-
Notifications
You must be signed in to change notification settings - Fork 16
safe/fast mode: remove fiber kill #475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
7a3db14 to
9a3bb6a
Compare
| local function call_on_storage(run_as_user, func_name, ...) | ||
| return box.session.su(run_as_user, call_cache.func_name_to_func(func_name), ...) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Сorrect that fast/safe mode semantics are now fully encapsulated in bucket_ref_unref (and related sharding helpers), and that no other side effects at the call layer are required anymore? If so, this would be good to explicitly state somewhere (comment or docs), since this is a noticeable behavioral change compared to previous versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are internal implementation details, and users do not need to be aware of them.
The API with subscriptions to rebalancing events remains unchanged, so I do not see a reason to explicitly document that some internal code was removed.
The fibers kill removal is already reflected in the changelog.
crud/common/yield_checks.lua
Outdated
| local info_prev = registry[info_curr.fid] | ||
| assert(info_curr.csw == info_prev.csw, "yield happened during fiber execution") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_no_yields() assumes the fiber was previously registered. For better, it might be worth adding an explicit assert(info_prev ~= nil, "fiber is not registered") before accessing info_prev.csw.
crud/common/yield_checks.lua
Outdated
| local fid = fiber.self():id() | ||
| assert(registry[fid] ~= nil, "fiber is not registered") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
register_fiber() uses info.fid as the registry key, while unregister_fiber() uses fiber.self():id(). suggest using either fiber.self():id() everywhere, or fiber.self():info().fid everywhere, to avoid surprises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the same value. In unregister_fiber needed only id, so fiber.self():id() is used. In register_fiber, check_no_yields :info used, so no need to call fiber.self():id(), when fiber id is stored in info.
crud/delete.lua
Outdated
| local finish = yield_checks.start() | ||
|
|
||
| opts = opts or {} | ||
|
|
||
| local space = box.space[space_name] | ||
| if space == nil then | ||
| return nil, DeleteError:new("Space %q doesn't exist", space_name) | ||
| return finish(nil, DeleteError:new("Space %q doesn't exist", space_name)) | ||
| end | ||
|
|
||
| local _, err = sharding.check_sharding_hash(space_name, | ||
| opts.sharding_func_hash, | ||
| opts.sharding_key_hash, | ||
| opts.skip_sharding_hash_check) | ||
|
|
||
| if err ~= nil then | ||
| return nil, err | ||
| return finish(nil, err) | ||
| end | ||
|
|
||
| local ref_ok, bucket_ref_err, unref = bucket_ref_unref.bucket_refrw(opts.bucket_id) | ||
| if not ref_ok then | ||
| return nil, bucket_ref_err | ||
| return finish(nil, bucket_ref_err) | ||
| end | ||
|
|
||
| yield_checks.check_no_yields() | ||
| -- add_space_schema_hash is false because | ||
| -- reloading space format on router can't avoid delete error on storage | ||
| local result = schema.wrap_func_result(space, space.delete, { | ||
| add_space_schema_hash = false, | ||
| field_names = field_names, | ||
| noreturn = opts.noreturn, | ||
| fetch_latest_metadata = opts.fetch_latest_metadata, | ||
| }, space, key) | ||
|
|
||
| finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an unexpected Lua error is thrown between yield_checks.start() and finish() (for example, inside schema.wrap_func_result()), the fiber would remain registered in yield_checks.registry. could this potentially affect subsequent tests and lead to confusing errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would affect tests that run on the same cluster. However, there are usually not many tests sharing a single cluster, so it shouldn’t cause major issues.
e0c1ca8 to
b3e1701
Compare
In fast mode, every fiber was named “fast” so it could be killed when safe mode was enabled. However, crud-storage methods do not yield until a space operation is performed. The rebalancing trigger is executed on insert/replace into the _bucket space; this operation does yield and switches the implementation of the bucket_ref/bucket_unref functions. Therefore, once a fiber starts in fast mode, it remains in fast mode until the first box.space operation: a write for memtx or a read/write for vinyl. As a result, there is no need to mark and kill iproto fibers that handle fast requests, because there is no situation in which a fiber remains in fast mode while rebalancing is in progress. Yield checks were added to the tests (yield_checks) to ensure that no yields occur during request until box operations.
In fast mode, every fiber was named "fast" so it could be killed when safe mode was enabled.
However, CRUD storage methods do not yield until a space operation is performed.
The rebalancing trigger is executed on
insert/replaceinto the_bucketspace; this operation does yield and switches the implementation of thebucket_ref/bucket_unreffunctions.Therefore, once a fiber starts in fast mode, it remains in fast mode until the first
box.spaceoperation:As a result, there is no need to mark and kill iproto fibers that handle fast requests, because there is no situation in which a fiber remains in fast mode while rebalancing is in progress.
Yield checks were added to the tests (
yield_checks) to ensure that no yields occur during these operations.I didn't forget about
Closes #473