-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
box: finish client fibers on shutdown #9604
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nshy
force-pushed
the
client-fibers-shutdown
branch
from
January 19, 2024 13:37
436da44
to
c8c9963
Compare
locker
requested changes
Jan 19, 2024
test/replication/gh-3637-misc-error-on-replica-auth-fail.result
Outdated
Show resolved
Hide resolved
I also added this hunk to fix memleak with injections. --- a/src/box/memtx_engine.cc
+++ b/src/box/memtx_engine.cc
@@ -1028,7 +1028,7 @@ checkpoint_f(va_list ap)
return -1;
}
- struct mh_i32_t *temp_space_ids = mh_i32_new();
+ struct mh_i32_t *temp_space_ids;
say_info("saving snapshot `%s'", snap->filename);
ERROR_INJECT_WHILE(ERRINJ_SNAP_WRITE_DELAY, {
@@ -1040,6 +1040,7 @@ checkpoint_f(va_list ap)
});
ERROR_INJECT(ERRINJ_SNAP_SKIP_ALL_ROWS, goto done);
struct space_read_view *space_rv;
+ temp_space_ids = mh_i32_new();
read_view_foreach_space(space_rv, &ckpt->rv) {
FiberGCChecker gc_check;
bool skip = false; |
nshy
force-pushed
the
client-fibers-shutdown
branch
from
January 24, 2024 15:09
c8c9963
to
08e6869
Compare
locker
requested changes
Jan 25, 2024
nshy
force-pushed
the
client-fibers-shutdown
branch
from
January 25, 2024 12:35
08e6869
to
df80e20
Compare
ochaplashkin
approved these changes
Jan 25, 2024
locker
approved these changes
Jan 26, 2024
In the process of graceful shutdown it is convenient to first finish all client (non system) fibers. Otherwise we should be ready for any subsystem to handle request from client fiber during or after subsystem shutdown. This would make code more complex. We first cancel client fibers and then wait for their finishing. The fiber may not respond to cancel and hang which cause shutdown hang but this is the approach we choose for iproto shutdown already. Note that as a result of this approach application will panic if it is shutdown during execution of initialization script (in particular if this script is doing box.cfg). There are changes in application/test to adopt to client fibers shutdown: - make code cancellable (only to pass existing tests, we did not investigate all the possible places that should be made such). - make console stop sending echo to client before client fibers shutdown. Otherwise as console server fiber is client one we will send message that fiber is cancelled on shutdown which breaks a lot of existing tests. This approach is on par with iproto shutdown. - some tests (7743, replication-luatest/shutdown, replication/anon, replication/force_recovery etc etc) test shutdown during execution of init script. Now panic is expected so change them accordingly. - some tests (8530, errinj_vylog) use injection that block client fiber finishing. In that tests we don't need graceful shutdown so let's just kill tarantool instead. - we change test in vinyl/errinj for tarantoolgh-3225. We don't really need to check when vinyl reader is blocked as it executes small tasks (we assume reading syscall will not hang). Also change test for vinyl dump shutdown by slowing dump down instead of blocking it entirely. This is required to finish in time client fibers in the test. - other similar changes Also we can drop code from replication shutdown which is required to handle client requests during/after shutdown. Part of tarantool#8423 NO_CHANGELOG=internal NO_DOC=internal
nshy
force-pushed
the
client-fibers-shutdown
branch
from
January 26, 2024 10:36
df80e20
to
ef70297
Compare
nshy
added a commit
to nshy/queue
that referenced
this pull request
Jan 29, 2024
We are going to finish all client (non-system) fibers in the process of Tarantool shutdown. First we cancel them and then wait for their finishing. So if the client fibers are not finished Tarantool shutdown is never finished too. Currently some of queue fibers can not be finished and we got hang on integration testing of PR [1]. Let's fix it. In `queue_state.lua` the only error that can aries on `box.ctl.wait_rw` or `box.ctl.wait_ro` is `FiberIsCancelled` so we can just drop pcall. In `utubettl.lua` we just need to fix a typo. Just as already done for fifottl in e355387 ("fix fifottl_fiber_iteration error handling") [1] PR with client fibers finishing on shutdown. tarantool/tarantool#9604
nshy
added a commit
to nshy/memcached
that referenced
this pull request
Jan 29, 2024
We are going to finish all client (non-system) fibers in the process of Tarantool shutdown. First we cancel them and then wait for their finishing. So if the client fibers are not finished Tarantool shutdown is never finished too. Currently memcached service fiber can not be finished and we got hang on integration testing of PR [1]. Let's fix it. [1] PR with client fibers finishing on shutdown. tarantool/tarantool#9604
nshy
added a commit
to nshy/memcached
that referenced
this pull request
Jan 29, 2024
We are going to finish all client (non-system) fibers in the process of Tarantool shutdown. First we cancel them and then wait for their finishing. So if the client fibers are not finished Tarantool shutdown is never finished too. Currently memcached service fiber can not be finished and we got hang on integration testing of PR [1]. Let's fix it. [1] PR with client fibers finishing on shutdown. tarantool/tarantool#9604
LeonidVas
pushed a commit
to tarantool/memcached
that referenced
this pull request
Jan 29, 2024
We are going to finish all client (non-system) fibers in the process of Tarantool shutdown. First we cancel them and then wait for their finishing. So if the client fibers are not finished Tarantool shutdown is never finished too. Currently memcached service fiber can not be finished and we got hang on integration testing of PR [1]. Let's fix it. [1] PR with client fibers finishing on shutdown. tarantool/tarantool#9604
nshy
added a commit
to nshy/queue
that referenced
this pull request
Jan 29, 2024
We are going to finish all client (non-system) fibers in the process of Tarantool shutdown. First we cancel them and then wait for their finishing. So if the client fibers are not finished Tarantool shutdown is never finished too. Currently the fiber can not be finished and we got hang in integration testing of PR tarantool/tarantool#9604.
nshy
added a commit
to nshy/queue
that referenced
this pull request
Jan 29, 2024
Same as fixing a typo in e355387 ("fix fifottl_fiber_iteration error handling") for fifottl. Required to pass integration testing of PR tarantool/tarantool#9604.
nshy
added a commit
to nshy/queue
that referenced
this pull request
Jan 29, 2024
We are going to finish all client (non-system) fibers in the process of Tarantool shutdown. First we cancel them and then wait for their finishing. So if the client fibers are not finished Tarantool shutdown is never finished too. Currently the fiber can not be finished and we got hang in integration testing of PR tarantool/tarantool#9604.
nshy
added a commit
to nshy/queue
that referenced
this pull request
Jan 29, 2024
Same as fixing a typo in e355387 ("fix fifottl_fiber_iteration error handling") for fifottl. Required to pass integration testing of PR tarantool/tarantool#9604.
LeonidVas
pushed a commit
to tarantool/queue
that referenced
this pull request
Jan 29, 2024
We are going to finish all client (non-system) fibers in the process of Tarantool shutdown. First we cancel them and then wait for their finishing. So if the client fibers are not finished Tarantool shutdown is never finished too. Currently the fiber can not be finished and we got hang in integration testing of PR tarantool/tarantool#9604.
LeonidVas
pushed a commit
to tarantool/queue
that referenced
this pull request
Jan 29, 2024
Same as fixing a typo in e355387 ("fix fifottl_fiber_iteration error handling") for fifottl. Required to pass integration testing of PR tarantool/tarantool#9604.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the process of graceful shutdown it is convenient to first finish all client (non system) fibers. Otherwise we should be ready for any subsystem to handle request from client fiber during or after subsystem shutdown. This would make code more complex.
We first cancel client fibers and then wait for their finishing. The fiber may not respond to cancel and hang which cause shutdown hang but this is the approach we choose for iproto shutdown already.
Note that as a result of this approach application will panic if it is shutdown during execution of initialization script (in particular if this script is doing box.cfg).
There are changes in application/test to adopt to client fibers shutdown:
make code cancellable (only to pass existing tests, we did not investigate all the possible places that should be made such).
make console stop sending echo to client before client fibers shutdown. Otherwise as console server fiber is client one we will send message that fiber is cancelled on shutdown which breaks a lot of existing tests. This approach is on par with iproto shutdown.
some tests (7743, replication-luatest/shutdown, replication/anon, replication/force_recovery etc etc) test shutdown during execution of init script. Now panic is expected so change them accordingly.
some tests (8530, 5430, errinj_vylog) use injection that block client fiber finishing. In that tests we don't need graceful shutdown so let's just kill tarantool instead.
we change test in vinyl/errinj for
gh-3225
. We don't really need to check when vinyl reader is blocked as it executes small tasks (we assume reading syscall will not hang). Also change test for vinyl dump shutdown by slowing dump down instead of blocking it entirely. This is required to finish in time client fibers in the test.other similar changes
Also we can drop code from replication shutdown which is required to handle client requests during/after shutdown.
Part of ##8423