Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ctrl+c to gracefully shutdown the server(s) #1613

Merged
merged 13 commits into from
Mar 21, 2023

Conversation

Wumpf
Copy link
Member

@Wumpf Wumpf commented Mar 20, 2023

Fixes

Tried to use tokio's ctrl-c detection but ran into issues with it, so used the ctrlc crate we already depend on and got it working nicely with that.

Tested ctrl'c'ing with:

Serving

  • cargo run -p rerun --features web_viewer -- ../api_demo.rrd --web-viewer
  • cargo run -p rerun --features web_viewer -- --web-viewer
  • cargo run -p re_web_viewer_server
  • cargo run -p api_demo -- --serve
  • python ./examples/python/api_demo/main.py --serve
  • python3 -m rerun --web-viewer ../objectron.rrd

Not serving, just to see if something else broke

  • python ./examples/python/tracking_hf_opencv/main.py
  • cargo run -p api_demo

Checklist

@Wumpf Wumpf added the 🪳 bug Something isn't working label Mar 20, 2023
@emilk
Copy link
Member

emilk commented Mar 20, 2023

I tested python3 -m rerun --web-viewer ../objectron.rrd, and that too seems to work

crates/re_web_viewer_server/src/main.rs Outdated Show resolved Hide resolved
crates/rerun/src/run.rs Outdated Show resolved Hide resolved
crates/rerun/src/run.rs Outdated Show resolved Hide resolved
crates/rerun/src/web_viewer.rs Outdated Show resolved Hide resolved
crates/rerun/src/web_viewer.rs Show resolved Hide resolved
crates/rerun/src/web_viewer.rs Outdated Show resolved Hide resolved

let web_server_join_handle = tokio_rt.spawn(async move {
// This is the server which the web viewer will talk to:
let ws_server = re_ws_comms::Server::new(re_ws_comms::DEFAULT_WS_SERVER_PORT)
.await
.unwrap();
let ws_server_handle = tokio::spawn(ws_server.listen(rerun_rx)); // TODO(emilk): use tokio_rt ?
let ws_server_handle = tokio::spawn(ws_server.listen(rerun_rx, shutdown_rx_ws_server)); // TODO(emilk): use tokio_rt ?
Copy link
Member

@emilk emilk Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this channel for the ws server? Doesn't calling self.web_server_join_handle.abort() (as we do in drop) already work?

That's one of the advantages of async tasks in general: they are easy to cancel (just drop their handles).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I could try ripping out the channel again from both servers and go with abort. Wonder if that is the same really as the shutdown I'm doing now.
I'd suggest not going there and unify as suggested above by using shutdown_txon Drop

Copy link
Member Author

@Wumpf Wumpf Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropping a handle does not cancel a task though, one needs to call abort on it explicitly? (only found this old thread where that decision was made, maybe has changed since then tokio-rs/tokio#1830 (comment))
Also do all tasks implement proper abort? The channel setup here follows tokio's recommendation on how to deal with ctrl+c

Copy link
Member

@emilk emilk Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought abort would work on any async task, since it simply stops the future from being polled. I don't understand why tokio would recommend something else.

Where is this recommendation?

Copy link
Member Author

@Wumpf Wumpf Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://tokio.rs/tokio/topics/shutdown

let next_frame = tokio::select! {
    res = self.connection.read_frame() => res?,
    _ = self.shutdown.recv() => {
        // If a shutdown signal is received, return from `run`.
        // This will result in the task terminating.
        return Ok(());
    }
};

Maybe this is a bit too much out of context we're dealing with more specific tasks after all. Furthermore, most of this does boil down to an abort I reckon. The reason I ended up here was not exhaustive research and consideration of what we have, but rather seeing that example and going with it, discovering stuff in the process.
That said, I'm actually quite happy with this solution here since it gives us a single super easy entry point for signaling shut down which also makes it easy to work with hyper's with_graceful_shutdown

rerun_py/rerun_sdk/rerun/script_helpers.py Show resolved Hide resolved
@Wumpf
Copy link
Member Author

Wumpf commented Mar 20, 2023

cargo run -p rerun --features web_viewer -- --web-viewer does not work correctly

@Wumpf
Copy link
Member Author

Wumpf commented Mar 20, 2023

oof this has bigger ramifications.

@Wumpf Wumpf added the do-not-merge Do not merge this PR label Mar 20, 2023
…l shutdown mechanism

Had to move setup_ctrl_c_handler back to `rerun` because of this
@Wumpf Wumpf added 💣 crash crash, deadlock/freeze, do-no-start and removed do-not-merge Do not merge this PR labels Mar 20, 2023
@Wumpf
Copy link
Member Author

Wumpf commented Mar 20, 2023

works now. Tested all serve usecases again (added to the list)

@Wumpf
Copy link
Member Author

Wumpf commented Mar 20, 2023

more issues! I didn't try cargo run -p api_demo -- --serve before (which runs into the remaining sleep in clap.rs). Cancelling works but it doesn't serve properly

@emilk emilk marked this pull request as draft March 20, 2023 15:10
Wumpf added 3 commits March 20, 2023 16:23
…n't create a new runtime if there is one already. Gracefully wait on ctrl+c instead of sleeping for `RerunArgs` entrypoint
@Wumpf Wumpf marked this pull request as ready for review March 20, 2023 16:07
@Wumpf
Copy link
Member Author

Wumpf commented Mar 20, 2023

that one is passing now as well! Changes got a lot bigger, but I do think it's better - admittedly still learning all these tokio things though; got quite far 😄

@Wumpf Wumpf requested a review from emilk March 20, 2023 17:05
Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👏👏👏

crates/re_ws_comms/src/server.rs Outdated Show resolved Hide resolved
crates/rerun/src/clap.rs Outdated Show resolved Hide resolved
@@ -74,6 +74,18 @@ impl RerunArgs {
default_enabled: bool,
run: impl FnOnce(Session) + Send + 'static,
) -> anyhow::Result<()> {
// Ensure we have a running tokio runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice

crates/rerun/src/web_viewer.rs Outdated Show resolved Hide resolved
@emilk emilk merged commit d666c94 into main Mar 21, 2023
@emilk emilk deleted the andreas/fix-crash-on-serve-exit branch March 21, 2023 07:40
@emilk
Copy link
Member

emilk commented Mar 21, 2023

Found a problem after merge:

cargo r -p rerun -- ../nyud.rrd causes this:

[2023-03-21T08:15:56Z INFO  rerun::run] Loading "../nyud.rrd"…

thread 'main' panicked at 'Error setting Ctrl-C handler: MultipleHandlers', re_viewer/src/app.rs:131

   7: core::panicking::panic_fmt
             at core/src/panicking.rs:64:14
   8: core::result::unwrap_failed
             at core/src/result.rs:1791:5
   9: core::result::Result<T,E>::expect
             at core/src/result.rs:1070:23
      re_viewer::app::App::from_receiver
             at re_viewer/src/app.rs:126:13
  10: rerun::run::run_impl::{{closure}}::{{closure}}
             at rerun/src/run.rs:357:27
      core::ops::function::FnOnce::call_once{{vtable.shim}}
             at core/src/ops/function.rs:507:5
  11: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at alloc/src/boxed.rs:2000:9

Troubleshooting Rerun: https://www.rerun.io/docs/getting-started/troubleshooting
libc++abi: terminating with uncaught foreign exception


Rerun caught a signal: SIGABRT

Troubleshooting Rerun: https://www.rerun.io/docs/getting-started/troubleshooting

      rerun::crash_handler::install_signal_handler::signal_handler
             at rerun/src/crash_handler.rs:163:25
   3: _OSAtomicTestAndClearBarrier
   4: __pthread_atfork_prepare_handlers
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>

Rerun caught a signal: SIGABRT
Troubleshooting Rerun: https://www.rerun.io/docs/getting-started/troubleshooting

zsh: abort      cargo r -p rerun -- ../nyud.rrd

Reverting in #1632

@emilk
Copy link
Member

emilk commented Mar 21, 2023

I think the solution to the above problem would be to hand down shutdown_rx to crates/re_viewer/src/app.rs, and have that spawn a task to listen for the shutdown signal

…though I am not a big fan of how tokio is infecting every part of the codebase :/

emilk added a commit that referenced this pull request Mar 21, 2023
Wumpf added a commit that referenced this pull request Mar 21, 2023
emilk pushed a commit that referenced this pull request Mar 21, 2023
* Revert "Revert "Handle ctrl+c to gracefully shutdown the server(s) (#1613)" (#1632)"

This reverts commit 93ab88b.

* pass shutdown bool to app instead of installing another ctrl+c handler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working 💣 crash crash, deadlock/freeze, do-no-start
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants