-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate all logging to the tracing library #871
Conversation
This is a significant internal change to move completely away from the last of the "log" crate and move to "tracing". The major advantage of doing this, is that we can now trace functions across future boundaries using tracing's instrumentation. We will now be requiring all tokio::spawn's usage to be properly instrumented. closes TraceMachina#870
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@aaronmondal (this should make you happy)
+@adam-singer
fyi: @chrisstaite-menlo
Reviewable status: 0 of 2 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04 (waiting on @aaronmondal and @adam-singer)
a discussion (no related file):
If we keep expanding this usage, there's a good chance we can hook into .err_tip()
or something. There's a lot of challenges and potential disadvantages though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04, and 1 discussions need to be resolved (waiting on @aaronmondal and @adam-singer)
nativelink-service/src/ac_server.rs
line 166 at r1 (raw file):
#[tonic::async_trait] impl ActionCache for AcServer { #[allow(clippy::blocks_in_conditions)]
blah, stupid clippy doesn't like the way this macro rewrites the code :-(
nativelink-util/src/retry.rs
line 159 at r1 (raw file):
Some(RetryResult::Retry(err)) => { if !self.should_retry(&err.code) { event!(Level::ERROR, ?attempt, ?err, "Not retrying permanent error");
note: Decided to upgrade this error to error
instead of info
..
nativelink-worker/src/running_actions_manager.rs
line 528 at r1 (raw file):
} async fn do_cleanup(
I found a bug due to the improved logging, so I figured I'd fix it. I can break this out into it's own patch, but it'd screw a bit with what I'd done already in this file.
The error was that if the future got dropped, it'd then cause drop()
to be called and we have an assert in there to ensure our action's get cleaned up. To get around this, we now emit an error, but then do cleanup in a spawn.
src/bin/nativelink.rs
line 667 at r1 (raw file):
// is ever dropped we will cleanup our data. let scope_guard = guard( Arc::downgrade(&connected_clients_mux),
fyi: Also figured, I'd fix this... We shouldn't hold a strong reference for cleanup code. No risk, as the code is written now, but good hygine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 31 of 31 files at r1, all commit messages.
Reviewable status: 1 of 2 LGTMs obtained, and 1 discussions need to be resolved (waiting on @aaronmondal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this! ❤️
Reviewed 31 of 31 files at r1, all commit messages.
Reviewable status: 2 of 2 LGTMs obtained, and 1 discussions need to be resolved
src/bin/nativelink.rs
line 696 at r1 (raw file):
.map_ok_or_else( |err| { use std::error::Error;
nit: Is this use correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 2 LGTMs obtained
src/bin/nativelink.rs
line 696 at r1 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
nit: Is this use correct?
Yes. This is a special case that we need to get .downcast_ref()
(or .source()
, don't remember which one) which lives on std::error::Error
and didn't want to make it visible to entire file.
This is a significant internal change to move completely away from the last of the "log" crate and move to "tracing". The major advantage of doing this, is that we can now trace functions across future boundaries using tracing's instrumentation. We will now be requiring all tokio::spawn's usage to be properly instrumented. closes TraceMachina#870
This is a significant internal change to move completely away from the last of the "log" crate and move to "tracing".
The major advantage of doing this, is that we can now trace functions across future boundaries usin tracing's instrumentation.
We will now be requiring all tokio::spawn's usage to be properly instrumented.
This change is