Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate all logging to the tracing library #871

Merged
merged 1 commit into from
Apr 22, 2024

Conversation

allada
Copy link
Member

@allada allada commented Apr 22, 2024

This is a significant internal change to move completely away from the last of the "log" crate and move to "tracing".

The major advantage of doing this, is that we can now trace functions across future boundaries usin tracing's instrumentation.

We will now be requiring all tokio::spawn's usage to be properly instrumented.


This change is Reviewable

This is a significant internal change to move completely away
from the last of the "log" crate and move to "tracing".

The major advantage of doing this, is that we can now trace
functions across future boundaries using tracing's instrumentation.

We will now be requiring all tokio::spawn's usage to be
properly instrumented.

closes TraceMachina#870
Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@aaronmondal (this should make you happy)
+@adam-singer

fyi: @chrisstaite-menlo

Reviewable status: 0 of 2 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04 (waiting on @aaronmondal and @adam-singer)

a discussion (no related file):
If we keep expanding this usage, there's a good chance we can hook into .err_tip() or something. There's a lot of challenges and potential disadvantages though.


Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04, and 1 discussions need to be resolved (waiting on @aaronmondal and @adam-singer)


nativelink-service/src/ac_server.rs line 166 at r1 (raw file):

#[tonic::async_trait]
impl ActionCache for AcServer {
    #[allow(clippy::blocks_in_conditions)]

blah, stupid clippy doesn't like the way this macro rewrites the code :-(


nativelink-util/src/retry.rs line 159 at r1 (raw file):

                    Some(RetryResult::Retry(err)) => {
                        if !self.should_retry(&err.code) {
                            event!(Level::ERROR, ?attempt, ?err, "Not retrying permanent error");

note: Decided to upgrade this error to error instead of info..


nativelink-worker/src/running_actions_manager.rs line 528 at r1 (raw file):

}

async fn do_cleanup(

I found a bug due to the improved logging, so I figured I'd fix it. I can break this out into it's own patch, but it'd screw a bit with what I'd done already in this file.

The error was that if the future got dropped, it'd then cause drop() to be called and we have an assert in there to ensure our action's get cleaned up. To get around this, we now emit an error, but then do cleanup in a spawn.


src/bin/nativelink.rs line 667 at r1 (raw file):

                // is ever dropped we will cleanup our data.
                let scope_guard = guard(
                    Arc::downgrade(&connected_clients_mux),

fyi: Also figured, I'd fix this... We shouldn't hold a strong reference for cleanup code. No risk, as the code is written now, but good hygine.

Copy link
Contributor

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 31 of 31 files at r1, all commit messages.
Reviewable status: 1 of 2 LGTMs obtained, and 1 discussions need to be resolved (waiting on @aaronmondal)

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this! ❤️

:lgtm_strong:

Reviewed 31 of 31 files at r1, all commit messages.
Reviewable status: 2 of 2 LGTMs obtained, and 1 discussions need to be resolved


src/bin/nativelink.rs line 696 at r1 (raw file):

                .map_ok_or_else(
                    |err| {
                        use std::error::Error;

nit: Is this use correct?

Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 2 LGTMs obtained


src/bin/nativelink.rs line 696 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

nit: Is this use correct?

Yes. This is a special case that we need to get .downcast_ref() (or .source(), don't remember which one) which lives on std::error::Error and didn't want to make it visible to entire file.

@allada allada merged commit 523ee33 into TraceMachina:main Apr 22, 2024
26 checks passed
chinchaun pushed a commit to chinchaun/nativelink that referenced this pull request May 6, 2024
This is a significant internal change to move completely away
from the last of the "log" crate and move to "tracing".

The major advantage of doing this, is that we can now trace
functions across future boundaries using tracing's instrumentation.

We will now be requiring all tokio::spawn's usage to be
properly instrumented.

closes TraceMachina#870
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants