Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add tracing to worker and proxy #1014

Open
wants to merge 1 commit into
base: next
Choose a base branch
from

Conversation

SantiagoPittella
Copy link
Collaborator

@SantiagoPittella SantiagoPittella commented Dec 11, 2024

this PR is part of #1004

@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-add-tracing-to-worker-proxy branch 2 times, most recently from a06eda5 to d68170d Compare December 12, 2024 19:13
@bobbinth
Copy link
Contributor

What is left to do on this PR? I would probably try to finish this one first, then address #1008, and only after that try to tackle metrics.

@SantiagoPittella
Copy link
Collaborator Author

What is left to do on this PR?

This PR is missing a cleanup, some configuration options and documentation.

I would probably try to finish this one first, then address #1008, and only after that try to tackle metrics.

Ok! Sounds good.

@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-add-tracing-to-worker-proxy branch from 9ba6f50 to e09374f Compare December 13, 2024 16:55
@SantiagoPittella SantiagoPittella changed the title wip: add tracing to worker and proxy feat: add tracing to worker and proxy Dec 13, 2024
@SantiagoPittella SantiagoPittella marked this pull request as ready for review December 13, 2024 16:55
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left some comments inline - most doc-related, but would be good for @igamigo and @Mirko-von-Leipzig to take a look as well.

Comment on lines +346 to 347
#[derive(Debug)]
pub struct LoadBalancer(pub Arc<LoadBalancerState>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not from this PR, but we should update section header on line 338 above.

Comment on lines +555 to +556
// The following methods are a copy of the default implementation defined in the trait, but
// with tracing instrumentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a brief explanation of why we need these methods implemented?

worker: None,
parent_span: info_span!("proxy:new_request", request_id = request_id.to_string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why don't we need to specify target both here and in other places in this file?

Comment on lines +643 to +665
let server_session = session.as_mut();
let code = match e.etype() {
HTTPStatus(code) => *code,
_ => {
match e.esource() {
ErrorSource::Upstream => 502,
ErrorSource::Downstream => {
match e.etype() {
WriteError | ReadError | ConnectionClosed => {
/* conn already dead */
0
},
_ => 400,
}
},
ErrorSource::Internal | ErrorSource::Unset => 500,
}
},
};
if code > 0 {
server_session.respond_error(code).await
}
code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that this is just a copy of a default implementation, right?

Comment on lines +24 to +25
// Construct TracerProvider for OpenTelemetryLayer
pub(crate) fn init_tracer_provider() -> TracerProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's use /// for doc comments (here and in other places in this file).

Also, could we add more details about how we configure the tracing provider? For example, why do we need to add ID generator, what does with_sampler() do etc.

Comment on lines +55 to +56
// Setup tracing subscriber
pub(crate) fn setup_tracing(provider: TracerProvider) -> Result<(), String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous comment - could we add more details about what this function does?

Also, what's the motivation for have this and init_tracer_provider() as two separate functions? It seems like one is called right after the other. Should we combine them into one function?

use tracing::Level;
use tracing_subscriber::{layer::SubscriberExt, Registry};

pub const TRACING_TARGET_NAME: &str = "miden-tx-prover";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I maybe would still call it MIDEN_TX_PROVER.

Comment on lines +45 to +46
opentelemetry-semantic-conventions = "0.27.0"
opentelemetry-jaeger = "0.22.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would get rid of the patch versions.

@@ -114,6 +114,16 @@ The proxy service uses this health check to determine if a worker is available t

Both the worker and the proxy will use the `info` log level by default, but it can be changed by setting the `RUST_LOG` environment variable.

## Traces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe combine this and logging into one section (or make this a sub-section of logging?).

Also, I would add more details here. For example, it is not clear where tracing/logging info is written to. Is it stdout? Is it some logging file? Somewhere else? It is also not clear whether there is a way to not use Jaeger (or maybe use something else) to view the logs.

Basically, a bit more context about how tracing/logging works would be helpful.

The service uses the `tracing` crate for structured logging and tracing. Traces are enabled by default, and uses opentelemetry to export traces to a Jaeger instance. The traces can be visualized using the Jaeger UI, which can be used by running:

```bash
docker run -d -p4317:4317 -p16686:16686 jaegertracing/all-in-one:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that we have docker installed on the machine, right? If so, I would mention this.

Also, are there alternative ways to do this? We don't need to describe them but if there is a link to how to do it w/o Docker, I'd include it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants