-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add tracing to worker and proxy #1014
base: next
Are you sure you want to change the base?
feat: add tracing to worker and proxy #1014
Conversation
a06eda5
to
d68170d
Compare
What is left to do on this PR? I would probably try to finish this one first, then address #1008, and only after that try to tackle metrics. |
This PR is missing a cleanup, some configuration options and documentation.
Ok! Sounds good. |
9ba6f50
to
e09374f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left some comments inline - most doc-related, but would be good for @igamigo and @Mirko-von-Leipzig to take a look as well.
#[derive(Debug)] | ||
pub struct LoadBalancer(pub Arc<LoadBalancerState>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not from this PR, but we should update section header on line 338 above.
// The following methods are a copy of the default implementation defined in the trait, but | ||
// with tracing instrumentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a brief explanation of why we need these methods implemented?
worker: None, | ||
parent_span: info_span!("proxy:new_request", request_id = request_id.to_string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why don't we need to specify target
both here and in other places in this file?
let server_session = session.as_mut(); | ||
let code = match e.etype() { | ||
HTTPStatus(code) => *code, | ||
_ => { | ||
match e.esource() { | ||
ErrorSource::Upstream => 502, | ||
ErrorSource::Downstream => { | ||
match e.etype() { | ||
WriteError | ReadError | ConnectionClosed => { | ||
/* conn already dead */ | ||
0 | ||
}, | ||
_ => 400, | ||
} | ||
}, | ||
ErrorSource::Internal | ErrorSource::Unset => 500, | ||
} | ||
}, | ||
}; | ||
if code > 0 { | ||
server_session.respond_error(code).await | ||
} | ||
code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that this is just a copy of a default implementation, right?
// Construct TracerProvider for OpenTelemetryLayer | ||
pub(crate) fn init_tracer_provider() -> TracerProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's use ///
for doc comments (here and in other places in this file).
Also, could we add more details about how we configure the tracing provider? For example, why do we need to add ID generator, what does with_sampler()
do etc.
// Setup tracing subscriber | ||
pub(crate) fn setup_tracing(provider: TracerProvider) -> Result<(), String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous comment - could we add more details about what this function does?
Also, what's the motivation for have this and init_tracer_provider()
as two separate functions? It seems like one is called right after the other. Should we combine them into one function?
use tracing::Level; | ||
use tracing_subscriber::{layer::SubscriberExt, Registry}; | ||
|
||
pub const TRACING_TARGET_NAME: &str = "miden-tx-prover"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I maybe would still call it MIDEN_TX_PROVER
.
opentelemetry-semantic-conventions = "0.27.0" | ||
opentelemetry-jaeger = "0.22.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would get rid of the patch versions.
@@ -114,6 +114,16 @@ The proxy service uses this health check to determine if a worker is available t | |||
|
|||
Both the worker and the proxy will use the `info` log level by default, but it can be changed by setting the `RUST_LOG` environment variable. | |||
|
|||
## Traces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe combine this and logging into one section (or make this a sub-section of logging?).
Also, I would add more details here. For example, it is not clear where tracing/logging info is written to. Is it stdout
? Is it some logging file? Somewhere else? It is also not clear whether there is a way to not use Jaeger (or maybe use something else) to view the logs.
Basically, a bit more context about how tracing/logging works would be helpful.
The service uses the `tracing` crate for structured logging and tracing. Traces are enabled by default, and uses opentelemetry to export traces to a Jaeger instance. The traces can be visualized using the Jaeger UI, which can be used by running: | ||
|
||
```bash | ||
docker run -d -p4317:4317 -p16686:16686 jaegertracing/all-in-one:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that we have docker
installed on the machine, right? If so, I would mention this.
Also, are there alternative ways to do this? We don't need to describe them but if there is a link to how to do it w/o Docker, I'd include it.
this PR is part of #1004