-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health Metrics for data-pipeline + Dogstatsd-Client Crate #638
Conversation
Signed-off-by: Bob Weinand <[email protected]>
Signed-off-by: Bob Weinand <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a functional standpoint following @ajgajg1134 's answers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just a few suggestions on design alternatives
@@ -197,13 +235,29 @@ impl TraceExporter { | |||
}) | |||
} | |||
|
|||
/// Emit a health metric to dogstatsd | |||
fn emit_metric(&self, metric: HealthMetric, custom_tags: Option<Vec<&Tag>>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it is worth it performance wise. But using a closure to lazily create the metric and tags can avoid computation and allocation when stats metrics are disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How likely are metrics to be disabled? If there is a decent chance we definitely should avoid any extra computation and allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, probably delaying the allocation would be beneficial if health metrics can be configurable by the customer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow how one would do this / where the performance benefit would be? If the client is None
then this function does nothing and just returns? 🤔 (or do you mean allocating the HealthMetric
type?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the HealthMetric and Vec still need to be allocated. Another option would be to use a macro to add a if flusher.is_some() {...}
everywhere we create metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only remaining concern is the one @VianneyRuhlmann raised. We shouldn't be calculating metrics at all if the functionality is disabled...assuming there may be a non-trivial amount of situations where metrics are disabled. But that can be addressed by a future PR.
let tags = match custom_tags { | ||
None => Either::Left(&self.common_stats_tags), | ||
Some(custom) => Either::Right(self.common_stats_tags.iter().chain(custom)), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a personal preference because I tend to find the use of Either rather obscure. Wouldn't it be better to define a more descriptive enum type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I definitely don't love having to use it here but I couldn't find a way around it. I'm also not a huge fan of a more descriptive type since that type would only be used in exactly this one place, and would only exist to get around the type system so I can have tags
be something that implements "Into Iterator" and avoid needing dynamic dispatch. (more context here: #638 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big fan of Either
when it makes sense (as in this case). We know that the a tag
can be A
or B
, and adding our own TagAorB
type just to represent that feels superfluous.
@@ -197,13 +235,29 @@ impl TraceExporter { | |||
}) | |||
} | |||
|
|||
/// Emit a health metric to dogstatsd | |||
fn emit_metric(&self, metric: HealthMetric, custom_tags: Option<Vec<&Tag>>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, probably delaying the allocation would be beneficial if health metrics can be configurable by the customer.
Ok(()) | ||
} | ||
|
||
fn create_client(endpoint: &Endpoint) -> anyhow::Result<StatsdClient> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: as the function has multiple points for returning an error and these errors can bubble up in the stack frame I would suggest to add more context creating some explicit errors with anyhow
so it's easier to debug which method failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call out, I've just added additional anyhow
in the places it wasn't used before!
What does this PR do?
Add some health metrics to the datapipeline-crate to assist with debugging any issues in customer environments. To support this the dogstatsd-client code from the sidecar has been promoted to a separate crate where it can be utilized by both.
Motivation
health metrics are good for debugging, we've used them successfully in other tracers.
Additional Notes
Unfortunately there is a bit of new code duplication in the dogstatsd-client crate. This was to get around some very difficult to read rust syntax in trying to write a
DogStatsDAction
type that could acceptVec<Tag>
,&[&Tag]
and a chained iterator of&Tag
. Doing this turned into a very difficult task that seems harder than it's worth, if there are any rust experts reviewing this who want to take a stab please feel free to do so, just ensure that your resulting code is able to be used by BOTH the sidecar and data-pipeline as these crates both use the client with different types.How to test the change?
Unit tests here provide good coverage for these implementations.
Ticket: APMSP-1248