feat: add retry layer for push metrics exporters#9036
Conversation
✅ Docs preview has no changesThe preview was not built because there were no changes. Build ID: 911ae5a7a132b5fa6fefbf00 ✅ AI Style Review — No Changes DetectedNo MDX files were changed in this pull request. Review Log: View detailed log
|
This comment has been minimized.
This comment has been minimized.
conwuegb
left a comment
There was a problem hiding this comment.
It mostly looks good, I just have a few questions.
| //! Retry wrapper for push metric exporters. | ||
| //! | ||
| //! Wraps a `PushMetricExporter` and retries failed exports a configurable number | ||
| //! of times with exponential backoff. Only surfaces the error after all attempts |
There was a problem hiding this comment.
Out of curiosity, what does "exponential backoff" mean here? Is it referring to exponentially longer wait times in between attempts? And if so, why?
There was a problem hiding this comment.
That's right - its a backoff strategy designed to prevent overloading the service with excessive requests if its already at capacity. I checked the OTEL docs to see if it had any recommendations here (originally I didn't think to check) and interestingly it does recommend using exponential backoff with jitter https://opentelemetry.io/docs/specs/otel/protocol/exporter/#retry. I'll go ahead and add the jitter aspect for consistency
carodewig
left a comment
There was a problem hiding this comment.
I'm curious how this dovetails (or differs from) the retry mechanism already present in ApolloExporter::submit_report - does that already retry the export, so now there's double retries involved? Or are there now two main retry mechanisms - one for ApolloOtlpExporter and one for ApolloExporter?
This area of the codebases generally confuses me (ie the difference between ApolloExporter and ApolloOtlpExporter) so may not be relevant - but wanted to ask to help with my own understanding!
@carodewig I just double checked this as most of this is new to me as well - they do sent different data, |
Add
RetryMetricExporter, which retries up to 3 times with exponential backoff to theapollo metricsandotlpnamed exporters.Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
A lot of (if not most) features benefit from built-in observability and
debug-level logs. Please read this guidance on metrics best-practices. ↩Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩