Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector exporter retries and backoff #1233

Closed
mwear opened this issue Jun 22, 2020 · 3 comments · Fixed by #3207
Closed

Collector exporter retries and backoff #1233

mwear opened this issue Jun 22, 2020 · 3 comments · Fixed by #3207
Assignees

Comments

@mwear
Copy link
Member

mwear commented Jun 22, 2020

Is your feature request related to a problem? Please describe.

The collector exporter doesn't current retry sending spans. We should consider adding retry logic with a reasonable backoff strategy.

Describe the solution you'd like

The spec mentions that exporters are responsible for implementing retry strategies and that they will differ between protocols and tracing backends. The OTLP exporter for the collector implements retires using exponential backoff. See: https://github.com/open-telemetry/opentelemetry-collector/blob/1b6e88b9995f0559b0f25b4f7c9f2e95e92965bb/exporter/otlpexporter/exporter.go#L94-L98. It would be ideal if we could do something similar for OTel JS.

@dyladan dyladan added the up-for-grabs Good for taking. Extra help will be provided by maintainers label Jun 23, 2020
@obecny
Copy link
Member

obecny commented Jun 23, 2020

The spec says that it should be an exporter. I think it has some cons as we will have to implement more less the same logic for all exporters and still it might be not enough. But the api itself gives us possibility of delegating this to a processor already. This would allow us easily implement it once and then use it with any exporter without rules for retry etc. Because the api allows that I don't really see a reason why we couldn't add a new processor with different retry rules. Also the retry rules for certain processor might be not enough for someone's needs but then if you can create your own processor this would gives endless possibility without changing the exporters. And having own processor might be much faster for people - you can have full power there to decide about all different retry rules whereas with exporter it is not the case.

@svetlanabrennan
Copy link
Contributor

I can tackle this issue.

@dyladan dyladan removed the up-for-grabs Good for taking. Extra help will be provided by maintainers label Jan 5, 2022
@alanwest
Copy link
Member

As discussed in this week's SIG meeting, I promised to share some context around other SIGs retry policies plus some ongoing discussions at the spec level. cc @svetlanabrennan @dyladan

Which status codes are retryable?

OTLP/gRPC

The specification defines the retryable gRPC status codes here.

OTLP/HTTP

Things remains a little more ambiguous. Specification does mention 429 and 503, but an open spec issue proposes to also include 502 and 504. The experimental implementation for opentelemetry-java retries 429, 502, 503, 504.

@mwear shared Ruby's implementation. If I'm reading my Rubies right, it appears it retries 408, 429, 502, 503, 504, as well as a number of internal errors.

Additional retry policy parameters

An open spec issue proposes to define a reasonable default retry policy. An attempt was made to define some options + their defaults, but given varying opinions it seems likely this will be left to the individual SIGs to define. Retry policy parameters may include things like:

  • max attempts
  • initial backoff
  • max backoff
  • backoff multiplier

These are inspired by gRPCs notion of a retry policy, but could equally apply to the OTLP/HTTP exporter.

The opentelemetry-java project has implemented these configuration parameters and supplied defaults (also inspired by gRPC's defaults), but for now they are not publicly accessible. These parameters are used by Java's gRPC and HTTP implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants