-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/otlp] Return proper http response code based on retryable errors #9357
[receiver/otlp] Return proper http response code based on retryable errors #9357
Conversation
e58d918
to
bc6c78b
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9357 +/- ##
=======================================
Coverage 91.13% 91.13%
=======================================
Files 353 353
Lines 18728 18740 +12
=======================================
+ Hits 17067 17079 +12
Misses 1333 1333
Partials 328 328 ☔ View full report in Codecov by Sentry. |
bc6c78b
to
20c5453
Compare
20c5453
to
463f465
Compare
c1a9202
to
dea6edb
Compare
The otlp receiver was recently updated via #8080 to properly propagate consumer errors back to clients as either permanent or retriable. The code we're using to indicate a non-retriable error is `codes.InvalidArgument`, which is the equivalent of `400` in HTTP. While 100% correct according to the [OTLP specification](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures) to indicate a non-retriable error, I think `codes.Internal` (which is equivalent to HTTP `500`), better conveys the actual state of the collector in these situations. Related to #9357 (comment) --------- Co-authored-by: Evan Bradley <[email protected]>
5d72d04
to
fa77168
Compare
@@ -42,7 +43,7 @@ func handleTraces(resp http.ResponseWriter, req *http.Request, tracesReceiver *t | |||
|
|||
otlpResp, err := tracesReceiver.Export(req.Context(), otlpReq) | |||
if err != nil { | |||
writeError(resp, enc, err, http.StatusInternalServerError) | |||
writeError(resp, enc, err, errors.GetHTTPStatusCodeFromStatus(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that now we also have a similar but different logic in errorMsgToStatus
. Can we consolidate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I took a look at consolidating before opening the PR. I believe it can be refactored, but it caused a lot of changes unrelated to the issue I was trying to solve. To keep the PRs smaller and targeted I'd like to use this PR as a solution to the issue and then a future PR that is only a refactor. I can open an issue to track that refactor if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogdandrutu are you ok with that approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #9864
@open-telemetry/collector-maintainers please review |
@open-telemetry/collector-maintainers please review |
@bogdandrutu PTAL |
Hi, I was just looking at this issue myself locally, and have found this open issue... I notice this, #9307 and #8080 are somewhat coupled to GRCP status codes. I also notice we have Can you clarify if Have you considered using Using |
@0x006EA1E5 as you've deduced, |
@bogdandrutu @open-telemetry/collector-approvers please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I've personally tested this change on tempo and has worked as expected, we get proper mappings now. Tempo uses the receiver code for both handling and being up to date with the latest otel changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description:
Updates the receiver's http response to return a proper http status based on whether or not the pipeline returned a retryable error. Builds upon the work done in #8080 and #9307
Link to tracking Issue:
Closes #9337
Closes #8132
Closes #9636
Closes #6725
Testing:
Updated lots of unit tests