Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datadog] make error retryable if logs sender received nil response #28672

Merged

Conversation

siarhei-kharchanka-cko
Copy link
Contributor

Description:

The Datadog exporter threats network/connectivity errors (HTTP client doesn't receive a response) as permanent errors, which can lead to log records loss. This change makes these errors retryable.

Link to tracking Issue: #24550

Testing:

Documentation:

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 27, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: siarhei-kharchanka-cko / name: Siarhei Kharchanka (2a83b6a)

@github-actions github-actions bot added the exporter/datadog Datadog components label Oct 27, 2023
Copy link
Member

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level I agree with this change, though AFAICT this can be a problem in 2 corner cases:

  1. May lead to duplicated logs being sent if the initial request went through but response is lost
  2. May increase the wait time if there's indeed a permanent error when exporter received nil response

WDYT? @dineshg13 @mx-psi

@siarhei-kharchanka-cko
Copy link
Contributor Author

siarhei-kharchanka-cko commented Oct 30, 2023

At a high level I agree with this change, though AFAICT this can be a problem in 2 corner cases:

  1. May lead to duplicated logs being sent if the initial request went through but response is lost
  2. May increase the wait time if there's indeed a permanent error when exporter received nil response

Just a quick note from my side. As we briefly discussed the migration to the logs agent should fix the issue as well - So I have taken a look on how it works in the logs agent. According to this source code the logs agent will retry a request on the error despite the fact empty/non-empty response. The only difference is that the logs agent source code also checks that the error is not a cancellation error triggered by the client (we can add this check as well). It looks like the described 2 cases can happen with the logs agent as well.

Copy link
Member

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking, LGTM then

@mx-psi mx-psi merged commit 55afb0a into open-telemetry:main Oct 30, 2023
@github-actions github-actions bot added this to the next release milestone Oct 30, 2023
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this pull request Nov 12, 2023
…esponse (open-telemetry#28672)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
The Datadog exporter threats network/connectivity errors (HTTP client
doesn't receive a response) as permanent errors, which can lead to log
records loss. This change makes these errors retryable.

**Link to tracking Issue:** open-telemetry#24550

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>
RoryCrispin pushed a commit to ClickHouse/opentelemetry-collector-contrib that referenced this pull request Nov 24, 2023
…esponse (open-telemetry#28672)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
The Datadog exporter threats network/connectivity errors (HTTP client
doesn't receive a response) as permanent errors, which can lead to log
records loss. This change makes these errors retryable.

**Link to tracking Issue:** open-telemetry#24550

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/datadog Datadog components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants