Skip to content

Fix Elasticsearch output retry backoff when receiving 429s#45073

Merged
faec merged 6 commits intoelastic:mainfrom
faec:elasticsearch-429-backoff
Jun 27, 2025
Merged

Fix Elasticsearch output retry backoff when receiving 429s#45073
faec merged 6 commits intoelastic:mainfrom
faec:elasticsearch-429-backoff

Conversation

@faec
Copy link
Copy Markdown

@faec faec commented Jun 26, 2025

See #36926. This fix has two components:

  • Return an error from Publish when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
  • Break the backoff counters for Publish and Connect into separate values, so a successful Connect call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

This comment on the issue has local testing instructions.

Related issues

@faec faec self-assigned this Jun 26, 2025
@faec faec added bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Jun 26, 2025
@botelastic botelastic Bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 26, 2025
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 26, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @faec? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@faec faec marked this pull request as ready for review June 26, 2025 16:33
@faec faec requested a review from a team as a code owner June 26, 2025 16:33
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@faec faec added the backport-active-all Automated backport with mergify to all the active branches label Jun 26, 2025
Copy link
Copy Markdown
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM. I'm wondering about cost/benefit of a test to make sure we keep the behavior in the future. What do you think?

@faec
Copy link
Copy Markdown
Author

faec commented Jun 26, 2025

I'm wondering about cost/benefit of a test to make sure we keep the behavior in the future. What do you think?

Yeah, I waffled about this, since it's one simple check in the middle of a function that expects a live connection. But ok, I split off the return value into a helper function based on the computed stats, and unit tested that, which is pretty simple but will keep someone from accidentally skipping the check if they reorganize that section of the code.

Fae Charlton added 2 commits June 26, 2025 16:55
@faec faec merged commit 8b25d5b into elastic:main Jun 27, 2025
204 of 205 checks passed
@faec faec deleted the elasticsearch-429-backoff branch June 27, 2025 21:29
@github-actions
Copy link
Copy Markdown
Contributor

@Mergifyio backport 8.17 8.18 8.19 9.0 9.1

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 27, 2025

backport 8.17 8.18 8.19 9.0 9.1

✅ Backports have been created

Details

mergify Bot pushed a commit that referenced this pull request Jun 27, 2025
See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)

# Conflicts:
#	libbeat/outputs/elasticsearch/client.go
mergify Bot pushed a commit that referenced this pull request Jun 27, 2025
See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)

# Conflicts:
#	libbeat/outputs/elasticsearch/client.go
mergify Bot pushed a commit that referenced this pull request Jun 27, 2025
See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)
mergify Bot pushed a commit that referenced this pull request Jun 27, 2025
See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)
mergify Bot pushed a commit that referenced this pull request Jun 27, 2025
See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)

# Conflicts:
#	libbeat/outputs/elasticsearch/client.go
pierrehilbert added a commit that referenced this pull request Jul 1, 2025
…ceiving 429s (#45099)

* Fix Elasticsearch output retry backoff when receiving 429s (#45073)

See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)

# Conflicts:
#	libbeat/outputs/elasticsearch/client.go

* Update CHANGELOG.next.asciidoc

* Update client.go

---------

Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
pierrehilbert pushed a commit that referenced this pull request Jul 1, 2025
…45098)

See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)

Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
pierrehilbert pushed a commit that referenced this pull request Jul 1, 2025
…45097)

See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)

Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
pierrehilbert added a commit that referenced this pull request Jul 1, 2025
…eceiving 429s (#45096)

* Fix Elasticsearch output retry backoff when receiving 429s (#45073)

See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)

# Conflicts:
#	libbeat/outputs/elasticsearch/client.go

* Update CHANGELOG.next.asciidoc

* Update client.go

---------

Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
pierrehilbert added a commit that referenced this pull request Jul 1, 2025
…eceiving 429s (#45095)

* Fix Elasticsearch output retry backoff when receiving 429s (#45073)

See #36926. This fix has two components:
- Return an error from `Publish` when the Elasticsearch output gets a 429 (too many requests) from Elasticsearch. This triggers a retry delay and reconnection attempt in the pipeline.
- Break the backoff counters for `Publish` and `Connect` into separate values, so a successful `Connect` call (which for Elasticsearch just means that an empty http GET gave an ok response) doesn't reset the exponential backoff for bulk ingest requests when they are being throttled.

(cherry picked from commit 8b25d5b)

# Conflicts:
#	libbeat/outputs/elasticsearch/client.go

* Update CHANGELOG.next.asciidoc

* Update client.go

---------

Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Publish() should backoff if Elasticsearch returns 429 HTTP rate limiting responses

3 participants