Skip to content

[kafka] Handle configuration errors#45128

Merged
khushijain21 merged 14 commits intoelastic:mainfrom
khushijain21:kafkabug
Jul 10, 2025
Merged

[kafka] Handle configuration errors#45128
khushijain21 merged 14 commits intoelastic:mainfrom
khushijain21:kafkabug

Conversation

@khushijain21
Copy link
Copy Markdown
Contributor

@khushijain21 khushijain21 commented Jul 1, 2025

Proposed commit message

This PR handles kafka configuration errors by dropping the message. Since configuration errors are of permanent error type and should not be retried.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Bring kafka server up using following command

cd filebeat
mage docker:composeUp

Use following filebeat.yml

filebeat.inputs:
  - type: filestream
    id: filestream-input-id
    enabled: true
    paths:
      - ./test.log
    file_identity.native: ~
    prospector.scanner.fingerprint.enabled: false      

output:
  kafka:
    hosts: ["localhost:9092", "localhost:9094"]
    topic: beats
    max_message_bytes: 10

and you can see this error

{
  "log.level": "error",
  "@timestamp": "2025-07-02T15:28:04.373+0530",
  "log.logger": "kafka",
  "log.origin": {
    "function": "github.com/elastic/beats/v7/libbeat/outputs/kafka.(*msgRef).fail",
    "file.name": "kafka/client.go",
    "file.line": 396
  },
  "message": "Kafka (topic=beats): dropping message due to invalid configuration kafka: invalid configuration (Attempt to produce message larger than configured Producer.MaxMessageBytes: 851 > 10)",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}

Related issues

@botelastic botelastic Bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 1, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 1, 2025

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jul 1, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @khushijain21? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

Comment thread libbeat/outputs/kafka/client.go Outdated
@khushijain21 khushijain21 marked this pull request as ready for review July 2, 2025 10:22
@khushijain21 khushijain21 requested a review from a team as a code owner July 2, 2025 10:22
@khushijain21 khushijain21 added backport-active-all Automated backport with mergify to all the active branches bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Jul 2, 2025
@botelastic botelastic Bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 2, 2025
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

Comment thread libbeat/outputs/kafka/client.go Outdated
Copy link
Copy Markdown
Member

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think dropping messages on all configuration errors is the correct approach here.

The goal is to fix #44619 by bringing back the behaviour of dropping messages if they're too large. Other configuration issues should be handled the same way they're handled today.

The problem with dropping messages on any configuration errors is that a configuration error can be fixed with user intervention. In that case we do not want inputs that keep track of their progress (like Filestream) to ingest all available data just to have it discarded by the output. Effectively creating another instance of #43250.

Even though it's not ideal, I prefer to compare the error string looking for Attempt to produce message larger than configured Producer.MaxMessageBytes (see the PR introducing the change), if it matches, we then drop the event like we used to. Any other configuration error we still retry.

Copy link
Copy Markdown

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly fine, a couple small requests. (I'm about to be on PTO for a few days so feel free to merge in my absence once other reviewers are happy.)

Comment thread libbeat/outputs/kafka/client.go Outdated
@khushijain21
Copy link
Copy Markdown
Contributor Author

@tiago can you take a look again. I have ensured we don't drop all configuration related and errors and also incorporated Fae's remarks

@khushijain21 khushijain21 requested review from belimawr and faec July 3, 2025 07:09
Copy link
Copy Markdown
Member

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good now!

Sorry for not mentioning that before, but could you add some tests for the code you added? Especially now that we're looking at the error string we should have a test enduring the behaviour is not going to change.

@khushijain21 khushijain21 requested a review from belimawr July 9, 2025 06:10
Copy link
Copy Markdown
Member

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great now!

I added some minor comments about formatting, feel free to ignore them ;)

Comment thread libbeat/outputs/kafka/client.go
Comment thread libbeat/outputs/kafka/client.go
Comment thread libbeat/outputs/kafka/client.go
@khushijain21 khushijain21 enabled auto-merge (squash) July 10, 2025 04:18
@khushijain21 khushijain21 merged commit 23f4491 into elastic:main Jul 10, 2025
201 of 204 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

@Mergifyio backport 8.17 8.18 8.19 9.0 9.1

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jul 10, 2025

backport 8.17 8.18 8.19 9.0 9.1

✅ Backports have been created

Details

mergify Bot pushed a commit that referenced this pull request Jul 10, 2025
* [kafka] Handle configuration errors

(cherry picked from commit 23f4491)

# Conflicts:
#	libbeat/outputs/kafka/client.go
#	libbeat/outputs/kafka/kafka_integration_test.go
mergify Bot pushed a commit that referenced this pull request Jul 10, 2025
* [kafka] Handle configuration errors

(cherry picked from commit 23f4491)

# Conflicts:
#	libbeat/outputs/kafka/kafka_integration_test.go
mergify Bot pushed a commit that referenced this pull request Jul 10, 2025
* [kafka] Handle configuration errors

(cherry picked from commit 23f4491)
mergify Bot pushed a commit that referenced this pull request Jul 10, 2025
* [kafka] Handle configuration errors

(cherry picked from commit 23f4491)

# Conflicts:
#	libbeat/outputs/kafka/kafka_integration_test.go
mergify Bot pushed a commit that referenced this pull request Jul 10, 2025
* [kafka] Handle configuration errors

(cherry picked from commit 23f4491)
khushijain21 added a commit that referenced this pull request Jul 10, 2025
* [kafka] Handle configuration errors
khushijain21 pushed a commit that referenced this pull request Jul 10, 2025
* [kafka] Handle configuration errors
khushijain21 added a commit that referenced this pull request Jul 10, 2025
* [kafka] Handle configuration errors

(cherry picked from commit 23f4491)

Co-authored-by: Khushi Jain <khushi.jain@elastic.co>
khushijain21 pushed a commit that referenced this pull request Jul 10, 2025
khushijain21 pushed a commit that referenced this pull request Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kafka output retries 'too large messages' (message len > max_message_bytes) instead of dropping it (>= v8.18.0)

7 participants