Skip to content

[receiver/kafkareceiver] Add support for disabling KIP-320 (truncation detection via leader epoch) for Franz-Go#42796

Merged
songy23 merged 12 commits into
open-telemetry:mainfrom
paulojmdias:fix/42226
Sep 25, 2025
Merged

[receiver/kafkareceiver] Add support for disabling KIP-320 (truncation detection via leader epoch) for Franz-Go#42796
songy23 merged 12 commits into
open-telemetry:mainfrom
paulojmdias:fix/42226

Conversation

@paulojmdias
Copy link
Copy Markdown
Member

Description

Add a new disable_leader_epoch option to the Kafka receiver’s client config. When set, the franz-go consumer clears leader epoch information from fetch offsets, effectively disabling KIP-320 truncation detection. This provides compatibility with brokers that don’t support leader epochs (e.g., Azure Event Hub w/ Premium SKU), while leaving the default behavior unchanged.

Link to tracking issue

Fixes #42226

Testing

Added some tests to validate this implementation.

Missing tests in Azure Event Hub with Premium SKU to ensure this is mitigated

Documentation

Updated README.md with the new option.

…n detection via leader epoch) for Franz-Go

Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
Copy link
Copy Markdown
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paulojmdias! Code looks great.

Comment thread receiver/kafkareceiver/README.md Outdated
- `group_id` (default = otel-collector): The consumer group that receiver will be consuming messages from
- `client_id` (default = otel-collector): The consumer client ID that receiver will use
- `rack_id` (default = ""): The rack identifier for this client. When set and brokers are configured with a rack-aware replica selector, the client will prefer fetching from the closest replica.
- `disable_leader_epoch` (default = false): When enabled, the consumer clears leader epoch information from fetch offsets. This disables `KIP-320` truncation detection in franz-go, which avoids compatibility issues with brokers that don’t fully support the feature (e.g., Azure Event Hubs), at the cost of losing automatic log-truncation safety.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this "enable_leader_epoch" or "use_leader_epoch" to avoid the double negative?

Also, what do you think about labeling this config as experimental? If the issue in Event Hubs is fixed (#42226 (comment)) then maybe this config can go away.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axw I renamed it to use_leader_epoch, as I think it makes more sense. I also marked this as experimental in the README.md file. Do we need to mark it somewhere else?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks. I think the README is good enough. @MovieStoreGuy please let us know if there's something else we should do to mark the feature as experimental.

Comment thread pkg/kafka/configkafka/config.go
Comment thread pkg/kafka/configkafka/config.go Outdated
// standard "client.rack" setting. By default, this is empty.
RackID string `mapstructure:"rack_id"`

// UseLeaderEpoch forces the collector to clear the leader epoch from
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I was unclear, when I said to invert the name I also meant to invert the meaning. So "use_leader_epoch: true" is what franz-go would do be default -- it would use the leader epoch returned by the brokers. Setting to false would disable KIP-320. Does this sound sensible?

Copy link
Copy Markdown
Member Author

@paulojmdias paulojmdias Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense @axw. I updated the logic and also the docs. I also updated the tests for the kafkatopicsobserver to ensure will match the default config.

Please review to see if it makes sense to you now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, thanks. This doc comment needs to be updated now though

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment to match the README.md 👍

paulojmdias and others added 5 commits September 22, 2025 17:47
Co-authored-by: Andrew Wilkins <axwalk@gmail.com>
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
Copy link
Copy Markdown
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paulojmdias LGTM, just needs a doc comment update

Comment thread receiver/kafkareceiver/consumer_franz.go Outdated
Comment thread pkg/kafka/configkafka/config.go Outdated
// standard "client.rack" setting. By default, this is empty.
RackID string `mapstructure:"rack_id"`

// UseLeaderEpoch forces the collector to clear the leader epoch from
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, thanks. This doc comment needs to be updated now though

Comment thread pkg/kafka/configkafka/testdata/client_config.yaml Outdated
paulojmdias and others added 2 commits September 23, 2025 09:58
Copy link
Copy Markdown
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@axw axw added the ready to merge Code review completed; ready to merge by maintainers label Sep 24, 2025
@songy23 songy23 merged commit 379c29b into open-telemetry:main Sep 25, 2025
203 checks passed
@github-actions github-actions Bot added this to the next release milestone Sep 25, 2025
@paulojmdias paulojmdias deleted the fix/42226 branch September 25, 2025 07:18
songy23 pushed a commit that referenced this pull request Oct 10, 2025
#### Description

I would like to propose adding @paulojmdias as a code owner of the Kafka
components. He has been doing excellent work across all of those
components (and more besides!).

-
#42507
-
#42796
-
#43105
-
#42181
-
#42327
-
#42507
-
#42796
-
#43019
-
#43083
-
#43105

#### Link to tracking issue

N/A

#### Testing

N/A

#### Documentation

N/A
ChrsMark pushed a commit to ChrsMark/opentelemetry-collector-contrib that referenced this pull request Oct 20, 2025
#### Description

I would like to propose adding @paulojmdias as a code owner of the Kafka
components. He has been doing excellent work across all of those
components (and more besides!).

-
open-telemetry#42507
-
open-telemetry#42796
-
open-telemetry#43105
-
open-telemetry#42181
-
open-telemetry#42327
-
open-telemetry#42507
-
open-telemetry#42796
-
open-telemetry#43019
-
open-telemetry#43083
-
open-telemetry#43105

#### Link to tracking issue

N/A

#### Testing

N/A

#### Documentation

N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[receiver/kafka] Offset data loss when switching existing system to FranzGo

4 participants