Skip to content

feat(producer): partition muting for msg ordering#3422

Merged
dnwe merged 3 commits intomainfrom
dnwe/muter
Mar 18, 2026
Merged

feat(producer): partition muting for msg ordering#3422
dnwe merged 3 commits intomainfrom
dnwe/muter

Conversation

@dnwe
Copy link
Copy Markdown
Collaborator

@dnwe dnwe commented Jan 10, 2026

This commit attempts to copy the Java client's concept of a RecordAccumulator, which "mutes" a partition when a batch is in-flight. This prevents a subsequent batch for the same partition from being sent before the first one has been acknowledged.

The brokerProducer now mutes a batch's partitions before sending and unmutes them only after the request is complete (including the queuing of any retries). A partitionMuter is introduced within asyncProducer to track and manage in-flight requests for each topic-partition and the waitForSpace logic is updated to respect partition mutes.

When MaxOpenRequests > 1, mute partitions with in-flight batches to
prevent out-of-order delivery during retries.

Signed-off-by: Dominic Evans <dominic.evans@uk.ibm.com>
Retry with backoff on ErrLeaderNotAvailable and ErrOffsetNotAvailable.

Signed-off-by: Dominic Evans <dominic.evans@uk.ibm.com>
@dnwe dnwe marked this pull request as ready for review January 10, 2026 20:57
@dnwe dnwe added the feat label Jan 11, 2026
Copy link
Copy Markdown
Collaborator

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😰 lotsa code, but looks great

Comment thread async_producer.go Outdated
// Requires: m.mu held.
func (m *partitionMuter) isMuted(topic string, partition int32) bool {
if partitions := m.inFlightCounts[topic]; partitions != nil {
return partitions[partition] > 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If partitions == nil then partitions[partition] > 0 == false.

So this function is semantically identical to just return m.inFlightCounts[topic][partition] > 0

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Comment thread async_producer.go Outdated
// Requires: m.mu held.
func (m *partitionMuter) isAnyMuted(set *produceSet) bool {
muted := false
set.eachPartition(func(topic string, partition int32, _ *partitionSet) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We might want to consider reworking this function to work similar to an iter.Seq2 like function, with an early return bool.)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an ‘anyPartition’ to the produceSet with that pattern and calling that here

Comment thread async_producer.go Outdated
Comment on lines +1326 to +1329
select {
case <-bp.timer.C:
default:
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of go1.23, this receive is no longer necessary as the channel cannot hold a stale value.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! Thanks

Comment thread produce_set.go Outdated
return out
}

func (ps *produceSet) filteredCopy(predicate func(topic string, partition int32) bool) *produceSet {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually find “filter” to be ambiguous, as after one has filtered something, one can keep the filtrate, or the residue.

In keeping with the slices package, copyFunc would be a parallel?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, fixed

Signed-off-by: Dominic Evans <dominic.evans@uk.ibm.com>
@dnwe
Copy link
Copy Markdown
Collaborator Author

dnwe commented Jan 11, 2026

😰 lotsa code, but looks great

😅 yeah I’ve written and thrown away so many branches where I’ve attempted this over the last couple of years because it ended up breaking more than it fixed. This is the first time where it seemed to have got a place that was reasonably understandable and seemed to behave across tests enough to actually consider merging it

@puellanivis
Copy link
Copy Markdown
Collaborator

It definitely looks like you’ve spent a few prototypes figuring out how to not write it. It’s pretty solid. 👍 I don’t see anything to comment about anymore.

@dnwe
Copy link
Copy Markdown
Collaborator Author

dnwe commented Mar 16, 2026

@puellanivis are you happy to re-review this since you became maintainer so that it counts as an approving review? 😅

@puellanivis
Copy link
Copy Markdown
Collaborator

😅

@dnwe dnwe merged commit ae36df4 into main Mar 18, 2026
17 checks passed
@dnwe dnwe deleted the dnwe/muter branch March 18, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants