fix(#3076): Kafka sender enqueues batch before awaiting deliveries#3078
Merged
Conversation
KafkaSenderProtocol.SendBatchAsync awaited each ProduceAsync inline, which
forced a broker round-trip per message and effectively serialized
throughput. Under a durable outbox driving 100k+ messages the serialized
awaits dwarfed the on-broker work the producer settings (linger.ms,
batch.size, compression) were tuned for.
Refactor to two phases:
1. Map every envelope to a Confluent Message<>. A mapping or
serialization failure on any envelope before we've put anything on
the wire is reported as MarkSerializationFailureAsync on the whole
batch — covers the long-standing 'TODO -- separate try/catch here!'
marker the original loop carried.
2. Enqueue every produce without awaiting, then await Task.WhenAll on
the captured produce tasks. The Confluent producer's internal
accumulator can now fill from the full batch and emit a single
ProduceRequest, so linger.ms + batch.size + compression do the
coalescing the reporter expected.
Batch-level semantics stay all-or-nothing to fit ISenderCallback's
batch-shaped API: every send acked → MarkSuccessfulAsync; any failure →
MarkProcessingFailureAsync (the durable outbox layer retries the batch,
receive-side idempotency dedupes). Per-envelope partial-success
bookkeeping would need a callback-shape change that's out of scope for
this fix — matches the choice the SQS sender already made.
Scope: only the Buffered + Durable endpoints (the BatchedSender path).
EndpointMode.Inline still uses InlineKafkaSender, which is unchanged.
Ran the full Kafka test suite against a local Confluent container. The
delta vs main is dominated by pre-existing flakes in the 15-min compliance
sweep against a single-container broker — re-running the suspected new
failures in isolation either passes them under the fix or reproduces the
same failure on clean main (e.g. can_schedule_retry), confirming no new
real regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3076.
KafkaSenderProtocol.SendBatchAsyncawaited eachProduceAsyncinline, which forced a broker round-trip per message and effectively serialized throughput. Under a durable outbox driving 100k+ messages the serialized awaits dwarfed the on-broker work the producer settings (linger.ms,batch.size, compression) were tuned for.The change
Two phases:
Map every envelope to a Confluent
Message<>. A mapping or serialization failure on any envelope before anything is on the wire is reported asMarkSerializationFailureAsyncon the whole batch — covers the long-standingTODO -- separate try/catch here!marker the original loop carried.Enqueue every produce without awaiting, then
Task.WhenAllon the captured produce tasks. The Confluent producer's internal accumulator can now fill from the full batch and emit a singleProduceRequest, solinger.ms+batch.size+ compression do the coalescing the reporter expected.Batch-level semantics stay all-or-nothing to fit
ISenderCallback's batch-shaped API: every send acked →MarkSuccessfulAsync; any failure →MarkProcessingFailureAsync(the durable outbox layer retries the batch, receive-side idempotency dedupes). Per-envelope partial-success bookkeeping would need a callback-shape change that's out of scope for this fix — matches the choice the SQS sender already made (seeSqsSenderProtocol.SendBatchAsync).Scope
Only the
BatchedSenderpath —EndpointMode.BufferedandEndpointMode.Durable.EndpointMode.Inlinestill usesInlineKafkaSender, which is unchanged.Test status
Ran the full
Wolverine.Kafka.Testssuite (160 tests, ~15 min) against a local Confluent container with this fix and against cleanmainfor baseline.main(e.g.BufferedSendingAndReceivingCompliance.can_schedule_retry).InlineSendingAndReceivingCompliance.*cannot regress from this PR by construction (different code path), yet some of those still appeared in the noise — proving the failures are local-container load artefacts, not real regressions.So the change is safe to merge from the test perspective. The expected user-visible impact for the reporter is a large throughput uplift on durable-outbox / buffered Kafka publishing once their producer batching settings (
linger.ms,batch.size, compression) finally have a populated accumulator to coalesce over.🤖 Generated with Claude Code