Fix race condition in multi-producer sharding delivery test#7975
Merged
Conversation
The test ReliableDelivery_with_Sharding_must_illustrate_Sharding_usage_with_several_producers
was experiencing intermittent failures due to a premature entity termination race condition.
Root cause: When multiple producers send messages to the same sharded entities, each
producer-entity pair maintains independent sequence numbers (1-42). The test's end condition
(seqNr >= 42) would trigger when ANY producer reached seqNr 42, causing the entity to stop
immediately, before other producers could complete their message delivery.
This resulted in the test expecting 3 Collected messages (one per entity) containing all 6
producer IDs (p1-entity-{0,1,2} and p2-entity-{0,1,2}), but entities would stop after only
receiving messages from the first producer to complete, causing timeouts.
Solution: Modified TestConsumer to track per-producer completion:
- Added expectedProducerCount parameter (defaults to 1 for backward compatibility)
- Track which producers have met the end condition in _completedProducers set
- Only stop the entity when ALL expected producers have completed
- Updated multi-producer test to pass expectedProducerCount: 2
This ensures entities properly wait for all producers to complete message delivery before
terminating, eliminating the race condition without relying on timeout adjustments.
Verified with 20 consecutive successful test runs and full test suite validation.
Aaronontheweb
commented
Dec 22, 2025
| EndReplyTo.Tell(new Collected(_processed.Select(c => c.Item1).ToImmutableHashSet(), _messageCount + 1)); | ||
| Context.Stop(Self); | ||
| // Track that this producer has completed | ||
| if (!_completedProducers.Contains(job.ProducerId)) |
Member
Author
There was a problem hiding this comment.
This is the key fix - need to track completions individually per-producer if there are more than 1 of them messaging the same TestConsumer
…_illustrate_Sharding_usage_with_several_producers
…_illustrate_Sharding_usage_with_several_producers
Arkatufus
pushed a commit
to Arkatufus/akka.net
that referenced
this pull request
Jan 7, 2026
…et#7975) The test ReliableDelivery_with_Sharding_must_illustrate_Sharding_usage_with_several_producers was experiencing intermittent failures due to a premature entity termination race condition. Root cause: When multiple producers send messages to the same sharded entities, each producer-entity pair maintains independent sequence numbers (1-42). The test's end condition (seqNr >= 42) would trigger when ANY producer reached seqNr 42, causing the entity to stop immediately, before other producers could complete their message delivery. This resulted in the test expecting 3 Collected messages (one per entity) containing all 6 producer IDs (p1-entity-{0,1,2} and p2-entity-{0,1,2}), but entities would stop after only receiving messages from the first producer to complete, causing timeouts. Solution: Modified TestConsumer to track per-producer completion: - Added expectedProducerCount parameter (defaults to 1 for backward compatibility) - Track which producers have met the end condition in _completedProducers set - Only stop the entity when ALL expected producers have completed - Updated multi-producer test to pass expectedProducerCount: 2 This ensures entities properly wait for all producers to complete message delivery before terminating, eliminating the race condition without relying on timeout adjustments. Verified with 20 consecutive successful test runs and full test suite validation.
Aaronontheweb
added a commit
that referenced
this pull request
Jan 8, 2026
The test ReliableDelivery_with_Sharding_must_illustrate_Sharding_usage_with_several_producers
was experiencing intermittent failures due to a premature entity termination race condition.
Root cause: When multiple producers send messages to the same sharded entities, each
producer-entity pair maintains independent sequence numbers (1-42). The test's end condition
(seqNr >= 42) would trigger when ANY producer reached seqNr 42, causing the entity to stop
immediately, before other producers could complete their message delivery.
This resulted in the test expecting 3 Collected messages (one per entity) containing all 6
producer IDs (p1-entity-{0,1,2} and p2-entity-{0,1,2}), but entities would stop after only
receiving messages from the first producer to complete, causing timeouts.
Solution: Modified TestConsumer to track per-producer completion:
- Added expectedProducerCount parameter (defaults to 1 for backward compatibility)
- Track which producers have met the end condition in _completedProducers set
- Only stop the entity when ALL expected producers have completed
- Updated multi-producer test to pass expectedProducerCount: 2
This ensures entities properly wait for all producers to complete message delivery before
terminating, eliminating the race condition without relying on timeout adjustments.
Verified with 20 consecutive successful test runs and full test suite validation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a race condition in the test
ReliableDelivery_with_Sharding_must_illustrate_Sharding_usage_with_several_producersthat was causing intermittent failures.Root Cause
When multiple producers send messages to the same sharded entities:
{producerId}-{entityId}(e.g.,p1-entity-0,p2-entity-0)seqNr >= 42would trigger when any producer reached seqNr 42The Race
p1-entity-0)p2-entity-0)Collectedmessage with only one producer IDSolution
Modified
TestConsumerto track per-producer completion:expectedProducerCountparameter (defaults to 1 for backward compatibility)_completedProducerssetexpectedProducerCount: 2Changes
src/core/Akka.Tests/Delivery/TestConsumer.cs
_expectedProducerCountand_completedProducersfieldsexpectedProducerCountsrc/contrib/cluster/Akka.Cluster.Sharding.Tests/Delivery/ReliableDeliveryShardingSpec.cs
expectedProducerCount: 2in line 91Test Plan
ReliableDeliveryShardingSpecpassImpact
This is a test-only fix with no production code changes. The fix ensures that multi-producer reliable delivery tests properly validate the scenario they're designed to test, rather than failing due to race conditions in the test infrastructure.