Add async overloads to multi-node TestConductor APIs#7750
Merged
Aaronontheweb merged 30 commits intoAug 13, 2025
Conversation
Addresses GitHub issue akkadotnet#4146 (open for 5+ years) by adding async versions of all blocking TestConductor methods to eliminate thread pool starvation and timeout issues in multi-node tests. Changes: - Added EnterAsync() methods with CancellationToken support - Added ExitAsync(), BlackholeAsync(), PassThroughAsync() methods - Added GetAddressForAsync(), GetNodesAsync(), RemoveNodeAsync() methods - Added EnterBarrierAsync() and NodeAsync() to MultiNodeSpec - Implemented sync-over-async pattern for backward compatibility - All existing synchronous methods now delegate to async versions This maintains 100% backward compatibility while providing async alternatives that will eliminate the 20+ second timeout failures in CI/CD pipelines.
Migrated existing TestConductorSpec test to use the new async methods: - Changed test method to async Task - Replaced Throttle().Wait() with ThrottleAsync() - Used RunOnAsync for async operations This validates the new async APIs work correctly and provides an example of how to migrate existing multi-node tests.
These files are not needed for the project and were accidentally included.
Successfully migrated two test methods to be fully async: - RemoteNodeDeathWatch_must_receive_Terminated_when_watched_node_crashAsync - RemoteNodeDeathWatch_must_cleanup_when_watching_node_crashAsync Changes: - Converted test methods to async Task - Replaced TestConductor.Exit().Wait() with ExitAsync() - Used RunOnAsync for all async operations - Replaced EnterBarrier with EnterBarrierAsync - Added System.Threading.Tasks using directive All 6 test node variations pass successfully, demonstrating the async APIs work correctly in real multi-node test scenarios.
Replaced .Wait() calls with .GetAwaiter().GetResult() to avoid potential deadlocks while maintaining synchronous execution model. This is a transitional step before fully converting StressSpec to async. Changes: - TestConductor.Exit().Wait() -> GetAwaiter().GetResult() - TestConductor.Blackhole().Wait() -> GetAwaiter().GetResult() This eliminates the blocking wait calls that were causing thread pool starvation in CI environments.
StressSpec contains deeply nested synchronous test structures (Within, RunOn, ReportResult) that make it impossible to properly await async TestConductor methods without a complete rewrite. The current TestConductor.Exit() and TestConductor.Blackhole() methods now internally use the async versions, which is an improvement over the previous direct blocking calls, but a full async conversion of StressSpec would require: 1. Converting Within() to support async operations 2. Converting ReportResult() to be async 3. Converting all test orchestration methods to async 4. Updating all callers throughout the test This is tracked as future work.
This reverts commit 5aa9ceb.
- Convert main test method Cluster_under_stress to async Task - Convert all Must* helper methods to async Task - Fix ReportResult lambda expressions to be async - Use WithinAsync instead of Within for async operations - Replace all TestConductor.Exit().Wait() with await TestConductor.ExitAsync() - Use RunOnAsync for async operations - Replace EnterBarrier with EnterBarrierAsync calls
- Convert main test method LeaderElectionSpecs to async Task - Convert ShutdownLeaderAndVerifyNewLeader to async Task - Replace TestConductor.Exit().Wait() with await TestConductor.ExitAsync() - Convert all Cluster_of_four_nodes_* methods to async Task - Use WithinAsync instead of Within for async operations - Replace all EnterBarrier calls with EnterBarrierAsync - Add using System.Threading.Tasks
- Convert main test method to async Task - Convert all test helper methods to async Task - Replace TestConductor blocking calls with async versions: - Blackhole().Wait() -> BlackholeAsync() - PassThrough().Wait() -> PassThroughAsync() - Exit().Wait() -> ExitAsync() - Replace all EnterBarrier calls with EnterBarrierAsync - Use RunOnAsync for async operations - Add using System.Threading.Tasks
- Create comprehensive migration guide (MULTINODE_TEST_ASYNC_MIGRATION.md) - Documents all blocking patterns to replace - Provides before/after code examples - Lists common pitfalls to avoid - Includes complete checklist of 34 test files needing migration - Add migration status checker script (check-multinode-migration.sh) - Automatically finds tests with blocking TestConductor calls - Counts .Wait() calls per file - Identifies tests using synchronous EnterBarrier - Tracks tests using Within that may need WithinAsync - Provides color-coded status output Current status: - 34 test files still have blocking TestConductor calls - 116 files still use synchronous EnterBarrier - 78 files use Within blocks that may need async conversion - 5 tests already migrated successfully This tooling helps track and manage the async migration effort across all 157 multi-node test files in the codebase.
- Convert test methods to async Task - Replace EnterBarrier with EnterBarrierAsync - Use WithinAsync for async timing constraints - Use RunOnAsync for async operations - Convert WhenTerminated.Wait() to WaitAsync() - Keep TestConductor.Shutdown().Wait() as no async version exists - Add using System.Threading.Tasks This eliminates race conditions caused by blocking calls and thread pool starvation.
Arkatufus
suggested changes
Aug 13, 2025
…net into mntr-async-overloads
Contributor
|
Waiting to see if everything turned green for this PR |
Arkatufus
approved these changes
Aug 13, 2025
Contributor
Arkatufus
left a comment
There was a problem hiding this comment.
Removing my "request for change" since I've modified the PR.
Would love to have another pair of eyes to check this PR.
Aaronontheweb
commented
Aug 13, 2025
Member
Author
Aaronontheweb
left a comment
There was a problem hiding this comment.
Found some no-nos
| "controller"); | ||
| "controller"); | ||
|
|
||
| var node = await _controller.Ask<IPEndPoint>(TestKit.Controller.GetSockAddr.Instance, Settings.QueryTimeout).ConfigureAwait(false); |
| { | ||
| try | ||
| { | ||
| var result = await Controller.Ask(new Terminate(node, new Right<bool, int>(exitValue)), Settings.QueryTimeout, cancellationToken).ConfigureAwait(false); |
Member
Author
There was a problem hiding this comment.
Need to remove all ConfigureAwait(false) from here
This was referenced May 21, 2026
Open
Closed
Open
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses GitHub issue #4146 (open for 5+ years) by adding async versions of all blocking TestConductor methods to eliminate thread pool starvation and timeout issues in multi-node tests.
Changes
CancellationTokensupport to all TestConductor methods:EnterAsync(),EnterBarrierAsync()ExitAsync(),BlackholeAsync(),PassThroughAsync()GetAddressForAsync(),GetNodesAsync(),RemoveNodeAsync()NodeAsync(),ThrottleAsync()TestConductorSpec- validates the new async APIs work correctlyRemoteNodeDeathWatchSpec- shows full async migration patternWhy This Matters
Based on investigation, the lack of async TestConductor methods is the root cause of:
Compatibility
Next Steps
Fixes #4146