Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
0f30e7b
Add async overloads to multi-node TestConductor APIs
Aaronontheweb Aug 12, 2025
2585f10
Convert TestConductorSpec to use new async APIs
Aaronontheweb Aug 12, 2025
cca2862
Remove Mono API verification files
Aaronontheweb Aug 12, 2025
eb99457
Migrate RemoteNodeDeathWatchSpec to use async APIs
Aaronontheweb Aug 12, 2025
5aa9ceb
Replace blocking .Wait() calls in StressSpec
Aaronontheweb Aug 12, 2025
0ce3f59
Note: StressSpec requires comprehensive async refactor
Aaronontheweb Aug 12, 2025
30dd3d7
Revert "Replace blocking .Wait() calls in StressSpec"
Aaronontheweb Aug 12, 2025
8874a48
Convert StressSpec to 100% async
Aaronontheweb Aug 12, 2025
81bada1
Convert LeaderElectionSpec to async
Aaronontheweb Aug 12, 2025
0f090a1
Convert ClusterAccrualFailureDetectorSpec to async
Aaronontheweb Aug 12, 2025
b346b58
Add multi-node test async migration guide and tracking tools
Aaronontheweb Aug 12, 2025
82dbe72
Convert DistributedPubSubRestartSpec to async
Aaronontheweb Aug 12, 2025
68c3b66
Merge branch 'dev' into mntr-async-overloads
Arkatufus Aug 13, 2025
3fd3354
Fix DistributedPubSubRestartSpec async implementation
Arkatufus Aug 13, 2025
2734adf
Merge branch 'mntr-async-overloads' of github.com:Aaronontheweb/akka.…
Arkatufus Aug 13, 2025
355fdb7
Fix ClusterAccrualFailureDetectorSpec async implementation
Arkatufus Aug 13, 2025
7ab9235
Fix LeaderElectionSpec async implementation
Arkatufus Aug 13, 2025
355b613
Fix StressSpec async implementation
Arkatufus Aug 13, 2025
d8d62b4
Fix Conductor async implementation
Arkatufus Aug 13, 2025
37025f8
Fix MultiNodeSpec async implementation
Arkatufus Aug 13, 2025
084a71d
Fix Player async implementation
Arkatufus Aug 13, 2025
9d16b91
Fix RemoteNodeDeathWatchSpec async impl
Arkatufus Aug 13, 2025
b875c3a
Fix TestconductorSpec async impl
Arkatufus Aug 13, 2025
e1f7a9a
Merge branch 'dev' into mntr-async-overloads
Arkatufus Aug 13, 2025
7290525
Async impl cleanup
Arkatufus Aug 13, 2025
125badf
Merge branch 'mntr-async-overloads' of github.com:Aaronontheweb/akka.…
Arkatufus Aug 13, 2025
dd8d2e8
restored old sync methods
Aaronontheweb Aug 13, 2025
bc6db6e
remove `JoinAsync` call
Aaronontheweb Aug 13, 2025
2562fe2
Remove JoinAsync()
Arkatufus Aug 13, 2025
4cdefdb
Merge branch 'mntr-async-overloads' of github.com:Aaronontheweb/akka.…
Arkatufus Aug 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
254 changes: 254 additions & 0 deletions MULTINODE_TEST_ASYNC_MIGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Multi-Node Test Async Migration Guide

## Overview
This guide helps migrate multi-node tests from blocking synchronous calls to async/await patterns to prevent thread pool starvation and test timeouts in CI environments.

## Why This Migration Is Necessary
- **Root Cause**: Blocking `.Wait()` calls on TestConductor operations cause thread pool starvation
- **Symptoms**: 20+ second timeout failures in CI environments
- **Solution**: Replace all blocking calls with proper async/await patterns

## Migration Patterns to Look For

### 1. TestConductor Blocking Calls
**Look for these patterns:**
```csharp
// OLD - Blocking
TestConductor.Exit(role, 0).Wait();
TestConductor.Blackhole(node1, node2, direction).Wait();
TestConductor.PassThrough(node1, node2, direction).Wait();
TestConductor.Throttle(node1, node2, direction, rate).Wait();
TestConductor.Disconnect(node1, node2).Wait();
TestConductor.Shutdown(node, abort).Wait();
TestConductor.RemoveNode(node).Wait();

// NEW - Async
await TestConductor.ExitAsync(role, 0);
await TestConductor.BlackholeAsync(node1, node2, direction);
await TestConductor.PassThroughAsync(node1, node2, direction);
await TestConductor.ThrottleAsync(node1, node2, direction, rate);
await TestConductor.DisconnectAsync(node1, node2);
await TestConductor.ShutdownAsync(node, abort);
await TestConductor.RemoveNodeAsync(node);
```

### 2. Barrier Synchronization
**Look for:**
```csharp
// OLD
EnterBarrier("barrier-name");
EnterBarrier("barrier-1", "barrier-2");

// NEW
await EnterBarrierAsync("barrier-name");
await EnterBarrierAsync("barrier-1", "barrier-2");
```

### 3. RunOn with Async Operations
**Look for:**
```csharp
// OLD
RunOn(() => {
TestConductor.Exit(role, 0).Wait();
}, roles);

// NEW
await RunOnAsync(async () => {
await TestConductor.ExitAsync(role, 0);
}, roles);
```

### 4. Within Blocks
**Look for:**
```csharp
// OLD
Within(TimeSpan.FromSeconds(30), () => {
// operations
EnterBarrier("done");
});

// NEW
await WithinAsync(TimeSpan.FromSeconds(30), async () => {
// operations
await EnterBarrierAsync("done");
});
```

### 5. Test Method Signatures
**Change:**
```csharp
// OLD
[MultiNodeFact]
public void TestName()

// NEW
[MultiNodeFact]
public async Task TestName()
```

### 6. Helper Method Signatures
**Change:**
```csharp
// OLD
public void HelperMethod()

// NEW
public async Task HelperMethod()
```

## Required Imports
Add if missing:
```csharp
using System.Threading.Tasks;
```

## Migration Checklist

### ✅ Completed Tests
- [x] StressSpec
- [x] LeaderElectionSpec
- [x] ClusterAccrualFailureDetectorSpec
- [x] TestConductorSpec (in Remote.Tests.MultiNode)
- [x] RemoteNodeDeathWatchSpec (in Remote.Tests.MultiNode)

### Core Tests - Akka.Cluster.Tests.MultiNode
- [ ] AttemptSysMsgRedeliverySpec
- [ ] ClientDowningNodeThatIsUnreachableSpec
- [ ] ClusterDeathWatchSpec
- [ ] ConvergenceSpec
- [ ] LeaderDowningAllOtherNodesSpec
- [ ] LeaderDowningNodeThatIsUnreachableSpec
- [ ] SingletonClusterSpec
- [ ] SplitBrainResolverDowningSpec
- [ ] SplitBrainSpec
- [ ] SurviveNetworkInstabilitySpec
- [ ] UnreachableNodeJoinsAgainSpec

### Core Tests - Akka.Cluster.Tests.MultiNode/Routing
- [ ] ClusterRoundRobinSpec

### Core Tests - Akka.Cluster.Tests.MultiNode/SBR (Split Brain Resolver)
- [ ] DownAllIndirectlyConnected5NodeSpec
- [ ] DownAllUnstable5NodeSpec
- [ ] IndirectlyConnected3NodeSpec
- [ ] IndirectlyConnected5NodeSpec
- [ ] LeaseMajority5NodeSpec

### Core Tests - Akka.Remote.Tests.MultiNode
- [ ] RemoteNodeRestartGateSpec
- [ ] RemoteNodeShutdownAndComesBackSpec
- [ ] RemoteReDeploymentSpec
- [ ] RemoteRestartedQuarantinedSpec

### Contrib Tests - Akka.Cluster.Sharding.Tests.MultiNode
- [ ] ClusterShardCoordinatorDowning2Spec
- [ ] ClusterShardCoordinatorDowningSpec
- [ ] ClusterShardingFailureSpec
- [ ] ClusterShardingRememberEntitiesNewExtractorSpec
- [ ] ClusterShardingRememberEntitiesSpec
- [ ] ClusterShardingSingleShardPerEntitySpec
- [ ] ClusterShardingSpec

### Contrib Tests - Akka.Cluster.Tools.Tests.MultiNode
- [ ] ClusterClient/ClusterClientDiscoverySpec
- [ ] ClusterClient/ClusterClientSpec
- [ ] PublishSubscribe/DistributedPubSubMediatorSpec
- [x] PublishSubscribe/DistributedPubSubRestartSpec
- [ ] Singleton/ClusterSingletonManagerDownedSpec
- [ ] Singleton/ClusterSingletonManagerSpec

### Tests That May Need EnterBarrier -> EnterBarrierAsync Migration
Additional tests that use EnterBarrier but may not have TestConductor blocking calls still need to be converted for consistency. Run this to find them:
```bash
find src -name "*.cs" -path "*Tests.MultiNode*" -exec grep -l "EnterBarrier(" {} \;
```

## Migration Steps

1. **Add async Task import**
```csharp
using System.Threading.Tasks;
```

2. **Convert test method signature**
- Change `public void` to `public async Task`

3. **Find and replace blocking patterns**
- Search for `.Wait()` calls
- Search for `EnterBarrier(`
- Search for `Within(`
- Search for `RunOn(` with async operations inside

4. **Update method calls**
- Add `await` keyword before async calls
- Change method names to async versions (add `Async` suffix)
- Update lambdas to `async` when needed

5. **Update helper methods**
- Convert any helper methods that now contain async calls
- Propagate async/await up the call chain

6. **Build and verify**
```bash
dotnet build src/core/Akka.Cluster.Tests.MultiNode/Akka.Cluster.Tests.MultiNode.csproj -c Release
```

7. **Run tests (example)**
```bash
dotnet test src/core/Akka.Cluster.Tests.MultiNode/Akka.Cluster.Tests.MultiNode.csproj \
-c Release --filter "FullyQualifiedName~YourTestName" --framework net8.0
```

## Common Pitfalls to Avoid

1. **Don't use ConfigureAwait(false) in tests**
- Tests should maintain their synchronization context

2. **Don't use GetAwaiter().GetResult()**
- This is just as bad as .Wait() for blocking

3. **Ensure all async operations are awaited**
- Missing awaits can cause race conditions

4. **Watch for nested RunOn calls**
- Inner RunOn may need to become RunOnAsync if it contains async operations

5. **Don't forget lambda async modifiers**
```csharp
// Wrong
ReportResult(() => { await SomeAsync(); });

// Right
ReportResult(async () => { await SomeAsync(); });
```

## Verification Commands

Check for remaining blocking calls:
```bash
# Find .Wait() calls
grep -r "\.Wait()" src --include="*.cs" | grep -i multinode

# Find EnterBarrier calls
grep -r "EnterBarrier(" src --include="*.cs" | grep -i multinode

# Find TestConductor blocking calls
grep -r "TestConductor\.[A-Z].*\.Wait()" src --include="*.cs"
```

## Git Commit Message Template
```
Convert [TestName] to async

- Convert main test method to async Task
- Replace TestConductor.[Method]().Wait() with await TestConductor.[Method]Async()
- Replace EnterBarrier with EnterBarrierAsync
- Use RunOnAsync for async operations
- Use WithinAsync for async timing constraints
- Add using System.Threading.Tasks
```

## Notes
- This migration improves test reliability by preventing thread pool starvation
- Tests should run faster and more reliably in CI environments
- The async APIs provide better cancellation support via CancellationToken
94 changes: 94 additions & 0 deletions check-multinode-migration.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
#!/bin/bash

# Multi-Node Test Async Migration Status Checker
# This script helps identify which multi-node tests still need async migration

echo "========================================="
echo "Multi-Node Test Async Migration Status"
echo "========================================="
echo ""

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Function to check a directory
check_directory() {
local dir=$1
local name=$2

echo -e "${YELLOW}Checking $name:${NC}"
echo "----------------------------------------"

# Find files with blocking TestConductor calls
local blocking_files=$(find "$dir" -name "*.cs" -exec grep -l "TestConductor.*\.Wait()" {} \; 2>/dev/null | sort)

if [ -z "$blocking_files" ]; then
echo -e "${GREEN}✓ No TestConductor blocking calls found${NC}"
else
echo -e "${RED}✗ Files with TestConductor.*.Wait() calls:${NC}"
for file in $blocking_files; do
basename_file=$(basename "$file")
count=$(grep -c "\.Wait()" "$file")
echo " - $basename_file ($count .Wait() calls)"
done
fi

# Check for EnterBarrier (non-async)
local barrier_count=$(find "$dir" -name "*.cs" -exec grep -l "EnterBarrier(" {} \; 2>/dev/null | wc -l)
if [ "$barrier_count" -gt 0 ]; then
echo -e "${YELLOW}⚠ $barrier_count files still use EnterBarrier (should be EnterBarrierAsync)${NC}"
fi

# Check for Within (non-async)
local within_count=$(find "$dir" -name "*.cs" -exec grep -l "Within(" {} \; 2>/dev/null | wc -l)
if [ "$within_count" -gt 0 ]; then
echo -e "${YELLOW}⚠ $within_count files use Within (may need WithinAsync)${NC}"
fi

echo ""
}

# Check core tests
check_directory "src/core/Akka.Cluster.Tests.MultiNode" "Akka.Cluster.Tests.MultiNode"
check_directory "src/core/Akka.Remote.Tests.MultiNode" "Akka.Remote.Tests.MultiNode"

# Check contrib tests
if [ -d "src/contrib/cluster/Akka.Cluster.Sharding.Tests.MultiNode" ]; then
check_directory "src/contrib/cluster/Akka.Cluster.Sharding.Tests.MultiNode" "Akka.Cluster.Sharding.Tests.MultiNode"
fi

if [ -d "src/contrib/cluster/Akka.Cluster.Tools.Tests.MultiNode" ]; then
check_directory "src/contrib/cluster/Akka.Cluster.Tools.Tests.MultiNode" "Akka.Cluster.Tools.Tests.MultiNode"
fi

if [ -d "src/contrib/cluster/Akka.Cluster.Metrics.Tests.MultiNode" ]; then
check_directory "src/contrib/cluster/Akka.Cluster.Metrics.Tests.MultiNode" "Akka.Cluster.Metrics.Tests.MultiNode"
fi

if [ -d "src/contrib/cluster/Akka.DistributedData.Tests.MultiNode" ]; then
check_directory "src/contrib/cluster/Akka.DistributedData.Tests.MultiNode" "Akka.DistributedData.Tests.MultiNode"
fi

echo "========================================="
echo "Summary"
echo "========================================="

# Count total blocking files
total_blocking=$(find src -name "*.cs" -path "*Tests.MultiNode*" -exec grep -l "TestConductor.*\.Wait()" {} \; 2>/dev/null | wc -l)
total_files=$(find src -name "*.cs" -path "*Tests.MultiNode*" 2>/dev/null | wc -l)

echo "Total multi-node test files: $total_files"
echo -e "${RED}Files with blocking TestConductor calls: $total_blocking${NC}"

if [ "$total_blocking" -eq 0 ]; then
echo -e "${GREEN}🎉 All TestConductor blocking calls have been migrated!${NC}"
else
echo -e "${YELLOW}⚠ Migration still needed for $total_blocking files${NC}"
fi

echo ""
echo "Run this script periodically to track migration progress."
echo "See MULTINODE_TEST_ASYNC_MIGRATION.md for migration guide."
Loading
Loading