Move dead-silo message break to connection maintainer#10090
Closed
ReubenBond wants to merge 2 commits into
Closed
Conversation
This was referenced May 12, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR continues the directory/membership cleanup split by relocating “break outstanding messages to dead silo” into the networking layer (SiloConnectionMaintainer) and further refactoring LocalGrainDirectory to reconcile membership from IClusterMembershipService snapshots instead of status-oracle event plumbing.
Changes:
- Move
BreakOutstandingMessagesToSiloinvocation toSiloConnectionMaintainerwhen a remote silo is markedDead. - Refactor
LocalGrainDirectoryto process membership snapshots/updates, adjust local directory/cache viaIsDefunctActivation, and take over terminating-silo activation deactivation logic previously inCatalog. - Update handoff/retry behavior and add targeted unit tests for defunct-activation determination.
Show a summary per file
| File | Description |
|---|---|
| test/Orleans.Runtime.Internal.Tests/LocalGrainDirectoryTests.cs | Adds unit coverage for LocalGrainDirectory.IsDefunctActivation behavior across silo statuses/unknowns/successors. |
| test/Orleans.Core.Tests/Directory/CachedGrainLocatorTests.cs | Updates test wiring for LocalGrainDirectory’s new IClusterMembershipService dependency. |
| src/Orleans.Runtime/Networking/SiloConnectionMaintainer.cs | Breaks outstanding messages and closes connections when a remote silo is marked Dead. |
| src/Orleans.Runtime/GrainDirectory/LocalGrainDirectory.cs | Moves to snapshot-driven membership processing; centralizes terminating-silo activation deactivation and defunct-entry cleanup. |
| src/Orleans.Runtime/GrainDirectory/ILocalGrainDirectory.cs | Removes the IsSiloInCluster method from the internal interface surface. |
| src/Orleans.Runtime/GrainDirectory/GrainDirectoryHandoffManager.cs | Adjusts successor checks, filters terminating addresses during handoff acceptance, and changes pending-operation retry/dequeue behavior. |
| src/Orleans.Runtime/Catalog/Catalog.cs | Removes catalog’s silo-status-change directory cleanup/break-outstanding-message responsibilities. |
Copilot's findings
Comments suppressed due to low confidence (1)
src/Orleans.Runtime/GrainDirectory/LocalGrainDirectory.cs:677
- Same as RegisterAsync:
hopCount > 1skips the retry delay/re-check on the first forwarded hop (hopCount==1), despite the comment saying this should happen after the first forward and other operations usinghopCount > 0. This can lead to rapid multi-hop forwarding when ownership is in flux. Consider usinghopCount > 0here (or align all forwarding methods to the same semantics).
// After the first forward, we insert a retry delay and recheck owner before forwarding again
if (hopCount > 1 && forwardAddress != null)
{
await Task.Delay(RETRY_DELAY);
forwardAddress = this.CheckIfShouldForward(address.GrainId, hopCount, "UnregisterAsync");
- Files reviewed: 7/7 changed files
- Comments generated: 2
f611705 to
3f786dc
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When background Orleans work logs after xUnit has cleared the current test context, ITestOutputHelper throws InvalidOperationException with "There is no currently active test." That exception can escape through Microsoft.Extensions.Logging and abort the test host. Fall back to stderr for that specific late-log case so the original runtime log is still emitted without crashing the test process. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3f786dc to
6f907c3
Compare
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part 5 of 7 split from #10085.
Problem:
Breaking outstanding messages to a dead silo was performed from LocalGrainDirectory/Catalog-related cleanup, even though it is unrelated to grain-directory state.
Solution:
Move BreakOutstandingMessagesToSilo handling to SiloConnectionMaintainer when a remote silo is marked Dead, keeping LocalGrainDirectory focused on directory reconciliation.
Stack:
Merge after #10089. This branch is stacked on split/pr10085-04-processing-cleanup; until earlier PRs merge, GitHub may show earlier stack changes. Incremental compare: ReubenBond/orleans@split/pr10085-04-processing-cleanup...split/pr10085-05-dead-silo-message-break
Review focus:
Dead-silo message break placement and LocalGrainDirectory/Catalog separation of concerns.