Skip to content

Implement poisoned shard detection and handling in AzureStorageJobShardManager#9907

Merged
ReubenBond merged 5 commits into
dotnet:mainfrom
benjaminpetit:poisoned-shards
Feb 16, 2026
Merged

Implement poisoned shard detection and handling in AzureStorageJobShardManager#9907
ReubenBond merged 5 commits into
dotnet:mainfrom
benjaminpetit:poisoned-shards

Conversation

@benjaminpetit

@benjaminpetit benjaminpetit commented Feb 12, 2026

Copy link
Copy Markdown
Contributor

Fix #9900

Microsoft Reviewers: Open in CodeFlow

Copilot AI review requested due to automatic review settings February 12, 2026 12:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements poisoned shard detection for durable job shard managers by tracking how often shards are “stolen” from dead silos and preventing further assignment once a configurable threshold is exceeded, addressing issue #9900 for Azure Storage and the in-memory implementation.

Changes:

  • Added DurableJobsOptions.MaxStolenCount to configure the poisoned-shard threshold.
  • Updated AzureStorageJobShardManager and InMemoryJobShardManager to track “stolen count” and skip assignment once poisoned.
  • Added InMemoryJobShardManagerTests covering orphaned/stolen/poisoned shard assignment scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
test/NonSilo.Tests/DurableJobs/InMemoryJobShardManagerTests.cs Adds tests for orphaned vs stolen shard reassignment and poison-threshold behavior.
src/Orleans.DurableJobs/JobShardManager.cs Extends InMemoryJobShardManager with stolen-count tracking and poison-threshold logic.
src/Orleans.DurableJobs/Hosting/DurableJobsOptions.cs Introduces MaxStolenCount option and documentation describing the behavior.
src/Azure/Orleans.DurableJobs.AzureStorage/AzureStorageJobShardManager.cs Adds Azure blob metadata tracking (StolenCount, LastStolenTime) and poison-threshold enforcement.
Comments suppressed due to low confidence (1)

src/Orleans.DurableJobs/JobShardManager.cs:218

  • These 'if' statements can be combined.
                if (isOrphaned || isFromDeadSilo)
                {
                    if (ownership.Shard.StartTime <= maxDueTime)
                    {
                        // If stolen from dead silo, increment stolen count
                        if (isFromDeadSilo)
                        {
                            ownership.StolenCount++;

                            // Check if shard is poisoned
                            if (ownership.StolenCount > _maxStolenCount)
                            {
                                // Shard is poisoned - don't assign it
                                continue;
                            }
                        }

                        ownership.OwnerSiloAddress = SiloAddress.ToString();
                        stolenShards.Add(ownership.Shard);
                    }
                }

Comment thread test/NonSilo.Tests/DurableJobs/InMemoryJobShardManagerTests.cs
Comment thread src/Orleans.DurableJobs/Hosting/DurableJobsOptions.cs Outdated
Comment thread src/Orleans.DurableJobs/Hosting/DurableJobsOptions.cs Outdated
Comment thread src/Azure/Orleans.DurableJobs.AzureStorage/AzureStorageJobShardManager.cs Outdated
Comment thread src/Azure/Orleans.DurableJobs.AzureStorage/AzureStorageJobShardManager.cs Outdated
Comment thread src/Orleans.DurableJobs/JobShardManager.cs Outdated
Comment thread test/NonSilo.Tests/DurableJobs/InMemoryJobShardManagerTests.cs
Comment thread test/NonSilo.Tests/DurableJobs/InMemoryJobShardManagerTests.cs Outdated
benjaminpetit and others added 5 commits February 16, 2026 08:43
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace direct DurableJobsOptions constructor usage with IOptions<DurableJobsOptions> and remove redundant constructor overloads. Update Azure durable jobs tests to pass options via IOptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ReubenBond ReubenBond enabled auto-merge February 16, 2026 17:49
@ReubenBond ReubenBond added this pull request to the merge queue Feb 16, 2026
Merged via the queue into dotnet:main with commit ca44fa4 Feb 16, 2026
112 of 113 checks passed
rkargMsft pushed a commit to rkargMsft/orleans that referenced this pull request Feb 27, 2026
…rdManager (dotnet#9907)

* feat: implement poisoned shard detection and handling in AzureStorageJobShardManager

* Fix in-memory durable jobs MaxStolenCount wiring

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR 9907 review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Use IOptions<DurableJobsOptions> in Azure shard manager

Replace direct DurableJobsOptions constructor usage with IOptions<DurableJobsOptions> and remove redundant constructor overloads. Update Azure durable jobs tests to pass options via IOptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rename stolen shard terminology to adopted

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Reuben Bond <reuben.bond@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement a way to handle poisoned job shards

3 participants