Skip to content

Add concurrent job slow start for durable jobs#9910

Merged
ReubenBond merged 5 commits into
dotnet:mainfrom
benjaminpetit:feature/durable-jobs-slow-start
Feb 17, 2026
Merged

Add concurrent job slow start for durable jobs#9910
ReubenBond merged 5 commits into
dotnet:mainfrom
benjaminpetit:feature/durable-jobs-slow-start

Conversation

@benjaminpetit

@benjaminpetit benjaminpetit commented Feb 12, 2026

Copy link
Copy Markdown
Contributor

Gradually increase job concurrency during startup to avoid starvation issues that can occur before caches, connection pools, and thread pool sizing have warmed up.

The semaphore starts at SlowStartInitialConcurrency (default: ProcessorCount) and doubles every SlowStartInterval (default: 10s) until MaxConcurrentJobsPerSilo is reached. The ramp-up begins when the first shard starts executing.

New DurableJobsOptions:

  • ConcurrencySlowStartEnabled (default: true)
  • SlowStartInitialConcurrency (default: Environment.ProcessorCount)
  • SlowStartInterval (default: 10 seconds)

Related issue: #9750

Microsoft Reviewers: Open in CodeFlow

Copilot AI review requested due to automatic review settings February 12, 2026 13:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a “concurrency slow start” mechanism for Durable Jobs to ramp up job execution concurrency during startup, aiming to reduce starvation before runtime caches/pools/threadpool warm up.

Changes:

  • Add new DurableJobsOptions settings to control slow-start behavior (enabled flag, initial concurrency, ramp interval) plus validation.
  • Update ShardExecutor to initialize a lower concurrency semaphore and ramp it up over time.
  • Add unit tests covering slow start enabled/disabled behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/Orleans.DurableJobs/ShardExecutor.cs Adds slow-start ramp-up logic which increases semaphore capacity over time.
src/Orleans.DurableJobs/ShardExecutor.Log.cs Adds structured logs for slow-start begin/increase/complete/error.
src/Orleans.DurableJobs/Hosting/DurableJobsOptions.cs Adds slow-start options + configuration validation rules.
test/NonSilo.Tests/DurableJobs/ShardExecutorTests.cs Adds tests for gradual concurrency ramp-up and for “disabled = immediate full concurrency”.

Comment thread src/Orleans.DurableJobs/ShardExecutor.cs Outdated
Comment thread src/Orleans.DurableJobs/Hosting/DurableJobsOptions.cs Outdated
Comment thread test/NonSilo.Tests/DurableJobs/ShardExecutorTests.cs
Comment thread src/Orleans.DurableJobs/ShardExecutor.cs Outdated
Comment thread src/Orleans.DurableJobs/ShardExecutor.cs Outdated
@benjaminpetit benjaminpetit mentioned this pull request Feb 12, 2026
14 tasks
@ReubenBond ReubenBond force-pushed the feature/durable-jobs-slow-start branch from 6fec786 to ace30d9 Compare February 15, 2026 17:19
@ReubenBond ReubenBond added this pull request to the merge queue Feb 15, 2026
@ReubenBond ReubenBond removed this pull request from the merge queue due to a manual request Feb 15, 2026
@ReubenBond ReubenBond force-pushed the feature/durable-jobs-slow-start branch from 0bacee0 to 3a094fd Compare February 16, 2026 17:57
benjaminpetit and others added 3 commits February 16, 2026 14:13
Gradually increase job concurrency during startup to avoid starvation
issues that can occur before caches, connection pools, and thread pool
sizing have warmed up.

The semaphore starts at SlowStartInitialConcurrency (default: ProcessorCount)
and doubles every SlowStartInterval (default: 10s) until MaxConcurrentJobsPerSilo
is reached. The ramp-up begins when the first shard starts executing.

New DurableJobsOptions:
- ConcurrencySlowStartEnabled (default: true)
- SlowStartInitialConcurrency (default: Environment.ProcessorCount)
- SlowStartInterval (default: 10 seconds)
Address PR review feedback by making slow-start ramp-up thread-safe and single-start, relaxing slow-start validation to avoid startup breaks, and stabilizing the slow-start concurrency test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Gate SlowStartInitialConcurrency validation on ConcurrencySlowStartEnabled so disabled slow-start settings do not block startup. Add regression coverage in ShardExecutorTests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ReubenBond ReubenBond force-pushed the feature/durable-jobs-slow-start branch from 3a094fd to 85388eb Compare February 16, 2026 22:19
@ReubenBond ReubenBond enabled auto-merge February 16, 2026 22:20
ReubenBond and others added 2 commits February 16, 2026 14:43
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ReubenBond ReubenBond added this pull request to the merge queue Feb 16, 2026
Merged via the queue into dotnet:main with commit b420246 Feb 17, 2026
112 of 113 checks passed
rkargMsft pushed a commit to rkargMsft/orleans that referenced this pull request Feb 27, 2026
* Add concurrent job slow start for durable jobs

Gradually increase job concurrency during startup to avoid starvation
issues that can occur before caches, connection pools, and thread pool
sizing have warmed up.

The semaphore starts at SlowStartInitialConcurrency (default: ProcessorCount)
and doubles every SlowStartInterval (default: 10s) until MaxConcurrentJobsPerSilo
is reached. The ramp-up begins when the first shard starts executing.

New DurableJobsOptions:
- ConcurrencySlowStartEnabled (default: true)
- SlowStartInitialConcurrency (default: Environment.ProcessorCount)
- SlowStartInterval (default: 10 seconds)

* Fix durable jobs slow-start issues

Address PR review feedback by making slow-start ramp-up thread-safe and single-start, relaxing slow-start validation to avoid startup breaks, and stabilizing the slow-start concurrency test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix slow-start validation when disabled

Gate SlowStartInitialConcurrency validation on ConcurrencySlowStartEnabled so disabled slow-start settings do not block startup. Add regression coverage in ShardExecutorTests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Simplify slow-start capacity initialization

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Reuben Bond <reuben.bond@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Reuben Bond <203839+ReubenBond@users.noreply.github.com>
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants