Add concurrent job slow start for durable jobs#9910
Merged
ReubenBond merged 5 commits intoFeb 17, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Introduces a “concurrency slow start” mechanism for Durable Jobs to ramp up job execution concurrency during startup, aiming to reduce starvation before runtime caches/pools/threadpool warm up.
Changes:
- Add new
DurableJobsOptionssettings to control slow-start behavior (enabled flag, initial concurrency, ramp interval) plus validation. - Update
ShardExecutorto initialize a lower concurrency semaphore and ramp it up over time. - Add unit tests covering slow start enabled/disabled behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/Orleans.DurableJobs/ShardExecutor.cs |
Adds slow-start ramp-up logic which increases semaphore capacity over time. |
src/Orleans.DurableJobs/ShardExecutor.Log.cs |
Adds structured logs for slow-start begin/increase/complete/error. |
src/Orleans.DurableJobs/Hosting/DurableJobsOptions.cs |
Adds slow-start options + configuration validation rules. |
test/NonSilo.Tests/DurableJobs/ShardExecutorTests.cs |
Adds tests for gradual concurrency ramp-up and for “disabled = immediate full concurrency”. |
6fec786 to
ace30d9
Compare
0bacee0 to
3a094fd
Compare
Gradually increase job concurrency during startup to avoid starvation issues that can occur before caches, connection pools, and thread pool sizing have warmed up. The semaphore starts at SlowStartInitialConcurrency (default: ProcessorCount) and doubles every SlowStartInterval (default: 10s) until MaxConcurrentJobsPerSilo is reached. The ramp-up begins when the first shard starts executing. New DurableJobsOptions: - ConcurrencySlowStartEnabled (default: true) - SlowStartInitialConcurrency (default: Environment.ProcessorCount) - SlowStartInterval (default: 10 seconds)
Address PR review feedback by making slow-start ramp-up thread-safe and single-start, relaxing slow-start validation to avoid startup breaks, and stabilizing the slow-start concurrency test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Gate SlowStartInitialConcurrency validation on ConcurrencySlowStartEnabled so disabled slow-start settings do not block startup. Add regression coverage in ShardExecutorTests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3a094fd to
85388eb
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rkargMsft
pushed a commit
to rkargMsft/orleans
that referenced
this pull request
Feb 27, 2026
* Add concurrent job slow start for durable jobs Gradually increase job concurrency during startup to avoid starvation issues that can occur before caches, connection pools, and thread pool sizing have warmed up. The semaphore starts at SlowStartInitialConcurrency (default: ProcessorCount) and doubles every SlowStartInterval (default: 10s) until MaxConcurrentJobsPerSilo is reached. The ramp-up begins when the first shard starts executing. New DurableJobsOptions: - ConcurrencySlowStartEnabled (default: true) - SlowStartInitialConcurrency (default: Environment.ProcessorCount) - SlowStartInterval (default: 10 seconds) * Fix durable jobs slow-start issues Address PR review feedback by making slow-start ramp-up thread-safe and single-start, relaxing slow-start validation to avoid startup breaks, and stabilizing the slow-start concurrency test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix slow-start validation when disabled Gate SlowStartInitialConcurrency validation on ConcurrencySlowStartEnabled so disabled slow-start settings do not block startup. Add regression coverage in ShardExecutorTests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Simplify slow-start capacity initialization Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Reuben Bond <reuben.bond@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Reuben Bond <203839+ReubenBond@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Gradually increase job concurrency during startup to avoid starvation issues that can occur before caches, connection pools, and thread pool sizing have warmed up.
The semaphore starts at SlowStartInitialConcurrency (default: ProcessorCount) and doubles every SlowStartInterval (default: 10s) until MaxConcurrentJobsPerSilo is reached. The ramp-up begins when the first shard starts executing.
New DurableJobsOptions:
Related issue: #9750
Microsoft Reviewers: Open in CodeFlow