Fix ScheduledTaskExecutor deadlock when TrySetResult runs continuations inline#2953
Conversation
ScheduledTaskExecutor deadlock when TrySetResult runs continuations inline
@dotnet-policy-service agree company="Redgate Software Ltd" |
test/Polly.Core.Tests/CircuitBreaker/Controller/ScheduledTaskExecutorTests.cs
Outdated
Show resolved
Hide resolved
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2953 +/- ##
=======================================
Coverage 96.15% 96.15%
=======================================
Files 309 309
Lines 7128 7128
Branches 1005 1005
=======================================
Hits 6854 6854
Misses 221 221
Partials 53 53
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
@crnhrv In the process of releasing this change, one of the new tests failed here: Does this indicate an issue with the fix, or is it just that the timeout is too small for CI? |
|
@martincostello Hmm, the tests were passing on my machine in under <10ms so I thought 250ms would be safe enough for CI, but I think that's the only explanation considering the actual behaviour should be deterministic otherwise as far as I can tell. I notice it failed on tests targeting |
|
Thanks for your contribution @crnhrv - the changes from this pull request have been published as part of version 8.6.6 📦, which is now available from NuGet.org 🚀 |
|
There was also a failure here for net10.0: logs If you're happy that it's just a test flake, then I'll do a PR to increase the value. |
|
I've been running the tests until failure locally in ~10 concurrent sessions for the past 5 minutes or so and haven't hit a failure so I can't imagine it's flaky for any other reason than the timeout being too low. |
Updated [Polly.Core](https://github.com/App-vNext/Polly) from 8.6.5 to 8.6.6. <details> <summary>Release notes</summary> _Sourced from [Polly.Core's releases](https://github.com/App-vNext/Polly/releases)._ ## 8.6.6 ## Highlights * Fix `ScheduledTaskExecutor` deadlock when `TrySetResult` runs continuations inline by @crnhrv in App-vNext/Polly#2953 ## What's Changed * Add specification tests for jitter by @martincostello in App-vNext/Polly#2830 * Refactor property-based tests by @martincostello in App-vNext/Polly#2831 * .NET 10 preparation by @martincostello in App-vNext/Polly#2842 * Fix CS7035 warning in dependabot jobs by @martincostello in App-vNext/Polly#2849 * Remove codecov/test-results-action by @martincostello in App-vNext/Polly#2872 * Update to .NET 10 SDK by @martincostello in App-vNext/Polly#2531 * Bump zizmor to v1.19.0 by @martincostello in App-vNext/Polly#2882 * Fix typo by @martincostello in App-vNext/Polly#2886 * Add RateLimitHeaders library to community resources by @alexis- in App-vNext/Polly#2887 * Bump zizmor to 1.21.0 by @martincostello in App-vNext/Polly#2905 * .NET 11 preparation by @martincostello in App-vNext/Polly#2932 * Remove Stryker workaround by @martincostello in App-vNext/Polly#2933 * Group .NET dependency updates by @martincostello in App-vNext/Polly#2944 * Migrate to actions/attest by @martincostello in App-vNext/Polly#2952 ## New Contributors * @alexis- made their first contribution in App-vNext/Polly#2887 * @crnhrv made their first contribution in App-vNext/Polly#2953 **Full Changelog**: App-vNext/Polly@8.6.5...8.6.6 Commits viewable in [compare view](App-vNext/Polly@8.6.5...8.6.6). </details> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: James Gunn <james@gunn.io>
Pull Request
The issue or feature being addressed
Fixes #2948 —
ScheduledTaskExecutordeadlock whenTrySetResultruns continuations inline.Details on the issue fix or feature implementation
ScheduledTaskExecutor.ScheduleTaskwas creating itsTaskCompletionSource<object>withoutTaskCreationOptions.RunContinuationsAsynchronously. This meant that whenStartProcessingAsynccalledTrySetResult, any continuation awaiting the returned task could run inline on the executor's single processing thread. If that continuation blocked (e.g. by holding a lock and making a synchronous Polly call, or by waiting for a second scheduled task), the executor thread would stall and deadlock.To fix we pass
TaskCreationOptions.RunContinuationsAsynchronouslyto theTaskCompletionSourceconstructor, ensuring continuations are always sent to the thread pool rather than running inline.A regression test is included that forces a continuation to run synchronously via
TaskContinuationOptions.ExecuteSynchronouslyand has it block waiting for a second scheduled task. This deadlocks (and is cancelled after 250ms) without the fix.Confirm the following