Skip to content

Background Jobs: Rewrite RecurringHostedServiceBase with SemaphoreSlim and add signalling support#22331

Merged
AndyButland merged 53 commits into
v17/devfrom
v17/feature/recurringbackgroundjob-signalling
May 21, 2026
Merged

Background Jobs: Rewrite RecurringHostedServiceBase with SemaphoreSlim and add signalling support#22331
AndyButland merged 53 commits into
v17/devfrom
v17/feature/recurringbackgroundjob-signalling

Conversation

@ronaldbarendse

@ronaldbarendse ronaldbarendse commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Note

This PR is based on #22330 (period drift fix). Review that PR first — this one contains only the additional commits.

Summary

Replace Timer with SemaphoreSlim-based loop on BackgroundService

The previous implementation used System.Threading.Timer, which executes callbacks on the ThreadPool. This had several issues:

  • No cancellation support: PerformExecuteAsync(object? state) had no way to observe host shutdown. The new PerformExecuteAsync(CancellationToken) overload enables cooperative cancellation.
  • Exception handling: An unhandled exception in the timer callback could crash the process. The new loop catches exceptions and continues.
  • Parallel execution risk: The old code disabled and re-enabled the timer around execution to prevent overlap — a pattern that is fragile and race-prone.

The rewrite inherits from BackgroundService and uses a SemaphoreSlim(0, 1) as the wait primitive inside ExecuteAsync. SemaphoreSlim.WaitAsync(TimeSpan, CancellationToken) was chosen over PeriodicTimer because it solves multiple problems simultaneously:

Add TriggerExecution() for on-demand signalling

The SemaphoreSlim wait can be interrupted by releasing the semaphore, causing the loop to execute immediately. Four protected internal overloads on RecurringHostedServiceBase control what happens after the triggered execution:

Method Behavior after triggered execution
TriggerExecution() Resume the original schedule. If the triggered execution overshot the next scheduled tick, skip it to avoid double-execution.
TriggerExecution(NextExecutionStrategy.Reset) Wait a full period. The schedule shifts forward from the triggered execution.
TriggerExecution(NextExecutionStrategy.Replace) The triggered execution replaces the next scheduled tick. The following execution occurs one full period after the originally-scheduled time.
TriggerExecution(TimeSpan nextDelay) Wait the specified custom delay before the next execution.

Triggering an IRecurringBackgroundJob (opt-in)

The four TriggerExecution(...) instance methods above are protected internal, so subclasses of RecurringHostedServiceBase (custom hosted services that don't go through IRecurringBackgroundJob) can trigger themselves directly. For IRecurringBackgroundJob implementations, triggering is now an explicit opt-in to keep the public API surface small and intentional:

  1. Mark the job with ITriggerableRecurringBackgroundJob (an empty marker interface extending IRecurringBackgroundJob):

    public class MyJob : RecurringBackgroundJobBase, ITriggerableRecurringBackgroundJob
    {
        public override TimeSpan Period => TimeSpan.FromMinutes(5);
        public override Task RunJobAsync(CancellationToken cancellationToken) => /* ... */;
    }
  2. Register the job the usual way:

    services.AddRecurringBackgroundJob<MyJob>();
  3. Inject IRecurringBackgroundJobTrigger<MyJob> where you want to trigger it:

    public class MyController(IRecurringBackgroundJobTrigger<MyJob> trigger)
    {
        public IActionResult Run() => Ok(trigger.TriggerExecution());
    }

IRecurringBackgroundJobTrigger<> is registered once as an open generic by AddBackgroundJobs, so the typed trigger is resolvable for any job that opts in via the ITriggerableRecurringBackgroundJob marker — no per-job DI registration is required. The generic constraint on the trigger interface enforces opt-in at compile time: requesting IRecurringBackgroundJobTrigger<NotMarked> is a compile error.

The typed trigger exposes the same overloads as the base class (TriggerExecution(), TriggerExecution(NextExecutionStrategy), TriggerExecution(TimeSpan)) and returns false if no hosted service is currently running for the job (e.g. before StartAsync, or if the job is not registered). Internally it delegates to RecurringBackgroundJobHostedServiceRunner, whose TriggerExecution<TJob> overloads are internal and not part of the public API.

Add RecurringBackgroundJobBase abstract class

Default values for Delay, ServerRoles, and the PeriodChanged event previously lived as default interface implementations on IRecurringBackgroundJob. A base class is the more natural place for these defaults. RecurringBackgroundJobBase also hides the now-obsoleted parameterless RunJobAsync() by routing it to RunJobAsync(CancellationToken). Implementors only need to provide Period and RunJobAsync(CancellationToken).

The DefaultDelay and DefaultServerRoles constants are declared as protected internal static readonly on the base class — protected for subclasses, internal so the default interface implementations (kept for backward compatibility with direct IRecurringBackgroundJob implementors) can still reference them.

Add IgnoredDelay to prevent tight looping when execution is skipped

The hosted service short-circuits PerformExecuteAsync and publishes a RecurringBackgroundJobIgnoredNotification when the runtime is not ready, the current server role is not allowed, or this is not the MainDom. Previously this returned immediately, so a job with a very short or zero Period (e.g. one that throttles itself via a semaphore inside RunJobAsync) would spin at 100% CPU and flood logs and notifications whenever the CMS chose to ignore it — see Umbraco.Engage.Issues#65 and #22859.

A new IRecurringBackgroundJob.IgnoredDelay property (default 1 minute, overridable per job) is now awaited after each ignored execution before the loop continues. The wait uses the injected TimeProvider for deterministic testing, is cancellable via the stoppingToken, and is skipped entirely when IgnoredDelay <= TimeSpan.Zero.

A regular execution path (job actually runs) is unaffected — back-off is only applied when the CMS prevents the job from running.

Add CancellationToken to RunJobAsync

The new RunJobAsync(CancellationToken) overload on IRecurringBackgroundJob enables cooperative cancellation during host shutdown. The parameterless RunJobAsync() is obsoleted with a default interface implementation that delegates to the new overload (scheduled for removal in Umbraco 19).

Test plan

  • Integration-style tests for the execution loop: periodic execution, cancellation, exception resilience
  • Tests for all TriggerExecution strategies (None, Reset, Replace, custom delay)
  • Test for None strategy overshoot-skip behavior
  • Test for ChangePeriod taking effect on next cycle
  • Runner trigger tests: returns true/false based on job registration, verifies immediate execution
  • IRecurringBackgroundJobTrigger<TJob> typed trigger tests (delegation to runner)
  • IgnoredDelay back-off tests: waits the configured delay, honors per-job override, skipped when zero, cancellable on shutdown
  • Existing RecurringBackgroundJobHostedServiceTests updated to use new signatures

Copilot AI review requested due to automatic review settings April 1, 2026 22:28

This comment was marked as resolved.

@AndyButland

Copy link
Copy Markdown
Contributor

I've reviewed and will merge #22330 @ronaldbarendse, but this one has a few conflicts and comments to consider. Please can you look at resolving/addressing those, and then you can re-target this to main for review and testing?

Base automatically changed from v17/bugfix/recurringbackgroundjob-period-drift to main April 2, 2026 05:19
# Conflicts:
#	src/Umbraco.Infrastructure/BackgroundJobs/IRecurringBackgroundJob.cs
#	src/Umbraco.Infrastructure/BackgroundJobs/RecurringBackgroundJobHostedService.cs
#	src/Umbraco.Infrastructure/BackgroundJobs/RecurringBackgroundJobHostedServiceRunner.cs
#	src/Umbraco.Infrastructure/Extensions/ServiceCollectionExtensions.cs
#	src/Umbraco.Infrastructure/HostedServices/RecurringHostedServiceBase.cs
#	tests/Umbraco.Tests.UnitTests/Umbraco.Infrastructure/HostedServices/RecurringHostedServiceBaseTests.cs

This comment was marked as resolved.

This comment was marked as resolved.

@AndyButland AndyButland left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ronaldbarendse - it looks like a solid and robust rewrite to me. Can you just fill me in a bit on the background of what triggered you to make these updates? Is it just something you saw that could be improved? Or are you running into real-world issues that this update should solve, or have features in mind that depends on this being updated in core?

Then there are various internal jobs that aren't updated to use the new API - TempFileCleanupJob, ReportSiteJob, TouchServerJob, and InstructionProcessJob still implement only RunJobAsync() (without the CancellationToken). I think if we are going change this, we should make sure internal code is now calling the non-obsolete overloads.

@ronaldbarendse

Copy link
Copy Markdown
Contributor Author

@AndyButland All CMS jobs are now using the new RecurringBackgroundJobBase class and implement the cancellation token where possible (ReportSiteJob cancels the HTTP request and TempFileCleanupJob stops cleanup when the application is stopping during their execution).

I've also added support for setting either/both Delay and Period to Timeout.InfiniteTimeSpan, allowing the initial delay and recurring period to wait for a manual trigger. This can be used to wait for a specific event to happen before starting the recurring background job. Engage also has a background job that removes segment data, which currently executes after startup (to ensure the data is cleaned up) and is triggered when a segment is soft-deleted (to ensure data gets removed when the user requested this). We don't want multiple cleanup processes executing at the same time and if the user deletes a segment just after a reboot, we don't want to wait or execute the cleanup twice, so this relatively simple change makes it possible to use recurring background jobs for this as well. The definition of 'recurring' has slightly changed though: instead of periodically executing, it can also be a manually triggered job...

This makes all the following use-cases possible:

  • Delay = 0, Period = 0
    • Timeline: t = 0, 0+ε, 0+2ε, … - back-to-back, no automatic gap.
    • Use case: Self-throttling worker that pulses inside RunJobAsync (e.g. queue drains via internal semaphore, like the Umbraco Engage pageview flusher pattern).
  • Delay = 0, Period = 5min
    • Timeline: t = 0, 5, 10, 15 min, …
    • Use case: Standard recurring job that should also run immediately on startup. Useful for jobs whose first run doesn't significantly impact boot/startup performance.
  • Delay = 0, Period = Infinite
    • Timeline: t = 0, then idle. With manual trigger at t = 2min: t = 0, 2 min, then idle again.
    • Use case: One-shot startup task that can be re-fired manually, e.g. cache warm-up.
  • Delay = 10s, Period = 0
    • Timeline: t = 10s, 10s+ε, 10s+2ε, … - startup grace period, then back-to-back.
    • Use case: Self-throttling worker that needs the app to stabilize before kicking in. Unusual but valid.
  • Delay = 10s, Period = 5min (the typical default shape)
    • Timeline: t = 10s, 5min 10s, 10min 10s, 15min 10s, …
    • Use case: The default recurring-job shape. RecurringBackgroundJobBase.DefaultDelay is 3 min for the same reason - give the app time to start.
  • Delay = 10s, Period = Infinite
    • Timeline: t = 10s, then idle. With manual trigger at t = 2min: t = 10s, 2min, then idle again.
    • Use case: Deferred one-shot - initialize after startup grace period, then sit idle until explicitly re-triggered.
  • Delay = Infinite, Period = 0
    • Timeline: idle until first manual trigger at t = 2min, then t = 2min, 2min+ε, 2min+2ε, …
    • Use case: Self-throttling worker that should be inert until activated (e.g. a feature flag turns it on later via TriggerExecution).
  • Delay = Infinite, Period = 5min
    • Timeline: idle until first manual trigger at t = 2min, then t = 2min, 7min, 12min, 17min, …
    • Use case: Scheduled job that should not run automatically at startup, e.g. operator-initiated maintenance task that, once started, recurs on a fixed cadence.
  • Delay = Infinite, Period = Infinite
    • Timeline: idle until each manual trigger. With triggers at t = 2min, t = 30min: t = 2min, 30min. Nothing else.
    • Use case: Pure on-demand worker - every TriggerExecution() produces exactly one execution. Command bus, manually-dispatched batch job, etc.

@AndyButland

Copy link
Copy Markdown
Contributor

Great @ronaldbarendse, thanks. I'll take a final look over this tomorrow (travelling today) and look to get this in. As it stands it will be 17.6 and 18.1, but if you think there is good reason to accelerate that, please let me know.

We should think about documentation for this too I would suggest. I can have a stab at that unless you are minded to, but given there's quite a bit of flexibility here now for custom jobs, it's worth having it documented, so I've added the "needs docs" label.

@AndyButland AndyButland added the status/needs-docs Requires new or updated documentation label May 20, 2026
@ronaldbarendse

Copy link
Copy Markdown
Contributor Author

Great @ronaldbarendse, thanks. I'll take a final look over this tomorrow (travelling today) and look to get this in. As it stands it will be 17.6 and 18.1, but if you think there is good reason to accelerate that, please let me know.

Having this available in 18.0 would allow the products to start taking advantage if this straight away. Otherwise, we'd push these improvements to v19, since we don't want to force users to upgrade the CMS before the product (especially for Deploy: we want to keep the minimum CMS version at x.0.0). Also note that these changes are all additive and backwards compatible!

We should think about documentation for this too I would suggest. I can have a stab at that unless you are minded to, but given there's quite a bit of flexibility here now for custom jobs, it's worth having it documented, so I've added the "needs docs" label.

This does indeed require updating the docs page, please do have a stab as this (there's more than enough information in this PR to have a decent restructure kicked-off by Claude) 👍🏻

@AndyButland AndyButland changed the title Background Jobs: Rewrite RecurringHostedServiceBase with SemaphoreSlim and add signalling support Background Jobs: Rewrite RecurringHostedServiceBase with SemaphoreSlim and add signalling support May 21, 2026
@AndyButland

Copy link
Copy Markdown
Contributor

I've taken a further pass through this and pushed a few changes on top of your work @ronaldbarendse. Please see summary below, plus details on a runtime issue I uncovered while testing and the fix for it. If you spot any concerns, please shout. Or if looks OK to you of course, please confirm.

Changes

Three minor bits of clean-up/nit-picking, and one bug fix.

1. IgnoredDelay = Timeout.InfiniteTimeSpan handling

The XML doc on IRecurringBackgroundJob.IgnoredDelay states that Timeout.InfiniteTimeSpan is "not allowed", but the implementation in RecurringBackgroundJobHostedService.IgnoreAndWaitAsync was checking ignoredDelay <= TimeSpan.Zero. Since Timeout.InfiniteTimeSpan == -1 ms, that comparison silently treats infinite as "skip the back-off entirely" — exactly the tight-loop scenario the property was added to prevent.

Now we detect Timeout.InfiniteTimeSpan explicitly, log a warning naming the offending job, and fall back to RecurringBackgroundJobBase.DefaultIgnoredDelay. Added Falls_Back_To_Default_When_IgnoredDelay_Is_Infinite to cover it.

2. Extract ApplyTriggerState helper

WaitForNextExecutionAsync had 22 lines at the bottom that consumed the pending TriggerState and computed the next delay basis depending on NextExecutionStrategy.None / Reset / Replace. Extracted that into a private ApplyTriggerState(delay, waitStart, period) so the wait loop ends with a clean return ApplyTriggerState(...) and the strategy switch (including the _nextExecutionSkipOnOvershoot side-effect for None) is isolated.

3. Documented TJob constraint on IRecurringBackgroundJobTrigger<TJob>

The runner indexes hosted services by job.GetType(), and FindHostedService<TJob> looks up by typeof(TJob). The constraint where TJob : ITriggerableRecurringBackgroundJob allows base classes / interfaces, but a non-concrete TJob would silently miss the dictionary lookup and return false. Since FindHostedService and the runner's TriggerExecution<TJob> overloads are non-public and only reachable through the public trigger interface, I added a <typeparam> note on IRecurringBackgroundJobTrigger<TJob> calling this out.

4. Suppress ExecutionContext flow when starting the recurring background loop

While manual testing I saw this on every boot, about a minute after the recurring hosted services started:

[ERR] An exception occurred while attempting to ensure distributed background jobs on startup.
System.InvalidOperationException: The Scope ec3a4a8e-... being disposed is not the Ambient Scope c28f2201-...
   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()
   at Umbraco.Cms.Infrastructure.Services.Implement.DistributedJobService.EnsureJobsAsync()
   at Umbraco.Cms.Infrastructure.BackgroundJobs.DistributedBackgroundJobHostedService.ExecuteAsync(...)

Followed shortly after by System.InvalidOperationException: There is already an open DataReader associated with this Connection which must be closed first. from DistributedJobRepository.Update — the connection state corruption downstream of the scope mismatch.

The exception is in DistributedJobService / DistributedBackgroundJobHostedService, which this PR doesn't touch. I confirmed via a test on origin/v17/dev (without this PR's changes) that the exception does not occur — so this PR is the cause of at least surfacing the issue.

Root cause

Scope.Dispose (Scope.cs:432) explicitly says:

If using Task.Run (or similar) as a fire and forget tasks or to run threads in parallel you must suppress execution context flow with ExecutionContext.SuppressFlow() and ExecutionContext.RestoreFlow().

The pre-PR RecurringHostedServiceBase.StartAsync did exactly that — it wrapped new Timer(ExecuteAsync, ...) with ExecutionContext.SuppressFlow(), so every Timer callback ran with an empty ExecutionContext.

In this PR RecurringHostedServiceBase now inherits from BackgroundService, whose StartAsync kicks off ExecuteTask = ExecuteAsync(_stoppingCts.Token) as fire-and-forget without suppressing flow. The captured ExecutionContext is whatever the caller's context was. That matters because Umbraco's ambient scope stack is a static AsyncLocal<ConcurrentStack<IScope>> (AmbientScopeStack.cs:8), and the first push uses _stack.Value ??= new ConcurrentStack<IScope>(). Once any scope is created in a context the stack reference is set and stays non-null even after pop — AsyncLocal's copy-on-write only protects the slot, not the referenced mutable object. From that point on every fire-and-forget hosted-service loop that inherits the host's context shares the same ConcurrentStack<IScope> instance, and concurrent pushes/pops interleave.

The pollution originates specifically in RecurringBackgroundJobHostedService<TJob>.StartAsync: it publishes RecurringBackgroundJobStartingNotification via await _eventAggregator.PublishAsync(...) before calling base.StartAsync. Any handler (or PublishAsync itself) that synchronously creates a scope initialises the static AmbientScopeStack slot on the caller's ExecutionContext, and that's the polluted EC that base.StartAsync then captures into ExecuteTask.

The visible victim was DistributedJobService.EnsureJobsAsync because the recent merge of #22796 changed DistributedJobRepository.Update from a silent return to throw new InvalidOperationException("No scope, could not update distributed job") on a null ambient — making a previously-silent scoping mismatch loud. The bug was reachable before #22796, just invisible.

Fix

In RecurringBackgroundJobHostedService<TJob>.StartAsync, wrap the call to base.StartAsync (which is BackgroundService.StartAsync — synchronous, creates ExecuteTask) with ExecutionContext.SuppressFlow(). ExecuteTask is captured inside the suppression and so inherits an empty ExecutionContext, restoring the isolation the old Timer-based code had. SuppressFlow is released before the rest of StartAsync continues, so the post-base notification publish runs with normal flow.

public override async Task StartAsync(CancellationToken cancellationToken)
{
    EventMessages eventMessages = _eventMessagesFactory.Get();
    var startingNotification = new RecurringBackgroundJobStartingNotification(_job, eventMessages);
    await _eventAggregator.PublishAsync(startingNotification, cancellationToken);

    Task startTask;
    using (ExecutionContext.IsFlowSuppressed() ? null : (IDisposable?)ExecutionContext.SuppressFlow())
    {
        startTask = base.StartAsync(cancellationToken);
    }

    await startTask;

    await _eventAggregator.PublishAsync(new RecurringBackgroundJobStartedNotification(_job, eventMessages).WithStateFrom(startingNotification), cancellationToken);
}

After this change the DistributedJobService exception no longer reproduces. The fix lives in the generic recurring-job hosted service rather than on RecurringHostedServiceBase itself, so we don't introduce a new override on the base class (which would have removed the newslot IL flag from the previously-virtual StartAsync and tripped CP0012 in package validation against the 17.0.0 baseline). Every recurring job — the four framework ones and any user-registered IRecurringBackgroundJob — runs through RecurringBackgroundJobHostedService<TJob>, so all of them are covered.


Manual testing

I added a set of throwaway jobs (uncommitted on my dev branch) to exercise the new surface end-to-end:

Job Purpose
HeartbeatJob Plain RecurringBackgroundJobBase, 15-s period, 5-s delay — confirms the basic semaphore loop
TriggerableHeartbeatJob ITriggerableRecurringBackgroundJob, 2-min period — long enough that manual triggers via IRecurringBackgroundJobTrigger<TJob> are clearly visible
ManualOnlyJob Period and Delay both Timeout.InfiniteTimeSpan — runs only when triggered
CancellationAwareJob 30-s loop awaiting Task.Delay(ct) inside RunJobAsync(CancellationToken)
IgnoredDelayDemoJob ServerRoles = [Subscriber], Period = TimeSpan.Zero, IgnoredDelay = 20s — always ignored on a single-server dev setup; the back-off prevents the tight loop

Plus a BackgroundJobsTestController exposing endpoints to fire each TriggerExecution overload (None, Reset, Replace, custom delay) against TriggerableHeartbeatJob, and to fire ManualOnlyJob.

Verified:

  • All five jobs register correctly at boot. ManualOnlyJob logs delay -00:00:00.0010000 every -00:00:00.0010000Timeout.InfiniteTimeSpan rendered via the existing framework logging (cosmetic).
  • IgnoredDelayDemoJob is correctly ignored on every iteration with the RecurringBackgroundJobIgnoredNotification published at a clean ~20-second cadence — the IgnoredDelay back-off is working.
  • HeartbeatJob and CancellationAwareJob fire on their declared schedules once the server role is resolved.
  • CancellationAwareJob cancels cleanly on app shutdown — OperationCanceledException propagates through RunJobAsync(ct), the RecurringBackgroundJobCanceledNotification is published, the loop exits.
  • TriggerableHeartbeatJob reacts to all four trigger overloads as documented:
    • TriggerExecution() (None) — immediate run, original schedule resumes
    • TriggerExecution(NextExecutionStrategy.Reset) — immediate run, next tick a full period later
    • TriggerExecution(NextExecutionStrategy.Replace) — immediate run, the originally-scheduled tick is skipped
    • TriggerExecution(TimeSpan) — immediate run, next tick after the custom delay
  • ManualOnlyJob only fires when its trigger endpoint is hit.
  • IRecurringBackgroundJobTrigger<TJob>.TriggerExecution(...) returns false before StartAsync and true afterwards, as expected from the runner lookup.
  • DistributedJobService scope exception described above no longer reproduces after the SuppressFlow fix.

@AndyButland AndyButland merged commit a547697 into v17/dev May 21, 2026
26 of 27 checks passed
@AndyButland AndyButland deleted the v17/feature/recurringbackgroundjob-signalling branch May 21, 2026 17:36
AndyButland added a commit that referenced this pull request May 21, 2026
…Slim` and add signalling support (#22331)

* Compute next delay to compensate for time drift

* Use SemaphoreSlim to properly handle exceptions, cancellation tokens and triggering immediate executions

* Add RecurringBackgroundJobBase to contain default values and hide obsoleted method

* Add NextExecutionStrategy parameter to adjust the schedule after triggered executions

* Add TriggerExecution methods to RecurringBackgroundJobHostedServiceRunner

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Handle cancellation (application shutdown) and publish RecurringBackgroundJobCanceledNotification

* Match hosted services by Type instead of type name string

* Extract shared helper for TriggerExecution tests

* Clear trigger state when initial delay is interrupted

* Clear _nextExecutionSkipOnOvershoot unconditionally

* Combine ComputeNextDelay tests

* Consolidate trigger state into an immutable record for thread safety

* Use ConcurrentDictionary for thread-safe hosted service lookup

* Remove hosted services from dictionary on stop

* Fix API compatibility errors

* Removed unneeded using.

* Register RecurringBackgroundJobHostedServiceRunner as resolvable singleton

* Remove failed hosted service from dictionary when StartAsync throws

* Use semaphore signaling instead of Task.Delay in trigger tests

Use semaphore signaling instead of Task.Delay in trigger tests 2

* Inject TimeProvider into RecurringHostedServiceBase for deterministic testing

Fix timeprovider

* Use DelayCalculator.GetDelay instead of RecurringHostedServiceBase.GetDelay

* Fix Exception_In_PerformExecuteAsync_Does_Not_Kill_Loop test

* Avoid disposing period-change CTS while wait loop may still reference it

* Configure IEventMessagesFactory mock to return real EventMessages

* Clarify TriggerExecution(TimeSpan) docs and add ChangePeriod test

* Validate period is positive and use GetOrAdd to avoid creating unused hosted services

* Set up Period and Delay on mock job to satisfy constructor validation

* Ensure PeriodChanged event is unsubscribed again

* Fix trigger state race, simplify ReleaseSignal, and add canceled notification test

Fix trigger state

* Use Interlocked for _period reads/writes and implement thread-safe dispose pattern

* Remove hosted service from dictionary before stopping to prevent triggering during shutdown

* Replace Task.Yield with semaphore timeouts in negative assertions

* Tidy RecurringBackgroundJobBase docs and runner error handling

* Wait IgnoredDelay after ignored execution to prevent tight looping when Period is short or zero

* Add IRecurringBackgroundJobTrigger<TJob> for opt-in job triggering

* Register IRecurringBackgroundJobTrigger as open generic and drop AddTriggerableRecurringBackgroundJob

* Fix and add parameter validation

* Allow Timeout.InfiniteTimeSpan as Period for manual-trigger-only recurring jobs

* Migrate built-in jobs to RecurringBackgroundJobBase and require ITriggerableRecurringBackgroundJob in runner trigger overloads

* Support infinite Delay and honor TriggerExecution(TimeSpan) issued during the initial delay

* Handle edge case of backoff via InfiniteTimeSpan.

* Refactored large method.

* Added clarifying documentation.

* Suppress ExecutionContext flow when starting the recurring background loop, restoring previous timer behaviour.

* Relocate Suppress ExecutionContext flow to avoid package validation error.

* Align IRecurringBackgroundJobTrigger generic type constraint with AddRecurringBackgroundJob

* Rename ApplyTriggerState to ComputeNextDelayFromTriggerState

* Allow Timeout.InfiniteTimeSpan as IgnoredDelay to fully disable a job for the remaining application lifecycle

* Fix generic type constraint

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Andy Butland <abutland73@gmail.com>
AndyButland added a commit that referenced this pull request May 21, 2026
…Slim` and add signalling support (#22331)

* Compute next delay to compensate for time drift

* Use SemaphoreSlim to properly handle exceptions, cancellation tokens and triggering immediate executions

* Add RecurringBackgroundJobBase to contain default values and hide obsoleted method

* Add NextExecutionStrategy parameter to adjust the schedule after triggered executions

* Add TriggerExecution methods to RecurringBackgroundJobHostedServiceRunner

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Handle cancellation (application shutdown) and publish RecurringBackgroundJobCanceledNotification

* Match hosted services by Type instead of type name string

* Extract shared helper for TriggerExecution tests

* Clear trigger state when initial delay is interrupted

* Clear _nextExecutionSkipOnOvershoot unconditionally

* Combine ComputeNextDelay tests

* Consolidate trigger state into an immutable record for thread safety

* Use ConcurrentDictionary for thread-safe hosted service lookup

* Remove hosted services from dictionary on stop

* Fix API compatibility errors

* Removed unneeded using.

* Register RecurringBackgroundJobHostedServiceRunner as resolvable singleton

* Remove failed hosted service from dictionary when StartAsync throws

* Use semaphore signaling instead of Task.Delay in trigger tests

Use semaphore signaling instead of Task.Delay in trigger tests 2

* Inject TimeProvider into RecurringHostedServiceBase for deterministic testing

Fix timeprovider

* Use DelayCalculator.GetDelay instead of RecurringHostedServiceBase.GetDelay

* Fix Exception_In_PerformExecuteAsync_Does_Not_Kill_Loop test

* Avoid disposing period-change CTS while wait loop may still reference it

* Configure IEventMessagesFactory mock to return real EventMessages

* Clarify TriggerExecution(TimeSpan) docs and add ChangePeriod test

* Validate period is positive and use GetOrAdd to avoid creating unused hosted services

* Set up Period and Delay on mock job to satisfy constructor validation

* Ensure PeriodChanged event is unsubscribed again

* Fix trigger state race, simplify ReleaseSignal, and add canceled notification test

Fix trigger state

* Use Interlocked for _period reads/writes and implement thread-safe dispose pattern

* Remove hosted service from dictionary before stopping to prevent triggering during shutdown

* Replace Task.Yield with semaphore timeouts in negative assertions

* Tidy RecurringBackgroundJobBase docs and runner error handling

* Wait IgnoredDelay after ignored execution to prevent tight looping when Period is short or zero

* Add IRecurringBackgroundJobTrigger<TJob> for opt-in job triggering

* Register IRecurringBackgroundJobTrigger as open generic and drop AddTriggerableRecurringBackgroundJob

* Fix and add parameter validation

* Allow Timeout.InfiniteTimeSpan as Period for manual-trigger-only recurring jobs

* Migrate built-in jobs to RecurringBackgroundJobBase and require ITriggerableRecurringBackgroundJob in runner trigger overloads

* Support infinite Delay and honor TriggerExecution(TimeSpan) issued during the initial delay

* Handle edge case of backoff via InfiniteTimeSpan.

* Refactored large method.

* Added clarifying documentation.

* Suppress ExecutionContext flow when starting the recurring background loop, restoring previous timer behaviour.

* Relocate Suppress ExecutionContext flow to avoid package validation error.

* Align IRecurringBackgroundJobTrigger generic type constraint with AddRecurringBackgroundJob

* Rename ApplyTriggerState to ComputeNextDelayFromTriggerState

* Allow Timeout.InfiniteTimeSpan as IgnoredDelay to fully disable a job for the remaining application lifecycle

* Fix generic type constraint

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Andy Butland <abutland73@gmail.com>
@AndyButland

Copy link
Copy Markdown
Contributor

Cherry picked to release/17.5.0 and release/18.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release/17.5.0 release/18.0.0 status/needs-docs Requires new or updated documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants