Skip to content

Published Content Cache: Defensive hardening against race conditions (closes #22254, #22384)#22393

Merged
Zeegaan merged 3 commits into
mainfrom
v17/bugfix/defensive-hardening-of-published-content-cache
Apr 22, 2026
Merged

Published Content Cache: Defensive hardening against race conditions (closes #22254, #22384)#22393
Zeegaan merged 3 commits into
mainfrom
v17/bugfix/defensive-hardening-of-published-content-cache

Conversation

@AndyButland

@AndyButland AndyButland commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Description

We have a couple of issues reporting concerns with published content being or becoming unavailable, and needing restore via publishing or rebuilding of caches: #22254, #22384. Neither have steps to replicate, but I've done some code analysis and found three areas where we could have problems in particular multithreading scenarios.

  • Skip caching null when ancestor check fails in DocumentCacheService.GetNodeAsync(): When content exists in the database but HasPublishedAncestorPath() transiently returns false, null was cached. Now we skip the cache write when the null is due to a failed ancestor check (but continue to cache null if the content truly does not exist).
  • Atomic swap in PublishStatusService.InitializeAsync(): Previously called Clear() then rebuilt from the database, leaving a window where concurrent readers saw an empty dictionary and concluded content was unpublished. Now builds a new ConcurrentDictionary and swaps it in with Interlocked.Exchange.
  • Atomic swap in ContentNavigationServiceBase.HandleRebuildAsync(): Previously cleared _roots (a non-thread-safe HashSet) then rebuilt, causing concurrent readers to see empty roots or throw. Now bundles Structure and Roots into a single NavigationSnapshot record so HandleRebuildAsync can replace both with one Interlocked.Exchange call. Readers snapshot the single reference to get a guaranteed-consistent pair.

For each of these I've created an integration test to demonstrate the problem and fail expectedly, applied the fix and verified the tests now pass. So I can be fairly confident that the race conditions identified are real and the fixes are correct. It's not 100% the case of course that these are the cause of the reported problems, but they do look to align with the symptoms.

Testing

  • PublishStatusServiceTests.Concurrent_Initialize_Never_Transiently_Loses_Published_Status — verifies no transient loss during re-initialization.
  • DocumentNavigationServiceTests.Concurrent_Rebuild_And_Queries_Never_Transiently_Lose_Content — verifies root keys are never transiently empty during concurrent rebuilds.
  • DocumentHybridCacheMockTests.Null_Is_Not_Cached_When_Content_Exists_But_Ancestor_Check_Fails — verifies null is not cached when ancestor check fails.

Copilot AI review requested due to automatic review settings April 9, 2026 10:18
@AndyButland AndyButland changed the title HybridCache: Defensive hardening against published content cache race conditions (closes #22254, #22322, #22384) Published Content Cache: Defensive hardening against published content cache race conditions (closes #22254, #22322, #22384) Apr 9, 2026
@AndyButland AndyButland changed the title Published Content Cache: Defensive hardening against published content cache race conditions (closes #22254, #22322, #22384) Published Content Cache: Defensive hardening against published content cache race conditions (closes #22254, #22384) Apr 9, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens the published-content HybridCache and related navigation/publish-status caches against race conditions that can transiently make published content appear missing (especially during rebuild/refresh notifications in load-balanced setups).

Changes:

  • Build-and-swap (via Interlocked.Exchange) in PublishStatusService.InitializeAsync() to avoid transient empty published-status state during re-initialization.
  • Build-and-swap in ContentNavigationServiceBase.HandleRebuildAsync() to avoid transient empty/unsafe _roots/navigation structures during rebuild, plus adjustments to root-key queries.
  • Avoid caching null into HybridCache when content exists but HasPublishedAncestorPath() fails (to prevent distributed cache poisoning), and add regression tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Umbraco.PublishedCache.HybridCache/Services/DocumentCacheService.cs Skips HybridCache writes for null results caused by a failed ancestor check to avoid poisoning L1/L2.
src/Umbraco.Core/Services/PublishStatus/PublishStatusService.cs Replaces clear-and-rebuild with atomic dictionary swap during initialization.
src/Umbraco.Core/Services/Navigation/ContentNavigationServiceBase.cs Rebuilds navigation into new collections and swaps them in; updates root-key query path.
tests/Umbraco.Tests.Integration/Umbraco.PublishedCache.HybridCache/DocumentHybridCacheMockTests.cs Adds test ensuring null isn’t cached when ancestor check is transiently wrong.
tests/Umbraco.Tests.Integration/Umbraco.Core/Services/PublishStatusServiceTests.ThreadSafety.cs Adds stress test for published-status initialization under concurrent reads.
tests/Umbraco.Tests.Integration/Umbraco.Core/Services/DocumentNavigationServiceTests.ThreadSafety.cs Adds stress test for navigation rebuild under concurrent root-key queries.

Comment thread src/Umbraco.Core/Services/Navigation/ContentNavigationServiceBase.cs Outdated
Comment thread src/Umbraco.PublishedCache.HybridCache/Services/DocumentCacheService.cs Outdated
@AndyButland AndyButland changed the title Published Content Cache: Defensive hardening against published content cache race conditions (closes #22254, #22384) Published Content Cache: Defensive hardening against race conditions (closes #22254, #22384) Apr 9, 2026

@Zeegaan Zeegaan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants