Published Content Cache: Defensive hardening against race conditions (closes #22254, #22384)#22393
Merged
Zeegaan merged 3 commits intoApr 22, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Hardens the published-content HybridCache and related navigation/publish-status caches against race conditions that can transiently make published content appear missing (especially during rebuild/refresh notifications in load-balanced setups).
Changes:
- Build-and-swap (via
Interlocked.Exchange) inPublishStatusService.InitializeAsync()to avoid transient empty published-status state during re-initialization. - Build-and-swap in
ContentNavigationServiceBase.HandleRebuildAsync()to avoid transient empty/unsafe_roots/navigation structures during rebuild, plus adjustments to root-key queries. - Avoid caching
nullinto HybridCache when content exists butHasPublishedAncestorPath()fails (to prevent distributed cache poisoning), and add regression tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/Umbraco.PublishedCache.HybridCache/Services/DocumentCacheService.cs |
Skips HybridCache writes for null results caused by a failed ancestor check to avoid poisoning L1/L2. |
src/Umbraco.Core/Services/PublishStatus/PublishStatusService.cs |
Replaces clear-and-rebuild with atomic dictionary swap during initialization. |
src/Umbraco.Core/Services/Navigation/ContentNavigationServiceBase.cs |
Rebuilds navigation into new collections and swaps them in; updates root-key query path. |
tests/Umbraco.Tests.Integration/Umbraco.PublishedCache.HybridCache/DocumentHybridCacheMockTests.cs |
Adds test ensuring null isn’t cached when ancestor check is transiently wrong. |
tests/Umbraco.Tests.Integration/Umbraco.Core/Services/PublishStatusServiceTests.ThreadSafety.cs |
Adds stress test for published-status initialization under concurrent reads. |
tests/Umbraco.Tests.Integration/Umbraco.Core/Services/DocumentNavigationServiceTests.ThreadSafety.cs |
Adds stress test for navigation rebuild under concurrent root-key queries. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
We have a couple of issues reporting concerns with published content being or becoming unavailable, and needing restore via publishing or rebuilding of caches: #22254, #22384. Neither have steps to replicate, but I've done some code analysis and found three areas where we could have problems in particular multithreading scenarios.
DocumentCacheService.GetNodeAsync(): When content exists in the database butHasPublishedAncestorPath()transiently returns false, null was cached. Now we skip the cache write when the null is due to a failed ancestor check (but continue to cache null if the content truly does not exist).PublishStatusService.InitializeAsync(): Previously calledClear()then rebuilt from the database, leaving a window where concurrent readers saw an empty dictionary and concluded content was unpublished. Now builds a newConcurrentDictionaryand swaps it in withInterlocked.Exchange.ContentNavigationServiceBase.HandleRebuildAsync(): Previously cleared_roots(a non-thread-safeHashSet) then rebuilt, causing concurrent readers to see empty roots or throw. Now bundlesStructureandRootsinto a singleNavigationSnapshotrecord soHandleRebuildAsynccan replace both with oneInterlocked.Exchangecall. Readers snapshot the single reference to get a guaranteed-consistent pair.For each of these I've created an integration test to demonstrate the problem and fail expectedly, applied the fix and verified the tests now pass. So I can be fairly confident that the race conditions identified are real and the fixes are correct. It's not 100% the case of course that these are the cause of the reported problems, but they do look to align with the symptoms.
Testing
PublishStatusServiceTests.Concurrent_Initialize_Never_Transiently_Loses_Published_Status— verifies no transient loss during re-initialization.DocumentNavigationServiceTests.Concurrent_Rebuild_And_Queries_Never_Transiently_Lose_Content— verifies root keys are never transiently empty during concurrent rebuilds.DocumentHybridCacheMockTests.Null_Is_Not_Cached_When_Content_Exists_But_Ancestor_Check_Fails— verifies null is not cached when ancestor check fails.