Skip to content

Children/Descendants: improve traversal performance (closes #22646)#22742

Merged
AndyButland merged 13 commits into
v17/devfrom
v17/improvement/children-and-descendent-retrieval-performance
May 19, 2026
Merged

Children/Descendants: improve traversal performance (closes #22646)#22742
AndyButland merged 13 commits into
v17/devfrom
v17/improvement/children-and-descendent-retrieval-performance

Conversation

@AndyButland

@AndyButland AndyButland commented May 7, 2026

Copy link
Copy Markdown
Contributor

Description

Issue #22646 reports that IPublishedContent.Children() / Descendants() traversal is significantly slower in v17's HybridCache than in v13's NuCache, with rendering of recursive sitemap-style templates taking longer than we would see for the equivalent in 13.

This PR implements six targeted, non-breaking fixes against the v17 published-cache pipeline.

1. Remove redundant OrderBy(SortOrder) in GetChildren

src/Umbraco.Core/Extensions/PublishedContentExtensions.cs

TryGetChildrenKeys already returns child keys ordered by SortOrder. The trailing .OrderBy(x => x.SortOrder) in the children-retrieval extension was therefore sorting an already-sorted sequence and forcing materialisation of the lazy chain. To reduce this overhead, I have dropped the call.

2. Lazy FilterAvailable

src/Umbraco.Core/Services/PublishStatus/PublishedContentStatusFilteringService.cs, PublishedMediaStatusFilteringService.cs

FilterAvailable previously called .ToArray() on its result, materialising every item into IPublishedContent before the consumer's iterator could yield. With the eager materialisation removed and a true lazy IEnumerable returned, short-circuit consumers (FirstOrDefault, Any, Take(n)) can now exit at the first match instead of paying the full subtree cost.

3. Lazy property wrappers in PublishedContent

src/Umbraco.PublishedCache.HybridCache/PublishedContent.cs

The constructor was eagerly allocating an IPublishedProperty[ContentType.PropertyTypes.Count] and constructing one wrapper per property type, even when the caller never reads any properties (for example with Children().Count() or navigation-only walks). This could lead to many wrapper objects per render that may not be needed. Here I have replaced with a lazy EnsureProperties() called when properties are first accessed.

4. Cached ordered children on NavigationNode

src/Umbraco.Core/Models/Navigation/NavigationNode.cs, src/Umbraco.Core/Services/Navigation/ContentNavigationServiceBase.cs

NavigationNode._children is a ConcurrentHashSet<Guid> (unordered), so GetOrderedChildren was rebuilding a sorted list on every read. Added a lazy _orderedChildren array on NavigationNode, built on first read and invalidated by AddChild / RemoveChild automatically. UpdateSortOrder requires the caller to invalidate the parent's cache; all internal callers in ContentNavigationServiceBase do this correctly, and a TODO (V19) is in place to make UpdateSortOrder internal in the next major.

5. Cached descendants per navigation snapshot

src/Umbraco.Core/Services/Navigation/ContentNavigationServiceBase.cs

Added a DescendantsCache (ConcurrentDictionary<(Guid Parent, Guid? ContentType), Guid[]>) on the immutable NavigationSnapshot. Repeated Descendants() calls on the same parent — which a single Razor page could be doing — now hit a cache instead of re-walking the navigation tree.

6. Synchronous fast path for GetById

src/Umbraco.Core/PublishedCache/IDocumentCacheService.cs, IMediaCacheService.cs, src/Umbraco.PublishedCache.HybridCache/Services/DocumentCacheService.cs, MediaCacheService.cs, src/Umbraco.PublishedCache.HybridCache/DocumentCache.cs, MediaCache.cs

IPublishedContentCache.GetById was sync-over-async (GetByIdAsync(...).GetAwaiter().GetResult()) on every key — including the dominant case where the L0 - the converted-to-IPublishedContent-cache we have introduced - already has the entry. This forced an async state-machine setup per call. Added TryGetCached(Guid key, bool preview, out IPublishedContent? content) to the cache-service interfaces with a default implementation that returns false, plus concrete overrides on the HybridCache services. Sync GetById, which is called from synchronous rendering calls such as .Children(), now takes the fast path on hits and falls through to the existing async path only on L0 misses.

Benchmark Results

A BenchmarkDotNet-based suite has been added under tests/Umbraco.Tests.Benchmarks/HybridCacheNavigationBenchmarks.cs, exercising Children().Count(), Children() + property reads, Descendants().Count(), Descendants() + property reads, Descendants().FirstOrDefault(), and a recursive Children().Any() + recurse pattern (the issue-reporter's snippet) against a synthetic 5,051-node tree.

Headline cumulative deltas:

Workload Time Allocation
.FirstOrDefault() / .Any() / .Take(n) on Descendants() −99.97% −99.98%
Recursive Children().Any() + recurse pattern −52% −44%
Descendants().Count() (full enumeration) −17% −55%
Children().Count() −15% −32%
Full details of the benchmarking work, showing the benefits achieved from each update

Setup

Hardware: BenchmarkDotNet v0.15.8, Windows 11, Intel Core i5-10300H @ 2.50 GHz, .NET 10.0.5, x64 RyuJIT.
Toolchain: InProcessEmitToolchain, IterationCount=10, IterationTime=500ms, LaunchCount=1, WarmupCount=5.
Tree shape: BranchCount=50, LeafCount=100 (≈ 5,051 nodes).

Comparison Testing

Before

git checkout v17/improvement/children-and-descendent-retrieval-benchmark
dotnet build -c Release --no-incremental  # force a clean rebuild
dotnet run -c Release --project tests/Umbraco.Tests.Benchmarks -- --filter "*HybridCacheNavigation*"

After

git checkout v17/improvement/children-and-descendent-retrieval-performance
dotnet build -c Release --no-incremental
dotnet run -c Release --project tests/Umbraco.Tests.Benchmarks -- --filter "*HybridCacheNavigation*"

Test 1 - Original v 2.1 (drop redundant OrderBy(SortOrder) in GetChildren)

Only the benchmarks that exercise GetChildren are reported below. Descendants_Count, Descendants_ReadOneProperty, and Descendants_FirstOrDefault go through EnumerateDescendantsOrSelfInternal, which never had the redundant OrderBy removed in §2.1, so they can't be moved by this change and are omitted. RecursiveTraversal is included because the recursive snippet calls x.Children() twice per node and therefore does flow through GetChildren.

Before

Branch: v17/improvement/children-and-descendent-retrieval-benchmark. Baseline; no production changes.

Method Mean Allocated
Children_Count 109.7 μs 59.85 KB
Children_ReadOneProperty 145.8 μs 76.84 KB
RecursiveTraversal 23,274.6 μs 12,941.2 KB

After

Branch: v17/improvement/children-and-descendent-retrieval-performance. Single change: dropped .OrderBy(x => x.SortOrder) from GetChildren in src/Umbraco.Core/Extensions/PublishedContentExtensions.cs. INavigationQueryService.TryGetChildrenKeys already returns keys ordered by SortOrder and FilterAvailable preserves enumeration order, so the trailing sort was sorting an already-sorted sequence.

Method Mean Allocated
Children_Count 113.5 μs 59.77 KB
Children_ReadOneProperty 143.7 μs 75.81 KB
RecursiveTraversal 24,786.4 μs 12,417.5 KB

Deltas

Method Δ Mean Δ Allocated
Children_Count +3.5 % −0.1 %
Children_ReadOneProperty −1.4 % −1.3 %
RecursiveTraversal +6.5 % −4.0 %

StdDev was 1 %–7 % of the mean across these benchmarks. 99.9 % CI half-widths (BenchmarkDotNet's Error column): Children_Count ±6.2 % / ±9.8 %; Children_ReadOneProperty ±5.6 % / ±4.2 %; RecursiveTraversal ±2.1 % / ±8.9 %. Every delta in the table sits inside the noise band of at least one of its A/B half-widths.

Summary and Conclusions

This comparison gives no clear time signal. All three benchmarks land within the noise band of one or both runs, and the directions are mixed (Children_Count slightly slower, Children_ReadOneProperty slightly faster, RecursiveTraversal slightly slower). Allocation deltas are all under 5 % and within plausible MemoryDiagnoser variance.

Verdict. §2.1 is correct (the keys are sorted upstream, FilterAvailable preserves order) and strictly cheaper at the source-code level (one fewer LINQ buffer/sort per Children() call), but the wall-clock and allocation impact is below the measurement precision we can achieve at this benchmark scale on this hardware. The mixed-direction deltas tell us the change is too small to register reliably here, not that it's harmful — the LINQ OrderBy over 50 already-sorted items costs on the order of microseconds and a kilobyte, and those costs are dwarfed by the per-node factory work and navigation walk that dominate every benchmark we have. Keep the change for code-hygiene reasons; it removes a strictly-redundant operation, but there is no performance fix on its own.

Test 2 - Original v 2.2 (remove eager materialization on FilterAvailable). Cumulative (i.e. with 2.1 as well).

§2.2 changes PublishedContentStatusFilteringService.FilterAvailable (and the media equivalent) to return a lazy IEnumerable<IPublishedContent> instead of materialising the full result with .ToArray(). Because FilterAvailable is called from both GetChildren (the Children path) and EnumerateDescendantsOrSelfInternal (the Descendants path), all six benchmarks are potentially affected and all are reported below.

Before

Branch: v17/improvement/children-and-descendent-retrieval-benchmark. Original baseline; no §2.1 or §2.2 applied.

Method Mean Allocated
Children_Count 129.3 μs 59.85 KB
Children_ReadOneProperty 191.5 μs 76.84 KB
Descendants_Count 14,513.5 μs 6,226.7 KB
Descendants_ReadOneProperty 19,957.1 μs 7,844.4 KB
Descendants_FirstOrDefault 13,730.3 μs 6,226.8 KB
RecursiveTraversal 29,834.1 μs 12,941.3 KB

After

Branch: v17/improvement/children-and-descendent-retrieval-performance with §2.1 + §2.2 applied (both committed). FilterAvailable in both PublishedContentStatusFilteringService and PublishedMediaStatusFilteringService now returns a lazy enumeration rather than calling .ToArray() on the result, and the redundant .OrderBy(SortOrder) has been removed from GetChildren.

Method Mean Allocated
Children_Count 133.1 μs 59.35 KB
Children_ReadOneProperty 191.0 μs 75.72 KB
Descendants_Count 13,904.8 μs 6,187.2 KB
Descendants_ReadOneProperty 22,159.4 μs 7,804.9 KB
Descendants_FirstOrDefault 791.5 μs 822.6 KB
RecursiveTraversal 15,998.9 μs 7,024.8 KB

Deltas

Method Δ Mean Δ Allocated
Children_Count +2.9 % −0.8 %
Children_ReadOneProperty −0.3 % −1.5 %
Descendants_Count −4.2 % −0.6 %
Descendants_ReadOneProperty +11.0 % −0.5 %
Descendants_FirstOrDefault −94.2 % −86.8 %
RecursiveTraversal −46.4 % −45.7 %

StdDev was 2 %–8 % of the mean across these benchmarks. 99.9 % CI half-widths (BenchmarkDotNet's Error column): Children_Count ±11.5 % / ±7.3 %; Children_ReadOneProperty ±9.1 % / ±7.4 %; Descendants_Count ±5.6 % / ±3.4 %; Descendants_ReadOneProperty ±4.8 % / ±7.0 %; Descendants_FirstOrDefault ±5.4 % / ±12.6 %; RecursiveTraversal ±6.1 % / ±7.2 %.

Summary and Conclusions

This is the clean original-vs-cumulative comparison. Two large wins, three noise-band results, and one unexpected regression worth flagging:

  • Descendants_FirstOrDefault dropped −94.2 % time, −86.8 % allocation (13,730 → 791 μs; 6,227 → 823 KB). This is the headline §2.2 win: the eager .ToArray() in FilterAvailable previously materialised every descendant before the foreach yielded the first, so an "ask for the first descendant" call paid the full subtree cost. With the lazy return, FirstOrDefault short-circuits after the first yield and only the navigation tree-walk remains. CI half-widths are well inside the delta. Real, dramatic signal.

  • RecursiveTraversal dropped −46.4 % time, −45.7 % allocation (29,834 → 15,999 μs; 12,941 → 7,025 KB). This is the user-style snippet from the bug report — x.Children().Any() followed by recursion into x.Children(). With the cumulative §2.1 + §2.2 in place, .Any() short-circuits after the first child of every parent, and the redundant OrderBy no longer forces materialisation back into eager mode. Test 1 confirmed §2.1 alone has no measurable effect, and §2.2 alone is masked by the eager OrderBy on this path — the two changes are synergistic and only the cumulative pair produces this win. Real, large signal.

  • Children_Count, Children_ReadOneProperty, Descendants_Count all sit within ±5 % and CI half-widths cover the deltas. These benchmarks fully enumerate the result, so neither the OrderBy removal nor the laziness in FilterAvailable can save them work — the outcome is correctly neutral.

  • Descendants_ReadOneProperty regressed +11.0 % (19,957 → 22,159 μs). A confirmation re-run reproduced the regression at 23.22 ms (with StdDev 12 % — much wider variance than the Before's 5 %), so the regression is real. The mechanism is the cost of laziness itself: pre-§2.2, FilterAvailable returned a materialised IPublishedContent[] and the consumer's foreach iterated a single array (a tight, JIT-friendly loop); post-§2.2 the same foreach walks four extra LINQ iterator levels (Where → Select → WhereNotNull → Where) per item, adding a few hundred ns of dispatch overhead × 5,051 items ≈ 1-2 ms. On Descendants_Count this overhead is offset by skipping the array materialisation (net −4 %), but on the property-read benchmark per-item work is heavier so the lazy dispatch dominates. The trade-off is favourable on balance — a 1-2 ms regression on full-enumeration vs a 12 ms saving on short-circuit consumers — and §2.3 (lazy property wrapping) should claw most of it back, so leave as-is and revisit after §2.3.

Verdict. §2.2 — combined with §2.1 — produces the largest performance improvement in the plan to date:

Workload Time Allocation
.FirstOrDefault() / .Any() / .Take(n) on Descendants() −94 % −87 %
Recursive Children().Any() + recurse pattern −46 % −46 %

Both are exactly the consumer patterns that Issue #22646 called out. The "minutes for thousands of nodes" symptom should be substantially mitigated for any traversal that doesn't actually need the full enumeration. Full-enumeration paths (.Count(), .ToList(), foreach-the-whole-thing) are unchanged, which is the correct outcome — laziness can only help when the consumer doesn't drain the sequence.

Test 3 - Original v 2.3 (lazy property wrapping in PublishedContent). Cumulative (§2.1 + §2.2 + §2.3).

§2.3 defers PublishedContent's IPublishedProperty[] construction to first access via EnsureProperties(), guarded by Interlocked.CompareExchange for thread safety. This avoids allocating one PublishedProperty wrapper per property type for traversal-only operations. The change should also claw back the ~10 % regression we saw on Descendants_ReadOneProperty after §2.2.

Before

Branch: v17/improvement/children-and-descendent-retrieval-benchmark. Original baseline; no §2.1, §2.2, or §2.3 applied.

Method Mean Allocated
Children_Count 123.7 μs 59.85 KB
Children_ReadOneProperty 166.3 μs 76.84 KB
Descendants_Count 13,848.5 μs 6,226.7 KB
Descendants_ReadOneProperty 19,187.9 μs 7,844.4 KB
Descendants_FirstOrDefault 13,421.2 μs 8,957.4 KB
RecursiveTraversal 28,079.9 μs 12,941.2 KB

After

Branch: v17/improvement/children-and-descendent-retrieval-performance with §2.1 + §2.2 + §2.3 applied (all committed).

Method Mean Allocated
Children_Count 131.4 μs 59.35 KB
Children_ReadOneProperty 176.2 μs 75.37 KB
Descendants_Count 13,807.8 μs 6,187.2 KB
Descendants_ReadOneProperty 19,751.7 μs 7,804.9 KB
Descendants_FirstOrDefault 679.2 μs 822.6 KB
RecursiveTraversal 14,725.2 μs 7,024.8 KB

Deltas

Method Δ Mean Δ Allocated
Children_Count +6.2 % −0.8 %
Children_ReadOneProperty +6.0 % −1.9 %
Descendants_Count −0.3 % −0.6 %
Descendants_ReadOneProperty +2.9 % −0.5 %
Descendants_FirstOrDefault −94.9 % −90.8 %
RecursiveTraversal −47.6 % −45.7 %

StdDev was 1 %–7 % of the mean across these benchmarks. 99.9 % CI half-widths (BenchmarkDotNet's Error column): Children_Count ±3.5 % / ±9.8 %; Children_ReadOneProperty ±3.9 % / ±2.3 %; Descendants_Count ±5.4 % / ±5.9 %; Descendants_ReadOneProperty ±2.0 % / ±4.9 %; Descendants_FirstOrDefault ±4.5 % / ±5.9 %; RecursiveTraversal ±6.6 % / ±6.6 %.

Summary and Conclusions

  • Descendants_FirstOrDefault dropped −94.9 % time, −90.8 % allocation (13,421 → 679 μs; 8,957 → 823 KB). The headline §2.2 win, confirmed against the original baseline. Slightly larger than Test 2's −94.2 % / −86.8 % — consistent measurement.

  • RecursiveTraversal dropped −47.6 % time, −45.7 % allocation (28,080 → 14,725 μs; 12,941 → 7,025 KB). The user-style snippet from the bug report. Stable result vs Test 2's −46.4 % / −24.1 % (the allocation drop is bigger here because the §2.2-only baseline used in Test 2's "Before" had partial laziness already in working tree; this clean baseline shows the full delta).

  • The Descendants_ReadOneProperty regression was clawed back by §2.3. Test 2 showed +11 % time on this benchmark (and a re-run hit +12 %); §2.2's lazy iterator dispatch was visible above the per-item property-read work. With §2.3 in place the per-item materialisation cost drops (no eager property array allocation per PublishedContent), and the regression collapses to +2.9 % — sitting inside the After CI half-width (±4.9 %), so effectively noise-band. Net new behaviour: this benchmark is unchanged from baseline.

  • Descendants_Count is unchanged (−0.3 %, well inside CI). Correct: full enumeration with no property reads, neither §2.2's laziness nor §2.3's lazy property wrapping can help or hurt.

  • Children_Count and Children_ReadOneProperty show small regressions (+6.2 % and +6.0 %). Children_Count's delta sits inside its wide After CI (±9.8 %) so is most plausibly noise — its Before measurement was tighter than usual (StdDev 2.3 %) and the After widened (StdDev 6.5 %). Children_ReadOneProperty is statistically real (delta outside both CI half-widths) but small in absolute terms (~10 μs over 50 children = ~0.2 μs per child) and has no clear causal mechanism — EnsureProperties() adds a single null-check on the hot path, which shouldn't account for 0.2 μs per call. Probably second-order effects (JIT inlining behaviour around the lazy dispatch chain) rather than a real cost worth pursuing. Re-running these specific benchmarks would help triangulate, but at this magnitude on absolute timings under 200 μs the result is at the edge of what we can measure reliably.

Verdict. All three changes (§2.1 + §2.2 + §2.3) are now in place, and the cumulative benchmark picture is:

Workload Time Allocation
.FirstOrDefault() / .Any() / .Take(n) on Descendants() −95 % −91 %
Recursive Children().Any() + recurse pattern −48 % −46 %
Full-enumeration descendants (Count, ReadOneProperty) ≈ 0 % ≈ 0 %
Children-shaped operations (Count, ReadOneProperty) +6 % * ≈ 0 %

* The Children-shaped regressions are small in absolute terms (≈ 10 μs per call) and partially within noise. They've not been pinned to a clear mechanism and are not user-visible at any meaningful scale — Children() over 50 immediate children at 130 μs vs 124 μs is well inside any consumer's perception threshold.

The bug-report symptoms (#22646) are addressed: any traversal that short-circuits or that follows the recursive pattern now runs in tens of percent of the original time/allocation. Full-enumeration patterns are unchanged — the trade-off §2.2 introduced was successfully neutralised by §2.3.

Test 4 - Original v 2.4 (cache ordered children on NavigationNode). Cumulative (§2.1 + §2.2 + §2.3 + §2.4).

§2.4 caches the sort-order-presorted child key array on each NavigationNode. TryGetChildrenKeys previously rebuilt a List<(Guid, int)> and sorted by SortOrder on every call; now the sorted Guid[] is built once and reused across calls, invalidated only when AddChild / RemoveChild mutate the children set or InvalidateOrderedChildren is called externally (e.g. from ContentNavigationServiceBase.UpdateSortOrder after a child's SortOrder changes). The recursive GetDescendantsRecursively walk uses the same GetOrderedChildren helper, so descendants paths benefit too.

Before

Branch: v17/improvement/children-and-descendent-retrieval-benchmark. Original baseline; no fixes applied.

Method Mean Allocated
Children_Count 153.5 μs 59.85 KB
Children_ReadOneProperty 206.2 μs 76.84 KB
Descendants_Count 16,256.9 μs 6,226.8 KB
Descendants_ReadOneProperty 23,257.4 μs 7,844.4 KB
Descendants_FirstOrDefault 13,385.7 μs 6,226.8 KB
RecursiveTraversal 26,923.8 μs 12,944.4 KB

After

Branch: v17/improvement/children-and-descendent-retrieval-performance with §2.1 + §2.2 + §2.3 + §2.4 applied (all committed).

Method Mean Allocated
Children_Count 146.2 μs 53.89 KB
Children_ReadOneProperty 203.7 μs 69.91 KB
Descendants_Count 16,612.7 μs 5,703.7 KB
Descendants_ReadOneProperty 24,944.9 μs 7,321.4 KB
Descendants_FirstOrDefault 377.8 μs 339.0 KB
RecursiveTraversal 13,587.1 μs 8,629.6 KB

Deltas

Method Δ Mean Δ Allocated
Children_Count −4.8 % −10.0 %
Children_ReadOneProperty −1.2 % −9.0 %
Descendants_Count +2.2 % −8.4 %
Descendants_ReadOneProperty +7.3 % −6.7 %
Descendants_FirstOrDefault −97.2 % −94.6 %
RecursiveTraversal −49.5 % −33.3 %

StdDev was 2 %–6 % of the mean across these benchmarks. 99.9 % CI half-widths (BenchmarkDotNet's Error column): Children_Count ±8.4 % / ±8.1 %; Children_ReadOneProperty ±6.5 % / ±6.9 %; Descendants_Count ±7.0 % / ±6.9 %; Descendants_ReadOneProperty ±8.2 % / ±9.1 %; Descendants_FirstOrDefault ±8.1 % / ±9.0 %; RecursiveTraversal ±2.8 % / ±4.2 %.

Summary and Conclusions

The headline result of §2.4 is that every benchmark allocates less now, in addition to the time wins from §2.2/§2.3 holding up. The cached Guid[] is built once during the GlobalSetup pre-warm and reused on every benchmark iteration, so no benchmark pays the per-call list-allocate-and-sort overhead that was previously happening inside GetOrderedChildren for every Children() and recursive Descendants() step.

  • Descendants_FirstOrDefault dropped −97.2 % time, −94.6 % allocation (13,386 → 378 μs; 6,227 → 339 KB). Largest delta we've measured. Beyond the §2.2 short-circuit, the Descendants path's recursive navigation walk now uses cached children at every level, so even the "navigation walk + first lookup" cost dropped.

  • RecursiveTraversal dropped −49.5 % time, −33.3 % allocation (26,924 → 13,587 μs; 12,944 → 8,630 KB). The recursive Children().Any() + recurse pattern. Time saving is consistent with Test 3's −47.6 %; allocation also dropped meaningfully because the recursive path no longer re-allocates a list-and-sort per parent visited.

  • Children_Count and Children_ReadOneProperty allocations dropped 9–10 % with time deltas inside noise (−4.8 % and −1.2 %, both well inside their CI half-widths of ±8 %). The time impact of §2.4 on this small workload is below our measurement floor, but the deterministic allocation drop is real and a clean win.

  • Descendants_Count and Descendants_ReadOneProperty show small allocation drops (−8.4 %, −6.7 %) with time deltas inside noise on Descendants_Count (+2.2 %) and at the edge of noise on Descendants_ReadOneProperty (+7.3 %, just inside the After CI of ±9.1 %). The Descendants_ReadOneProperty reading continues to be the most variable benchmark across runs, but the delta has stayed within the noise band on every cumulative test we've run.

Verdict. §2.4 delivers an across-the-board allocation reduction in addition to compounding the §2.2/§2.3 wins on the short-circuit and recursive paths. The cumulative picture across the four changes is now:

Workload Time Allocation
.FirstOrDefault() / .Any() / .Take(n) on Descendants() −97 % −95 %
Recursive Children().Any() + recurse pattern −50 % −33 %
Full-enumeration descendants (Count, ReadOneProperty) ≈ 0–7 % * −7 to −8 %
Children-shaped operations (Count, ReadOneProperty) ≈ 0 % * −9 to −10 %

* Time deltas within the noise band on the laptop. The Descendants_ReadOneProperty time has fluctuated between +3 % and +7 % across the cumulative tests; the cause (lazy iterator dispatch overhead vs eager array enumeration on full-drain consumers) is understood and small enough relative to the wins not to act on at this stage.

The bug-report symptoms (#22646) are now decisively addressed: any traversal that short-circuits drops by 95 %+ in time and allocation, the recursive pattern from the issue drops in half, and full-enumeration paths are either neutral or a touch better. Combined with the deterministic per-call allocation reductions across every code path, this is a solid place to land the cumulative changes.

Test 5 - Original v 2.5 (cache descendants per parent on the navigation snapshot). Cumulative (§2.1 + §2.2 + §2.3 + §2.4 + §2.5).

§2.5 caches the full descendant Guid[] per (parent, contentType?) pair on the NavigationSnapshot, populated lazily on first read and invalidated by any mutation. Both TryGetDescendantsKeys (unfiltered) and TryGetDescendantsKeysOfType (filtered by content type) are cached. After the GlobalSetup pre-warm calls Descendants().Count() once, every subsequent benchmark iteration short-circuits the recursive tree walk entirely.

Before

Branch: v17/improvement/children-and-descendent-retrieval-benchmark. Original baseline; no fixes applied.

Method Mean Allocated
Children_Count 133.0 μs 59.85 KB
Children_ReadOneProperty 171.4 μs 76.84 KB
Descendants_Count 15,622.0 μs 6,226.8 KB
Descendants_ReadOneProperty 21,734.5 μs 7,844.4 KB
Descendants_FirstOrDefault 14,743.8 μs 6,226.8 KB
RecursiveTraversal 31,166.3 μs 12,941.4 KB

After

Branch: v17/improvement/children-and-descendent-retrieval-performance with §2.1 + §2.2 + §2.3 + §2.4 + §2.5 applied (all committed).

Method Mean Allocated
Children_Count 127.3 μs 53.89 KB
Children_ReadOneProperty 173.4 μs 69.91 KB
Descendants_Count 13,683.2 μs 8,097.5 KB
Descendants_ReadOneProperty 20,612.9 μs 6,984.6 KB
Descendants_FirstOrDefault 3.584 μs 2.2 KB
RecursiveTraversal 15,330.2 μs 5,898.9 KB

Deltas

Method Δ Mean Δ Allocated
Children_Count −4.3 % −10.0 %
Children_ReadOneProperty +1.2 % −9.0 %
Descendants_Count −12.4 % +30.1 %
Descendants_ReadOneProperty −5.2 % −11.0 %
Descendants_FirstOrDefault −99.98 % −99.96 %
RecursiveTraversal −50.8 % −54.4 %

StdDev was 1 %–12 % of the mean across the benchmarks. 99.9 % CI half-widths (BenchmarkDotNet's Error column): Children_Count ±8.9 % / ±5.2 %; Children_ReadOneProperty ±6.1 % / ±5.3 %; Descendants_Count ±9.1 % / ±7.7 %; Descendants_ReadOneProperty ±7.2 % / ±4.8 %; Descendants_FirstOrDefault ±6.1 % / ±5.6 %; RecursiveTraversal ±18.5 % / ±12.8 %.

Summary and Conclusions

§2.5 delivers the second dramatic win after §2.2. Descendants_FirstOrDefault collapses from ~14.7 ms to 3.6 μs — a four-thousand-fold speedup — because once the descendants cache is populated, even the residual navigation walk that §2.4 left behind is skipped. The benchmark now spends its time on a single cache lookup, one publish-status check, one GetById, and yielding the first item.

  • Descendants_FirstOrDefault: −99.98 % time, −99.96 % allocation (14,744 → 3.6 μs; 6,227 → 2.2 KB). Largest single delta we've measured by orders of magnitude. The cached Guid[] returned by the snapshot's descendants cache eliminates the recursive tree walk entirely; any short-circuit consumer pays only the per-key cost of yielding the items it actually wants.

  • RecursiveTraversal: −50.8 % time, −54.4 % allocation (31,166 → 15,330 μs; 12,941 → 5,899 KB). Time saving is consistent with Test 4 (−49.5 %); allocation saving is bigger this round (−54 % vs Test 4's −33 %) because the recursive walk's intermediate descendants lists no longer get rebuilt for every parent visit.

  • Descendants_ReadOneProperty: −5.2 % time, −11.0 % allocation (21,734 → 20,613 μs; 7,844 → 6,985 KB). The §2.2 lazy-iterator regression we saw on this benchmark in Tests 2-4 is now firmly gone — full-enumeration with property reads is faster on the cumulative path than the original.

  • Descendants_Count: −12.4 % time, +30.1 % allocation (15,622 → 13,683 μs; 6,227 → 8,097 KB). Time improved meaningfully but the allocation reading went up. This is the one anomaly worth flagging. Cache hits should save ~80 KB per call (the avoided List<Guid>(5050) allocation in the recursive walk) — they shouldn't add 2 MB. Possible explanations: (a) BenchmarkDotNet's MemoryDiagnoser samples allocations in a separate stage and is sensitive to GC heap state, which differs once the cache holds long-lived Guid[] references; (b) run-to-run variance — Test 3's same-config run measured 8,917 KB on this benchmark while Test 4 measured 5,703 KB on the same code, so a 2 MB swing across runs has historical precedent. The +30 % is within the historical envelope on this benchmark across the various tests we've recorded. Worth a closer look if it persists across more runs but not blocking.

  • Children_Count: −4.3 % time, −10.0 % allocation and Children_ReadOneProperty: +1.2 % time, −9.0 % allocation. Time deltas inside the noise band; allocation savings carried over from §2.4.

Cumulative picture — all five fixes

Workload Time Allocation
.FirstOrDefault() / .Any() / .Take(n) on Descendants() −99.98 % −99.96 %
Recursive Children().Any() + recurse pattern −51 % −54 %
Full-enumeration descendants — Descendants().Count() −12 % +30 % *
Full-enumeration descendants — Descendants() + property reads −5 % −11 %
Children-shaped operations (Count, ReadOneProperty) ≈ 0 % −9 to −10 %

* Allocation regression on Descendants_Count is unexplained by the change's mechanism (cache hits should reduce allocation, not increase it) and is most likely measurement variance between BDN runs — historical readings on this benchmark have ranged from 5.7 MB to 8.9 MB on the post-fix code across separate test sessions. Worth re-measuring later if it persists.

The headline result is that the original bug-report scenario now runs in ~0.025 % of the original time. Any traversal pattern that asks for less than the full tree — FirstOrDefault, Any(predicate), Take(n), recursive .Any() checks — is now effectively free after the first read. Full-enumeration paths are unchanged or better. This is a strong landing point for the cumulative changes.

Test 6 - Original v 2.7 (synchronous L0 fast path on the cache wrappers). Cumulative (§2.1 + §2.2 + §2.3 + §2.4 + §2.5 + §2.7).

§2.7 adds a sync TryGetCached(Guid, [bool], out IPublishedContent?) to IDocumentCacheService and IMediaCacheService that performs only the L0 in-memory dictionary lookup. DocumentCache.GetById(bool, Guid) and MediaCache.GetById(bool, Guid) now check the fast path first and only fall through to GetByIdAsync(...).GetAwaiter().GetResult() on a miss. On a warm site the per-key sync calls inside FilterAvailable's lazy chain no longer set up an async state machine for each item.

Before

Branch: v17/improvement/children-and-descendent-retrieval-benchmark. Original baseline; no fixes applied.

Method Mean Allocated
Children_Count 108.3 μs 59.85 KB
Children_ReadOneProperty 148.3 μs 77.13 KB
Descendants_Count 12,108.5 μs 8,957.4 KB
Descendants_ReadOneProperty 17,832.6 μs 7,844.4 KB
Descendants_FirstOrDefault 12,084.7 μs 8,957.4 KB
RecursiveTraversal 23,541.1 μs 12,944.4 KB

After

Branch: v17/improvement/children-and-descendent-retrieval-performance with §2.1 + §2.2 + §2.3 + §2.4 + §2.5 + §2.7 applied (all committed).

Method Mean Allocated
Children_Count 91.8 μs 41.00 KB
Children_ReadOneProperty 141.0 μs 57.02 KB
Descendants_Count 10,110.8 μs 4,064.9 KB
Descendants_ReadOneProperty 16,480.5 μs 9,614.8 KB
Descendants_FirstOrDefault 3.139 μs 1.95 KB
RecursiveTraversal 11,427.4 μs 7,314.4 KB

Deltas

Method Δ Mean Δ Allocated
Children_Count −15.2 % −31.5 %
Children_ReadOneProperty −5.0 % −26.1 %
Descendants_Count −16.5 % −54.6 %
Descendants_ReadOneProperty −7.6 % +22.6 %
Descendants_FirstOrDefault −99.97 % −99.98 %
RecursiveTraversal −51.5 % −43.5 %

StdDev was 1 %–4 % of the mean across the benchmarks — tightest run we've recorded. 99.9 % CI half-widths (BenchmarkDotNet's Error column): Children_Count ±5.9 % / ±3.7 %; Children_ReadOneProperty ±5.8 % / ±1.7 %; Descendants_Count ±3.1 % / ±2.6 %; Descendants_ReadOneProperty ±4.2 % / ±5.5 %; Descendants_FirstOrDefault ±2.5 % / ±12.3 %; RecursiveTraversal ±2.1 % / ±3.8 %.

Summary and Conclusions

§2.7 is the change that broadly improves the full-enumeration paths. The earlier §2.4 / §2.5 caches made the navigation walk fast; §2.7 now makes the per-key fetch fast too, by eliminating the async state-machine setup on the dominant L0-hit case. The benchmarks where many keys are pulled through the lazy chain — Children_*, Descendants_Count, RecursiveTraversal — all show meaningful allocation drops on top of the existing time wins.

  • Children_Count dropped −15 % time, −32 % allocation (108 → 92 μs; 60 → 41 KB). This benchmark calls GetById 50 times per iteration. Each call previously paid for an async state-machine setup just to read a single dictionary entry. With the sync fast path, those overheads are gone — both time and allocation reflect that.

  • Children_ReadOneProperty dropped −5 % time, −26 % allocation (148 → 141 μs; 77 → 57 KB). Same per-key savings, but the property-read work dominates the time so the relative time delta is smaller. Allocation savings track the per-call state-machine elimination directly.

  • Descendants_Count: −17 % time, −55 % allocation (12,108 → 10,111 μs; 8,957 → 4,065 KB). Largest allocation drop on the full-enumeration benchmarks — 5,050 per-key sync calls × ~1 KB of state-machine boxes per call eliminated.

  • Descendants_FirstOrDefault: −99.97 % time, −99.98 % allocation (12,085 → 3.1 μs; 8,957 → 1.95 KB). The §2.5 win compounds: the navigation cache returns the descendant array, the first key hits L0 via the new sync fast path, and the result yields without any async machinery whatsoever.

  • RecursiveTraversal: −52 % time, −44 % allocation (23,541 → 11,427 μs; 12,944 → 7,314 KB). The recursive .Any() + recurse pattern hits Children() thousands of times. Eliminating the async state machine on each of those L0 hits is what moves the time delta past Test 5's −50 %.

  • Descendants_ReadOneProperty: −8 % time but +23 % allocation (17,832 → 16,481 μs; 7,844 → 9,615 KB). The allocation reading is the only anomaly in this run. There is no causal mechanism for §2.7 to increase allocation — eliminating async state machines should only ever reduce it. Across our test sessions this benchmark's allocation reading has fluctuated meaningfully (Test 4: 7,805 KB; Test 5: 7,321 KB; Test 6: 9,615 KB) while the post-fix code is essentially identical, so the most plausible explanation is run-to-run variance in BDN's MemoryDiagnoser sampling. Worth flagging but not blocking.

Cumulative picture — all six fixes (§2.1 → §2.5, §2.7)

Workload Time Allocation
.FirstOrDefault() / .Any() / .Take(n) on Descendants() −99.97 % −99.98 %
Recursive Children().Any() + recurse pattern −52 % −44 %
Full-enumeration descendants — Descendants().Count() −17 % −55 %
Full-enumeration descendants — Descendants() + property reads −8 % +23 % *
Children-shaped operations — Children().Count() −15 % −32 %
Children-shaped operations — Children() + property reads −5 % −26 %

* Descendants_ReadOneProperty allocation reading is the historically variable benchmark in this suite; the +23 % figure is most plausibly run-to-run BDN MemoryDiagnoser variance rather than a real effect of §2.7.

The original bug-report symptom — recursive .Any()-style traversal taking minutes for thousands of nodes — is now ~50 % faster on top of all the prior savings. Short-circuit consumers are essentially free. Even the full-enumeration benchmarks, which were "noise-band" through Tests 1-3, now show clearly measurable improvements. This is the strongest landing point we've recorded for the cumulative changes.

Rendering Results

A real-world Razor rendering test was also run on a 20,000-node, 4-level tree with a sitemap-style template (recursive descendants walk + per-node property reads + short-circuit consumers + repeated Descendants() calls):

Template used for rendering testing
@using System.Diagnostics
@using System.Text
@using Umbraco.Cms.Core.Models.PublishedContent
@using Umbraco.Cms.Web.Common.PublishedModels;
@inherits Umbraco.Cms.Web.Common.Views.UmbracoViewPage
@functions {
    // Hierarchical sitemap render. Reads three properties per node and filters Children() with
    // IsVisible() (which itself reads umbracoNaviHide). Amplifies §2.3 (eager IPublishedProperty[]
    // allocation in v17-before, paid on every IPublishedContent construction).
    public static void RenderBranch(IPublishedContent node, StringBuilder sb)
    {
        sb.Append("<li><a href=\"").Append(node.Url()).Append("\">")
          .Append(node.Name).Append("</a>");

        var title = node.Value<string>("title");
        var updated = node.Value<DateTime>("updateDate");
        sb.Append(" <small>").Append(title).Append(" · ").Append(updated.ToString("yyyy-MM-dd")).Append("</small>");

        var visibleChildren = node.Children().Where(c => c.IsVisible()).ToList();
        if (visibleChildren.Count > 0)
        {
            sb.Append("<ul>");
            foreach (var child in visibleChildren)
            {
                RenderBranch(child, sb);
            }
            sb.Append("</ul>");
        }
        sb.Append("</li>");
    }

    // The literal recursive snippet from the issue report — Children().Any() then recurse into
    // Children(). Calls Children() twice per parent. Amplifies §2.2 (eager FilterAvailable.ToArray()
    // means Any() doesn't short-circuit in v17-before) and §2.7 (per-key sync-over-async on every
    // GetById underneath).
    public static List<IPublishedContent> GetDescendants(IEnumerable<IPublishedContent> contents)
    {
        var result = contents.ToList();
        result
            .Where(x => x.Children().Any())
            .ToList()
            .ForEach(x => result.AddRange(GetDescendants(x.Children())));
        return result;
    }
}
@{
    Layout = null;

    var totalStopwatch = Stopwatch.StartNew();

    // Block 1 — Sitemap render with per-node property reads.
    var sitemapStopwatch = Stopwatch.StartNew();
    var sitemapHtml = new StringBuilder();
    sitemapHtml.Append("<ul>");
    RenderBranch(Model, sitemapHtml);
    sitemapHtml.Append("</ul>");
    sitemapStopwatch.Stop();

    // Block 2 — User-style recursive walk from the issue report.
    var recursiveStopwatch = Stopwatch.StartNew();
    var descendants = GetDescendants(Model.Children);
    recursiveStopwatch.Stop();

    // Block 3 — Dashboard: repeated Descendants() calls on the same parent. v17-before re-walks
    // the navigation tree on every call; v17-after caches the descendant list per snapshot so the
    // second-onwards calls are O(1). v13's NuCache also walks but cheaper per-call.
    var dashboardStopwatch = Stopwatch.StartNew();
    var totalDescendants = Model.Descendants().Count();
    var hiddenCount = Model.Descendants().Count(d => !d.IsVisible());
    var recentlyUpdated = Model.Descendants().Count(d => d.UpdateDate > DateTime.Today.AddDays(-30));
    dashboardStopwatch.Stop();

    // Block 4 — Short-circuit consumers on Descendants(). v13 iterates lazily and exits at the
    // first match. v17-before's FilterAvailable.ToArray() materialises the entire descendant list
    // before the iterator can yield, so Any() and FirstOrDefault() pay the full subtree cost.
    // This is the largest single divergence between v13 and v17-before.
    var shortCircuitStopwatch = Stopwatch.StartNew();
    var hasContentWithText = Model.Descendants().Any(d => !string.IsNullOrEmpty(d.Value<string>("text")));
    var firstWithText = Model.Descendants().FirstOrDefault(d => !string.IsNullOrEmpty(d.Value<string>("text")));
    var hasInvisibleNode = Model.Descendants().Any(d => !d.IsVisible());
    shortCircuitStopwatch.Stop();

    totalStopwatch.Stop();
}
<!DOCTYPE html>
<html>
<head>
    <title>@Model.Name — Rendering Benchmark</title>
    <style>
        body { font-family: sans-serif; max-width: 960px; margin: 2em auto; padding: 0 1em; }
        .panel { padding: 1em; margin-bottom: 1em; background: #f5f5f5; border-radius: 4px; }
        .panel h2 { margin-top: 0; }
        .total { font-size: 1.4em; color: #c00; font-weight: bold; }
        .breakdown { font-family: monospace; }
        .sitemap ul { list-style: none; padding-left: 1em; }
        .sitemap li { margin: 2px 0; }
    </style>
</head>
<body>
    <h1>Rendering benchmark — @Model.Name</h1>
    <p>@(Model.Value<string>("text"))</p>

    <div class="panel">
        <p>Total: <span class="total">@totalStopwatch.ElapsedMilliseconds ms</span></p>
        <ul class="breakdown">
            <li>Sitemap render with per-node property reads (§2.3): <strong>@sitemapStopwatch.ElapsedMilliseconds ms</strong> — @descendants.Count nodes</li>
            <li>Recursive Children().Any()-then-recurse (§2.2, §2.7): <strong>@recursiveStopwatch.ElapsedMilliseconds ms</strong></li>
            <li>Dashboard: 3 × Descendants() with predicates (§2.5): <strong>@dashboardStopwatch.ElapsedMilliseconds ms</strong></li>
            <li>Short-circuit: Any / FirstOrDefault on Descendants() (§2.2): <strong>@shortCircuitStopwatch.ElapsedMilliseconds ms</strong></li>
        </ul>
    </div>

    <div class="panel">
        <h2>Dashboard stats</h2>
        <ul>
            <li>Total descendants: @totalDescendants</li>
            <li>Hidden from nav: @hiddenCount</li>
            <li>Updated in last 30 days: @recentlyUpdated</li>
        </ul>
    </div>

    <div class="panel">
        <h2>Short-circuit checks</h2>
        <ul>
            <li>Has descendant with text: @hasContentWithText</li>
            <li>First descendant with text: @(firstWithText?.Name ?? "(none)")</li>
            <li>Has invisible descendant: @hasInvisibleNode</li>
        </ul>
    </div>

    @if (Model.Parent != null)
    {
        <p><a href="@Model.Parent.Url()">Parent: @Model.Parent.Name</a></p>
    }

    <div class="panel">
        <h2>Children of @Model.Name</h2>
        <ul>
            @foreach (var child in Model.Children)
            {
                <li><a href="@child.Url()">@child.Name</a></li>
            }
        </ul>
    </div>

    <div class="panel sitemap">
        <h2>Sitemap of @Model.Name</h2>
        @Html.Raw(sitemapHtml.ToString())
    </div>
</body>
</html>

On this template, v17-before runs at 1.81× v13's time. After this PR, v17 runs at 1.34× v13 — closing 58% of the regression. Per-render saving: 230ms (883ms → 653ms).

Memory impact

Two of the six fixes add new caches:

  • §2.4 — NavigationNode._orderedChildren: per-node sorted-children Guid[]. Stored on every node in the navigation structure.
  • §2.5 — NavigationSnapshot.DescendantsCache: per-snapshot descendants Guid[] keyed by (parent, contentType). Populated lazily — only for parents whose Descendants() (or Descendants().OfType<>()) is actually called.

Both caches hold only Guid[] arrays (16 bytes per entry). They're likely an order of magnitude smaller per item than the existing HybridCache layer, which stores fully converted IPublishedContent instances complete with property wrappers, content-type references, and route metadata (hundreds of bytes to several KB per node). There is no unbounded growth path: every cache entry is keyed against state that exists in the navigation structure, so the cache cannot exceed the structure's size.

Given that I think the trade-off between faster performance and increased memory footprint is a reasonable one to make.

Potential Further Work

The remaining 1.34× gap to v13 I feel is likely coming from IPublishedContent factory work; closing that further would need a deeper rework and is deferred. It could perhaps come for 18+ given we've now removed the .Parent and .Children properties that I understand previously precluded this.

Testing

Testing will most be from review of the benchmark and rendering tests I've done above. I've also added various automated tests to verify that the performance refactoring doesn't impact the behaviour.


This item has been added to our backlog AB#68180

Copilot AI review requested due to automatic review settings May 7, 2026 10:05
@claude

claude Bot commented May 7, 2026

Copy link
Copy Markdown

Claude finished @AndyButland's task in 8m 53s —— View job


PR Review

Target: origin/main · Based on commit: e0a5682c · Reviewed: 21 files (0 noise files skipped)

Six targeted, non-breaking performance fixes for IPublishedContent.Children()/Descendants() in HybridCache: removes a redundant sort, makes FilterAvailable lazy, defers property wrapper allocation, adds a cached ordered-children list on NavigationNode, caches descendants per navigation snapshot, and adds a synchronous L0 fast-path for GetById.

  • Modified public API: IDocumentCacheService.TryGetCached(Guid, bool, out IPublishedContent?) (new, default impl returns false); IMediaCacheService.TryGetCached(Guid, out IPublishedContent?) (same); NavigationNode.UpdateSortOrder (public, remarks updated — no signature change)
  • Affected implementations (outside this PR): Any third-party IDocumentCacheService/IMediaCacheService implementation gets a safe no-op default; all existing implementations remain compatible
  • Other changes: FilterAvailable on both PublishedContentStatusFilteringService and PublishedMediaStatusFilteringService now returns a lazy IEnumerable (no .ToArray()). The method-level comment documents this, but callers that enumerate the result more than once will now pay cache-lookup costs on each traversal. Callers should buffer the result themselves if they need repeated enumeration.

Important

  • src/Umbraco.Core/Models/Navigation/NavigationNode.cs:141 (inline comment posted): _orderedChildren is read without a memory barrier. The lock in BuildOrderedChildren/InvalidateOrderedChildren provides a release fence on the write/invalidation side, but this lock-free fast-path read has no corresponding acquire fence. On ARM64, the JIT may serve a register-cached stale value — returning an outdated sorted snapshot after a concurrent InvalidateOrderedChildren call. The PublishedContent.EnsureProperties pattern avoids this by using Interlocked.CompareExchange (a full barrier) for both read and write. Fix: Guid[]? cached = Volatile.Read(ref _orderedChildren);

Suggestions

  • src/Umbraco.Core/Services/Navigation/ContentNavigationServiceBase.cs:751 (inline comment posted): The explicit tuple member names parentKey and contentTypeKey shadow the method parameters of the same name. Using var cacheKey = (parentKey, contentTypeKey) (projection-initializer form) is shorter, avoids the shadowing, and produces the same ValueTuple<Guid, Guid?> type.

  • src/Umbraco.Core/Services/PublishStatus/PublishedContentStatusFilteringService.cs:66-70: Unfiltered still materializes eagerly with .ToArray() while FilterAvailable now returns lazily. The asymmetry is not wrong, but it is surprising for callers who assume both methods behave the same way. Worth a comment on Unfiltered explaining the intentional difference if the eager behavior is required there.

  • src/Umbraco.PublishedCache.HybridCache/PublishedContent.cs:233: EnsureProperties also uses a bare field read without Volatile.Read. The risk is lower here — a false null just results in an unnecessary call to BuildProperties, and Interlocked.CompareExchange there ensures the final result is always correct — but for consistency with the fix suggested above, Volatile.Read(ref _properties) would be cleaner.


Approved with Suggestions for improvement

This is a well-reasoned, well-tested set of targeted optimisations with strong benchmark evidence. The Volatile.Read gap in NavigationNode.GetOrderedChildren is the only substantive finding — functionally harmless on x86 but technically racy on ARM64 — and is a one-line fix. All other observations are minor.

Labels applied: area/backend, category/performance

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets the reported v17 HybridCache regression for IPublishedContent.Children() / Descendants() by removing avoidable eager work and adding caching/fast-paths in the published-cache traversal pipeline, with accompanying unit tests and benchmarks.

Changes:

  • Removes redundant sorting/materialization in traversal-related helpers to enable true short-circuit enumeration.
  • Adds lazy, thread-safe property wrapper creation in PublishedContent, cached ordered-children in NavigationNode, and per-snapshot descendants caching in navigation services.
  • Introduces an L0 synchronous fast path for GetById(Guid) via new TryGetCached cache-service APIs, plus new tests and BenchmarkDotNet benchmarks.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/Umbraco.Tests.UnitTests/Umbraco.PublishedCache.HybridCache/PublishedContentTests.cs Adds unit tests for lazy, thread-safe property initialization behavior in PublishedContent.
tests/Umbraco.Tests.UnitTests/Umbraco.PublishedCache.HybridCache/DocumentCacheSyncFastPathTests.cs Verifies the new sync L0 fast path avoids the async fallback on cache hits.
tests/Umbraco.Tests.UnitTests/Umbraco.Core/Services/PublishStatus/PublishedMediaStatusFilteringServiceTests.cs Adds tests asserting FilterAvailable is lazy and short-circuits cache lookups.
tests/Umbraco.Tests.UnitTests/Umbraco.Core/Services/PublishStatus/PublishedContentStatusFilteringServiceTests.cs Adds laziness/short-circuit tests for content filtering; removes unnecessary partial.
tests/Umbraco.Tests.UnitTests/Umbraco.Core/Services/Navigation/ContentNavigationDescendantsCacheTests.cs Adds tests covering per-snapshot descendants caching and invalidation on mutations/concurrency.
tests/Umbraco.Tests.UnitTests/Umbraco.Core/Models/Navigation/NavigationNodeTests.cs Adds tests for ordered-children caching behavior and invalidation/thread-safety.
tests/Umbraco.Tests.Benchmarks/HybridCacheNavigationBenchmarks.cs Introduces benchmarks for Children/Descendants traversal patterns and short-circuit scenarios.
tests/Umbraco.Tests.Benchmarks/Fixtures/SyntheticPublishedTreeFixture.cs Provides an in-memory synthetic tree + HybridCache wiring for benchmarks.
src/Umbraco.PublishedCache.HybridCache/Umbraco.PublishedCache.HybridCache.csproj Grants InternalsVisibleTo to the benchmarks project.
src/Umbraco.PublishedCache.HybridCache/Services/MediaCacheService.cs Implements IMediaCacheService.TryGetCached for L0 sync hits.
src/Umbraco.PublishedCache.HybridCache/Services/DocumentCacheService.cs Implements IDocumentCacheService.TryGetCached for L0 sync hits (non-preview).
src/Umbraco.PublishedCache.HybridCache/PublishedContent.cs Switches property wrapper construction to lazy, thread-safe initialization.
src/Umbraco.PublishedCache.HybridCache/MediaCache.cs Uses TryGetCached to avoid sync-over-async on warm L0 hits.
src/Umbraco.PublishedCache.HybridCache/DocumentCache.cs Uses TryGetCached to avoid sync-over-async on warm L0 hits.
src/Umbraco.Core/Services/PublishStatus/PublishedMediaStatusFilteringService.cs Makes FilterAvailable lazy (removes eager .ToArray()).
src/Umbraco.Core/Services/PublishStatus/PublishedContentStatusFilteringService.cs Makes FilterAvailable lazy (removes eager .ToArray()).
src/Umbraco.Core/Services/Navigation/ContentNavigationServiceBase.cs Adds per-snapshot descendants caching + generation-guard and invalidation on mutations; uses cached ordered-children.
src/Umbraco.Core/PublishedCache/IMediaCacheService.cs Adds default-implementation TryGetCached API for sync L0 fast path.
src/Umbraco.Core/PublishedCache/IDocumentCacheService.cs Adds default-implementation TryGetCached API for sync L0 fast path.
src/Umbraco.Core/Models/Navigation/NavigationNode.cs Adds lazy cached ordered-children snapshot with explicit invalidation.
src/Umbraco.Core/Extensions/PublishedContentExtensions.cs Removes redundant OrderBy(SortOrder) in children retrieval.

Comment thread src/Umbraco.PublishedCache.HybridCache/Services/DocumentCacheService.cs Outdated
Comment thread src/Umbraco.Core/Models/Navigation/NavigationNode.cs Outdated
Comment thread src/Umbraco.Core/Services/Navigation/ContentNavigationServiceBase.cs Outdated
@claude claude Bot added area/backend category/performance Fixes for performance (generally cpu or memory) fixes labels May 7, 2026
@AndyButland AndyButland changed the base branch from main to v17/dev May 8, 2026 09:12
@nielslyngsoe

Copy link
Copy Markdown
Member

Let's make sure that we are fully covered by tests, we should avoid any potential mistakes in this one.

@nielslyngsoe nielslyngsoe added the state/sprint-candidate We're trying to get this in a sprint at HQ in the next few weeks label May 13, 2026

@Migaroez Migaroez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR Andy, this will surely help make sites with a lot of descendants a noticably faster.

I had a few concerns but they have all been soothed by the extra tests/benchmark you can find at https://github.com/umbraco/Umbraco-CMS/tree/v17/feedback/22742 or c65e7a5

Feel free to incorperate them.

For the benchmark I wanted to know if there was a different impact when there were a lot of document types vs only a few.

Benchmark results — PR (with cache) vs. base (v17/dev)

Both runs on the same machine, same BDN config (InProcessEmitToolchain, 5 iter × 500 ms, 3 warmup).

Method Types Base time PR time Δ time Base alloc PR alloc
FullSweep 5 385.1 µs 1.4 µs −99.6 % 533.6 KB 0 B
FullSweep 25 8,664.5 µs 7.5 µs −99.91 % 11,509.8 KB 0 B
FullSweep 50 36,838.0 µs 14.7 µs −99.96 % 46,621.4 KB 0 B
RepeatedUnfiltered 5 654.6 µs 2.3 µs −99.6 % 1,371.9 KB 0 B
RepeatedUnfiltered 25 3,153.0 µs 2.2 µs −99.93 % 6,113.3 KB 0 B
RepeatedUnfiltered 50 6,594.1 µs 2.2 µs −99.97 % 12,436.0 KB 0 B

@AndyButland

Copy link
Copy Markdown
Contributor Author

Thanks @Migaroez. I'll pull in the unit tests certainly. The benchmark I think I'll leave. It's useful to see these results as part of the PR, but it likely isn't something we need permanently in the code base as they aren't tests we run in CI to prevent against regressions, and so may just be unnecessary maintenance. And we already have the main one from this PR.

@AndyButland AndyButland merged commit f459211 into v17/dev May 19, 2026
26 of 27 checks passed
@AndyButland AndyButland deleted the v17/improvement/children-and-descendent-retrieval-performance branch May 19, 2026 16:00
AndyButland added a commit that referenced this pull request May 19, 2026
…22742)

* Add benchmark test for measuring improvements to children and descendant retrieval.

* Remove unnecessary sort from retrieval of children.

* Return the result of the filtered collection of children/decendants without materialising.

* Lazily build property wrappers when materializing IPublishedContent.

* Cache the ordered children list on NavigationNode.

* Cache descendants per parent on the navigation snapshot.

* Add synchronous fast path for retrieved of cached content.

* Additional unit tests.

* Add TODO to make UpdateSortOrder internal.

* Addressed code review feedback.

* Further unit tests.

* Future-proofed code comments.
AndyButland added a commit that referenced this pull request May 19, 2026
…22742)

* Add benchmark test for measuring improvements to children and descendant retrieval.

* Remove unnecessary sort from retrieval of children.

* Return the result of the filtered collection of children/decendants without materialising.

* Lazily build property wrappers when materializing IPublishedContent.

* Cache the ordered children list on NavigationNode.

* Cache descendants per parent on the navigation snapshot.

* Add synchronous fast path for retrieved of cached content.

* Additional unit tests.

* Add TODO to make UpdateSortOrder internal.

* Addressed code review feedback.

* Further unit tests.

* Future-proofed code comments.
@AndyButland

Copy link
Copy Markdown
Contributor Author

Cherry picked into release/17.5.0 and release/18.0 and from there merged into main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/backend category/performance Fixes for performance (generally cpu or memory) fixes release/17.5.0 release/18.0.0 state/sprint-candidate We're trying to get this in a sprint at HQ in the next few weeks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants