Skip to content

perf(shader-cache): hybrid-CPU scheduling and LPT dispatch#2021

Merged
alandtse merged 6 commits into
community-shaders:devfrom
alandtse:shader_cores
Mar 31, 2026
Merged

perf(shader-cache): hybrid-CPU scheduling and LPT dispatch#2021
alandtse merged 6 commits into
community-shaders:devfrom
alandtse:shader_cores

Conversation

@alandtse
Copy link
Copy Markdown
Collaborator

@alandtse alandtse commented Mar 28, 2026

Detect P-core count via GetLogicalProcessorInformationEx so thread limits are meaningful on Intel hybrid CPUs (Alder/Raptor Lake). Result is cached — topology never changes at runtime.

Thread pool sizing:

  • Startup: all logical cores (E-cores help; no game loop running)
  • Background (in-game): half of P-core count to spare the render thread
  • Management and file-watcher moved to dedicated std::jthread so they no longer consume pool slots or inflate get_tasks_running()

Compilation dispatch (WaitTake):

  • Tasks are assigned a static priority score (LPT: Longest Processing Time first) based on shader type, class, descriptor popcount, and known-heavy techniques (MTLand, TruePBR, AnisoLighting, Parallax)
  • Heavy tasks (score >= 500) are limited to P-core count concurrent slots on hybrid CPUs, leaving E-cores for lighter permutations
  • Reduces tail latency caused by heavy shaders landing on slow cores

Compilation correctness:

  • ClaimCompilation atomically marks a shader Pending before releasing mapMutex, preventing redundant parallel compilation of the same key
  • Threads waiting on a Pending entry block on mapCV and resume when the compiling thread calls AddCompletedShader

Observability:

  • ProcessCompilationSet records wall-clock time per task; logs slow (>=2s) at debug and very slow (>=8s) at info with descriptor complexity and source file size as straggler proxies
  • ETA now uses priority-weighted progress fraction over wall-clock elapsed time, which is more accurate under parallel workloads
  • Dev overlay shows active threads, heavy-task concurrency, and running slow/very-slow counts
  • Settings tab adds a stacked bar breaking down the last build by deduplicated, fast, slow, very slow, and failed shaders

Summary by CodeRabbit

  • New Features

    • Stacked shader-cache task breakdown in Settings with hover details.
    • Developer-only diagnostics and overlay showing compilation metrics, progress bars, top‑3 slowest shaders, and per-item copy action.
    • Human-readable duration formatting for compilation timers.
  • Improvements

    • Clarified tooltips for compiler and background-thread sliders with concrete defaults and impact guidance.
    • Performance-core-aware thread selection on hybrid CPUs; enhanced scheduling, failure handling, deduplication, and more accurate ETA/progress metrics.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaced hardware-concurrency sizing with a perf-core query; added Util::GetPerformanceCoreCount and FormatDuration; implemented per-key claim/wait shader compilation with priority-weighted scheduling, heavy-task throttling, dedicated management/file-watcher threads, slow-task instrumentation, and developer UI/overlay updates.

Changes

Cohort / File(s) Summary
Performance Core Utility
src/Utils/WinApi.h, src/Utils/WinApi.cpp
Add Util::GetPerformanceCoreCount() that queries CPU topology for highest-efficiency cores, caches the result, and falls back to hardware_concurrency() on failure.
Formatting Helper
src/Utils/Format.h, src/Utils/Format.cpp
Add Util::FormatDuration(double ms) returning zero-padded HH:MM:SS, clamping negative or non-finite inputs to 00:00:00.
Shader Cache Core
src/ShaderCache.h, src/ShaderCache.cpp
Introduce per-key claim/wait API (ClaimCompilation, ResolvePendingFailure), cached per-task priority, priority-ordered task containers, heavy-task threshold and inflight tracking, top-slow/parallelism telemetry, priority-weighted ETA/progress, and move management and file-watcher to dedicated std::jthreads; add mapCV signalling and adjust pool sizing/stop logic.
UI: Settings / Overlay / Advanced
src/Menu/SettingsTabRenderer.cpp, src/Menu/OverlayRenderer.cpp, src/Menu/AdvancedSettingsRenderer.cpp
Add stacked-bar shader-task visualization and hover tooltip, developer-only thread diagnostics and derived parallelism stats, update tooltips for compiler-thread controls, and add Top-3 slowest-shaders list with copy-key context menu.
WaterCache change
src/Features/UnifiedWater/WaterCache.cpp
Switched async build thread-source from std::thread::hardware_concurrency() to Util::GetPerformanceCoreCount() while retaining existing clamping/derivation.
Includes / Threading helpers
src/ShaderCache.h, various files
Added Utils/WinApi.h includes where used; declared managementJthread, fileWatcherThread, mapCV, and updated pool/thread-count initialization.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Compile caller
    participant Cache as ShaderCache
    participant Pool as Compilation Pool
    participant Mgmt as Management Thread
    participant FW as FileWatcher Thread

    Caller->>Cache: CompileShader(key)
    Cache->>Cache: ClaimCompilation(key)
    alt CacheHit
        Cache-->>Caller: (CacheHit, blob)
    else AlreadyPending
        Cache-->>Caller: wait on mapCV until Completed/Failed
    else Claimed
        Cache-->>Caller: Claimed (will compile)
        Caller->>Pool: Submit ShaderCompilationTask(priority)
        par Pool compiles
            Pool->>Pool: ComputePriority, set thread priority
            Pool->>Pool: Execute task, measure wall-time
            Pool->>Cache: Record Completed/Failed (blob/nullptr)
            Cache->>Cache: Update slow/very-slow counters & weights
            Cache->>Cache: mapCV.notify_all()
        and Mgmt enforces limits
            Mgmt->>Pool: Enforce heavyTasksInFlight limit (perf-core based)
        and FW processes file changes
            FW->>Cache: enqueue file-watcher events
        end
        Cache-->>Caller: Unblocked on mapCV signal
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • doodlum
  • jiayev

Poem

🐰 I count the perf cores, nibble each thread,
Tasks line up neatly — some sleep, some tread,
Slow shaders noted, keys copied with cheer,
Management hums, file-watcher keeps near,
Hooray — builds hop forward, light and clear.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main objectives of the changeset: hybrid-CPU scheduling detection and LPT-based dispatch improvements to the shader cache.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

No actionable suggestions for changed features.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ShaderCache.cpp`:
- Around line 2354-2358: The destructor’s timeout fallback escalates the wrong
thread: after moving ManageCompilationSet onto managementJthread, update the
timeout recovery path to target the managementJthread (and the
ManageCompilationSet stop/join logic) instead of the old managementThread/worker
escalation; specifically, when waiting on the compilation pool times out, send
the stop request and join/interrupt the managementJthread (or invoke its stop
token/notify path tied to ManageCompilationSet) ensuring you check for
managementJthread's existence and join/cleanup it rather than trying to escalate
pool worker threads.
- Around line 2392-2394: The new watcher thread captures ShaderCache::listener
by reference causing a race where ShaderCache::listener can be nulled before the
thread runs; update the jthread creation to capture the concrete
watcher/listener instance by value (copy or shared_ptr) so the thread holds its
own lifetime, and modify StopFileWatcher() to explicitly request stop and join
the listener->fileWatcherThread (or call .join()/.request_stop() on the jthread)
before deleting or nulling any watcher/listener objects; ensure processQueue()
and the watcher destructor are safe to run after request_stop and only then
reset raw pointers to nullptr to avoid null-deref/lifetime bugs.

In `@src/ShaderCache.h`:
- Around line 483-488: The code still subtracts 1 from
std::thread::hardware_concurrency() so startup doesn’t actually use all logical
cores; change compilationThreadCount to use the full logical core count (remove
the "- 1") or update the comment/UI if you intend to keep reserving a core, and
initialize BS::thread_pool compilationPool with that compilationThreadCount (use
compilationThreadCount as the pool size instead of calling
hardware_concurrency() again) so the pool size matches the intended startup
behavior; ensure references are to compilationThreadCount,
backgroundCompilationThreadCount, and compilationPool.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 15b0e8ee-50b0-4bc2-89d1-34ec8878cf8c

📥 Commits

Reviewing files that changed from the base of the PR and between d42bcf6 and 56e2728.

📒 Files selected for processing (8)
  • src/Features/UnifiedWater/WaterCache.cpp
  • src/Menu/AdvancedSettingsRenderer.cpp
  • src/Menu/OverlayRenderer.cpp
  • src/Menu/SettingsTabRenderer.cpp
  • src/ShaderCache.cpp
  • src/ShaderCache.h
  • src/Utils/WinApi.cpp
  • src/Utils/WinApi.h

Comment thread src/ShaderCache.cpp
Comment thread src/ShaderCache.cpp Outdated
Comment thread src/ShaderCache.h Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 28, 2026

✅ A pre-release build is available for this PR:
Download

@alandtse alandtse closed this in 15df3df Mar 29, 2026
@alandtse alandtse reopened this Mar 29, 2026
alandtse and others added 2 commits March 29, 2026 17:32
Detect P-core count via GetLogicalProcessorInformationEx so thread
limits are meaningful on Intel hybrid CPUs (Alder/Raptor Lake).
Result is cached — topology never changes at runtime.

Thread pool sizing:
- Startup: all logical cores (E-cores help; no game loop running)
- Background (in-game): half of P-core count to spare the render thread
- Management and file-watcher moved to dedicated std::jthread so they
  no longer consume pool slots or inflate get_tasks_running()

Compilation dispatch (WaitTake):
- Tasks are assigned a static priority score (LPT: Longest Processing
  Time first) based on shader type, class, descriptor popcount, and
  known-heavy techniques (MTLand, TruePBR, AnisoLighting, Parallax)
- Heavy tasks (score >= 500) are limited to P-core count concurrent
  slots on hybrid CPUs, leaving E-cores for lighter permutations
- Reduces tail latency caused by heavy shaders landing on slow cores

Compilation correctness:
- ClaimCompilation atomically marks a shader Pending before releasing
  mapMutex, preventing redundant parallel compilation of the same key
- Threads waiting on a Pending entry block on mapCV and resume when
  the compiling thread calls AddCompletedShader

Observability:
- ProcessCompilationSet records wall-clock time per task; logs slow
  (>=2s) at debug and very slow (>=8s) at info with descriptor
  complexity and source file size as straggler proxies
- ETA now uses priority-weighted progress fraction over wall-clock
  elapsed time, which is more accurate under parallel workloads
- Dev overlay shows active threads, heavy-task concurrency, and
  running slow/very-slow counts
- Settings tab adds a stacked bar breaking down the last build by
  deduplicated, fast, slow, very slow, and failed shaders

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/Utils/Format.h (1)

69-77: Clarify duration formatting contract in docstring

Consider documenting how fractional, negative, and invalid inputs are handled (truncate vs round; clamp behavior). It will prevent API ambiguity for call sites.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Utils/Format.h` around lines 69 - 77, Update the FormatDuration(double
ms) docstring to explicitly state how fractional milliseconds, negative values,
and invalid inputs are handled: specify whether the function rounds or truncates
fractional milliseconds when computing seconds/minutes/hours, whether negative
durations are clamped to "00:00:00" or prefixed with a "-" sign, and what
constitutes an "invalid" input (e.g., NaN/inf) and how it is handled (e.g.,
treat as zero or return an error string). Mention the time range behavior as
well (if durations beyond 24 hours are allowed and how hours are formatted) so
callers of FormatDuration know the exact contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Menu/SettingsTabRenderer.cpp`:
- Around line 243-246: The compiled bucket calculation wrongly subtracts
cacheHits from completed (variables compiled, completed, cacheHits) even though
cache hits are mutually exclusive with queued/compiled tasks; change the
calculation to use completed directly (uint64_t compiled = completed) so
compiled represents all completed compilations, and keep the downstream bucket
math (fast = compiled > slow ? compiled - slow : 0, medium = slow > verySlow ?
slow - verySlow : 0) unchanged; update any comment explaining "Fast = compiled
successfully..." to reflect that compiled already excludes cache hits.

In `@src/Utils/Format.cpp`:
- Around line 161-168: FormatDuration currently casts milliseconds to int and
does not guard against negative or non-finite inputs and potential overflow;
update FormatDuration to first check std::isfinite(ms) and clamp non-finite or
negative values to 0, convert ms to a 64-bit integer (e.g., int64_t total_ms =
static_cast<int64_t>(ms)) before doing arithmetic to avoid narrowing/overflow,
compute seconds from total_ms using 64-bit math, and optionally cap the result
to a sane max (e.g., clamp total seconds to a reasonably large value) before
formatting hours/minutes/seconds with fmt::format so invalid inputs and very
large durations are handled safely.

---

Nitpick comments:
In `@src/Utils/Format.h`:
- Around line 69-77: Update the FormatDuration(double ms) docstring to
explicitly state how fractional milliseconds, negative values, and invalid
inputs are handled: specify whether the function rounds or truncates fractional
milliseconds when computing seconds/minutes/hours, whether negative durations
are clamped to "00:00:00" or prefixed with a "-" sign, and what constitutes an
"invalid" input (e.g., NaN/inf) and how it is handled (e.g., treat as zero or
return an error string). Mention the time range behavior as well (if durations
beyond 24 hours are allowed and how hours are formatted) so callers of
FormatDuration know the exact contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07d97057-b97a-40d7-85f1-66207cb16a41

📥 Commits

Reviewing files that changed from the base of the PR and between 56e2728 and b77e864.

📒 Files selected for processing (10)
  • src/Features/UnifiedWater/WaterCache.cpp
  • src/Menu/AdvancedSettingsRenderer.cpp
  • src/Menu/OverlayRenderer.cpp
  • src/Menu/SettingsTabRenderer.cpp
  • src/ShaderCache.cpp
  • src/ShaderCache.h
  • src/Utils/Format.cpp
  • src/Utils/Format.h
  • src/Utils/WinApi.cpp
  • src/Utils/WinApi.h
✅ Files skipped from review due to trivial changes (1)
  • src/ShaderCache.h
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/Features/UnifiedWater/WaterCache.cpp
  • src/Utils/WinApi.h
  • src/Utils/WinApi.cpp
  • src/ShaderCache.cpp

Comment thread src/Menu/SettingsTabRenderer.cpp Outdated
Comment thread src/Utils/Format.cpp
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/ShaderCache.h (1)

504-510: ⚠️ Potential issue | 🟡 Minor

Thread count variable and pool size are inconsistent.

compilationThreadCount is hardware_concurrency() - 1, but compilationPool is initialized with the full hardware_concurrency(). The comment says "all logical cores minus one at startup for OS headroom," yet the pool itself uses all cores.

If the intent is to use all cores at startup (since management/file-watcher are on dedicated jthreads), then compilationThreadCount should match the pool size. If the intent is to reserve one core, the pool should use compilationThreadCount.

🔧 Option A: Use all cores (align variable with pool)
-	int32_t compilationThreadCount = std::max(static_cast<int32_t>(std::thread::hardware_concurrency()) - 1, 1);
+	int32_t compilationThreadCount = std::max(static_cast<int32_t>(std::thread::hardware_concurrency()), 1);
🔧 Option B: Reserve one core (align pool with variable)
-	BS::thread_pool<> compilationPool{ static_cast<std::size_t>(std::thread::hardware_concurrency()) };
+	BS::thread_pool<> compilationPool{ static_cast<std::size_t>(compilationThreadCount) };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ShaderCache.h` around lines 504 - 510, compilationThreadCount,
backgroundCompilationThreadCount and compilationPool are inconsistent:
compilationThreadCount uses hardware_concurrency()-1 while compilationPool is
constructed with the full hardware_concurrency(); pick one approach and make
them match. Either set compilationPool to use
static_cast<std::size_t>(compilationThreadCount) (to reserve one core) or change
compilationThreadCount to
static_cast<int32_t>(std::thread::hardware_concurrency()) (to use all cores);
ensure types/casts remain consistent and keep backgroundCompilationThreadCount
logic unchanged (it can still use Util::GetPerformanceCoreCount()/2).
🧹 Nitpick comments (1)
src/ShaderCache.h (1)

289-289: Consider using std::atomic<uint32_t> for heavyTasksInFlight.

Using int allows negative values if there's a bug in increment/decrement pairing. An unsigned type would make such bugs more apparent (wrap-around) and semantically reflects that task counts cannot be negative.

♻️ Suggested change
-	std::atomic<int> heavyTasksInFlight = 0;            // number of dispatched heavy (>= kHeavyPriorityThreshold) tasks still running
+	std::atomic<uint32_t> heavyTasksInFlight = 0;       // number of dispatched heavy (>= kHeavyPriorityThreshold) tasks still running
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ShaderCache.h` at line 289, Change the atomic type for the task counter
to an unsigned integral to better express non-negativity and catch decrement
bugs: replace std::atomic<int> heavyTasksInFlight with std::atomic<uint32_t> (or
std::atomic<std::uint32_t>) and update any code that reads, increments, or
decrements heavyTasksInFlight (operators, comparisons, initializers) to use the
unsigned type consistently, ensuring any signed-to-unsigned conversions are
handled where necessary (e.g., comparisons against 0 or casting when interfacing
with APIs expecting signed ints).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ShaderCache.h`:
- Around line 437-443: ProcessCompilationSet currently calls task.Perform()
(which invokes compile helpers like MakeAndAddVertexShader) without guarding
against exceptions, so a Pending entry from ClaimCompilation can remain
unresolved and block waiting threads; modify ProcessCompilationSet (around the
task.Perform() call) to wrap the compilation in a try/catch or use a RAII guard
that on exit always calls AddCompletedShader with a failed result and then
mapCV.notify_all(), ensuring any Pending entry is transitioned to Failed and
waiting ClaimCompilation callers are awakened even if compilation throws. Ensure
the guard or catch references the same Pending key used by ClaimCompilation and
calls mapCV.notify_all() after AddCompletedShader to maintain the existing
wake-up semantics.

---

Duplicate comments:
In `@src/ShaderCache.h`:
- Around line 504-510: compilationThreadCount, backgroundCompilationThreadCount
and compilationPool are inconsistent: compilationThreadCount uses
hardware_concurrency()-1 while compilationPool is constructed with the full
hardware_concurrency(); pick one approach and make them match. Either set
compilationPool to use static_cast<std::size_t>(compilationThreadCount) (to
reserve one core) or change compilationThreadCount to
static_cast<int32_t>(std::thread::hardware_concurrency()) (to use all cores);
ensure types/casts remain consistent and keep backgroundCompilationThreadCount
logic unchanged (it can still use Util::GetPerformanceCoreCount()/2).

---

Nitpick comments:
In `@src/ShaderCache.h`:
- Line 289: Change the atomic type for the task counter to an unsigned integral
to better express non-negativity and catch decrement bugs: replace
std::atomic<int> heavyTasksInFlight with std::atomic<uint32_t> (or
std::atomic<std::uint32_t>) and update any code that reads, increments, or
decrements heavyTasksInFlight (operators, comparisons, initializers) to use the
unsigned type consistently, ensuring any signed-to-unsigned conversions are
handled where necessary (e.g., comparisons against 0 or casting when interfacing
with APIs expecting signed ints).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 654fd53a-dc77-41c8-a266-a0aacfc7883b

📥 Commits

Reviewing files that changed from the base of the PR and between b77e864 and db1d452.

📒 Files selected for processing (4)
  • src/Menu/SettingsTabRenderer.cpp
  • src/ShaderCache.h
  • src/Utils/Format.cpp
  • src/Utils/Format.h
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/Utils/Format.cpp
  • src/Menu/SettingsTabRenderer.cpp

Comment thread src/ShaderCache.h
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ShaderCache.cpp (1)

3077-3112: ⚠️ Potential issue | 🟠 Major

The heavy-task cap is never enforced.

WaitTake() always dispatches the highest-priority task whenever the pool has room, and heavyTasksInFlight is only updated for telemetry. Once the heavy count reaches the intended P-core limit, this still picks another heavy task instead of falling back to a light one.

Based on learnings, "Flag potential problems proactively including performance impact on rendering, SE/AE/VR runtime compatibility issues, GPU register conflicts, and security risks from malformed configurations".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ShaderCache.cpp` around lines 3077 - 3112, WaitTake() currently always
picks the highest-priority task and only updates heavyTasksInFlight for
telemetry, so heavy tasks can exceed the intended P-core limit; modify the
selection logic (in WaitTake()/the block that reads availableTasks and uses
bestIt) to enforce the heavy-task cap: compute the current heavy limit (use
shaderCache->backgroundCompilation ?
shaderCache->backgroundCompilationThreadCount :
shaderCache->compilationThreadCount), and if the candidate task's GetPriority()
>= kHeavyPriorityThreshold and heavyTasksInFlight.load(...) >= heavyLimit, scan
availableTasks from highest priority down to find the first non-heavy task (or
return std::nullopt if none); only after choosing a heavy task increment
heavyTasksInFlight (fetch_add) while still under the same lock and before
inserting into tasksInProgress, so the cap is respected and telemetry remains
accurate.
♻️ Duplicate comments (4)
src/ShaderCache.cpp (3)

1347-1355: ⚠️ Potential issue | 🟠 Major

Resolve Pending inside CompileShader().

ClaimCompilation() marks the key Pending before any filesystem/D3D work. The async path catches and calls ResolvePendingFailure(), but synchronous MakeAndAdd*Shader() callers do not, so an exception out of this function can leave future callers blocked on mapCV forever.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ShaderCache.cpp` around lines 1347 - 1355, CompileShader() calls
cache.ClaimCompilation(key) which sets the slot to Pending but does not call
ResolvePendingFailure() on synchronous exceptions, leaving threads blocked on
mapCV; modify CompileShader() to wrap the work done after claiming (the calls to
MakeAndAdd*Shader() / filesystem/D3D operations) in a try/catch that on any
exception calls cache.ResolvePendingFailure(key, exception) (or the equivalent
resolve/fail method) and then rethrows, and ensure the catch also signals
waiting threads via mapCV so Pending is cleared; this mirrors the async path
behavior and prevents permanent Pending entries.

1932-1944: ⚠️ Potential issue | 🔴 Critical

Stop/join background work before tearing down the cache.

Clear() runs before the file watcher, management thread, and pool workers are stopped, so in-flight callbacks can still re-enter ShaderCache while its state is being torn down. If wait_for() then times out, TerminateThread(managementHandle, 0) still does not address the worker threads you were actually waiting on.

As per coding guidelines, "Include proper resource management and graceful degradation for DirectX 11 resources and user input validation to prevent crashes from malformed configurations".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ShaderCache.cpp` around lines 1932 - 1944, The teardown order is unsafe:
Clear() runs while file-watcher, managementJthread and compilationPool workers
may still invoke back into ShaderCache, and the code uses
TerminateThread(managementHandle) which doesn't stop worker threads; change the
shutdown sequence to first stop the file watcher (call StopFileWatcher()),
request_stop() on managementJthread and signal/stop the compilationPool workers,
then purge unstarted tasks and wait/join the managementJthread and worker
threads via safe joins/waits (use managementJthread.join() or equivalent and
compilationPool.wait_for() with a proper stop token), removing any
TerminateThread usage and ensuring Clear() runs last after all background work
has cleanly stopped.

2413-2418: ⚠️ Potential issue | 🔴 Critical

Shut the watcher down before deleting listener.

This thread only exits after UseFileWatcher() flips, but StopFileWatcher() then deletes listener directly. Because src/ShaderCache.h declares UpdateListener::fileWatcherThread before deps, actionMutex, and queue, delete listener destroys the queue/mutex before the std::jthread is joined; fileWatcher is also nulled without ever being deleted.

As per coding guidelines, "Include proper resource management and graceful degradation for DirectX 11 resources and user input validation to prevent crashes from malformed configurations".

Also applies to: 2427-2437

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ShaderCache.cpp` around lines 2413 - 2418, The spawned file-watcher
jthread can outlive the UpdateListener object because StopFileWatcher() deletes
listener while fileWatcherThread (UpdateListener::fileWatcherThread) may still
be running; fix by ensuring the watcher thread is stopped/joined before deleting
or nulling listener. Modify the code that creates the thread (where
capturedListener->fileWatcherThread = std::jthread(...)) and
StopFileWatcher()/UseFileWatcher() so you either: (a) use a owning shared_ptr
for listener and capture that shared_ptr into the thread lambda to extend
lifetime until processQueue() finishes, or (b) explicitly signal the stop
condition, notify the queue, then call fileWatcherThread.join() / ensure jthread
is joined (or request_stop and wait) inside StopFileWatcher() before deleting
listener and before deleting/clearing deps, actionMutex, and queue; also ensure
fileWatcher (the watcher pointer) is deleted only after the thread has joined.
Ensure references to UpdateListener::processQueue(),
UpdateListener::fileWatcherThread, listener, deps, actionMutex, and queue are
handled in that order to prevent use-after-free.
src/ShaderCache.h (1)

538-543: ⚠️ Potential issue | 🟠 Major

Startup still leaves one worker unused.

compilationThreadCount still starts at hardware_concurrency() - 1, so this does not match the new “all logical cores at startup” policy. Since compilationPool is also constructed from that initial value, increasing the slider later cannot recover that missing startup worker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ShaderCache.h` around lines 538 - 543, The startup worker count is wrong:
change compilationThreadCount to use all logical cores (remove the "- 1") so it
matches the "all logical cores at startup" policy and ensure it still clamps to
at least 1; update the construction of BS::thread_pool<> compilationPool to use
that corrected compilationThreadCount; optionally update or remove the comment
about reserving a core for the OS to reflect the new policy. Reference symbols:
compilationThreadCount, compilationPool, std::thread::hardware_concurrency(),
and Util::GetPerformanceCoreCount().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/ShaderCache.cpp`:
- Around line 3077-3112: WaitTake() currently always picks the highest-priority
task and only updates heavyTasksInFlight for telemetry, so heavy tasks can
exceed the intended P-core limit; modify the selection logic (in WaitTake()/the
block that reads availableTasks and uses bestIt) to enforce the heavy-task cap:
compute the current heavy limit (use shaderCache->backgroundCompilation ?
shaderCache->backgroundCompilationThreadCount :
shaderCache->compilationThreadCount), and if the candidate task's GetPriority()
>= kHeavyPriorityThreshold and heavyTasksInFlight.load(...) >= heavyLimit, scan
availableTasks from highest priority down to find the first non-heavy task (or
return std::nullopt if none); only after choosing a heavy task increment
heavyTasksInFlight (fetch_add) while still under the same lock and before
inserting into tasksInProgress, so the cap is respected and telemetry remains
accurate.

---

Duplicate comments:
In `@src/ShaderCache.cpp`:
- Around line 1347-1355: CompileShader() calls cache.ClaimCompilation(key) which
sets the slot to Pending but does not call ResolvePendingFailure() on
synchronous exceptions, leaving threads blocked on mapCV; modify CompileShader()
to wrap the work done after claiming (the calls to MakeAndAdd*Shader() /
filesystem/D3D operations) in a try/catch that on any exception calls
cache.ResolvePendingFailure(key, exception) (or the equivalent resolve/fail
method) and then rethrows, and ensure the catch also signals waiting threads via
mapCV so Pending is cleared; this mirrors the async path behavior and prevents
permanent Pending entries.
- Around line 1932-1944: The teardown order is unsafe: Clear() runs while
file-watcher, managementJthread and compilationPool workers may still invoke
back into ShaderCache, and the code uses TerminateThread(managementHandle) which
doesn't stop worker threads; change the shutdown sequence to first stop the file
watcher (call StopFileWatcher()), request_stop() on managementJthread and
signal/stop the compilationPool workers, then purge unstarted tasks and
wait/join the managementJthread and worker threads via safe joins/waits (use
managementJthread.join() or equivalent and compilationPool.wait_for() with a
proper stop token), removing any TerminateThread usage and ensuring Clear() runs
last after all background work has cleanly stopped.
- Around line 2413-2418: The spawned file-watcher jthread can outlive the
UpdateListener object because StopFileWatcher() deletes listener while
fileWatcherThread (UpdateListener::fileWatcherThread) may still be running; fix
by ensuring the watcher thread is stopped/joined before deleting or nulling
listener. Modify the code that creates the thread (where
capturedListener->fileWatcherThread = std::jthread(...)) and
StopFileWatcher()/UseFileWatcher() so you either: (a) use a owning shared_ptr
for listener and capture that shared_ptr into the thread lambda to extend
lifetime until processQueue() finishes, or (b) explicitly signal the stop
condition, notify the queue, then call fileWatcherThread.join() / ensure jthread
is joined (or request_stop and wait) inside StopFileWatcher() before deleting
listener and before deleting/clearing deps, actionMutex, and queue; also ensure
fileWatcher (the watcher pointer) is deleted only after the thread has joined.
Ensure references to UpdateListener::processQueue(),
UpdateListener::fileWatcherThread, listener, deps, actionMutex, and queue are
handled in that order to prevent use-after-free.

In `@src/ShaderCache.h`:
- Around line 538-543: The startup worker count is wrong: change
compilationThreadCount to use all logical cores (remove the "- 1") so it matches
the "all logical cores at startup" policy and ensure it still clamps to at least
1; update the construction of BS::thread_pool<> compilationPool to use that
corrected compilationThreadCount; optionally update or remove the comment about
reserving a core for the OS to reflect the new policy. Reference symbols:
compilationThreadCount, compilationPool, std::thread::hardware_concurrency(),
and Util::GetPerformanceCoreCount().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d8403983-3619-4d3c-bf43-56d41cebef55

📥 Commits

Reviewing files that changed from the base of the PR and between 0943ebc and b126603.

📒 Files selected for processing (3)
  • src/Menu/AdvancedSettingsRenderer.cpp
  • src/ShaderCache.cpp
  • src/ShaderCache.h

@alandtse alandtse merged commit 72143ae into community-shaders:dev Mar 31, 2026
17 checks passed
alandtse added a commit that referenced this pull request Mar 31, 2026
ParticleTroned pushed a commit to ParticleTroned/skyrim-community-shaders that referenced this pull request May 2, 2026
…-shaders#2021)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 72143ae)
ParticleTroned pushed a commit to ParticleTroned/skyrim-community-shaders that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants