perf(shader-cache): hybrid-CPU scheduling and LPT dispatch#2021
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaced hardware-concurrency sizing with a perf-core query; added Util::GetPerformanceCoreCount and FormatDuration; implemented per-key claim/wait shader compilation with priority-weighted scheduling, heavy-task throttling, dedicated management/file-watcher threads, slow-task instrumentation, and developer UI/overlay updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Compile caller
participant Cache as ShaderCache
participant Pool as Compilation Pool
participant Mgmt as Management Thread
participant FW as FileWatcher Thread
Caller->>Cache: CompileShader(key)
Cache->>Cache: ClaimCompilation(key)
alt CacheHit
Cache-->>Caller: (CacheHit, blob)
else AlreadyPending
Cache-->>Caller: wait on mapCV until Completed/Failed
else Claimed
Cache-->>Caller: Claimed (will compile)
Caller->>Pool: Submit ShaderCompilationTask(priority)
par Pool compiles
Pool->>Pool: ComputePriority, set thread priority
Pool->>Pool: Execute task, measure wall-time
Pool->>Cache: Record Completed/Failed (blob/nullptr)
Cache->>Cache: Update slow/very-slow counters & weights
Cache->>Cache: mapCV.notify_all()
and Mgmt enforces limits
Mgmt->>Pool: Enforce heavyTasksInFlight limit (perf-core based)
and FW processes file changes
FW->>Cache: enqueue file-watcher events
end
Cache-->>Caller: Unblocked on mapCV signal
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No actionable suggestions for changed features. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ShaderCache.cpp`:
- Around line 2354-2358: The destructor’s timeout fallback escalates the wrong
thread: after moving ManageCompilationSet onto managementJthread, update the
timeout recovery path to target the managementJthread (and the
ManageCompilationSet stop/join logic) instead of the old managementThread/worker
escalation; specifically, when waiting on the compilation pool times out, send
the stop request and join/interrupt the managementJthread (or invoke its stop
token/notify path tied to ManageCompilationSet) ensuring you check for
managementJthread's existence and join/cleanup it rather than trying to escalate
pool worker threads.
- Around line 2392-2394: The new watcher thread captures ShaderCache::listener
by reference causing a race where ShaderCache::listener can be nulled before the
thread runs; update the jthread creation to capture the concrete
watcher/listener instance by value (copy or shared_ptr) so the thread holds its
own lifetime, and modify StopFileWatcher() to explicitly request stop and join
the listener->fileWatcherThread (or call .join()/.request_stop() on the jthread)
before deleting or nulling any watcher/listener objects; ensure processQueue()
and the watcher destructor are safe to run after request_stop and only then
reset raw pointers to nullptr to avoid null-deref/lifetime bugs.
In `@src/ShaderCache.h`:
- Around line 483-488: The code still subtracts 1 from
std::thread::hardware_concurrency() so startup doesn’t actually use all logical
cores; change compilationThreadCount to use the full logical core count (remove
the "- 1") or update the comment/UI if you intend to keep reserving a core, and
initialize BS::thread_pool compilationPool with that compilationThreadCount (use
compilationThreadCount as the pool size instead of calling
hardware_concurrency() again) so the pool size matches the intended startup
behavior; ensure references are to compilationThreadCount,
backgroundCompilationThreadCount, and compilationPool.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 15b0e8ee-50b0-4bc2-89d1-34ec8878cf8c
📒 Files selected for processing (8)
src/Features/UnifiedWater/WaterCache.cppsrc/Menu/AdvancedSettingsRenderer.cppsrc/Menu/OverlayRenderer.cppsrc/Menu/SettingsTabRenderer.cppsrc/ShaderCache.cppsrc/ShaderCache.hsrc/Utils/WinApi.cppsrc/Utils/WinApi.h
|
✅ A pre-release build is available for this PR: |
Detect P-core count via GetLogicalProcessorInformationEx so thread limits are meaningful on Intel hybrid CPUs (Alder/Raptor Lake). Result is cached — topology never changes at runtime. Thread pool sizing: - Startup: all logical cores (E-cores help; no game loop running) - Background (in-game): half of P-core count to spare the render thread - Management and file-watcher moved to dedicated std::jthread so they no longer consume pool slots or inflate get_tasks_running() Compilation dispatch (WaitTake): - Tasks are assigned a static priority score (LPT: Longest Processing Time first) based on shader type, class, descriptor popcount, and known-heavy techniques (MTLand, TruePBR, AnisoLighting, Parallax) - Heavy tasks (score >= 500) are limited to P-core count concurrent slots on hybrid CPUs, leaving E-cores for lighter permutations - Reduces tail latency caused by heavy shaders landing on slow cores Compilation correctness: - ClaimCompilation atomically marks a shader Pending before releasing mapMutex, preventing redundant parallel compilation of the same key - Threads waiting on a Pending entry block on mapCV and resume when the compiling thread calls AddCompletedShader Observability: - ProcessCompilationSet records wall-clock time per task; logs slow (>=2s) at debug and very slow (>=8s) at info with descriptor complexity and source file size as straggler proxies - ETA now uses priority-weighted progress fraction over wall-clock elapsed time, which is more accurate under parallel workloads - Dev overlay shows active threads, heavy-task concurrency, and running slow/very-slow counts - Settings tab adds a stacked bar breaking down the last build by deduplicated, fast, slow, very slow, and failed shaders Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/Utils/Format.h (1)
69-77: Clarify duration formatting contract in docstringConsider documenting how fractional, negative, and invalid inputs are handled (truncate vs round; clamp behavior). It will prevent API ambiguity for call sites.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Utils/Format.h` around lines 69 - 77, Update the FormatDuration(double ms) docstring to explicitly state how fractional milliseconds, negative values, and invalid inputs are handled: specify whether the function rounds or truncates fractional milliseconds when computing seconds/minutes/hours, whether negative durations are clamped to "00:00:00" or prefixed with a "-" sign, and what constitutes an "invalid" input (e.g., NaN/inf) and how it is handled (e.g., treat as zero or return an error string). Mention the time range behavior as well (if durations beyond 24 hours are allowed and how hours are formatted) so callers of FormatDuration know the exact contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Menu/SettingsTabRenderer.cpp`:
- Around line 243-246: The compiled bucket calculation wrongly subtracts
cacheHits from completed (variables compiled, completed, cacheHits) even though
cache hits are mutually exclusive with queued/compiled tasks; change the
calculation to use completed directly (uint64_t compiled = completed) so
compiled represents all completed compilations, and keep the downstream bucket
math (fast = compiled > slow ? compiled - slow : 0, medium = slow > verySlow ?
slow - verySlow : 0) unchanged; update any comment explaining "Fast = compiled
successfully..." to reflect that compiled already excludes cache hits.
In `@src/Utils/Format.cpp`:
- Around line 161-168: FormatDuration currently casts milliseconds to int and
does not guard against negative or non-finite inputs and potential overflow;
update FormatDuration to first check std::isfinite(ms) and clamp non-finite or
negative values to 0, convert ms to a 64-bit integer (e.g., int64_t total_ms =
static_cast<int64_t>(ms)) before doing arithmetic to avoid narrowing/overflow,
compute seconds from total_ms using 64-bit math, and optionally cap the result
to a sane max (e.g., clamp total seconds to a reasonably large value) before
formatting hours/minutes/seconds with fmt::format so invalid inputs and very
large durations are handled safely.
---
Nitpick comments:
In `@src/Utils/Format.h`:
- Around line 69-77: Update the FormatDuration(double ms) docstring to
explicitly state how fractional milliseconds, negative values, and invalid
inputs are handled: specify whether the function rounds or truncates fractional
milliseconds when computing seconds/minutes/hours, whether negative durations
are clamped to "00:00:00" or prefixed with a "-" sign, and what constitutes an
"invalid" input (e.g., NaN/inf) and how it is handled (e.g., treat as zero or
return an error string). Mention the time range behavior as well (if durations
beyond 24 hours are allowed and how hours are formatted) so callers of
FormatDuration know the exact contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 07d97057-b97a-40d7-85f1-66207cb16a41
📒 Files selected for processing (10)
src/Features/UnifiedWater/WaterCache.cppsrc/Menu/AdvancedSettingsRenderer.cppsrc/Menu/OverlayRenderer.cppsrc/Menu/SettingsTabRenderer.cppsrc/ShaderCache.cppsrc/ShaderCache.hsrc/Utils/Format.cppsrc/Utils/Format.hsrc/Utils/WinApi.cppsrc/Utils/WinApi.h
✅ Files skipped from review due to trivial changes (1)
- src/ShaderCache.h
🚧 Files skipped from review as they are similar to previous changes (4)
- src/Features/UnifiedWater/WaterCache.cpp
- src/Utils/WinApi.h
- src/Utils/WinApi.cpp
- src/ShaderCache.cpp
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/ShaderCache.h (1)
504-510:⚠️ Potential issue | 🟡 MinorThread count variable and pool size are inconsistent.
compilationThreadCountishardware_concurrency() - 1, butcompilationPoolis initialized with the fullhardware_concurrency(). The comment says "all logical cores minus one at startup for OS headroom," yet the pool itself uses all cores.If the intent is to use all cores at startup (since management/file-watcher are on dedicated jthreads), then
compilationThreadCountshould match the pool size. If the intent is to reserve one core, the pool should usecompilationThreadCount.🔧 Option A: Use all cores (align variable with pool)
- int32_t compilationThreadCount = std::max(static_cast<int32_t>(std::thread::hardware_concurrency()) - 1, 1); + int32_t compilationThreadCount = std::max(static_cast<int32_t>(std::thread::hardware_concurrency()), 1);🔧 Option B: Reserve one core (align pool with variable)
- BS::thread_pool<> compilationPool{ static_cast<std::size_t>(std::thread::hardware_concurrency()) }; + BS::thread_pool<> compilationPool{ static_cast<std::size_t>(compilationThreadCount) };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ShaderCache.h` around lines 504 - 510, compilationThreadCount, backgroundCompilationThreadCount and compilationPool are inconsistent: compilationThreadCount uses hardware_concurrency()-1 while compilationPool is constructed with the full hardware_concurrency(); pick one approach and make them match. Either set compilationPool to use static_cast<std::size_t>(compilationThreadCount) (to reserve one core) or change compilationThreadCount to static_cast<int32_t>(std::thread::hardware_concurrency()) (to use all cores); ensure types/casts remain consistent and keep backgroundCompilationThreadCount logic unchanged (it can still use Util::GetPerformanceCoreCount()/2).
🧹 Nitpick comments (1)
src/ShaderCache.h (1)
289-289: Consider usingstd::atomic<uint32_t>forheavyTasksInFlight.Using
intallows negative values if there's a bug in increment/decrement pairing. An unsigned type would make such bugs more apparent (wrap-around) and semantically reflects that task counts cannot be negative.♻️ Suggested change
- std::atomic<int> heavyTasksInFlight = 0; // number of dispatched heavy (>= kHeavyPriorityThreshold) tasks still running + std::atomic<uint32_t> heavyTasksInFlight = 0; // number of dispatched heavy (>= kHeavyPriorityThreshold) tasks still running🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ShaderCache.h` at line 289, Change the atomic type for the task counter to an unsigned integral to better express non-negativity and catch decrement bugs: replace std::atomic<int> heavyTasksInFlight with std::atomic<uint32_t> (or std::atomic<std::uint32_t>) and update any code that reads, increments, or decrements heavyTasksInFlight (operators, comparisons, initializers) to use the unsigned type consistently, ensuring any signed-to-unsigned conversions are handled where necessary (e.g., comparisons against 0 or casting when interfacing with APIs expecting signed ints).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ShaderCache.h`:
- Around line 437-443: ProcessCompilationSet currently calls task.Perform()
(which invokes compile helpers like MakeAndAddVertexShader) without guarding
against exceptions, so a Pending entry from ClaimCompilation can remain
unresolved and block waiting threads; modify ProcessCompilationSet (around the
task.Perform() call) to wrap the compilation in a try/catch or use a RAII guard
that on exit always calls AddCompletedShader with a failed result and then
mapCV.notify_all(), ensuring any Pending entry is transitioned to Failed and
waiting ClaimCompilation callers are awakened even if compilation throws. Ensure
the guard or catch references the same Pending key used by ClaimCompilation and
calls mapCV.notify_all() after AddCompletedShader to maintain the existing
wake-up semantics.
---
Duplicate comments:
In `@src/ShaderCache.h`:
- Around line 504-510: compilationThreadCount, backgroundCompilationThreadCount
and compilationPool are inconsistent: compilationThreadCount uses
hardware_concurrency()-1 while compilationPool is constructed with the full
hardware_concurrency(); pick one approach and make them match. Either set
compilationPool to use static_cast<std::size_t>(compilationThreadCount) (to
reserve one core) or change compilationThreadCount to
static_cast<int32_t>(std::thread::hardware_concurrency()) (to use all cores);
ensure types/casts remain consistent and keep backgroundCompilationThreadCount
logic unchanged (it can still use Util::GetPerformanceCoreCount()/2).
---
Nitpick comments:
In `@src/ShaderCache.h`:
- Line 289: Change the atomic type for the task counter to an unsigned integral
to better express non-negativity and catch decrement bugs: replace
std::atomic<int> heavyTasksInFlight with std::atomic<uint32_t> (or
std::atomic<std::uint32_t>) and update any code that reads, increments, or
decrements heavyTasksInFlight (operators, comparisons, initializers) to use the
unsigned type consistently, ensuring any signed-to-unsigned conversions are
handled where necessary (e.g., comparisons against 0 or casting when interfacing
with APIs expecting signed ints).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 654fd53a-dc77-41c8-a266-a0aacfc7883b
📒 Files selected for processing (4)
src/Menu/SettingsTabRenderer.cppsrc/ShaderCache.hsrc/Utils/Format.cppsrc/Utils/Format.h
🚧 Files skipped from review as they are similar to previous changes (2)
- src/Utils/Format.cpp
- src/Menu/SettingsTabRenderer.cpp
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ShaderCache.cpp (1)
3077-3112:⚠️ Potential issue | 🟠 MajorThe heavy-task cap is never enforced.
WaitTake()always dispatches the highest-priority task whenever the pool has room, andheavyTasksInFlightis only updated for telemetry. Once the heavy count reaches the intended P-core limit, this still picks another heavy task instead of falling back to a light one.Based on learnings, "Flag potential problems proactively including performance impact on rendering, SE/AE/VR runtime compatibility issues, GPU register conflicts, and security risks from malformed configurations".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ShaderCache.cpp` around lines 3077 - 3112, WaitTake() currently always picks the highest-priority task and only updates heavyTasksInFlight for telemetry, so heavy tasks can exceed the intended P-core limit; modify the selection logic (in WaitTake()/the block that reads availableTasks and uses bestIt) to enforce the heavy-task cap: compute the current heavy limit (use shaderCache->backgroundCompilation ? shaderCache->backgroundCompilationThreadCount : shaderCache->compilationThreadCount), and if the candidate task's GetPriority() >= kHeavyPriorityThreshold and heavyTasksInFlight.load(...) >= heavyLimit, scan availableTasks from highest priority down to find the first non-heavy task (or return std::nullopt if none); only after choosing a heavy task increment heavyTasksInFlight (fetch_add) while still under the same lock and before inserting into tasksInProgress, so the cap is respected and telemetry remains accurate.
♻️ Duplicate comments (4)
src/ShaderCache.cpp (3)
1347-1355:⚠️ Potential issue | 🟠 MajorResolve
PendinginsideCompileShader().
ClaimCompilation()marks the keyPendingbefore any filesystem/D3D work. The async path catches and callsResolvePendingFailure(), but synchronousMakeAndAdd*Shader()callers do not, so an exception out of this function can leave future callers blocked onmapCVforever.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ShaderCache.cpp` around lines 1347 - 1355, CompileShader() calls cache.ClaimCompilation(key) which sets the slot to Pending but does not call ResolvePendingFailure() on synchronous exceptions, leaving threads blocked on mapCV; modify CompileShader() to wrap the work done after claiming (the calls to MakeAndAdd*Shader() / filesystem/D3D operations) in a try/catch that on any exception calls cache.ResolvePendingFailure(key, exception) (or the equivalent resolve/fail method) and then rethrows, and ensure the catch also signals waiting threads via mapCV so Pending is cleared; this mirrors the async path behavior and prevents permanent Pending entries.
1932-1944:⚠️ Potential issue | 🔴 CriticalStop/join background work before tearing down the cache.
Clear()runs before the file watcher, management thread, and pool workers are stopped, so in-flight callbacks can still re-enterShaderCachewhile its state is being torn down. Ifwait_for()then times out,TerminateThread(managementHandle, 0)still does not address the worker threads you were actually waiting on.As per coding guidelines, "Include proper resource management and graceful degradation for DirectX 11 resources and user input validation to prevent crashes from malformed configurations".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ShaderCache.cpp` around lines 1932 - 1944, The teardown order is unsafe: Clear() runs while file-watcher, managementJthread and compilationPool workers may still invoke back into ShaderCache, and the code uses TerminateThread(managementHandle) which doesn't stop worker threads; change the shutdown sequence to first stop the file watcher (call StopFileWatcher()), request_stop() on managementJthread and signal/stop the compilationPool workers, then purge unstarted tasks and wait/join the managementJthread and worker threads via safe joins/waits (use managementJthread.join() or equivalent and compilationPool.wait_for() with a proper stop token), removing any TerminateThread usage and ensuring Clear() runs last after all background work has cleanly stopped.
2413-2418:⚠️ Potential issue | 🔴 CriticalShut the watcher down before deleting
listener.This thread only exits after
UseFileWatcher()flips, butStopFileWatcher()then deleteslistenerdirectly. Becausesrc/ShaderCache.hdeclaresUpdateListener::fileWatcherThreadbeforedeps,actionMutex, andqueue,delete listenerdestroys the queue/mutex before thestd::jthreadis joined;fileWatcheris also nulled without ever being deleted.As per coding guidelines, "Include proper resource management and graceful degradation for DirectX 11 resources and user input validation to prevent crashes from malformed configurations".
Also applies to: 2427-2437
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ShaderCache.cpp` around lines 2413 - 2418, The spawned file-watcher jthread can outlive the UpdateListener object because StopFileWatcher() deletes listener while fileWatcherThread (UpdateListener::fileWatcherThread) may still be running; fix by ensuring the watcher thread is stopped/joined before deleting or nulling listener. Modify the code that creates the thread (where capturedListener->fileWatcherThread = std::jthread(...)) and StopFileWatcher()/UseFileWatcher() so you either: (a) use a owning shared_ptr for listener and capture that shared_ptr into the thread lambda to extend lifetime until processQueue() finishes, or (b) explicitly signal the stop condition, notify the queue, then call fileWatcherThread.join() / ensure jthread is joined (or request_stop and wait) inside StopFileWatcher() before deleting listener and before deleting/clearing deps, actionMutex, and queue; also ensure fileWatcher (the watcher pointer) is deleted only after the thread has joined. Ensure references to UpdateListener::processQueue(), UpdateListener::fileWatcherThread, listener, deps, actionMutex, and queue are handled in that order to prevent use-after-free.src/ShaderCache.h (1)
538-543:⚠️ Potential issue | 🟠 MajorStartup still leaves one worker unused.
compilationThreadCountstill starts athardware_concurrency() - 1, so this does not match the new “all logical cores at startup” policy. SincecompilationPoolis also constructed from that initial value, increasing the slider later cannot recover that missing startup worker.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ShaderCache.h` around lines 538 - 543, The startup worker count is wrong: change compilationThreadCount to use all logical cores (remove the "- 1") so it matches the "all logical cores at startup" policy and ensure it still clamps to at least 1; update the construction of BS::thread_pool<> compilationPool to use that corrected compilationThreadCount; optionally update or remove the comment about reserving a core for the OS to reflect the new policy. Reference symbols: compilationThreadCount, compilationPool, std::thread::hardware_concurrency(), and Util::GetPerformanceCoreCount().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/ShaderCache.cpp`:
- Around line 3077-3112: WaitTake() currently always picks the highest-priority
task and only updates heavyTasksInFlight for telemetry, so heavy tasks can
exceed the intended P-core limit; modify the selection logic (in WaitTake()/the
block that reads availableTasks and uses bestIt) to enforce the heavy-task cap:
compute the current heavy limit (use shaderCache->backgroundCompilation ?
shaderCache->backgroundCompilationThreadCount :
shaderCache->compilationThreadCount), and if the candidate task's GetPriority()
>= kHeavyPriorityThreshold and heavyTasksInFlight.load(...) >= heavyLimit, scan
availableTasks from highest priority down to find the first non-heavy task (or
return std::nullopt if none); only after choosing a heavy task increment
heavyTasksInFlight (fetch_add) while still under the same lock and before
inserting into tasksInProgress, so the cap is respected and telemetry remains
accurate.
---
Duplicate comments:
In `@src/ShaderCache.cpp`:
- Around line 1347-1355: CompileShader() calls cache.ClaimCompilation(key) which
sets the slot to Pending but does not call ResolvePendingFailure() on
synchronous exceptions, leaving threads blocked on mapCV; modify CompileShader()
to wrap the work done after claiming (the calls to MakeAndAdd*Shader() /
filesystem/D3D operations) in a try/catch that on any exception calls
cache.ResolvePendingFailure(key, exception) (or the equivalent resolve/fail
method) and then rethrows, and ensure the catch also signals waiting threads via
mapCV so Pending is cleared; this mirrors the async path behavior and prevents
permanent Pending entries.
- Around line 1932-1944: The teardown order is unsafe: Clear() runs while
file-watcher, managementJthread and compilationPool workers may still invoke
back into ShaderCache, and the code uses TerminateThread(managementHandle) which
doesn't stop worker threads; change the shutdown sequence to first stop the file
watcher (call StopFileWatcher()), request_stop() on managementJthread and
signal/stop the compilationPool workers, then purge unstarted tasks and
wait/join the managementJthread and worker threads via safe joins/waits (use
managementJthread.join() or equivalent and compilationPool.wait_for() with a
proper stop token), removing any TerminateThread usage and ensuring Clear() runs
last after all background work has cleanly stopped.
- Around line 2413-2418: The spawned file-watcher jthread can outlive the
UpdateListener object because StopFileWatcher() deletes listener while
fileWatcherThread (UpdateListener::fileWatcherThread) may still be running; fix
by ensuring the watcher thread is stopped/joined before deleting or nulling
listener. Modify the code that creates the thread (where
capturedListener->fileWatcherThread = std::jthread(...)) and
StopFileWatcher()/UseFileWatcher() so you either: (a) use a owning shared_ptr
for listener and capture that shared_ptr into the thread lambda to extend
lifetime until processQueue() finishes, or (b) explicitly signal the stop
condition, notify the queue, then call fileWatcherThread.join() / ensure jthread
is joined (or request_stop and wait) inside StopFileWatcher() before deleting
listener and before deleting/clearing deps, actionMutex, and queue; also ensure
fileWatcher (the watcher pointer) is deleted only after the thread has joined.
Ensure references to UpdateListener::processQueue(),
UpdateListener::fileWatcherThread, listener, deps, actionMutex, and queue are
handled in that order to prevent use-after-free.
In `@src/ShaderCache.h`:
- Around line 538-543: The startup worker count is wrong: change
compilationThreadCount to use all logical cores (remove the "- 1") so it matches
the "all logical cores at startup" policy and ensure it still clamps to at least
1; update the construction of BS::thread_pool<> compilationPool to use that
corrected compilationThreadCount; optionally update or remove the comment about
reserving a core for the OS to reflect the new policy. Reference symbols:
compilationThreadCount, compilationPool, std::thread::hardware_concurrency(),
and Util::GetPerformanceCoreCount().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d8403983-3619-4d3c-bf43-56d41cebef55
📒 Files selected for processing (3)
src/Menu/AdvancedSettingsRenderer.cppsrc/ShaderCache.cppsrc/ShaderCache.h
…-shaders#2021) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 72143ae)
closes community-shaders#2021 (cherry picked from commit 15df3df)
Detect P-core count via GetLogicalProcessorInformationEx so thread limits are meaningful on Intel hybrid CPUs (Alder/Raptor Lake). Result is cached — topology never changes at runtime.
Thread pool sizing:
Compilation dispatch (WaitTake):
Compilation correctness:
Observability:
Summary by CodeRabbit
New Features
Improvements