Conversation
Adds D3D11 timestamp query and QPC-based profiling with LegitProfiler graphs, a Statistics built-in menu tab, and per-feature Profiling sections with GPU/CPU toggle. Instruments all feature compute dispatches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a D3D11 GPU/CPU profiler, wires a global profiler instance, instruments many compute/draw sites with named pass scopes, and adds ImGui visualizations (graphs, tables, per-group/per-pass aggregates, percentiles, and per-feature views). ChangesGPU/CPU Profiling System
Sequence DiagramsequenceDiagram
participant Engine as Engine
participant State as State
participant Profiler as globals::profiler
participant D3D11 as D3D11
participant Feature as Feature GPU Code
participant UI as Profiling UI
Engine->>State: SetupResources()
State->>Profiler: Initialize(device, context)
Profiler->>D3D11: Create timestamp queries\ncompute tick->ms
loop Per frame
Engine->>Profiler: BeginFrame()
Feature->>Profiler: BeginPass("name")
Profiler->>D3D11: Issue GPU timestamp begin
Feature->>D3D11: GPU Dispatch/Draw
Profiler->>D3D11: Issue GPU timestamp end
Feature->>Profiler: EndPass()
Engine->>Profiler: EndFrame()
end
Note over Profiler: After kFrameLatency
Profiler->>D3D11: Read queries
Profiler->>Profiler: Compute GPU/CPU ms, update history
UI->>Profiler: GetResults()
Profiler-->>UI: TimerResult list
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No actionable suggestions for changed features. |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Features/Upscaling/RCAS/RCAS.cpp (1)
48-83:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the profiler/perf-event nesting order.
The profiler pass and state perf event are not properly nested. Currently:
Line 48: profiler->BeginPass Line 49: state->BeginPerfEvent Line 81: profiler->EndPass ← ends before state Line 82: state->EndPerfEventThis creates overlapping scopes rather than proper nesting. Compare with the pattern in
Upscaling.cpplines 1717-1777, wherestate->EndPerfEvent()is called beforeprofiler->EndPass().🔄 Proposed fix
context->CSSetShader(nullptr, nullptr, 0); - globals::profiler->EndPass(); state->EndPerfEvent(); + globals::profiler->EndPass(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Features/Upscaling/RCAS/RCAS.cpp` around lines 48 - 83, The profiler and perf-event scopes are closed in the wrong order; move the perf-event end call before ending the profiler pass so nesting matches BeginPass/BeginPerfEvent order — specifically, call state->EndPerfEvent() prior to globals::profiler->EndPass() at the end of the RCAS sharpening block (matching the pattern used in Upscaling.cpp), leaving all other dispatch and resource unbind calls intact.
🧹 Nitpick comments (1)
src/Features/ScreenSpaceShadows.cpp (1)
198-200: ⚡ Quick winAvoid per-frame
std::formatfor profiler pass names in the render hot path.Line 198 builds a formatted string every eye/frame. Use fixed pass-name literals selected by branch to remove avoidable CPU overhead.
As per coding guidelines, "Flag potential performance impacts on rendering performance when implementing graphics features and suggest user toggles for feature control".
♻️ Proposed refactor
-auto DispatchEye = [&](const char* eyeName, ID3D11ComputeShader* shader, const float* lightProj, - float invTexSizeX, float invTexSizeY) { - std::string timerName = eyeName ? std::format("ScreenSpaceShadows::RayMarch({})", eyeName) : "ScreenSpaceShadows::RayMarch"; - globals::profiler->BeginPass(timerName); +auto DispatchEye = [&](const char* eyeName, const char* profilerPassName, ID3D11ComputeShader* shader, const float* lightProj, + float invTexSizeX, float invTexSizeY) { + globals::profiler->BeginPass(profilerPassName);-DispatchEye(nullptr, GetComputeRaymarch(), lightProjectionF.data(), InvTexSizeX, InvTexSizeY); +DispatchEye(nullptr, "ScreenSpaceShadows::RayMarch", GetComputeRaymarch(), lightProjectionF.data(), InvTexSizeX, InvTexSizeY); ... -DispatchEye("Left Eye", GetComputeRaymarch(), lightProjectionF.data(), InvTexSizeX, InvTexSizeY); +DispatchEye("Left Eye", "ScreenSpaceShadows::RayMarch(LeftEye)", GetComputeRaymarch(), lightProjectionF.data(), InvTexSizeX, InvTexSizeY); ... -DispatchEye("Right Eye", GetComputeRaymarchRight(), lightProjectionRightF.data(), InvTexSizeX, InvTexSizeY); +DispatchEye("Right Eye", "ScreenSpaceShadows::RayMarch(RightEye)", GetComputeRaymarchRight(), lightProjectionRightF.data(), InvTexSizeX, InvTexSizeY);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Features/ScreenSpaceShadows.cpp` around lines 198 - 200, The code currently constructs a per-frame std::format string into timerName using eyeName and passes it to globals::profiler->BeginPass, causing unnecessary allocations in the render hot path; replace this with two fixed string literals (e.g., "ScreenSpaceShadows::RayMarch_Left" and "ScreenSpaceShadows::RayMarch_Right" or a generic "ScreenSpaceShadows::RayMarch" when eyeName is null) and select the appropriate literal by branching on eyeName before calling globals::profiler->BeginPass to eliminate per-frame formatting and allocations while preserving distinct pass names for profiling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/Menu/StatisticsRenderer.cpp`:
- Around line 93-95: The code unconditionally dereferences globals::profiler
(via "auto& profiler = (*globals::profiler)") which can crash if it’s null;
update each render entry point that uses this pattern (the blocks at the diff
and the similar locations around lines 146-147 and 293-295) to first check if
globals::profiler is non-null, and if null either early-return or use a safe
fallback (e.g., skip rendering the profiler UI, set results to an empty/default
value, and log a warning), then only call profiler.GetResults() and evaluate
timingMode/TimmingMode::CPU after the null check so no dereference occurs on a
null pointer.
In `@src/Profiler.cpp`:
- Around line 188-190: activeTimers currently overwrites entries when the same
pass name appears multiple times (activeTimers[timer.name] = { ms, timer.cpuMs
};), dropping earlier samples and underreporting; change the logic in
Profiler.cpp where you populate activeTimers to check for an existing entry for
timer.name and accumulate the values (add ms to the existing elapsed ms and add
timer.cpuMs to the existing cpu ms) rather than replacing them, and continue to
increment activeTotalMs by the ms for each occurrence so total time remains
correct.
- Around line 44-55: The CreateQuery calls inside the frames loop
(device->CreateQuery for frame.disjoint and for each timer.begin/timer.end)
currently ignore HRESULTs and can leave null/invalid ComPtrs; change these to
check the returned HRESULT (use SUCCEEDED/FAILED) for each CreateQuery, log or
mark the frame as profiling-disabled on failure, release any partially-created
queries for that frame, and skip issuing Begin/End/GetData for frames whose
queries failed (or propagate an error state) so later calls don't operate on
null queries; ensure this logic covers creation of frame.disjoint, all
timer.begin/timer.end in the frame.timers array and uses safe guards around
kMaxTimers handling.
In `@src/State.cpp`:
- Around line 763-767: When frameAnnotations is false you must clear the
existing profiler callbacks so perf events stop firing; update the conditional
around globals::profiler->SetPerfEventCallbacks to set the
BeginPerfEvent/EndPerfEvent lambdas when frameAnnotations is true and to
clear/reset (pass no-op lambdas or nullptr equivalents) when false so the
profiler no longer invokes BeginPerfEvent/EndPerfEvent after annotations are
toggled off; refer to globals::profiler->SetPerfEventCallbacks, BeginPerfEvent,
EndPerfEvent, and frameAnnotations to locate the change.
In `@src/Utils/LegitProfiler.h`:
- Around line 7-11: This header is not self-contained: add the missing includes
<algorithm> (for std::sort, std::min, std::max), <cstdint> (for uint32_t) and
<cmath> (for floating-point std::abs overload) to the top of LegitProfiler.h,
and update the abs usage to std::abs to ensure the floating-point overload is
selected; also qualify calls to sort/min/max with std:: where appropriate to
avoid relying on transitive includes.
---
Outside diff comments:
In `@src/Features/Upscaling/RCAS/RCAS.cpp`:
- Around line 48-83: The profiler and perf-event scopes are closed in the wrong
order; move the perf-event end call before ending the profiler pass so nesting
matches BeginPass/BeginPerfEvent order — specifically, call
state->EndPerfEvent() prior to globals::profiler->EndPass() at the end of the
RCAS sharpening block (matching the pattern used in Upscaling.cpp), leaving all
other dispatch and resource unbind calls intact.
---
Nitpick comments:
In `@src/Features/ScreenSpaceShadows.cpp`:
- Around line 198-200: The code currently constructs a per-frame std::format
string into timerName using eyeName and passes it to
globals::profiler->BeginPass, causing unnecessary allocations in the render hot
path; replace this with two fixed string literals (e.g.,
"ScreenSpaceShadows::RayMarch_Left" and "ScreenSpaceShadows::RayMarch_Right" or
a generic "ScreenSpaceShadows::RayMarch" when eyeName is null) and select the
appropriate literal by branching on eyeName before calling
globals::profiler->BeginPass to eliminate per-frame formatting and allocations
while preserving distinct pass names for profiling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ed7f2a83-cd32-4f8f-ae2c-90b7acd8ebc6
📒 Files selected for processing (28)
include/PCH.hsrc/Deferred.cppsrc/Features/DynamicCubemaps.cppsrc/Features/GrassCollision.cppsrc/Features/HDRDisplay.cppsrc/Features/IBL.cppsrc/Features/LightLimitFix.cppsrc/Features/PerformanceOverlay.cppsrc/Features/PerformanceOverlay.hsrc/Features/ScreenSpaceGI.cppsrc/Features/ScreenSpaceShadows.cppsrc/Features/Skylighting.cppsrc/Features/SubsurfaceScattering.cppsrc/Features/TerrainBlending.cppsrc/Features/TerrainShadows.cppsrc/Features/Upscaling.cppsrc/Features/Upscaling/RCAS/RCAS.cppsrc/Features/VRStereoOptimizations.cppsrc/Features/VolumetricShadows.cppsrc/Globals.cppsrc/Globals.hsrc/Menu/FeatureListRenderer.cppsrc/Menu/StatisticsRenderer.cppsrc/Menu/StatisticsRenderer.hsrc/Profiler.cppsrc/Profiler.hsrc/State.cppsrc/Utils/LegitProfiler.h
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/Menu/ProfilingRenderer.cpp (1)
93-95:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
globals::profilerbefore dereference in all render entry points.These paths can crash when profiler is unavailable (startup/reset edge cases).
Suggested fix
void ProfilingRenderer::RenderGraph() { + if (!globals::profiler) + return; auto& profiler = (*globals::profiler); const auto& results = profiler.GetResults(); @@ void ProfilingRenderer::RenderStatistics(bool showTable, bool showModeToggle) { + if (!globals::profiler) { + ImGui::TextDisabled("Profiler unavailable"); + return; + } auto& profiler = (*globals::profiler); @@ void ProfilingRenderer::RenderFeatureTimers(const std::string& featurePrefix) { + if (!globals::profiler) { + ImGui::TextDisabled("Profiler unavailable"); + return; + } auto& profiler = (*globals::profiler);As per coding guidelines, "Include proper resource management and graceful degradation error handling in all DirectX code to prevent crashes from malformed configurations".
Also applies to: 147-148, 295-297
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Menu/ProfilingRenderer.cpp` around lines 93 - 95, Guard against a null globals::profiler before dereferencing it in all render entry points: replace the direct dereference pattern (auto& profiler = (*globals::profiler); const auto& results = profiler.GetResults(); bool cpuMode = (timingMode == TimingMode::CPU);) with an initial null-check (if (!globals::profiler) { /* early return or set safe defaults and degrade gracefully */ }) and then only dereference globals::profiler after the check; apply the same change to the other occurrences that access globals::profiler and call GetResults() so the renderer returns or uses safe defaults when the profiler is unavailable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/Features/PerformanceOverlay.cpp`:
- Around line 1589-1592: The code reads globals::profiler->GetTotalTimeMs()
without checking profiler, which can crash if profiler is null; update the
PerformanceOverlay calculation to null-check globals::profiler before calling
GetTotalTimeMs (e.g., use a local float csPassesTime = globals::profiler ?
globals::profiler->GetTotalTimeMs() : 0.0f) and then compute csPercent,
remainingOtherTime and remainingOtherPercent from that safe value so the overlay
gracefully degrades when profiler is uninitialized.
In `@src/Menu/ProfilingRenderer.cpp`:
- Around line 149-160: cpuMode is only set inside the showModeToggle branch so
when the toggle is hidden the UI always treats profiling as GPU; initialize
cpuMode from timingMode before the if block (e.g., set cpuMode = (timingMode ==
TimingMode::CPU) immediately after declaring cpuMode) and keep the existing
toggle logic that updates timingMode and timeSinceLastUpdate so the
hidden-toggle path respects the currently selected TimingMode.
---
Duplicate comments:
In `@src/Menu/ProfilingRenderer.cpp`:
- Around line 93-95: Guard against a null globals::profiler before dereferencing
it in all render entry points: replace the direct dereference pattern (auto&
profiler = (*globals::profiler); const auto& results = profiler.GetResults();
bool cpuMode = (timingMode == TimingMode::CPU);) with an initial null-check (if
(!globals::profiler) { /* early return or set safe defaults and degrade
gracefully */ }) and then only dereference globals::profiler after the check;
apply the same change to the other occurrences that access globals::profiler and
call GetResults() so the renderer returns or uses safe defaults when the
profiler is unavailable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b386b229-e117-4471-a111-e5f8224ea831
📒 Files selected for processing (5)
src/Features/PerformanceOverlay.cppsrc/Features/PerformanceOverlay.hsrc/Menu/FeatureListRenderer.cppsrc/Menu/ProfilingRenderer.cppsrc/Menu/ProfilingRenderer.h
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/Menu/ProfilingRenderer.cpp`:
- Around line 162-164: Remove the stray closing brace that prematurely ends
ProfilingRenderer::RenderStatistics: delete the extra `}` (the one that closes
the function at line 162) so that the subsequent statements (starting with the
`float currentTime = static_cast<float>(ImGui::GetTime());` and all code through
the end of the intended function) remain inside
ProfilingRenderer::RenderStatistics; verify the function's opening and closing
braces now correctly encompass all logic and recompile.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: c38eafb3-6f2d-4693-bd46-ff2c0a2d3bcf
📒 Files selected for processing (1)
src/Menu/ProfilingRenderer.cpp
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/Features/DynamicCubemaps.cpp (1)
377-379: ⚡ Quick winGate profiling scopes behind the runtime profiling toggle.
Lines 377-379, 420-423, 465-477, and 512-531 now profile every dispatch unconditionally. Please short-circuit these
BeginPass/EndPasscalls when profiling is disabled to avoid unnecessary per-pass overhead on the render path.As per coding guidelines, "src/Features/**/*.{h,cpp}: Flag potential performance impacts on rendering performance when implementing graphics features and suggest user toggles for feature control".
Also applies to: 420-423, 465-477, 512-531
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Features/DynamicCubemaps.cpp` around lines 377 - 379, The profiling BeginPass/EndPass calls around the dispatches in DynamicCubemaps.cpp should be short-circuited when runtime profiling is disabled: wrap the globals::profiler->BeginPass(...) / globals::profiler->EndPass() pair (the block that uses the a_reflections ? "DynamicCubemaps::CaptureReflections" : "DynamicCubemaps::Capture" name and the other profiling strings at the other sites) with a runtime check (e.g. verify globals::profiler is non-null and the profiler is enabled via the profiler's runtime-enabled API) so the Dispatch(...) calls run without the per-pass profiling overhead when profiling is off; apply the same guard to the other BeginPass/EndPass occurrences at the noted locations (lines ~420-423, ~465-477, ~512-531) that currently call globals::profiler->BeginPass/EndPass unconditionally.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/Features/DynamicCubemaps.cpp`:
- Around line 377-379: The profiling BeginPass/EndPass calls around the
dispatches in DynamicCubemaps.cpp should be short-circuited when runtime
profiling is disabled: wrap the globals::profiler->BeginPass(...) /
globals::profiler->EndPass() pair (the block that uses the a_reflections ?
"DynamicCubemaps::CaptureReflections" : "DynamicCubemaps::Capture" name and the
other profiling strings at the other sites) with a runtime check (e.g. verify
globals::profiler is non-null and the profiler is enabled via the profiler's
runtime-enabled API) so the Dispatch(...) calls run without the per-pass
profiling overhead when profiling is off; apply the same guard to the other
BeginPass/EndPass occurrences at the noted locations (lines ~420-423, ~465-477,
~512-531) that currently call globals::profiler->BeginPass/EndPass
unconditionally.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ae4e8924-b5fc-4b8b-96e5-feb25e4f4c95
📒 Files selected for processing (18)
features/Dynamic Cubemaps/Shaders/Features/DynamicCubemaps.inifeatures/Grass Collision/Shaders/Features/GrassCollision.inifeatures/HDR Display/Shaders/Features/HDRDisplay.inifeatures/IBL/Shaders/Features/ImageBasedLighting.inifeatures/Light Limit Fix/Shaders/Features/LightLimitFix.inifeatures/Performance Overlay/Shaders/Features/PerformanceOverlay.inifeatures/Screen Space GI/Shaders/Features/ScreenSpaceGI.inifeatures/Skylighting/Shaders/Features/Skylighting.inifeatures/Subsurface Scattering/Shaders/Features/SubsurfaceScattering.inifeatures/Terrain Blending/Shaders/Features/TerrainBlending.inifeatures/Terrain Shadows/Shaders/Features/TerrainShadows.inifeatures/Volumetric Shadows/Shaders/Features/VolumetricShadows.inisrc/Features/DynamicCubemaps.cppsrc/Menu/ProfilingRenderer.cppsrc/Profiler.cppsrc/Profiler.hsrc/State.cppsrc/Utils/LegitProfiler.h
💤 Files with no reviewable changes (1)
- src/Menu/ProfilingRenderer.cpp
✅ Files skipped from review due to trivial changes (11)
- features/Light Limit Fix/Shaders/Features/LightLimitFix.ini
- features/Terrain Shadows/Shaders/Features/TerrainShadows.ini
- features/Grass Collision/Shaders/Features/GrassCollision.ini
- features/HDR Display/Shaders/Features/HDRDisplay.ini
- features/Performance Overlay/Shaders/Features/PerformanceOverlay.ini
- features/Volumetric Shadows/Shaders/Features/VolumetricShadows.ini
- features/Terrain Blending/Shaders/Features/TerrainBlending.ini
- features/Dynamic Cubemaps/Shaders/Features/DynamicCubemaps.ini
- features/Screen Space GI/Shaders/Features/ScreenSpaceGI.ini
- features/Skylighting/Shaders/Features/Skylighting.ini
- features/IBL/Shaders/Features/ImageBasedLighting.ini
|
✅ A pre-release build is available for this PR: |
Adds D3D11 timestamp query and QPC-based profiling with LegitProfiler graphs, a Profiling menu tab, and per-feature Profiling sections with GPU/CPU toggle.
Summary by CodeRabbit