perf: reduce copy operations#2193
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughTerrain blending now writes a main-depth snapshot inline from the compute shader to a new UAV; upscaling sharpening and DLSS output paths were refactored to avoid intermediate CopyResource operations and to conditionally skip depth copies in certain UI/paused contexts. Changes
Sequence Diagram(s)sequenceDiagram
participant CS as Compute Shader (DepthBlend)
participant MainSRV as Main Depth (SRV)
participant TerrainSRV as Terrain Depth (SRV)
participant MainCopyUAV as MainDepthCopy (UAV u2)
participant BlendUAV as Blended Depth (UAV)
CS->>MainSRV: Sample mainDepth
CS->>TerrainSRV: Sample terrainDepth
CS->>MainCopyUAV: Write mainDepth snapshot
CS->>BlendUAV: Write mixedDepth (min(main, terrain))
sequenceDiagram
participant App as Application
participant DLSS as DLSS Upscaler
participant Sharp as Sharpening Pipeline
participant MainUAV as Main Texture (UAV)
participant SharpTex as Sharpener Texture
App->>DLSS: Upscale frame
alt RCAS sharpening && sharpenerTexture available
DLSS->>SharpTex: Write DLSS output (sharpener)
Sharp->>MainUAV: Sharpen into main (direct UAV write)
else
DLSS->>MainUAV: Write DLSS output directly into main
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No actionable suggestions for changed features. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/Features/Upscaling.cpp (1)
1980-2007: Zero-copy RCAS path looks correct; please confirmsharpenerTexturelifetime matches the DLSS UAV write inStreamline::Upscale.The early-out on
!main.UAV(Line 1998) is a nice safety net, and the direction of data flow (DLSS →sharpenerTextureSRV → RCAS →main.UAV) aligns with the companion change inStreamline::Upscale. A couple of things worth confirming:
sharpenerTextureis created only in the DLSS branch ofCreateUpscalingTextureResources(Line 696-714) and destroyed in the non-DLSS branch ofDestroyUpscalingTextureResources(Line 754-761).ApplySharpeningis only invoked whenupscaleMethod == kDLSS(Line 2042-2043), so the null-check onsharpenerTextureat Line 1988 should never fire in steady state — good.- Per the PR summary (~214 µs saved), the critical invariant is that DLSS actually wrote into
sharpenerTexturethis frame. InStreamline::Upscalethe selector issharpnessDLSS > 0.0f && upscaling.sharpenerTexture, while here the selector issettings.sharpnessDLSS <= 0.0f→ return. IfsharpenerTextureexists but is null-checked differently across the two files (e.g.,sharpnessDLSS == 0.0fvs<= 0.0fvs FP edge cases like NaN/negative), the two paths can diverge: DLSS would write toa_upscalingTexture(=main.texture) while RCAS would read from a stalesharpenerTexture. Consider centralizing the "should sharpen" predicate so both call sites cannot drift.- Barrier/hazard: DLSS writes
sharpenerTexturevia Streamline (likely leaves it bound as UAV or in an indeterminate state). Then RCAS readssharpenerTexture->srvand writesmain.UAV. D3D11 will auto-unbind conflicting views, but please verify with Tracy/PIX that there isn't a residual UAV binding onmain.UAVleft over from DLSS's internal pipeline that would force an implicit state transition and negate the win.♻️ Optional: extract a single "sharpen active" predicate so both files share it
+// In Upscaling.h +bool IsDLSSSharpeningActive() const { return settings.sharpnessDLSS > 0.0f && sharpenerTexture != nullptr; }Then use
upscaling.IsDLSSSharpeningActive()in bothStreamline::UpscaleandUpscaling::ApplySharpening.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Features/Upscaling.cpp` around lines 1980 - 2007, The DLSS-sharpening predicate can drift between Streamline::Upscale and Upscaling::ApplySharpening causing stale reads; create a single predicate method (e.g., Upscaling::IsDLSSSharpeningActive()) that checks std::isfinite(settings.sharpnessDLSS) && settings.sharpnessDLSS > 0.0f && sharpenerTexture != nullptr (and any "was written this frame" flag if you have one), then replace the separate checks in ApplySharpening and Streamline::Upscale with calls to IsDLSSSharpeningActive(); also audit CreateUpscalingTextureResources and DestroyUpscalingTextureResources to ensure sharpenerTexture lifetime matches that predicate.src/Features/Upscaling/Streamline.cpp (1)
615-654: VR SBS extent alignment is correct; update non-VR comment to reflect conditional colorOut redirection.
VR FinalizePerEyeOutputs extents: The per-eye output copies land in the correct SBS regions.
sharpenerTextureinherits full SBS dimensions frommain.texture, andFinalizePerEyeOutputscopies each eye to the correct offset (offsetXOut = 0for left,eyeWidthOutfor right). RCAS then processes the full SBS width (line 50:screenWidth = state->screenSize.x), so the layout is sound.Predicate gates are consistent: The
sharpnessDLSS > 0.0f && sharpenerTextureselector in Streamline (line 619) and the return guards inApplySharpening(lines 1985–1989) are logically symmetric—both activate together and deactivate together. No drift risk.Non-VR comment (line 646) is misleading as written. It should note that
colorOutis conditionally redirected for RCAS zero-copy:Suggested comment update
- // Non-VR: Simple full-texture upscale. + // Non-VR: Full-texture upscale with conditional colorOut redirection for RCAS sharpening.The optional refactor (shared
IsDLSSSharpeningActive()helper) remains defensible as a best practice, though the current dual guards are not immediately problematic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Features/Upscaling/Streamline.cpp` around lines 615 - 654, The non-VR branch comment is misleading about output behavior—update the comment above the non-VR path in Streamline.cpp to state that colorOut may be conditionally redirected to upscaling.sharpenerTexture when upscaling.settings.sharpnessDLSS > 0.0f and upscaling.sharpenerTexture exists (see the colorOut selector where it chooses between upscaling.sharpenerTexture->resource.get() and a_upscalingTexture) so that RCAS can perform zero-copy sharpening; reference EvaluateDLSS(...) and the colorOut variable and mention the symmetry with ApplySharpening guards (upscaling.settings.sharpnessDLSS and upscaling.sharpenerTexture) or replace the comment with a brief note that non-VR uses a full-texture upscale but output may be redirected for DLSS sharpening.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Features/Upscaling.cpp`:
- Around line 1917-1926: The menu-gated copy may let depthCopy be stale across
transitions; add a stale-frame detection and force a copy when needed: introduce
a frame-tracking variable (e.g., lastDepthCopyFrame) updated wherever
kMAIN→kMAIN_COPY is explicitly copied (including the VR path code path that
currently copies after upscaling), then in UpscaleDepth (or where depthCopy is
sampled) compare against the current frame counter and call
copyIfNonAliased(depthCopy.texture, depth.texture) when lastDepthCopyFrame !=
currentFrame; also ensure any existing unconditional copies (VR branch) update
lastDepthCopyFrame so all copy paths are tracked.
---
Nitpick comments:
In `@src/Features/Upscaling.cpp`:
- Around line 1980-2007: The DLSS-sharpening predicate can drift between
Streamline::Upscale and Upscaling::ApplySharpening causing stale reads; create a
single predicate method (e.g., Upscaling::IsDLSSSharpeningActive()) that checks
std::isfinite(settings.sharpnessDLSS) && settings.sharpnessDLSS > 0.0f &&
sharpenerTexture != nullptr (and any "was written this frame" flag if you have
one), then replace the separate checks in ApplySharpening and
Streamline::Upscale with calls to IsDLSSSharpeningActive(); also audit
CreateUpscalingTextureResources and DestroyUpscalingTextureResources to ensure
sharpenerTexture lifetime matches that predicate.
In `@src/Features/Upscaling/Streamline.cpp`:
- Around line 615-654: The non-VR branch comment is misleading about output
behavior—update the comment above the non-VR path in Streamline.cpp to state
that colorOut may be conditionally redirected to upscaling.sharpenerTexture when
upscaling.settings.sharpnessDLSS > 0.0f and upscaling.sharpenerTexture exists
(see the colorOut selector where it chooses between
upscaling.sharpenerTexture->resource.get() and a_upscalingTexture) so that RCAS
can perform zero-copy sharpening; reference EvaluateDLSS(...) and the colorOut
variable and mention the symmetry with ApplySharpening guards
(upscaling.settings.sharpnessDLSS and upscaling.sharpenerTexture) or replace the
comment with a brief note that non-VR uses a full-texture upscale but output may
be redirected for DLSS sharpening.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: abbd9064-e708-4521-b159-a6978bc10958
📒 Files selected for processing (5)
features/Terrain Blending/Shaders/TerrainBlending/DepthBlend.hlslsrc/Features/TerrainBlending.cppsrc/Features/TerrainBlending.hsrc/Features/Upscaling.cppsrc/Features/Upscaling/Streamline.cpp
|
✅ A pre-release build is available for this PR: |
|
@coderabbitai does this impact nonVR at all? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 438 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 9097
In short, all three are cross-platform perf improvements; VR just sees a proportionally larger gain because of the stereo buffer overhead being doubled there (D24S8 copy at full SBS resolution, etc.). 🧠 Learnings used |
- Zero-copy RCAS: redirect DLSS output to sharpenerTexture so RCAS writes directly to kMAIN.UAV, eliminating a ~214µs CopyResource - Terrain blending: DepthBlend CS writes mainDepth inline to u2 (R32_FLOAT UAV), eliminating a CopyResource(terrainDepth←mainDepth) of ~393µs - Depth upscale: gate kMAIN→kMAIN_COPY copy to non-3D menu contexts only; the engine already does this copy during gameplay (~200µs) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
5119906 to
4a6e7da
Compare
Summary
Three independent copy eliminations profiled via Tracy, targeting overhead (tested in VR):
sharpenerTextureso RCAS writes directly tokMAIN.UAV, eliminating aCopyResourceround-tripDepthBlend.hlslCS writesmainDepthto a UAV (u2, R32_FLOAT) inline, eliminating a separateCopyResource(terrainDepth←mainDepth)before the terrain blending composite passkMAIN→kMAIN_COPYduring 3D scene rendering; our extra copy is only needed in non-3D menu contexts (map menu, main menu, loading screen, pause). Gate the copy to only run thenTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit