refactor(ssgi): tidy GI compute shader internals#2225
Conversation
Replace the 9-tap box history clamp with a 5-tap cross pattern (SVGF, Schied 2017). Named-neighbour reads (yL/yR/yU/yD) replace a nested loop, making the algorithm explicit at source level. The clamp short-circuits on saturated history (lerpFactor < 0.15, ~7+ accumulated frames). The temporal blend has self-stabilised by that point, so the bounding effect is marginal. Disocclusion remains covered: motion-clamped accumFrames keeps lerpFactor above 0.15 in transient regions. Universal change (VR + flat). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughReworks Screen Space GI shader sampling and temporal history handling (guarded rsqrt, revised mip selection, direct back-side vectors, 5-tap cross clamping with lerp threshold) and updates C++ SRV bindings to match the HLSL parameter reordering. No public API changes. Changes
Sequence Diagram(s)sequenceDiagram
participant CPU as DrawSSGI (CPU)
participant GPU as Compute Shader (GI)
participant SRV as Texture SRVs
participant UAV as GI UAVs
CPU->>SRV: Bind SRVs (`texIlY`,`texIlCoCg`,`texGiSpecular`,`texNormal`,...)
CPU->>GPU: Dispatch giCompute (with updated SRV slots)
GPU->>SRV: Sample temporal histories, normals, indirect maps
GPU->>GPU: Compute GI using guarded rsqrt, split mip selection, back-side vectors
GPU-->>GPU: Apply temporal denoiser (5-tap cross clamp when lerpFactor >= 0.15)
GPU->>UAV: Write GI outputs (radiance, normal mips, specular)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No actionable suggestions for changed features. |
Three mathematically equivalent rewrites in the GI trace inner loop, expressing the underlying identities at source level rather than relying on compiler hoisting: 1. Per-slice constant logLenOmega = 0.5 * log2(dot(omega, omega)). Per-step mip becomes log2(s) + logLenOmega for sampleOffset = s * omega with s >= 0. 2. rcpProjectedNormalVecLength = rsqrt(dot(v, v)) for cosNorm. The max(..., 1e-20) clamp also guards a degenerate-normal NaN edge case the original could produce. 3. sampleBackHorizonVec = normalize(sampleDelta - viewVec * Thickness) directly, instead of building sampleBackPos as an intermediate. Same for the GI-radius back-horizon vector. All three are bit-equivalent for non-degenerate inputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gi.cs.hlsl declared Texture2D<float> srcPrevAo at t5 but never sampled it. The C++ dispatch still bound texAo[inputAoTexIdx]->srv to slot 5 on every GI dispatch. Remove the HLSL declaration, drop the C++ bind, and shift the remaining textures (srcPrevY/CoCg/GISpecular/Normal) from t6-t9 down to t5-t8 to keep the SRV layout dense. No functional change. Mainly tidies up dead code; minor CPU savings from one less SRV bind per frame. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
964bc16 to
4eb7c7d
Compare
|
✅ A pre-release build is available for this PR: |
Summary
logLenOmegafor the mip selection,rsqrt(dot(v,v))for cosNorm (also guards a degenerate-normal NaN edge case viamax(..., 1e-20)), and foldingsampleBackPosintosampleDelta - viewVec * ThicknesssrcPrevAoSRV declaration fromgi.cs.hlsl(declared at t5 but never read); drop the matching C++ bind and collapsesrcPrevY/CoCg/GISpecular/Normalfrom t6..t9 down to t5..t8All math changes are bit-equivalent for non-degenerate inputs. No new settings.
Test plan
hlslkit-compilevalidates SSGI variants (HALF_RES / QUARTER_RES / full, GI on/off, GI_SPECULAR on/off, TEMPORAL_DENOISER on/off, VR + flat)SSGI - GIper-dispatch cost stays within scene-noise of upstream baseline🤖 Generated with Claude Code
Summary by CodeRabbit