Skip to content

perf(SSGI): eliminate radiance copy via direct UAV write#2127

Merged
doodlum merged 1 commit into
community-shaders:devfrom
alandtse:perf/ssgi-eliminate-radiance-copy
Apr 16, 2026
Merged

perf(SSGI): eliminate radiance copy via direct UAV write#2127
doodlum merged 1 commit into
community-shaders:devfrom
alandtse:perf/ssgi-eliminate-radiance-copy

Conversation

@alandtse
Copy link
Copy Markdown
Collaborator

@alandtse alandtse commented Apr 16, 2026

Summary

radianceDisocc previously wrote mip 0 to texRadiance, then a CopySubresourceRegion copied it to texRadianceTemp before prefilterRadiance could safely read it (avoiding the read/write hazard on the same texture). This PR eliminates that intermediate copy by having radianceDisocc write directly to texRadianceTemp as a UAV.

  • texRadianceTemp gains a UAV view (D3D11_BIND_UNORDERED_ACCESS replaces D3D11_BIND_RENDER_TARGET)
  • radianceDisocc binds texRadianceTemp->uav at slot 0 instead of texRadiance->uav
  • prefilterRadiance continues to read texRadianceTemp as SRV and write the mip chain back to texRadiance — unchanged
  • The CopySubresourceRegion call is removed entirely

No behaviour change, no quality impact. Removes one full-texture GPU copy per frame.

Test plan

  • Verified in RenderDoc (VR capture) that prefilterRadiance receives correct mip 0 input after the change
  • No visual regression in outdoor/interior scenes
  • Builds cleanly under ALL preset

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor
    • Optimized the Screen Space Global Illumination pipeline to eliminate an internal copy step and enable direct radiance writes into the staging buffer. This reduces GPU memory traffic, lowers workload, and improves rendering performance and frame-time stability.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ea045fe-71bd-48b6-a18b-813105b8d79f

📥 Commits

Reviewing files that changed from the base of the PR and between 5b344a7 and fe572cd.

📒 Files selected for processing (1)
  • src/Features/ScreenSpaceGI.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Features/ScreenSpaceGI.cpp

📝 Walkthrough

Walkthrough

Temporary radiance texture now supports direct UAV writes for mip 0; the disocclusion compute writes mip 0 into the temp UAV, and the prefilter reads mip 0 from the temp SRV while writing the mip-chain back into the main radiance texture.

Changes

Cohort / File(s) Summary
Screen Space GI Radiance Texture Refactoring
src/Features/ScreenSpaceGI.cpp
SetupResources: make texRadianceTemp UAV-writable and create mip‑0 UAV. DrawSSGI: route radiance-disocc writes for mip 0 to texRadianceTemp->uav, remove the mip‑0 CopySubresourceRegion, and have Prefilter bind texRadianceTemp->srv for mip 0 while keeping per-mip UAVs for writing the radiance mip chain into texRadiance.

Sequence Diagram(s)

sequenceDiagram
    participant Disocc as RadianceDisoccCompute
    participant TempTex as texRadianceTemp (UAV / SRV)
    participant Radiance as texRadiance (UAV chain)
    participant Prefilter as PrefilterCompute

    Disocc->>TempTex: write mip0 (UAV)
    Note right of TempTex: mip0 stored in temp UAV (created in SetupResources)
    Prefilter->>TempTex: read mip0 (SRV)
    Prefilter->>Radiance: write/read other mips (per-mip UAVs)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • doodlum
  • jiayev

Poem

🐰 I hopped through code with careful paws,

mip0 found a new, direct clause,
no copy sprint, just tidy flow,
temp UAV helps radiance grow,
nibbles of light — a carrot glow. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf(SSGI): eliminate radiance copy via direct UAV write' is specific and directly summarizes the main optimization: removing a GPU copy operation by using direct UAV writes instead.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@alandtse alandtse changed the title perf(VR): eliminate SSGI radiance copy via direct UAV write perf(SSGI): eliminate radiance copy via direct UAV write Apr 16, 2026
@github-actions
Copy link
Copy Markdown

No actionable suggestions for changed features.

@alandtse alandtse force-pushed the perf/ssgi-eliminate-radiance-copy branch from 45b5391 to 5b344a7 Compare April 16, 2026 04:18
@alandtse
Copy link
Copy Markdown
Collaborator Author

@coderabbitai please carefully review the code changes (discounting any comments) and determine if it should result in performance gains and explain why. Please identify all changes and potential regressions with the approach.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the full diff for the PR
git diff HEAD~1 HEAD -- src/Features/ScreenSpaceGI.cpp 2>/dev/null || git log --oneline -5

# Also show the changed file
fd -e cpp -e h ScreenSpaceGI | head -20

Length of output: 3573


🏁 Script executed:

#!/bin/bash
# Get broader context around texRadiance and texRadianceTemp setup
grep -n "texRadiance" src/Features/ScreenSpaceGI.cpp | head -60

# Check the header for texRadiance declarations
grep -n "texRadiance" src/Features/ScreenSpaceGI.h 2>/dev/null | head -30

# Look at how texRadiance UAV is set up (bind flags)
grep -n -A5 -B5 "D3D11_BIND_UNORDERED_ACCESS" src/Features/ScreenSpaceGI.cpp | head -80

Length of output: 127


🏁 Script executed:

#!/bin/bash
# Find the file first
fd ScreenSpaceGI

# Also search across all cpp/h files
rg -l "texRadiance" --type cpp --type h | head -10

Length of output: 196


🏁 Script executed:

#!/bin/bash
# List all files to understand repo structure
tree -L 3 --filelimit 30 | head -60

# Try finding cpp/h files broadly
fd -e cpp -e h | grep -i "screen" | head -20

# Check features folder
ls "features/Screen Space GI/" 2>/dev/null

Length of output: 761


🏁 Script executed:

#!/bin/bash
# The src/ directory seems absent; look at SSGI shaders which may reveal expected resource bindings
ls "features/Screen Space GI/Shaders/ScreenSpaceGI/"

# Check shader files for UAV/SRV slot usage in radianceDisocc
cat "features/Screen Space GI/Shaders/ScreenSpaceGI/RadianceDisocc.hlsl" 2>/dev/null | head -80

# Also check prefilter shader
cat "features/Screen Space GI/Shaders/ScreenSpaceGI/PrefilterRadiance.hlsl" 2>/dev/null | head -80

Length of output: 132


🏁 Script executed:

#!/bin/bash
cat "features/Screen Space GI/Shaders/ScreenSpaceGI/radianceDisocc.cs.hlsl"
echo "=== PREFILTER ==="
cat "features/Screen Space GI/Shaders/ScreenSpaceGI/prefilterRadiance.cs.hlsl"

Length of output: 11495


@alandtse: Here is a thorough analysis of all changes and their implications.


Performance Analysis

Yes — this PR should produce a real, measurable per-frame GPU saving.

The old pipeline (3 steps):

radianceDisocc → texRadiance (u0 UAV)
    ↓
CopySubresourceRegion: texRadiance[mip0] → texRadianceTemp
    ↓
prefilterRadiance reads texRadianceTemp (SRV), writes mip chain → texRadiance

The new pipeline (2 steps):

radianceDisocc → texRadianceTemp (u0 UAV, direct write)
    ↓
prefilterRadiance reads texRadianceTemp (SRV), writes mip chain → texRadiance

Why the gains are real:

Source of cost Old New
Full-texture GPU blit CopySubresourceRegion every frame ❌ Eliminated
GPU memory bandwidth Read mip0 + write mip0 (full copy) ❌ Eliminated (~2× mip0 BW per frame)
GPU pipeline stall D3D11 must drain radianceDisocc CS before starting copy ❌ Eliminated
CPU API overhead One D3D11 command recorded per frame ❌ Eliminated

For a full-res DXGI_FORMAT_R11G11B10_FLOAT texture (4 bytes/pixel), mip 0 at 1920×1080 is ~8 MB — the copy consumed ~16 MB of memory bandwidth (read + write) every frame, plus the associated GPU sync. At 60fps that is nearly 1 GB/s of memory bandwidth reclaimed.


All Code Changes

SetupResources — resource creation

Before After
texRadianceTemp bind flags D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_RENDER_TARGET D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_UNORDERED_ACCESS
texRadianceTemp UAV Not created New: DXGI_FORMAT_R11G11B10_FLOAT, Texture2D, MipSlice=0

DrawSSGI — radianceDisocc pass

- uavs.at(0) = texRadiance->uav.get();
+ uavs.at(0) = texRadianceTemp->uav.get();

radianceDisocc now writes outRadianceDisocc at u0 directly into texRadianceTemp.

DrawSSGI — prefilterRadiance pass

- context->CopySubresourceRegion(
-     texRadianceTemp->resource.get(), 0, 0, 0, 0,
-     texRadiance->resource.get(), 0, nullptr);

The entire copy is removed. The srvs.at(0) = texRadianceTemp->srv.get() binding and all uavRadiance[0..4] bindings are unchanged.


Correctness / Hazard Analysis

Why the original copy existed: prefilterRadiance writes mip 0 of texRadiance (uavRadiance[0]) but also needs mip 0 as its SRV input — a read/write conflict on the same resource. texRadianceTemp was the intermediate buffer to break that hazard.

This hazard is fully preserved in the new approach: prefilterRadiance still reads texRadianceTemp (SRV) and writes to texRadiance (UAV) — they are different resources.

UAV→SRV barrier: The resetViews() call before the prefilterRadiance bindings clears slot u0, which D3D11 uses to enforce the UAV flush before CSSetShaderResources binds the same resource as an SRV. This pattern was already present before this PR.

Format UAV store support: DXGI_FORMAT_R11G11B10_FLOAT is valid for typed UAV stores (RWTexture2D<float3> writes) in D3D11 Feature Level 11.0. This was already proven working for texRadiance->uav in the same dispatch.


Potential Regressions

  1. texRadiance->uav is now orphaned. The default/full-resource UAV on texRadiance was used in the old radianceDisocc pass but is no longer bound anywhere. It remains allocated in memory but is harmless — just a minor resource leak worth noting.

  2. D3D11_BIND_RENDER_TARGET removed from texRadianceTemp. If any code path outside this diff attempts to use texRadianceTemp as a render target, CreateRenderTargetView would fail or silently produce an invalid view. No such usage is visible in the shader or diff, so this appears safe.

  3. Content of texRadiance mip 0 between passes. Before this PR, texRadiance mip 0 held the raw disocclusion output from radianceDisocc until prefilterRadiance overwrote it with the filtered version. After this PR, texRadiance mip 0 holds stale data from the previous frame until prefilterRadiance runs. If any intermediate pass reads texRadiance mip 0 before prefilterRadiance completes — which is not visible in this diff — it would receive one-frame-stale radiance. Based on the visible pipeline ordering this does not appear to be the case, but it is worth confirming no pass samples texRadiance mip 0 between these two dispatches.

  4. D3D11 debug layer warnings. If resetViews() does not null out all UAV slots before the prefilterRadiance bindings (e.g., if it only resets SRV slots), the D3D11 debug layer may warn about texRadianceTemp being simultaneously bound as UAV (from the prior dispatch's residual state) and SRV. This was a pre-existing concern, not introduced here.


Overall verdict: The optimization is sound and the performance gain is genuine. The data flow and read/write hazard avoidance are maintained correctly. The main item to double-check is regression point 3 — confirming no intermediate pass reads texRadiance mip 0 in the window between radianceDisocc completing and prefilterRadiance completing.


🧠 Learnings used
Learnt from: vrnord
Repo: doodlum/skyrim-community-shaders PR: 2003
File: features/Upscaling/Shaders/Upscaling/DepthUpscalePS.hlsl:55-61
Timestamp: 2026-03-24T04:30:32.809Z
Learning: In `features/Upscaling/Shaders/Upscaling/DepthUpscalePS.hlsl` (doodlum/skyrim-community-shaders PR `#2003`), the shader is dispatched per-eye using per-eye dimensions (DstWidth/DstHeight). The UV (`input.TexCoord`) is derived from `dispatchID / DstDim`, which is already eye-local — not full SBS/packed stereo coordinates. The per-eye UV clamping logic (isRight, halfScale) operates entirely in eye-local space, and there is no stereo seam issue. Do not flag SBS seam concerns for this shader.

Learnt from: alandtse
Repo: doodlum/skyrim-community-shaders PR: 2084
File: src/ShaderCache.cpp:1646-1649
Timestamp: 2026-04-10T07:34:27.849Z
Learning: In `src/ShaderCache.cpp` (doodlum/skyrim-community-shaders PR `#2084`), `GetImagespaceShaderDescriptor` intentionally returns `false` for VR when a shader is not reverse-engineered. Returning `false` causes `ResolveImageSpaceDescriptor` to return `false`, which makes all callers (GetVertexShader, GetPixelShader, GetComputeShader, CompileShader) skip custom shader handling and defer to the vanilla shader. This is the correct pattern for un-RE'd imagespace shaders on VR — do not flag it as "skipping all custom VR handling" or suggest gating on HDR feature state.

Learnt from: vrnord
Repo: doodlum/skyrim-community-shaders PR: 1982
File: package/Shaders/Lighting.hlsl:3089-3095
Timestamp: 2026-03-16T13:13:11.490Z
Learning: In `package/Shaders/Lighting.hlsl` (doodlum/skyrim-community-shaders), the `TREE_ANIM` alpha test path intentionally uses a hardcoded `0.1` floor check (`if (alpha < 0.1) discard;`) and `AlphaTestRefRS` rather than `SharedData::VRAlphaTestThreshold`. `VRAlphaTestThreshold` is wired into `DistantTree.hlsl` and `RunGrass.hlsl` only. The threshold wiring for `Lighting.hlsl`'s `TREE_ANIM` path is deferred to a separate VR foliage PR.

Learnt from: Dlizzio
Repo: doodlum/skyrim-community-shaders PR: 1950
File: package/Shaders/Lighting.hlsl:2008-2012
Timestamp: 2026-03-07T00:53:21.267Z
Learning: In doodlum/skyrim-community-shaders PR `#1950`, the maintainer (Dlizzio) confirmed that per-axis normal reorientation for triplanar-projected normals/detail in package/Shaders/Lighting.hlsl is not required for their use case; the current approach intentionally mixes samples without axis-specific swizzle/sign correction.

Learnt from: Dlizzio
Repo: doodlum/skyrim-community-shaders PR: 2000
File: src/WeatherEditor/EditorWindow.cpp:1271-1274
Timestamp: 2026-03-22T18:40:49.633Z
Learning: Repo: doodlum/skyrim-community-shaders — Maintainer preference (PR `#2000` on 2026-03-22): In src/WeatherEditor/EditorWindow.cpp within EditorWindow::RenderUI, keep `resetLayout = false;` at the end of the frame (after PaletteWindow::Draw). The reset is intentionally one-shot for windows rendered that frame; closed windows are allowed to reopen with their previous geometry. Do not move the clear later or broadcast resets to closed windows.

Learnt from: jiayev
Repo: doodlum/skyrim-community-shaders PR: 1947
File: package/Shaders/DeferredCompositeCS.hlsl:234-243
Timestamp: 2026-03-14T08:35:42.651Z
Learning: In `package/Shaders/DeferredCompositeCS.hlsl` (doodlum/skyrim-community-shaders), the fallback non-IBL SKYLIGHTING specular path intentionally double-weights with `skylightingSpecular`: `dalcScaled` is computed as `IrradianceToGamma(IrradianceToLinear(directionalAmbientColorSpecular) * skylightingSpecular)`, and the final `finalIrradiance` is then `lerp(specularIrradiance, specularIrradianceReflections, skylightingSpecular)`. This is original engine logic preserved from before PR `#1947` and should not be flagged as a double-attenuation bug.

Learnt from: jiayev
Repo: doodlum/skyrim-community-shaders PR: 0
File: :0-0
Timestamp: 2025-08-03T18:37:19.690Z
Learning: ISReflectionsRayTracing.hlsl and ISWorldMap.hlsl in the skyrim-community-shaders repository are image-space post-processing shaders that perform color sampling and blending operations that need proper linear color space handling for the linear lighting system. ISReflectionsRayTracing handles screen-space reflections and should use conditional Color::IrradianceToLinear/Gamma conversions similar to ISCompositeLensFlareVolumetricLighting.hlsl. ISWorldMap performs 7x7 color accumulation that should be done in linear space similar to the pattern used in ISSAOComposite.hlsl.

Learnt from: vrnord
Repo: doodlum/skyrim-community-shaders PR: 2002
File: package/Shaders/VRStereoOptimizations/StencilCS.hlsl:97-143
Timestamp: 2026-03-24T05:34:21.730Z
Learning: In `package/Shaders/VRStereoOptimizations/StencilCS.hlsl` (doodlum/skyrim-community-shaders PR `#2002`), `EdgeWidth` has been fully removed as a setting. The edge detection search width is hardcoded to `kInnerWidth = 2` with `maxWidth = kInnerWidth`. Do not flag EdgeWidth as a dead/unused parameter — it no longer exists in `VRStereoOptParams` or `cbuffers.hlsli` in this PR.

Learnt from: vrnord
Repo: doodlum/skyrim-community-shaders PR: 1982
File: src/Features/VR/StereoBlend.cpp:98-109
Timestamp: 2026-03-16T13:13:31.383Z
Learning: In `src/Features/VR/StereoBlend.cpp` (doodlum/skyrim-community-shaders), debug modes 4/5 ("Overwrite", "Overwrite Eye1") in the `StereoBlendDebugMode` selector are NOT separate shader variants. They use the same `stereoBlendOverwriteCS` shader with different cbuffer parameters. When `vrStereoOptActive` is true, `stereoBlendOverwriteCS` is always selected unconditionally and `StereoBlendDebugMode` values 4/5 do not map to any distinct code path.

Learnt from: jiayev
Repo: doodlum/skyrim-community-shaders PR: 1947
File: features/Dynamic Cubemaps/Shaders/DynamicCubemaps/DynamicCubemaps.hlsli:107-115
Timestamp: 2026-03-14T08:35:02.326Z
Learning: In `features/Dynamic Cubemaps/Shaders/DynamicCubemaps/DynamicCubemaps.hlsli`, the pattern of scaling `directionalAmbientColorSpecular` by `skylightingSpecular` (via `dalcScaled`) before the final `lerp(specularIrradiance, specularIrradianceReflections, skylightingSpecular)` is intentional pre-existing behavior preserved from the original implementation. Do not flag this as double-attenuation; the combined attenuation is by design.

radianceDisocc now writes mip 0 directly to texRadianceTemp, which
prefilterRadiance reads as SRV while writing the mip chain back to
texRadiance. Removes the per-frame CopySubresourceRegion that was
previously used to resolve the read/write hazard on texRadiance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@alandtse alandtse force-pushed the perf/ssgi-eliminate-radiance-copy branch from 5b344a7 to fe572cd Compare April 16, 2026 04:31
@github-actions
Copy link
Copy Markdown

✅ A pre-release build is available for this PR:
Download

@doodlum doodlum merged commit 422c541 into community-shaders:dev Apr 16, 2026
17 checks passed
YtzyFvra pushed a commit to YtzyFvra/skyrim-community-shaders that referenced this pull request Apr 19, 2026
…haders#2127)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
ParticleTroned added a commit to ParticleTroned/skyrim-community-shaders that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants