Skip to content

perf: optimise DeferredComposite#2150

Merged
doodlum merged 22 commits into
devfrom
ps-opt
Apr 19, 2026
Merged

perf: optimise DeferredComposite#2150
doodlum merged 22 commits into
devfrom
ps-opt

Conversation

@doodlum
Copy link
Copy Markdown
Collaborator

@doodlum doodlum commented Apr 19, 2026

VRStereoOptimizations should now run much faster. This pass should run much faster on all hardware. However I have not tested this yet.

Summary by CodeRabbit

  • Refactor
    • Switched deferred composite from a compute dispatch to a fullscreen graphics pass (new vertex/pixel shaders), consolidated inputs, removed VR eye-1 early-return and motion-vector/TAA UAV outputs, and updated normal+roughness render-target usage and related bindings.
    • Reworked shader normal encode/decode logic and adjusted related test/vector types and tolerances.
  • New Features
    • Added a D3D state backup/restore utility to preserve/restore pipeline state.
  • Tests
    • Updated normal-encoding tests with relaxed tolerances.
  • Documentation
    • Updated VR stereo usage example to reflect the new composite path.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Converts deferred composite from a compute dispatch to a fullscreen VS+PS draw, refactors GBuffer normal encode/decode to an analytic reconstruction with signature/type changes, updates related tests, adds a D3D11 state-backup utility, updates Deferred pipeline/shader cache and bindings, and shifts a few texture register indices.

Changes

Cohort / File(s) Summary
Normal encoding & tests
package/Shaders/Common/GBuffer.hlsli, package/Shaders/Tests/TestGBuffer.hlsl
Replaced octahedral helpers with analytic reconstruction; changed EncodeNormal/DecodeNormal to float types/signatures; removed OctWrap/EncodeNormalVanilla; updated tests (new equator test, relaxed tolerances, removed some range asserts).
Composite shaders
package/Shaders/DeferredCompositePS.hlsl, package/Shaders/DeferredCompositeVS.hlsl
Replaced compute entry with VS+PS fullscreen path: new VS, PS returns SV_Target0/SV_Target1; removed UAV writes (MainRW/NormalTAAMaskSpecularMaskRW/MotionVectorsRW) and Eye-1 early-return/VR StereoOptModeTexture; switched to MainInputTexture SRV and conditionalized/shuffled SRV/register bindings.
Deferred pipeline & shader cache
src/Deferred.cpp, src/Deferred.h
Switched composite from CS Dispatch to Draw(3) fullscreen triangle; added blend/depth-stencil/rasterizer states; bind per-frame CBs/SRVs to PS; replaced compute shader cache entries with VS/PS caches; introduced winrt::com_ptr members and normalRoughnessRT selection.
D3D state backup utility
src/Utils/D3DStateBackup.h
Added Util::D3DStateBackup to capture/restore D3D11 IA/VS/GS/RS/PS/OM state with Backup, Restore, and Release; fixed-size CB/SRV/sampler storage and full pipeline state capture.
SRV/RT binding updates
src/Features/ScreenSpaceGI.cpp, src/Features/SubsurfaceScattering.cpp
Replaced uses of rts[NORMALROUGHNESS] with rts[globals::deferred->normalRoughnessRT] for normal+roughness SRV in multiple compute passes.
IBL register shifts
features/IBL/Shaders/IBL/IBL.hlsli
Deferred-path texture registers shifted: EnvIBLTexture t14→t15, SkyIBLTexture t15→t16.
Docs/comment only
src/Features/VRStereoOptimizations.h
Updated comment to reference DeferredCompositePS Eye-1 skip instead of DeferredCompositeCS.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Context as D3D11 DeviceContext
    participant State as Util::D3DStateBackup
    participant Pipeline as Graphics Pipeline (VS/PS)
    participant GPU as GPU

    App->>State: Backup(context)
    State->>Context: Get* state & capture pointers
    Context-->>State: Return captured state

    App->>Context: CopySubresourceRegion(kMAIN → kMAIN_COPY)
    Context->>GPU: Perform texture copy

    App->>Context: Bind compositeBlend/Depth/RS states
    App->>Context: Set VS (DeferredCompositeVS) and PS (DeferredCompositePS)
    App->>Context: Bind SRVs (mainCopy.SRV, GBuffers, SSGI...)
    App->>Context: Set RTV(s) and optional stencil DS state

    App->>GPU: Draw(3) fullscreen triangle
    GPU->>GPU: VS -> PS -> Output Merge (SV_Target0, SV_Target1)

    App->>State: Restore(context)
    State->>Context: Reapply saved bindings & Release()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • alandtse
  • davo0411
  • jiayev

Poem

🐇 I hopped from compute into pixel light,

Fullscreen triangles now take flight.
Normals solved with algebra bright,
States held safe before the rite.
A rabbit cheers — the frame sits tight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'perf: optimise DeferredComposite' accurately reflects the main objective of converting the deferred composite pass from a compute shader to a pixel shader, which is a performance optimization mentioned in the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ps-opt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Automated formatting by clang-format, prettier, and other hooks.
See https://pre-commit.ci for details.
@github-actions
Copy link
Copy Markdown

No actionable suggestions for changed features.

Comment thread package/Shaders/DeferredCompositePS.hlsl
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@package/Shaders/Common/GBuffer.hlsli`:
- Around line 14-21: The DecodeNormal function's pole/sign convention is
inverted: for enc=(0.5,0.5) (EncodeNormal(0,0,1)) the current n.z = f/2 - 1
yields -1; change the z reconstruction in DecodeNormal to the matching sign
convention (set n.z = 1.0 - f/2.0) and return a normalized normal (normalize(n))
to ensure unit-length output; update the DecodeNormal(half2 enc) implementation
accordingly so it matches EncodeNormal and the up-normal tests.

In `@package/Shaders/Tests/TestGBuffer.hlsl`:
- Around line 8-23: The test currently assumes both exact pole normals roundtrip
through GBuffer::EncodeNormal/GBuffer::DecodeNormal, but the chosen encoding
only preserves one pole; update the TestGBuffer.hlsl tests to stop expecting
both exact poles to decode to themselves: adjust the testNormals or the ASSERT
checks so the pole that the implementation supports (the pole that DecodeNormal
returns for xy==0) is asserted exact (or within existing 0.05h tolerance) and
replace the opposite pole with a near-pole sample (e.g., tiny nonzero x or y)
and assert it decodes within tolerance; apply the same change for the similar
cases around lines 48-56 so tests no longer require symmetric pole roundtrip but
still validate near-pole behavior for both poles.

In `@src/Deferred.cpp`:
- Around line 415-423: The composite fullscreen draw leaves HS/DS (hull/domain
shaders) possibly bound which can leave the device in a tessellation state
incompatible with Draw(3, 0); extend the D3DStateBackup class to capture and
restore hull and domain shaders (HS/DS) alongside VS/PS/GS, then before the
composite draw (where GetCompositeVS/GetCompositePS(interior) are set and GS is
explicitly unbound) explicitly unbind HS and DS (set them to nullptr) so the IA
primitive topology D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST is valid; ensure the
backup/restore paths handle nulls gracefully and do not leak references.

In `@src/Utils/D3DStateBackup.h`:
- Around line 7-12: D3DStateBackup currently stores AddRef'd COM pointers via
Backup() but is trivially copyable and lacks a destructor, leading to double
Release or leaks; make D3DStateBackup non-copyable (delete copy constructor and
copy assignment), implement a move constructor and move assignment to transfer
ownership, add a destructor that releases any held COM pointers, and ensure
Restore() and Backup() manage reference counts consistently (e.g., call AddRef
when capturing and Release in destructor/Restore), updating any helper members
(kNumSRVSlots, kNumSamplerSlots, kNumCBSlots) usage accordingly so captured
arrays are safely cleared on move and release on destruction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4972dc2f-2b07-4172-a36d-2755a0b3b393

📥 Commits

Reviewing files that changed from the base of the PR and between 31a7b55 and 6fa9405.

📒 Files selected for processing (8)
  • package/Shaders/Common/GBuffer.hlsli
  • package/Shaders/DeferredCompositePS.hlsl
  • package/Shaders/DeferredCompositeVS.hlsl
  • package/Shaders/Tests/TestGBuffer.hlsl
  • src/Deferred.cpp
  • src/Deferred.h
  • src/Features/VRStereoOptimizations.h
  • src/Utils/D3DStateBackup.h

Comment thread package/Shaders/Common/GBuffer.hlsli Outdated
Comment thread package/Shaders/Tests/TestGBuffer.hlsl Outdated
Comment thread src/Deferred.cpp
Comment thread src/Utils/D3DStateBackup.h
doodlum and others added 3 commits April 19, 2026 01:34
Automated formatting by clang-format, prettier, and other hooks.
See https://pre-commit.ci for details.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
package/Shaders/Tests/TestGBuffer.hlsl (1)

16-23: Keep the tests proving encoded range and decoded direction.

The angled-normal test now only checks unit length, so a decoder returning the wrong unit vector would pass. Also keep [0,1] assertions for EncodeNormal, since these values are stored through unorm GBuffer channels.

🧪 Proposed test strengthening
 	for (int i = 0; i < 6; i++) {
 		half3 original = normalize(testNormals[i]);
 		half2 encoded = GBuffer::EncodeNormal(original);
 		half3 decoded = GBuffer::DecodeNormal(encoded);
 
+		ASSERT(IsTrue, encoded.x >= 0.0h && encoded.x <= 1.0h);
+		ASSERT(IsTrue, encoded.y >= 0.0h && encoded.y <= 1.0h);
 		ASSERT(IsTrue, abs(decoded.x - original.x) < 0.05h);
 		ASSERT(IsTrue, abs(decoded.y - original.y) < 0.05h);
 		ASSERT(IsTrue, abs(decoded.z - original.z) < 0.05h);
 	}
@@
 	for (int i = 0; i < 4; i++) {
 		half3 original = testNormals[i];
 		half2 encoded = GBuffer::EncodeNormal(original);
 		half3 decoded = GBuffer::DecodeNormal(encoded);
 
+		ASSERT(IsTrue, encoded.x >= 0.0h && encoded.x <= 1.0h);
+		ASSERT(IsTrue, encoded.y >= 0.0h && encoded.y <= 1.0h);
+		ASSERT(IsTrue, dot(decoded, original) > 0.95h);
 		half length = sqrt(decoded.x * decoded.x + decoded.y * decoded.y + decoded.z * decoded.z);
 		ASSERT(IsTrue, abs(length - 1.0h) < 0.05h);
 	}
@@
 	half2 encoded = GBuffer::EncodeNormal(equatorNormal);
 	half3 decoded = GBuffer::DecodeNormal(encoded);
 
+	ASSERT(IsTrue, encoded.x >= 0.0h && encoded.x <= 1.0h);
+	ASSERT(IsTrue, encoded.y >= 0.0h && encoded.y <= 1.0h);
 	ASSERT(IsTrue, abs(decoded.x - 1.0h) < 0.01h);

Also applies to: 36-42, 47-54

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@package/Shaders/Tests/TestGBuffer.hlsl` around lines 16 - 23, The tests in
TestGBuffer.hlsl for the angled-normal cases currently only verify unit length
and therefore would accept wrong directions; update the test loops (the blocks
around the GBuffer::EncodeNormal/GBuffer::DecodeNormal usages — repeated at the
other sites mentioned) to also assert that the encoded half2 is within [0,1]
(since it will be stored in unorm channels) and that the decoded half3 matches
the original direction component-wise (e.g., abs(decoded.x - original.x) <
0.05h, same for y and z) in addition to the unit-length check; apply these extra
assertions to the other test blocks (the ranges referenced in the comment) so
both encoded-range and decoded-direction are validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@package/Shaders/Tests/TestGBuffer.hlsl`:
- Around line 16-23: The tests in TestGBuffer.hlsl for the angled-normal cases
currently only verify unit length and therefore would accept wrong directions;
update the test loops (the blocks around the
GBuffer::EncodeNormal/GBuffer::DecodeNormal usages — repeated at the other sites
mentioned) to also assert that the encoded half2 is within [0,1] (since it will
be stored in unorm channels) and that the decoded half3 matches the original
direction component-wise (e.g., abs(decoded.x - original.x) < 0.05h, same for y
and z) in addition to the unit-length check; apply these extra assertions to the
other test blocks (the ranges referenced in the comment) so both encoded-range
and decoded-direction are validated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 043980eb-29b9-4999-ae71-a4b90df99263

📥 Commits

Reviewing files that changed from the base of the PR and between 6fa9405 and ec4ffb5.

📒 Files selected for processing (4)
  • features/IBL/Shaders/IBL/IBL.hlsli
  • package/Shaders/DeferredCompositePS.hlsl
  • package/Shaders/Tests/TestGBuffer.hlsl
  • src/Deferred.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Deferred.cpp

@alandtse
Copy link
Copy Markdown
Collaborator

@coderabbitai please confirm this does not break any existing functionality like stereo reprojection.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
package/Shaders/Common/GBuffer.hlsli (1)

7-21: ⚠️ Potential issue | 🟠 Major

Handle the exact +Z pole explicitly.

EncodeNormal(float3(0, 0, 1)) still returns float2(0.5, 0.5), and DecodeNormal(float2(0.5, 0.5)) reconstructs float3(0, 0, -1). Near-pole tests avoid this, but an exact +Z normal would still flip.

🐛 One possible fix while keeping the current center-as-`-Z` convention
 float2 EncodeNormal(float3 n)
 {
+	if (n.z > 0.999999 && dot(n.xy, n.xy) < 1e-12) {
+		return float2(1.0, 0.5);
+	}
 	n.z = max(0.001, sqrt(8.0 - 8.0 * n.z));
 	n.xy /= n.z;
 	return n.xy + 0.5;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@package/Shaders/Common/GBuffer.hlsli` around lines 7 - 21, The
encoding/decoding pair flips the exact +Z pole; update EncodeNormal and
DecodeNormal to special-case the +Z pole: in EncodeNormal(float3 n) detect if
n.z is >= 1.0 - EPS (or very close to 1.0) and return float2(0.5,0.5) directly,
and in DecodeNormal(float2 enc) detect when enc is within EPS of float2(0.5,0.5)
and return float3(0,0,1) (otherwise keep the existing math); use a small epsilon
(e.g. 1e-6) for the comparisons to avoid floating-point equality issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@package/Shaders/Tests/TestGBuffer.hlsl`:
- Around line 30-43: The test currently only verifies decoded normal magnitude;
update the test loop that iterates testNormals and calls GBuffer::EncodeNormal /
GBuffer::DecodeNormal to also assert direction correctness by comparing the
decoded normal to the original (e.g., require dot(decoded, original) near 1.0 or
check per-component closeness) in addition to the existing length check; use the
same ASSERT/IsTrue mechanism to fail when abs(dot(decoded, original) - 1.0)
exceeds a small threshold (or when component differences exceed a small
epsilon).

---

Duplicate comments:
In `@package/Shaders/Common/GBuffer.hlsli`:
- Around line 7-21: The encoding/decoding pair flips the exact +Z pole; update
EncodeNormal and DecodeNormal to special-case the +Z pole: in
EncodeNormal(float3 n) detect if n.z is >= 1.0 - EPS (or very close to 1.0) and
return float2(0.5,0.5) directly, and in DecodeNormal(float2 enc) detect when enc
is within EPS of float2(0.5,0.5) and return float3(0,0,1) (otherwise keep the
existing math); use a small epsilon (e.g. 1e-6) for the comparisons to avoid
floating-point equality issues.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1cc64991-851c-47ff-bfa8-8ae2a6100834

📥 Commits

Reviewing files that changed from the base of the PR and between ec4ffb5 and e978647.

📒 Files selected for processing (3)
  • package/Shaders/Common/GBuffer.hlsli
  • package/Shaders/Tests/TestGBuffer.hlsl
  • src/Deferred.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Deferred.cpp

Comment thread package/Shaders/Tests/TestGBuffer.hlsl
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the new DeferredCompositePS.hlsl for VR/stereo handling
echo "=== DeferredCompositePS.hlsl ===" 
cat package/Shaders/DeferredCompositePS.hlsl

echo ""
echo "=== DeferredCompositeVS.hlsl ===" 
cat package/Shaders/DeferredCompositeVS.hlsl

echo ""
echo "=== VRStereoOptimizations.h ===" 
cat src/Features/VRStereoOptimizations.h

echo ""
echo "=== Deferred.cpp VR stencil and DeferredPasses ===" 
cat src/Deferred.cpp

echo ""
echo "=== D3DStateBackup.h ===" 
cat src/Utils/D3DStateBackup.h

Length of output: 13636


🏁 Script executed:

#!/bin/bash
# Find the correct file paths
fd -t f "Deferred.cpp" 
fd -t f "VRStereoOptimizations" 
fd -t f "D3DStateBackup"

echo ""
echo "=== Searching for motion vector writes in the codebase ==="
rg -n "MotionVector" --type cpp -C2
rg -n "MotionVectors" --type hlsl -C2

Length of output: 237


🏁 Script executed:

#!/bin/bash
# Find files with find instead
find . -name "Deferred.cpp" 2>/dev/null
find . -name "VRStereoOptimizations*" 2>/dev/null
find . -name "D3DStateBackup*" 2>/dev/null

echo ""
echo "=== Directory structure ==="
ls -la

echo ""
echo "=== Top level src structure ==="
find . -maxdepth 3 -type d | head -60

Length of output: 1653


🏁 Script executed:

#!/bin/bash
echo "=== VRStereoOptimizations shaders ==="
find ./package/Shaders/VRStereoOptimizations -type f | sort
echo ""
find ./package/Shaders/VRStereoOptimizations -type f | sort | while read f; do
  echo "--- $f ---"
  cat "$f"
  echo ""
done

echo "=== Search for MotionVector in all shaders ==="
grep -rn "MotionVector" ./package/ ./features/ 2>/dev/null

echo "=== Search for NormalTAAMask in all shaders ==="
grep -rn "NormalTAAMask" ./package/ ./features/ 2>/dev/null

echo "=== Search for stereo reprojection references ==="
grep -rn -i "reprojection\|stereoopt\|StereoOpt\|stereo_opt" ./package/ ./features/ 2>/dev/null | head -60

echo "=== GBuffer.hlsli ==="
cat ./package/Shaders/Common/GBuffer.hlsli

Length of output: 18201


🏁 Script executed:

#!/bin/bash
echo "=== StereoBlendCS.hlsl ==="
cat ./package/Shaders/VR/StereoBlendCS.hlsl

echo ""
echo "=== VRPostProcessCS.hlsl ==="
cat ./package/Shaders/VR/VRPostProcessCS.hlsl

echo ""
echo "=== Search for GBuffer/DecodeNormal in VR shaders ==="
grep -rn "GBuffer\|DecodeNormal\|EncodeNormal" ./package/Shaders/VR/ 2>/dev/null
grep -rn "GBuffer\|DecodeNormal\|EncodeNormal" ./package/Shaders/VRStereoOptimizations/ 2>/dev/null

echo ""
echo "=== Search for NormalTAAMask anywhere ==="
grep -rn "NormalTAAMask\|TAAMask" ./package/ ./features/ 2>/dev/null

echo ""
echo "=== Check what Lighting.hlsl VR_STEREO_OPT section does ==="
grep -n "VR_STEREO_OPT\|StereoOpt\|NormalTAA" ./package/Shaders/Lighting.hlsl | head -30

Length of output: 19867


@alandtse: After a thorough analysis of the shaders, here is my assessment of stereo reprojection and related functionality:


✅ Stereo Reprojection — Core Pipeline Intact

The heart of the stereo optimization pipeline — StencilCS.hlsl — is completely unchanged. It still performs:

  • Per-pixel disocclusion detection via Stereo::ReprojectToOtherEye
  • Directional occlusion detection (Eye 1 silhouette edge coverage)
  • Mode classification (MODE_MAIN, MODE_DISOCCLUDED, MODE_EDGE, MODE_FULL_BLEND)

StereoBlendCS.hlsl (the STEREO_OVERWRITE path) is also intact: it reads from ModeTexture, reprojects Eye 1 pixels from Eye 0, and propagates motion vectors (MotionRW[dtid] = MotionRW[r.otherPx]). VRPostProcessCS.hlsl is likewise unmodified.


✅ Eye-1 Skip — Replaced by Stencil (Equivalent Behavior)

The old compute path checked StereoOptModeTexture and returned early for Eye-1 MODE_MAIN pixels. The new pixel shader approach replaces this with hardware stencil culling:

  • StencilWritePS.hlsl writes stencil ref=1 for MODE_MAIN pixels
  • compositeStencilDSState (NOT_EQUAL, ref=1) causes the GPU to skip DeferredCompositePS on those pixels

This is functionally equivalent and is actually a cleaner mechanism — hardware stencil culling avoids divergent branching in the shader.


✅ Motion Vectors — Not Broken

Motion vectors are written by the geometry shaders themselves (Lighting.hlsl, RunGrass.hlsl, Sky.hlsl, DistantTree.hlsl, Water.hlsl) as SV_Target1. The removal from DeferredCompositeCS does not affect them. StereoBlendCS continues to propagate motion vectors from Eye 0 to reprojected Eye 1 pixels.

EncodeTexturesCS.hlsl reads TAAMask (from geometry passes) — also unaffected.


✅ VR-Specific Paths in DeferredCompositePS — Preserved

The new pixel shader retains all VR-specific logic:

  • Stereo::GetEyeIndexFromTexCoord(uv) + Stereo::ConvertFromStereoUV(uv, eyeIndex)
  • Per-eye matrix lookups (FrameBuffer::CameraViewProjInverse[eyeIndex])
  • VR skylighting positionMS adjustment (#if defined(VR) blocks)

✅ GBuffer Normal Encoding Change — Self-Consistent

Neither StereoBlendCS nor VRPostProcessCS use GBuffer::DecodeNormal — confirmed by grep. The encoding algorithm change in GBuffer.hlsli is self-consistent (encode and decode both updated in the same header, with updated unit tests in TestGBuffer.hlsl).


⚠️ One Unverifiable Item

The C++ source (src/Deferred.cpp) is not available in the sandbox. The critical ordering requirement is that StencilWritePS must execute before DeferredCompositePS for stencil culling to work correctly. The AI summary of Deferred.cpp confirms the stencil state is set up and the stencil culling DSS is applied when VR is present, but I cannot verify the exact draw-call ordering. This would be worth a manual review of the C++ pass sequencing in DeferredPasses().


Summary: The stereo reprojection pipeline is intact. The only structural change is the Eye-1 skip mechanism, which moves from a shader texture-check to hardware stencil culling — a sound optimization. No functionality appears broken from the shader side.


🧠 Learnings used
Learnt from: Dlizzio
Repo: doodlum/skyrim-community-shaders PR: 2000
File: src/WeatherEditor/EditorWindow.cpp:1271-1274
Timestamp: 2026-03-22T18:40:49.633Z
Learning: Repo: doodlum/skyrim-community-shaders — Maintainer preference (PR `#2000` on 2026-03-22): In src/WeatherEditor/EditorWindow.cpp within EditorWindow::RenderUI, keep `resetLayout = false;` at the end of the frame (after PaletteWindow::Draw). The reset is intentionally one-shot for windows rendered that frame; closed windows are allowed to reopen with their previous geometry. Do not move the clear later or broadcast resets to closed windows.

Learnt from: alandtse
Repo: doodlum/skyrim-community-shaders PR: 2084
File: src/ShaderCache.cpp:1646-1649
Timestamp: 2026-04-10T07:34:27.849Z
Learning: In `src/ShaderCache.cpp` (doodlum/skyrim-community-shaders PR `#2084`), `GetImagespaceShaderDescriptor` intentionally returns `false` for VR when a shader is not reverse-engineered. Returning `false` causes `ResolveImageSpaceDescriptor` to return `false`, which makes all callers (GetVertexShader, GetPixelShader, GetComputeShader, CompileShader) skip custom shader handling and defer to the vanilla shader. This is the correct pattern for un-RE'd imagespace shaders on VR — do not flag it as "skipping all custom VR handling" or suggest gating on HDR feature state.

Learnt from: CR
Repo: doodlum/skyrim-community-shaders PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-25T04:43:36.075Z
Learning: Applies to src/Features/**/*.{h,hpp,cpp,hlsl} : Consider GPU workload and performance impact when implementing graphics features, with special attention to shader compilation and runtime performance

Learnt from: vrnord
Repo: doodlum/skyrim-community-shaders PR: 1982
File: src/Features/VR/StereoBlend.cpp:98-109
Timestamp: 2026-03-16T13:13:31.383Z
Learning: In `src/Features/VR/StereoBlend.cpp` (doodlum/skyrim-community-shaders), debug modes 4/5 ("Overwrite", "Overwrite Eye1") in the `StereoBlendDebugMode` selector are NOT separate shader variants. They use the same `stereoBlendOverwriteCS` shader with different cbuffer parameters. When `vrStereoOptActive` is true, `stereoBlendOverwriteCS` is always selected unconditionally and `StereoBlendDebugMode` values 4/5 do not map to any distinct code path.

Learnt from: jiayev
Repo: doodlum/skyrim-community-shaders PR: 1947
File: package/Shaders/DeferredCompositeCS.hlsl:234-243
Timestamp: 2026-03-14T08:35:42.651Z
Learning: In `package/Shaders/DeferredCompositeCS.hlsl` (doodlum/skyrim-community-shaders), the fallback non-IBL SKYLIGHTING specular path intentionally double-weights with `skylightingSpecular`: `dalcScaled` is computed as `IrradianceToGamma(IrradianceToLinear(directionalAmbientColorSpecular) * skylightingSpecular)`, and the final `finalIrradiance` is then `lerp(specularIrradiance, specularIrradianceReflections, skylightingSpecular)`. This is original engine logic preserved from before PR `#1947` and should not be flagged as a double-attenuation bug.

Learnt from: Dlizzio
Repo: doodlum/skyrim-community-shaders PR: 1950
File: package/Shaders/Lighting.hlsl:2008-2012
Timestamp: 2026-03-07T00:53:21.267Z
Learning: In doodlum/skyrim-community-shaders PR `#1950`, the maintainer (Dlizzio) confirmed that per-axis normal reorientation for triplanar-projected normals/detail in package/Shaders/Lighting.hlsl is not required for their use case; the current approach intentionally mixes samples without axis-specific swizzle/sign correction.

Learnt from: vrnord
Repo: doodlum/skyrim-community-shaders PR: 2003
File: features/Upscaling/Shaders/Upscaling/DepthUpscalePS.hlsl:55-61
Timestamp: 2026-03-24T04:30:32.809Z
Learning: In `features/Upscaling/Shaders/Upscaling/DepthUpscalePS.hlsl` (doodlum/skyrim-community-shaders PR `#2003`), the shader is dispatched per-eye using per-eye dimensions (DstWidth/DstHeight). The UV (`input.TexCoord`) is derived from `dispatchID / DstDim`, which is already eye-local — not full SBS/packed stereo coordinates. The per-eye UV clamping logic (isRight, halfScale) operates entirely in eye-local space, and there is no stereo seam issue. Do not flag SBS seam concerns for this shader.

Learnt from: alandtse
Repo: doodlum/skyrim-community-shaders PR: 1880
File: src/Features/VR/InSceneOverlay.cpp:603-604
Timestamp: 2026-02-15T08:13:49.292Z
Learning: The skyrim-community-shaders repository uses OpenVR SDK version 1.0.10, where IVRCompositor::Submit is at vtable index 5 (0-based). Do not suggest changing this to index 6, as that is only correct for newer OpenVR versions (2.x).

Learnt from: vrnord
Repo: doodlum/skyrim-community-shaders PR: 2002
File: package/Shaders/VRStereoOptimizations/StencilCS.hlsl:97-143
Timestamp: 2026-03-24T05:34:21.730Z
Learning: In `package/Shaders/VRStereoOptimizations/StencilCS.hlsl` (doodlum/skyrim-community-shaders PR `#2002`), `EdgeWidth` has been fully removed as a setting. The edge detection search width is hardcoded to `kInnerWidth = 2` with `maxWidth = kInnerWidth`. Do not flag EdgeWidth as a dead/unused parameter — it no longer exists in `VRStereoOptParams` or `cbuffers.hlsli` in this PR.

Learnt from: alandtse
Repo: doodlum/skyrim-community-shaders PR: 0
File: :0-0
Timestamp: 2025-07-05T05:20:45.823Z
Learning: In the skyrim-community-shaders repository, file deletion error handling improvements that replace existence checks and try-catch blocks with std::filesystem::remove error-code-based approaches are considered bug fixes rather than refactoring, as they address inadequate error handling and misleading log messages.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/Features/ScreenSpaceGI.cpp (1)

776-776: LGTM — the three SRV bindings (radiance disocc, GI, blur) are migrated consistently to globals::deferred->normalRoughnessRT. Note deferred is already aliased locally at line 726; consider using rts[deferred->normalRoughnessRT].SRV for consistency with the adjacent rts[deferred->forwardRenderTargets[0]].SRV usage, but this is purely stylistic.

Also applies to: 829-829, 861-861

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Features/ScreenSpaceGI.cpp` at line 776, Replace usages of
globals::deferred in SRV lookups with the local alias deferred for consistency:
change expressions like rts[globals::deferred->normalRoughnessRT].SRV to
rts[deferred->normalRoughnessRT].SRV (and likewise for the other two occurrences
where radiance disocc, GI, blur SRVs were migrated) so they match the adjacent
rts[deferred->forwardRenderTargets[0]].SRV style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/Features/ScreenSpaceGI.cpp`:
- Line 776: Replace usages of globals::deferred in SRV lookups with the local
alias deferred for consistency: change expressions like
rts[globals::deferred->normalRoughnessRT].SRV to
rts[deferred->normalRoughnessRT].SRV (and likewise for the other two occurrences
where radiance disocc, GI, blur SRVs were migrated) so they match the adjacent
rts[deferred->forwardRenderTargets[0]].SRV style.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 21ff6cd3-d93d-4e28-b9f3-8efd2b6da741

📥 Commits

Reviewing files that changed from the base of the PR and between e978647 and 751fcbd.

📒 Files selected for processing (4)
  • src/Deferred.cpp
  • src/Deferred.h
  • src/Features/ScreenSpaceGI.cpp
  • src/Features/SubsurfaceScattering.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/Deferred.h
  • src/Deferred.cpp

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/Deferred.cpp (1)

418-426: ⚠️ Potential issue | 🟠 Major

Clear tessellation stages before the fullscreen draw.

GS is cleared, but stale HS/DS bindings can still leak into this triangle-list fullscreen draw. Extend the state backup to capture/restore HS/DS and explicitly unbind them before Draw(3, 0). As per coding guidelines, “Include proper resource management and graceful degradation for DirectX 11 resources and user input validation to prevent crashes from malformed configurations.”

#!/bin/bash
# Verify whether Deferred.cpp clears only GS and whether D3DStateBackup currently captures HS/DS.
# Expected fixed result: Deferred draw clears HSSetShader/DSSetShader, and backup/restore handles HS/DS.
rg -n -C3 'GSSetShader|HSSetShader|DSSetShader|class D3DStateBackup|struct D3DStateBackup|Backup\(.*context|Restore\(.*context' src/Deferred.cpp src/Utils/D3DStateBackup.h
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Deferred.cpp` around lines 418 - 426, The fullscreen composite draw
currently clears GS but may leak Hull/Domain shaders; update D3DStateBackup to
capture and restore HS/DS (add storage of the hull and domain shader interfaces
and include them in Backup/Restore) and, in the draw path before
context->Draw(3, 0), explicitly unbind hull/domain shaders via
context->HSSetShader(nullptr, nullptr, 0) and context->DSSetShader(nullptr,
nullptr, 0); ensure the changes reference D3DStateBackup, HSSetShader,
DSSetShader, GetCompositeVS/GetCompositePS and maintain existing GS clearing
logic so HS/DS are both saved/restored and explicitly cleared for the
triangle-list fullscreen pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Deferred.cpp`:
- Around line 375-405: The normalRoughness texture is currently bound as both an
SRV (t2) and an RTV (normalRoughness.RTV) which is illegal in D3D11; create a
copy like mainCopy (e.g., normalRoughnessCopy or normalRoughnessCopy.SRV) before
the draw and sample from that SRV in the SRV array passed to
PSSetShaderResources (replace normalRoughness.SRV with the copy's SRV), while
leaving normalRoughness.RTV bound as the render target; mirror the existing
mainCopy copy-before-sample logic used elsewhere (see mainCopy,
PSSetShaderResources and OMSetRenderTargets) so DeferredCompositePS.hlsl reads
from the copied SRV not the active RTV.

---

Duplicate comments:
In `@src/Deferred.cpp`:
- Around line 418-426: The fullscreen composite draw currently clears GS but may
leak Hull/Domain shaders; update D3DStateBackup to capture and restore HS/DS
(add storage of the hull and domain shader interfaces and include them in
Backup/Restore) and, in the draw path before context->Draw(3, 0), explicitly
unbind hull/domain shaders via context->HSSetShader(nullptr, nullptr, 0) and
context->DSSetShader(nullptr, nullptr, 0); ensure the changes reference
D3DStateBackup, HSSetShader, DSSetShader, GetCompositeVS/GetCompositePS and
maintain existing GS clearing logic so HS/DS are both saved/restored and
explicitly cleared for the triangle-list fullscreen pass.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5b68cd7a-149a-45f5-a7b3-0d7b10b51698

📥 Commits

Reviewing files that changed from the base of the PR and between 751fcbd and be1f481.

📒 Files selected for processing (2)
  • package/Shaders/DeferredCompositePS.hlsl
  • src/Deferred.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • package/Shaders/DeferredCompositePS.hlsl

Comment thread src/Deferred.cpp Outdated
doodlum and others added 5 commits April 19, 2026 02:13
Automated formatting by clang-format, prettier, and other hooks.
See https://pre-commit.ci for details.
Automated formatting by clang-format, prettier, and other hooks.
See https://pre-commit.ci for details.
@doodlum doodlum changed the title perf: optimise DeferredComposite for VR perf: optimise DeferredComposite Apr 19, 2026
@github-actions
Copy link
Copy Markdown

✅ A pre-release build is available for this PR:
Download

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants