perf: optimize utility and diffuseiblcs#1346
Conversation
WalkthroughThe changes refactor a diffuse image-based lighting compute shader to use parallel processing with a 16x16 thread group and shared memory reduction, replacing a serial nested loop. Separately, the Poisson disk shadow filtering function is optimized by reducing samples, precomputing values, unrolling loops, and adding early termination for efficiency. Changes
Sequence Diagram(s)sequenceDiagram
participant CPU
participant GPU
participant ThreadGroup
participant SharedMemory
CPU->>GPU: Dispatch DiffuseIBLCS compute shader
GPU->>ThreadGroup: Launch 16x16 threads
loop For each thread
ThreadGroup->>ThreadGroup: Compute sample direction & SH value
ThreadGroup->>SharedMemory: Store SH contribution
end
ThreadGroup->>SharedMemory: Parallel reduction (barrier sync)
SharedMemory->>ThreadGroup: Accumulated SH coefficients
ThreadGroup->>GPU: First thread writes output
sequenceDiagram
participant CPU
participant GPU
participant ShadowFunc
CPU->>GPU: Call GetPoissonDiskFilteredShadowVisibility
GPU->>ShadowFunc: Precompute parameters
loop For sample 0 to 3 (unrolled)
ShadowFunc->>ShadowFunc: Compute random offset, sample shadow map
end
loop For sample 4 to 11
ShadowFunc->>ShadowFunc: Compute random offset, sample shadow map
alt Early termination condition met
ShadowFunc->>ShadowFunc: Extrapolate and break
end
end
ShadowFunc->>GPU: Return averaged visibility
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Poem
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
package/Shaders/Utility.hlsl (1)
370-370: Document the quality vs performance trade-offThe reduction from 16 to 12 samples provides significant performance gains. Consider adding a comment explaining the quality impact assessment that led to choosing 12 samples.
- const int sampleCount = 12; // reduced from 16 + const int sampleCount = 12; // Reduced from 16 - provides 25% performance gain with minimal quality impact
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl(1 hunks)package/Shaders/Utility.hlsl(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-05T05:20:45.823Z
Learning: In the skyrim-community-shaders repository, file deletion error handling improvements that replace existence checks and try-catch blocks with std::filesystem::remove error-code-based approaches are considered bug fixes rather than refactoring, as they address inadequate error handling and misleading log messages.
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-06-24T07:17:36.604Z
Learning: When reviewing PRs, always clarify the scope if there are multiple related features or dependencies. WeatherPicker was a separate PR that was already merged, while this PR focuses specifically on WetnessEffects climate preset system enhancements.
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: features/Wetness Effects/Shaders/WetnessEffects/WetnessEffects.hlsli:57-61
Timestamp: 2025-06-17T05:40:22.785Z
Learning: Default parameter values are supported in the HLSL compiler used by the skyrim-community-shaders project, contrary to standard HLSL (FXC/DXC) limitations.
Learnt from: jiayev
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-18T15:21:03.641Z
Learning: In the skyrim-community-shaders rendering pipeline, materials with alpha < 1 or alpha blending enabled are rendered in non-deferred mode rather than deferred mode. This means issues with dynamic cubemaps on transparent materials are actually non-deferred rendering issues.
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: src/Features/WetnessEffects.h:36-36
Timestamp: 2025-06-08T11:25:14.536Z
Learning: In the skyrim-community-shaders project, boolean flags in C++ structs that interface with HLSL shaders use `uint` type instead of `bool` for compatibility reasons. This ensures consistent 4-byte size, proper 16-byte alignment in constant buffers, and cross-platform compatibility when passing data between C++ and HLSL shaders.
Learnt from: davo0411
PR: doodlum/skyrim-community-shaders#1070
File: src/State.cpp:79-83
Timestamp: 2025-05-30T11:44:15.542Z
Learning: In the Skyrim Community Shaders project, the `smoothDrawCalls` array in the `State` class is declared as type `double` in `src/State.h`, which is the correct floating-point type for performing exponential moving average smoothing calculations.
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-01T18:01:07.079Z
Learning: In the skyrim-community-shaders project, simple scalar constants in HLSL shaders use #define (e.g., #define NTHREADS 128), while more complex constants use static const within namespaces (e.g., Math namespace in Math.hlsli). For epsilon standardization, #define is the appropriate choice since epsilon values are simple scalar constants.
📚 Learning: in the skyrim-community-shaders project, boolean flags in c++ structs that interface with hlsl shade...
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: src/Features/WetnessEffects.h:36-36
Timestamp: 2025-06-08T11:25:14.536Z
Learning: In the skyrim-community-shaders project, boolean flags in C++ structs that interface with HLSL shaders use `uint` type instead of `bool` for compatibility reasons. This ensures consistent 4-byte size, proper 16-byte alignment in constant buffers, and cross-platform compatibility when passing data between C++ and HLSL shaders.
Applied to files:
package/Shaders/Utility.hlslfeatures/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
📚 Learning: in the skyrim community shaders project, the `smoothdrawcalls` array in the `state` class is declare...
Learnt from: davo0411
PR: doodlum/skyrim-community-shaders#1070
File: src/State.cpp:79-83
Timestamp: 2025-05-30T11:44:15.542Z
Learning: In the Skyrim Community Shaders project, the `smoothDrawCalls` array in the `State` class is declared as type `double` in `src/State.h`, which is the correct floating-point type for performing exponential moving average smoothing calculations.
Applied to files:
package/Shaders/Utility.hlsl
📚 Learning: default parameter values are supported in the hlsl compiler used by the skyrim-community-shaders pro...
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: features/Wetness Effects/Shaders/WetnessEffects/WetnessEffects.hlsli:57-61
Timestamp: 2025-06-17T05:40:22.785Z
Learning: Default parameter values are supported in the HLSL compiler used by the skyrim-community-shaders project, contrary to standard HLSL (FXC/DXC) limitations.
Applied to files:
features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
📚 Learning: in the skyrim-community-shaders project, simple scalar constants in hlsl shaders use #define (e.g., ...
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-01T18:01:07.079Z
Learning: In the skyrim-community-shaders project, simple scalar constants in HLSL shaders use #define (e.g., #define NTHREADS 128), while more complex constants use static const within namespaces (e.g., Math namespace in Math.hlsli). For epsilon standardization, #define is the appropriate choice since epsilon values are simple scalar constants.
Applied to files:
features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
📚 Learning: in the skyrim-community-shaders project, boolean flags in c++ structs that interface with hlsl shade...
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: src/Features/WetnessEffects.h:36-36
Timestamp: 2025-06-08T11:25:14.536Z
Learning: In the skyrim-community-shaders project, boolean flags in C++ structs that interface with HLSL shaders use `uint` type instead of `bool` for compatibility reasons. This ensures consistent size, alignment, and cross-platform compatibility when passing data to shader constant buffers.
Applied to files:
features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
📚 Learning: in the skyrim-community-shaders rendering pipeline, materials with alpha < 1 or alpha blending enabl...
Learnt from: jiayev
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-18T15:21:03.641Z
Learning: In the skyrim-community-shaders rendering pipeline, materials with alpha < 1 or alpha blending enabled are rendered in non-deferred mode rather than deferred mode. This means issues with dynamic cubemaps on transparent materials are actually non-deferred rendering issues.
Applied to files:
features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Validate shader compilation (VR, .github/configs/shader-validation-vr.yaml)
- GitHub Check: Build plugin and addons
- GitHub Check: Validate shader compilation (Flatrim, .github/configs/shader-validation.yaml)
🔇 Additional comments (5)
package/Shaders/Utility.hlsl (1)
368-463: Great GPU performance optimizations!The refactoring effectively reduces GPU timing costs through sample count reduction, precomputation of expensive operations, loop unrolling, and early termination. The use of
rsqrtfor fast normalization and simplified random generation are excellent choices.features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl (4)
28-70: Excellent parallelization of the spherical integral computationThe parallel implementation correctly preserves the mathematical operations while achieving ~4x performance improvement. The optimizations in the dynamic cubemap conditional check and pre-computed constants are well done.
73-89: Well-implemented parallel reduction algorithmThe tree-based reduction correctly handles the 256→1 reduction in logarithmic steps with proper synchronization. The [unroll] directive and single-thread output write are appropriate optimizations.
27-28: Verify compute shader dispatch call parameters match updated thread group sizeI couldn't locate any C++ invocations of the
DiffuseIBLCScompute shader via search, so please manually confirm that your dispatch call usesAXIS_SAMPLE_COUNTfor both X and Y thread-group dimensions. For example, ensure any call resembling:deviceContext->Dispatch( /* x = */ AXIS_SAMPLE_COUNT, /* y = */ AXIS_SAMPLE_COUNT, /* z = */ 1 );is in place. Common places to check include:
- Graphics or render-pass implementation files (e.g.,
GraphicsRenderer.cpp,DiffuseIBLPass.cpp)- Any module that sets up or dispatches compute shaders
16-24: Confirm shared memory footprint fits hardware limitsWe couldn’t locate the
sh2definition in the repo to calculate its size. Please verify that:
- Total shared memory = 3 channels × (AXIS_SAMPLE_COUNT² = 256) ×
sizeof(sh2)- This value stays within your GPU’s shared memory capacity (typically 32 KB or 48 KB)
If it exceeds the limit, consider reducing the sample count or restructuring the reduction.
• File: features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl (lines 22–24)
alandtse
left a comment
There was a problem hiding this comment.
Please make sure to address any new warnings. If you believe they need to be ignored or suppressed, please explain the reasoning. I assume you've tested for correctness.
Lastly, note the perf: commit message.
b0fd650 to
93c6acc
Compare
Automated formatting by clang-format, prettier, and other hooks. See https://pre-commit.ci for details.
|
✅ A pre-release build is available for this PR: |
|
Closing and reopening to try to trigger the ci. |
|
Need some testers. Once you confirm it's been tested, I'll merge. |
Frame GPU timing costs as result of this refactor:
Utility.hlsl (shadowmasks)
-20% cost
DiffuseIBLCS.hlsl
-96% cost
Summary by CodeRabbit
Performance Improvements
Other Changes