Skip to content

perf: optimize utility and diffuseiblcs#1346

Merged
Pentalimbed merged 5 commits into
community-shaders:devfrom
soda3000:dev-31-07-2025
Aug 10, 2025
Merged

perf: optimize utility and diffuseiblcs#1346
Pentalimbed merged 5 commits into
community-shaders:devfrom
soda3000:dev-31-07-2025

Conversation

@soda3000
Copy link
Copy Markdown
Contributor

@soda3000 soda3000 commented Jul 31, 2025

Frame GPU timing costs as result of this refactor:
Utility.hlsl (shadowmasks)
-20% cost
DiffuseIBLCS.hlsl
-96% cost

Summary by CodeRabbit

  • Performance Improvements

    • Enhanced diffuse image-based lighting with a parallelized compute shader, delivering up to 4x faster processing.
    • Optimized shadow visibility sampling by reducing sample count, adding precomputation, loop unrolling, and early exit conditions for improved efficiency.
  • Other Changes

    • Refined sampling strategies and memory usage for smoother rendering and lighting effects.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 31, 2025

Walkthrough

The changes refactor a diffuse image-based lighting compute shader to use parallel processing with a 16x16 thread group and shared memory reduction, replacing a serial nested loop. Separately, the Poisson disk shadow filtering function is optimized by reducing samples, precomputing values, unrolling loops, and adding early termination for efficiency.

Changes

Cohort / File(s) Change Summary
Diffuse IBL Compute Shader Refactor
features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
Refactored compute shader from single-threaded nested loop to 16x16 thread group parallelization. Each thread computes one spherical harmonic sample with center sampling, performs gamma correction, and writes scaled contributions to groupshared memory. A parallel tree-based reduction sums all contributions within the thread group. Sample count reduced from 1024 to 256. Output write occurs only from the first thread after reduction. Shader entry point signature updated to include thread group index.
Poisson Disk Shadow Filter Optimization
package/Shaders/Utility.hlsl
Optimized shadow visibility function by reducing sample count from 16 to 12, precomputing constants outside the sampling loop, unrolling the first 4 samples, and adding an early exit condition when visibility is near fully shadowed or lit. The sampling loop uses optimized math operations like reciprocal square root and simplified hashing. Function signature remains unchanged.

Sequence Diagram(s)

sequenceDiagram
    participant CPU
    participant GPU
    participant ThreadGroup
    participant SharedMemory

    CPU->>GPU: Dispatch DiffuseIBLCS compute shader
    GPU->>ThreadGroup: Launch 16x16 threads
    loop For each thread
        ThreadGroup->>ThreadGroup: Compute sample direction & SH value
        ThreadGroup->>SharedMemory: Store SH contribution
    end
    ThreadGroup->>SharedMemory: Parallel reduction (barrier sync)
    SharedMemory->>ThreadGroup: Accumulated SH coefficients
    ThreadGroup->>GPU: First thread writes output
Loading
sequenceDiagram
    participant CPU
    participant GPU
    participant ShadowFunc

    CPU->>GPU: Call GetPoissonDiskFilteredShadowVisibility
    GPU->>ShadowFunc: Precompute parameters
    loop For sample 0 to 3 (unrolled)
        ShadowFunc->>ShadowFunc: Compute random offset, sample shadow map
    end
    loop For sample 4 to 11
        ShadowFunc->>ShadowFunc: Compute random offset, sample shadow map
        alt Early termination condition met
            ShadowFunc->>ShadowFunc: Extrapolate and break
        end
    end
    ShadowFunc->>GPU: Return averaged visibility
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

  • build(hlsl): fix warnings #1370: Minor fix in the same DiffuseIBLCS compute shader related to explicit RGB extraction from sampled color, overlapping with color accumulation logic in this refactor.

Poem

In parallel threads we now compute,
Lighting samples, swift and astute.
Shadows filter with fewer tries,
Early exits, optimized skies.
With shared memory and clever math,
We bunnies hop a faster path!
🐇✨

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae40e2 and 3e4262c.

📒 Files selected for processing (1)
  • features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build plugin and addons
  • GitHub Check: Validate shader compilation (VR, .github/configs/shader-validation-vr.yaml)
  • GitHub Check: Validate shader compilation (Flatrim, .github/configs/shader-validation.yaml)
  • GitHub Check: Feature Version Audit (PR)
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 31, 2025

Using provided base ref: b2b736e
Using base ref: b2b736e
Base commit date: 2025-08-06T23:50:31-07:00 (Wednesday, August 06, 2025 11:50 PM)
No actionable suggestions for changed features.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
package/Shaders/Utility.hlsl (1)

370-370: Document the quality vs performance trade-off

The reduction from 16 to 12 samples provides significant performance gains. Consider adding a comment explaining the quality impact assessment that led to choosing 12 samples.

-	const int sampleCount = 12; // reduced from 16
+	const int sampleCount = 12; // Reduced from 16 - provides 25% performance gain with minimal quality impact
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c80909 and 7c69e9c.

📒 Files selected for processing (2)
  • features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl (1 hunks)
  • package/Shaders/Utility.hlsl (1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-05T05:20:45.823Z
Learning: In the skyrim-community-shaders repository, file deletion error handling improvements that replace existence checks and try-catch blocks with std::filesystem::remove error-code-based approaches are considered bug fixes rather than refactoring, as they address inadequate error handling and misleading log messages.
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-06-24T07:17:36.604Z
Learning: When reviewing PRs, always clarify the scope if there are multiple related features or dependencies. WeatherPicker was a separate PR that was already merged, while this PR focuses specifically on WetnessEffects climate preset system enhancements.
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: features/Wetness Effects/Shaders/WetnessEffects/WetnessEffects.hlsli:57-61
Timestamp: 2025-06-17T05:40:22.785Z
Learning: Default parameter values are supported in the HLSL compiler used by the skyrim-community-shaders project, contrary to standard HLSL (FXC/DXC) limitations.
Learnt from: jiayev
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-18T15:21:03.641Z
Learning: In the skyrim-community-shaders rendering pipeline, materials with alpha < 1 or alpha blending enabled are rendered in non-deferred mode rather than deferred mode. This means issues with dynamic cubemaps on transparent materials are actually non-deferred rendering issues.
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: src/Features/WetnessEffects.h:36-36
Timestamp: 2025-06-08T11:25:14.536Z
Learning: In the skyrim-community-shaders project, boolean flags in C++ structs that interface with HLSL shaders use `uint` type instead of `bool` for compatibility reasons. This ensures consistent 4-byte size, proper 16-byte alignment in constant buffers, and cross-platform compatibility when passing data between C++ and HLSL shaders.
Learnt from: davo0411
PR: doodlum/skyrim-community-shaders#1070
File: src/State.cpp:79-83
Timestamp: 2025-05-30T11:44:15.542Z
Learning: In the Skyrim Community Shaders project, the `smoothDrawCalls` array in the `State` class is declared as type `double` in `src/State.h`, which is the correct floating-point type for performing exponential moving average smoothing calculations.
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-01T18:01:07.079Z
Learning: In the skyrim-community-shaders project, simple scalar constants in HLSL shaders use #define (e.g., #define NTHREADS 128), while more complex constants use static const within namespaces (e.g., Math namespace in Math.hlsli). For epsilon standardization, #define is the appropriate choice since epsilon values are simple scalar constants.
📚 Learning: in the skyrim-community-shaders project, boolean flags in c++ structs that interface with hlsl shade...
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: src/Features/WetnessEffects.h:36-36
Timestamp: 2025-06-08T11:25:14.536Z
Learning: In the skyrim-community-shaders project, boolean flags in C++ structs that interface with HLSL shaders use `uint` type instead of `bool` for compatibility reasons. This ensures consistent 4-byte size, proper 16-byte alignment in constant buffers, and cross-platform compatibility when passing data between C++ and HLSL shaders.

Applied to files:

  • package/Shaders/Utility.hlsl
  • features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
📚 Learning: in the skyrim community shaders project, the `smoothdrawcalls` array in the `state` class is declare...
Learnt from: davo0411
PR: doodlum/skyrim-community-shaders#1070
File: src/State.cpp:79-83
Timestamp: 2025-05-30T11:44:15.542Z
Learning: In the Skyrim Community Shaders project, the `smoothDrawCalls` array in the `State` class is declared as type `double` in `src/State.h`, which is the correct floating-point type for performing exponential moving average smoothing calculations.

Applied to files:

  • package/Shaders/Utility.hlsl
📚 Learning: default parameter values are supported in the hlsl compiler used by the skyrim-community-shaders pro...
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: features/Wetness Effects/Shaders/WetnessEffects/WetnessEffects.hlsli:57-61
Timestamp: 2025-06-17T05:40:22.785Z
Learning: Default parameter values are supported in the HLSL compiler used by the skyrim-community-shaders project, contrary to standard HLSL (FXC/DXC) limitations.

Applied to files:

  • features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
📚 Learning: in the skyrim-community-shaders project, simple scalar constants in hlsl shaders use #define (e.g., ...
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-01T18:01:07.079Z
Learning: In the skyrim-community-shaders project, simple scalar constants in HLSL shaders use #define (e.g., #define NTHREADS 128), while more complex constants use static const within namespaces (e.g., Math namespace in Math.hlsli). For epsilon standardization, #define is the appropriate choice since epsilon values are simple scalar constants.

Applied to files:

  • features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
📚 Learning: in the skyrim-community-shaders project, boolean flags in c++ structs that interface with hlsl shade...
Learnt from: alandtse
PR: doodlum/skyrim-community-shaders#577
File: src/Features/WetnessEffects.h:36-36
Timestamp: 2025-06-08T11:25:14.536Z
Learning: In the skyrim-community-shaders project, boolean flags in C++ structs that interface with HLSL shaders use `uint` type instead of `bool` for compatibility reasons. This ensures consistent size, alignment, and cross-platform compatibility when passing data to shader constant buffers.

Applied to files:

  • features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
📚 Learning: in the skyrim-community-shaders rendering pipeline, materials with alpha < 1 or alpha blending enabl...
Learnt from: jiayev
PR: doodlum/skyrim-community-shaders#0
File: :0-0
Timestamp: 2025-07-18T15:21:03.641Z
Learning: In the skyrim-community-shaders rendering pipeline, materials with alpha < 1 or alpha blending enabled are rendered in non-deferred mode rather than deferred mode. This means issues with dynamic cubemaps on transparent materials are actually non-deferred rendering issues.

Applied to files:

  • features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Validate shader compilation (VR, .github/configs/shader-validation-vr.yaml)
  • GitHub Check: Build plugin and addons
  • GitHub Check: Validate shader compilation (Flatrim, .github/configs/shader-validation.yaml)
🔇 Additional comments (5)
package/Shaders/Utility.hlsl (1)

368-463: Great GPU performance optimizations!

The refactoring effectively reduces GPU timing costs through sample count reduction, precomputation of expensive operations, loop unrolling, and early termination. The use of rsqrt for fast normalization and simplified random generation are excellent choices.

features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl (4)

28-70: Excellent parallelization of the spherical integral computation

The parallel implementation correctly preserves the mathematical operations while achieving ~4x performance improvement. The optimizations in the dynamic cubemap conditional check and pre-computed constants are well done.


73-89: Well-implemented parallel reduction algorithm

The tree-based reduction correctly handles the 256→1 reduction in logarithmic steps with proper synchronization. The [unroll] directive and single-thread output write are appropriate optimizations.


27-28: Verify compute shader dispatch call parameters match updated thread group size

I couldn't locate any C++ invocations of the DiffuseIBLCS compute shader via search, so please manually confirm that your dispatch call uses AXIS_SAMPLE_COUNT for both X and Y thread-group dimensions. For example, ensure any call resembling:

deviceContext->Dispatch(
    /* x = */ AXIS_SAMPLE_COUNT,
    /* y = */ AXIS_SAMPLE_COUNT,
    /* z = */ 1
);

is in place. Common places to check include:

  • Graphics or render-pass implementation files (e.g., GraphicsRenderer.cpp, DiffuseIBLPass.cpp)
  • Any module that sets up or dispatches compute shaders

16-24: Confirm shared memory footprint fits hardware limits

We couldn’t locate the sh2 definition in the repo to calculate its size. Please verify that:

  • Total shared memory = 3 channels × (AXIS_SAMPLE_COUNT² = 256) × sizeof(sh2)
  • This value stays within your GPU’s shared memory capacity (typically 32 KB or 48 KB)

If it exceeds the limit, consider reducing the sample count or restructuring the reduction.

• File: features/IBL/Shaders/IBL/DiffuseIBLCS.hlsl (lines 22–24)

Comment thread package/Shaders/Utility.hlsl Outdated
@alandtse alandtse changed the title refactor: GPU performance optimization for some HLSL files perf: optimize utility and diffuseiblcs Aug 1, 2025
Copy link
Copy Markdown
Collaborator

@alandtse alandtse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to address any new warnings. If you believe they need to be ignored or suppressed, please explain the reasoning. I assume you've tested for correctness.

Lastly, note the perf: commit message.

@soda3000 soda3000 marked this pull request as draft August 4, 2025 16:11
Automated formatting by clang-format, prettier, and other hooks.
See https://pre-commit.ci for details.
@soda3000 soda3000 marked this pull request as ready for review August 4, 2025 18:56
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Aug 4, 2025

✅ A pre-release build is available for this PR:
Download

@soda3000 soda3000 requested a review from alandtse August 4, 2025 19:54
@alandtse
Copy link
Copy Markdown
Collaborator

alandtse commented Aug 4, 2025

Closing and reopening to try to trigger the ci.

@alandtse alandtse closed this Aug 4, 2025
@alandtse alandtse reopened this Aug 4, 2025
@alandtse
Copy link
Copy Markdown
Collaborator

alandtse commented Aug 5, 2025

Need some testers. Once you confirm it's been tested, I'll merge.

@Pentalimbed Pentalimbed requested a review from jiayev August 7, 2025 03:56
@Pentalimbed Pentalimbed merged commit 0b469a5 into community-shaders:dev Aug 10, 2025
13 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants