refactor(llf): align cluster light cap#61
Conversation
|
Warning Review limit reached
More reviews will be available in 26 minutes and 24 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR reduces the per-cluster light capacity from 256 to 128, documents the constraint across shader and C++ headers to prevent light index pool overruns, and removes unnecessary LDS staging from the compute shader, allowing threads to read directly from the global lights buffer. ChangesLight Limit Cap and Shader Optimization
Possibly related PRs
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No actionable suggestions for changed features. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
features/Light Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl (1)
14-20: Run targeted hlslkit validation for this shader path.Bindings look unchanged here, but this culling-path update should stay covered by feature-scoped hlslkit compile + buffer-scan checks (flat + VR variants) to catch future register/buffer regressions early.
As per coding guidelines: "Highlight GPU register conflicts and recommend hlslkit buffer scanning for shader development" and "Use targeted hlslkit-compile shader validation for specific features during development rather than full validation."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@features/Light` Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl around lines 14 - 20, Run a targeted hlslkit validation for ClusterCullingCS.hlsl to ensure no GPU register or buffer conflicts were introduced: run the feature-scoped hlslkit compile and buffer-scan checks (flat + VR variants) against the bindings for StructuredBuffer<ClusterAABB> clusters, StructuredBuffer<Light> lights and RWStructuredBuffer<uint> lightIndexCounter, lightIndexList, and RWStructuredBuffer<LightGrid> lightGrid; report any register collisions or missing/incorrect bindings and fix them in the shader (adjust registers or buffer declarations) so the compile+scan passes for this culling path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@features/Light` Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl:
- Around line 14-20: Run a targeted hlslkit validation for ClusterCullingCS.hlsl
to ensure no GPU register or buffer conflicts were introduced: run the
feature-scoped hlslkit compile and buffer-scan checks (flat + VR variants)
against the bindings for StructuredBuffer<ClusterAABB> clusters,
StructuredBuffer<Light> lights and RWStructuredBuffer<uint> lightIndexCounter,
lightIndexList, and RWStructuredBuffer<LightGrid> lightGrid; report any register
collisions or missing/incorrect bindings and fix them in the shader (adjust
registers or buffer declarations) so the compile+scan passes for this culling
path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 33fca8b1-3862-4ca5-a0a6-83cb19dce15c
📒 Files selected for processing (3)
features/Light Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlslfeatures/Light Limit Fix/Shaders/LightLimitFix/Common.hlslisrc/Features/LightLimitFix.h
There was a problem hiding this comment.
Pull request overview
Aligns the LightLimitFix clustered light-culling compute shader with the C++-allocated light index pool size to prevent silent index drops in dense lighting scenarios, and removes provably-dead groupshared staging code from the culling pass.
Changes:
- Tighten the shader per-cluster visible light cap (
MAX_CLUSTER_LIGHTS) from 256 to 128 to matchLightLimitFix::CLUSTER_MAX_LIGHTSand thelightIndexListpool sizing. - Remove unused
groupsharedstaging and unnecessaryGroupMemoryBarrierWithGroupSync()calls inClusterCullingCS.hlsl. - Add cross-referenced documentation comments on both C++ and shader sides to keep the constants in sync.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/Features/LightLimitFix.h |
Adds a clarifying comment documenting that CLUSTER_MAX_LIGHTS must match the shader cap to avoid overrunning the global index pool. |
features/Light Limit Fix/Shaders/LightLimitFix/Common.hlsli |
Updates MAX_CLUSTER_LIGHTS to 128 (behavior fix) and documents why it must match the C++ allocation. |
features/Light Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl |
Removes dead groupshared staging/barriers and adds rationale explaining why the shader reads lights directly. |
5037926 to
cb1302c
Compare
Two related cleanups to the clustered light-culling pass; no observable behavior change in any realistic scene. 1. Align the per-cluster cap. MAX_CLUSTER_LIGHTS (Common.hlsli) was 256 while the C++ pool is clusterCount * CLUSTER_MAX_LIGHTS = 128. The constants represent the same quantity; set the shader cap to 128 and cross-reference both sides. Overrun was effectively unreachable (global pool, mostly-empty clusters), so this is consistency hardening, not a user-visible fix. Also halves the per-thread visibleLightIndices[] indexable-temp array. 2. Remove dead groupshared staging. The sharedLights copy and its barriers were never read in any commit (a name collision on the pezcode port) and could not have worked anyway: Light[GROUP_SIZE] = 96 KB exceeds the 32 KB cs_5_0 LDS limit, so a live read would not compile. fxc already dead-stripped it; the only DXBC delta is the 256->128 cap. Verified with standalone fxc cs_5_0 compiles (flat + VR) and before/after disassembly diff. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cb1302c to
8704216
Compare
|
✅ A pre-release build is available for this PR: |
Summary
Two related cleanups to the LightLimitFix clustered light-culling pass (
ClusterCullingCS.hlsl), both confirmed by standalonefxc cs_5_0compiles (flat + VR) and before/after DXBC disassembly diff. No observable behavior change in any realistic scene — this is consistency hardening + dead-code removal, hencerefactor.1. Align the per-cluster light cap with the index-pool size
MAX_CLUSTER_LIGHTSin the shader was 256 while the C++ side allocates the globallightIndexListpool asclusterCount * CLUSTER_MAX_LIGHTS= 128 per cluster (LightLimitFix.h). The two constants are meant to represent the same quantity, so the shader cap is set to 128 and both sides now cross-reference each other in comments.In principle a 256 cap could let the summed per-cluster counts overrun the pool (D3D11 silently discards out-of-bounds
RWStructuredBufferwrites), but in practice this is unreachable: the pool is global and filled via a shared atomic, and the overwhelming majority of clusters hold zero lights, so the average per-cluster count stays far below 128 even in dense scenes. The change is therefore a correctness/consistency hardening, not a user-visible fix. It also halves the per-threadvisibleLightIndices[]indexable-temp array.2. Remove dead
groupsharedstaging (zero codegen change)The
groupshared Light sharedLights[GROUP_SIZE]copy and its twoGroupMemoryBarrierWithGroupSync()calls were vestigial:git log --all -Gfor a read ofsharedLightsreturns nothing. The inner cull loop has read the globallightsbuffer directly since the first LLF commit (4118e7f7d) — a name collision on the original pezcode port: the shared array was renamedsharedLightswhile the global SRV kept the namelights, solights[i]always resolved to global memory. PR chore: remove cluster culling while loop community-shaders/skyrim-community-shaders#697 later removed the batchingwhileloop, collapsing the staging into obvious single-element dead code.Light[GROUP_SIZE]= 1024 × 96 B = 96 KB, 3× over the 32 KBcs_5_0/DX11 groupshared limit. A live read would be a hard compile error, so the shader only ever compiled because the staging was dead andfxcstripped it.dcl_tgsmand zerosync_*. The only DXBC delta in this PR is the256 → 128cap from change 1.Risk
Low. Change 2 is provably codegen-neutral; change 1 only tightens a cap the allocation already assumed.
Notes
A larger occupancy optimization for this shader (eliminating the per-thread index array via a two-pass count/write —
fxcconfirms it removes the X4714 occupancy warning) is being evaluated separately with an in-game Tracy A/B; it is intentionally not in this PR.🤖 Generated with Claude Code