Skip to content

refactor(llf): align cluster light cap#61

Merged
alandtse merged 1 commit into
devfrom
claude/nifty-austin-50f8ad
May 30, 2026
Merged

refactor(llf): align cluster light cap#61
alandtse merged 1 commit into
devfrom
claude/nifty-austin-50f8ad

Conversation

@alandtse
Copy link
Copy Markdown
Owner

@alandtse alandtse commented May 30, 2026

Summary

Two related cleanups to the LightLimitFix clustered light-culling pass (ClusterCullingCS.hlsl), both confirmed by standalone fxc cs_5_0 compiles (flat + VR) and before/after DXBC disassembly diff. No observable behavior change in any realistic scene — this is consistency hardening + dead-code removal, hence refactor.

1. Align the per-cluster light cap with the index-pool size

MAX_CLUSTER_LIGHTS in the shader was 256 while the C++ side allocates the global lightIndexList pool as clusterCount * CLUSTER_MAX_LIGHTS = 128 per cluster (LightLimitFix.h). The two constants are meant to represent the same quantity, so the shader cap is set to 128 and both sides now cross-reference each other in comments.

In principle a 256 cap could let the summed per-cluster counts overrun the pool (D3D11 silently discards out-of-bounds RWStructuredBuffer writes), but in practice this is unreachable: the pool is global and filled via a shared atomic, and the overwhelming majority of clusters hold zero lights, so the average per-cluster count stays far below 128 even in dense scenes. The change is therefore a correctness/consistency hardening, not a user-visible fix. It also halves the per-thread visibleLightIndices[] indexable-temp array.

2. Remove dead groupshared staging (zero codegen change)

The groupshared Light sharedLights[GROUP_SIZE] copy and its two GroupMemoryBarrierWithGroupSync() calls were vestigial:

  • Never read, in any commit on any branch. git log --all -G for a read of sharedLights returns nothing. The inner cull loop has read the global lights buffer directly since the first LLF commit (4118e7f7d) — a name collision on the original pezcode port: the shared array was renamed sharedLights while the global SRV kept the name lights, so lights[i] always resolved to global memory. PR chore: remove cluster culling while loop community-shaders/skyrim-community-shaders#697 later removed the batching while loop, collapsing the staging into obvious single-element dead code.
  • Could never have worked here. Light[GROUP_SIZE] = 1024 × 96 B = 96 KB, 3× over the 32 KB cs_5_0/DX11 groupshared limit. A live read would be a hard compile error, so the shader only ever compiled because the staging was dead and fxc stripped it.
  • Removal is byte-identical in codegen. The prior shader's DXBC already had zero dcl_tgsm and zero sync_*. The only DXBC delta in this PR is the 256 → 128 cap from change 1.

Risk

Low. Change 2 is provably codegen-neutral; change 1 only tightens a cap the allocation already assumed.

Notes

A larger occupancy optimization for this shader (eliminating the per-thread index array via a two-pass count/write — fxc confirms it removes the X4714 occupancy warning) is being evaluated separately with an in-game Tracy A/B; it is intentionally not in this PR.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 30, 2026 07:11
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Warning

Review limit reached

@alandtse, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 26 minutes and 24 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f023f895-e706-4cd9-80c9-908c4eee90fb

📥 Commits

Reviewing files that changed from the base of the PR and between 5497b38 and 8704216.

📒 Files selected for processing (3)
  • features/Light Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl
  • features/Light Limit Fix/Shaders/LightLimitFix/Common.hlsli
  • src/Features/LightLimitFix.h
📝 Walkthrough

Walkthrough

The PR reduces the per-cluster light capacity from 256 to 128, documents the constraint across shader and C++ headers to prevent light index pool overruns, and removes unnecessary LDS staging from the compute shader, allowing threads to read directly from the global lights buffer.

Changes

Light Limit Cap and Shader Optimization

Layer / File(s) Summary
Constant reduction and cross-file documentation
features/Light Limit Fix/Shaders/LightLimitFix/Common.hlsli, src/Features/LightLimitFix.h
MAX_CLUSTER_LIGHTS reduced from 256 to 128 with added comments explaining the constraint on the global lightIndexList pool (clusterCount * CLUSTER_MAX_LIGHTS) and synchronization requirement with C++ CLUSTER_MAX_LIGHTS. Corresponding C++ header documentation added.
Compute shader LDS optimization
features/Light Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl
Removed the sharedLights LDS staging buffer and GroupMemoryBarrierWithGroupSync() call; each thread now directly reads from the global lights buffer, simplifying the light loading loop.

Possibly related PRs

  • alandtse/open-shaders#35: Both PRs modify LightLimitFix's shared shader definitions and light data handling within the cluster-culling pipeline.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Lights once staged in shared mystique,
Now gather straight—efficient, sleek,
With caps well-tuned at one-two-eight,
Each thread reads freely, keeps the gate.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR objectives and changes are unrelated to the linked issues (#1 about clang_format action, #2 about progressTitle refactoring), making compliance assessment inconclusive. Verify that the correct issues are linked to this PR, or confirm whether this PR should proceed without linked issue validation.
✅ Passed checks (3 passed)
Check name Status Explanation
Out of Scope Changes check ✅ Passed All changes are directly related to the PR objectives: fixing the cluster light cap mismatch, removing dead groupshared staging, and adding documentation. No out-of-scope modifications detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'refactor(llf): align cluster light cap' accurately describes the main changes: aligning the cluster light capacity constant between shader and C++ code. It matches the core objective of making the MAX_CLUSTER_LIGHTS constant consistent with the C++ allocation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/nifty-austin-50f8ad

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

No actionable suggestions for changed features.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
features/Light Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl (1)

14-20: Run targeted hlslkit validation for this shader path.

Bindings look unchanged here, but this culling-path update should stay covered by feature-scoped hlslkit compile + buffer-scan checks (flat + VR variants) to catch future register/buffer regressions early.

As per coding guidelines: "Highlight GPU register conflicts and recommend hlslkit buffer scanning for shader development" and "Use targeted hlslkit-compile shader validation for specific features during development rather than full validation."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@features/Light` Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl around
lines 14 - 20, Run a targeted hlslkit validation for ClusterCullingCS.hlsl to
ensure no GPU register or buffer conflicts were introduced: run the
feature-scoped hlslkit compile and buffer-scan checks (flat + VR variants)
against the bindings for StructuredBuffer<ClusterAABB> clusters,
StructuredBuffer<Light> lights and RWStructuredBuffer<uint> lightIndexCounter,
lightIndexList, and RWStructuredBuffer<LightGrid> lightGrid; report any register
collisions or missing/incorrect bindings and fix them in the shader (adjust
registers or buffer declarations) so the compile+scan passes for this culling
path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@features/Light` Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl:
- Around line 14-20: Run a targeted hlslkit validation for ClusterCullingCS.hlsl
to ensure no GPU register or buffer conflicts were introduced: run the
feature-scoped hlslkit compile and buffer-scan checks (flat + VR variants)
against the bindings for StructuredBuffer<ClusterAABB> clusters,
StructuredBuffer<Light> lights and RWStructuredBuffer<uint> lightIndexCounter,
lightIndexList, and RWStructuredBuffer<LightGrid> lightGrid; report any register
collisions or missing/incorrect bindings and fix them in the shader (adjust
registers or buffer declarations) so the compile+scan passes for this culling
path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 33fca8b1-3862-4ca5-a0a6-83cb19dce15c

📥 Commits

Reviewing files that changed from the base of the PR and between f431ecd and 5497b38.

📒 Files selected for processing (3)
  • features/Light Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl
  • features/Light Limit Fix/Shaders/LightLimitFix/Common.hlsli
  • src/Features/LightLimitFix.h

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns the LightLimitFix clustered light-culling compute shader with the C++-allocated light index pool size to prevent silent index drops in dense lighting scenarios, and removes provably-dead groupshared staging code from the culling pass.

Changes:

  • Tighten the shader per-cluster visible light cap (MAX_CLUSTER_LIGHTS) from 256 to 128 to match LightLimitFix::CLUSTER_MAX_LIGHTS and the lightIndexList pool sizing.
  • Remove unused groupshared staging and unnecessary GroupMemoryBarrierWithGroupSync() calls in ClusterCullingCS.hlsl.
  • Add cross-referenced documentation comments on both C++ and shader sides to keep the constants in sync.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/Features/LightLimitFix.h Adds a clarifying comment documenting that CLUSTER_MAX_LIGHTS must match the shader cap to avoid overrunning the global index pool.
features/Light Limit Fix/Shaders/LightLimitFix/Common.hlsli Updates MAX_CLUSTER_LIGHTS to 128 (behavior fix) and documents why it must match the C++ allocation.
features/Light Limit Fix/Shaders/LightLimitFix/ClusterCullingCS.hlsl Removes dead groupshared staging/barriers and adds rationale explaining why the shader reads lights directly.

@alandtse alandtse changed the title fix(llf): align cluster light cap with index pool refactor(llf): align cluster light cap May 30, 2026
@alandtse alandtse force-pushed the claude/nifty-austin-50f8ad branch 2 times, most recently from 5037926 to cb1302c Compare May 30, 2026 07:40
Copilot AI review requested due to automatic review settings May 30, 2026 07:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread features/Light Limit Fix/Shaders/LightLimitFix/Common.hlsli
Two related cleanups to the clustered light-culling pass; no observable
behavior change in any realistic scene.

1. Align the per-cluster cap. MAX_CLUSTER_LIGHTS (Common.hlsli) was 256
   while the C++ pool is clusterCount * CLUSTER_MAX_LIGHTS = 128. The
   constants represent the same quantity; set the shader cap to 128 and
   cross-reference both sides. Overrun was effectively unreachable (global
   pool, mostly-empty clusters), so this is consistency hardening, not a
   user-visible fix. Also halves the per-thread visibleLightIndices[]
   indexable-temp array.

2. Remove dead groupshared staging. The sharedLights copy and its barriers
   were never read in any commit (a name collision on the pezcode port) and
   could not have worked anyway: Light[GROUP_SIZE] = 96 KB exceeds the 32 KB
   cs_5_0 LDS limit, so a live read would not compile. fxc already
   dead-stripped it; the only DXBC delta is the 256->128 cap.

Verified with standalone fxc cs_5_0 compiles (flat + VR) and before/after
disassembly diff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alandtse alandtse force-pushed the claude/nifty-austin-50f8ad branch from cb1302c to 8704216 Compare May 30, 2026 07:45
@github-actions
Copy link
Copy Markdown

✅ A pre-release build is available for this PR:
Download

@alandtse alandtse merged commit fcfaab0 into dev May 30, 2026
19 checks passed
@alandtse alandtse deleted the claude/nifty-austin-50f8ad branch May 30, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants