refactor(TAA): restructure ISTemporalAA into readable code#98
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughISTemporalAA.hlsl adds named luma/blend constants and helper math, centralizes history reprojection into ReprojectHistoryUV, and rebuilds motion sampling, neighbourhood luma bracket accumulation, combined reject logic, and alpha/HDR-aware output with feedback encoding. ChangesTemporal AA Core Refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No actionable suggestions for changed features. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
package/Shaders/ISTemporalAA.hlsl (1)
286-544: Keep the targeted shader validation cadence on this refactor stack.I don't see a new resource-register conflict here—the explicit
t0-t5,s0-s5, andb2layout stays stable—but this kind of scratch-pack destructuring and[unroll]fold consolidation is exactly where a quickhlslkitbuffer scan plus permutation-targetedhlslkit-compilepass helps catch accidental binding or semantic drift before merge. As per coding guidelines, "Highlight GPU register conflicts and recommend hlslkit buffer scanning for shader development" and "Use targeted hlslkit-compile shader validation for specific features during development rather than full validation."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@package/Shaders/ISTemporalAA.hlsl` around lines 286 - 544, The change introduces aggressive tap folding/unrolls and register-packed neighbor handling (see ISTemporalAA.hlsl symbols SelectDepthGuidedUV, SampleNeighborGRB, AssignPackedNeighbor and the [unroll] fold loops) which can mask GPU register/semantic drift; run an hlslkit buffer scan over this shader and a targeted hlslkit-compile pass (not full validate) for the features touched (neighbor taps, motion reprojection, HDR_OUTPUT branches) to detect any t0–t5, s0–s5, b2 binding conflicts or semantic/register spills, then resolve any reported conflicts by reassigning bindings or refactoring the offending packed variables (unpack/repack around the hot loops) and re-run the hlslkit-compile until clean; add a short comment near the folded loop and the SelectDepthGuidedUV call noting the validation step and tool/command used.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@package/Shaders/ISTemporalAA.hlsl`:
- Around line 286-544: The change introduces aggressive tap folding/unrolls and
register-packed neighbor handling (see ISTemporalAA.hlsl symbols
SelectDepthGuidedUV, SampleNeighborGRB, AssignPackedNeighbor and the [unroll]
fold loops) which can mask GPU register/semantic drift; run an hlslkit buffer
scan over this shader and a targeted hlslkit-compile pass (not full validate)
for the features touched (neighbor taps, motion reprojection, HDR_OUTPUT
branches) to detect any t0–t5, s0–s5, b2 binding conflicts or semantic/register
spills, then resolve any reported conflicts by reassigning bindings or
refactoring the offending packed variables (unpack/repack around the hot loops)
and re-run the hlslkit-compile until clean; add a short comment near the folded
loop and the SelectDepthGuidedUV call noting the validation step and
tool/command used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3645512f-6f24-429d-9519-32f16e22234e
📒 Files selected for processing (1)
package/Shaders/ISTemporalAA.hlsl
There was a problem hiding this comment.
Pull request overview
This PR refactors Skyrim’s decompiled Temporal AA pixel shader (ISTemporalAA.hlsl) into a more readable, “named TAA” structure while aiming to preserve behavior (per PR description: byte-identity where possible, and RenderDoc A/B equivalence where not).
Changes:
- Introduces named constants and helper functions (e.g., bracket merges, sharpen delta, history UV reprojection) to replace repeated decompile idioms.
- Reorganizes
main()to replace pervasive float4 register-aliasing with named locals and structured intermediate steps (brackets, flicker score, history rectification, blend). - Consolidates repeated fold logic into small unrolled loops/arrays for min/max neighborhood bracket evaluation and alpha-coverage gating.
The 9-line subtraction chain is a sum of 9 integer ceil() contributions; collapse to one saturate(4 - ...) expression in the same left-to-right order. Bytecode- identical across flat/VR/HDR/VR+HDR (offline verifier).
The 6-tap min-luma colour bracket reduction (18 lines of identical cmp/select/ select) collapses to 6 MergeLumaBracket() calls. Bytecode diverges (helper inlining), so validated via the Tier-3 RenderDoc harness on a live SE frame: EQUIVALENT, candidate mean_abs == baseline noise floor (1.68e-4).
The flicker section interleaved 8 FlickerLumaContribution stashes (which reused the taps' .x belowHist slots) with the min-luma colour bubble-sort. Each contribution reads a tap's original .w luma before the sort mutates it, so hoist all 8 into named fc* locals up front; the colour sort stays as register packing with a comment. Validated EQUIVALENT via the runtime harness (mean == floor).
The two sharpen candidates (8 lines of -a*0.25+center accumulation, each via a dead scratch slot) collapse to SharpenDelta() calls. Validated EQUIVALENT via the runtime harness (mean == floor).
The temporal-blend/clamp section is dense per-line packing with no repeating structure to factor safely; add an intent comment instead of restructuring. Comment-only -> bytecode-identical across all four permutations.
kHistoryLumaDecay (0.95), kHistoryBlendThreshold (0.9025), kFeedbackBlendMax (~0.99) replace inline literals, matching the existing k* constant pattern. Bytecode-identical across all four permutations (offline verifier).
kFlickerThreshold (0.2) replaces the inline literal in FlickerLumaContribution. Bytecode-identical across all four permutations (offline verifier).
The centre-seed fold of tapC1 into the max bracket is the same gated merge as the cascade; replace the inline cmp/select/select with MergeLumaBracket(tapC1, ...). Bytecode-identical across all four permutations (offline verifier).
The ungated corner fold (cmp/select into bracketMinReg) is the same lower-luma merge with gate 0; the neighbourhood min-luma bracket is now expressed entirely through MergeLumaBracket. Bytecode-identical across all four permutations.
…ect) The 'colour sort' is the neighbourhood MAX-luma bracket (complement to the min bracket). Prior attempt inverted the gate (decompile's max pass commits the new pick when belowHist is SET: gate ? cand : bracket — opposite of the min pass); that read EQUIVALENT on a static menu but diverged ~4x under motion. Corrected and validated EQUIVALENT on an in-game vanity-cam MOTION frame (eid 34605, cand_mean == baseline floor) — the fixture that exposed the original bug.
…validated) Final corner fold = ungated MergeMaxBracket(corner, maxBracket, 1). Colour sort now fully expressed as the neighbourhood max bracket (6 gated folds + corner), mirroring the min bracket. Validated EQUIVALENT on the in-game motion frame (eid 34605, cand_mean == floor).
The min bracket lived in the misleadingly-named bracketMax register (.yzw); lift it into a float3 minBracket local mirroring maxBracket. bracketMax.x (centre belowHist) and the later kMinLumaCap clamp lifetime are untouched. Validated EQUIVALENT on the in-game motion frame (eid 34605, cand_mean == floor).
The kMinLumaCap low-clamp of tapC1 is a centre-vs-floor bound folded into tapC1 with MergeMaxBracket; lift it into a lowClamp local. Retires the bracketMax.yzw lifetime (register now only holds .x = centre belowHist). Validated EQUIVALENT on the in-game motion frame (eid 34605, cand_mean == floor).
…no-op) After minBracket/lowClamp extraction the bracketMax register held only .x (the centre belowHist mask). Replace with a float belowHistCentre local and drop it from the register pool. Bytecode-identical across all four permutations.
The select tail picks the neighbourhood AABB bounds for history clamping: motionReject.y toggles corner inclusion (max with-corner / min without-corner when set). Express as maxBoundSel/minBoundSel float3 locals and pack into tapC0/tapC1 in the blend's expected layout; drops the bracketMinReg.x / mergeScratch.x / tapA0.yzw scratch packing. Validated EQUIVALENT under in-game motion (eid 34605, cand_mean == floor).
tapA1.x = min(max(tapMin.w, history.x), tapA0.w) is clamp(history.x, lo, hi) over the two sharpened candidates' luma range. Validated EQUIVALENT under in-game motion (eid 34605, cand_mean == floor).
The core restructure orphaned the weightedColor float4 pack — zero references remain in main(). Compiled DXBC is byte-identical before and after removal across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The min and max neighbourhood colour brackets fold the same six taps in the same order with the same belowHist gates; only the merge rule and seed differ. Extract the (tap, gate) sequence into foldTaps/foldGates arrays and replace the two unrolled 6-line cascades with [unroll] loops. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the three narrow scratch register-packs with named locals: centerMeta -> centerRBL (centre colour as R,B,luma); the min-bound packs mergeScratch.yzw/bracketMinReg.yzw -> minBoundNoCorner/minBoundWithCorner. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
center was only ever used as center.x (the C1 tap below-history mask); replace the float4 pack with a named scalar. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The temporal-blend tail packed the two RCAS-sharpened clip candidates and the history-luma clamp into reused tap registers (corner/tapC0/tapMin/ tapA0/tapA1/tapB0). Replace with named locals (clipLo/clipHi/clipPick/ clipFinal/clampedHistory/rectifyLo/rectifyHi); history.xyz and the corner.xyz output stay as carriers for their downstream lifetime. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the motionReject.z accumulator and sampleUV.x/.w scratch in the rejection block with named scalars (reject spanning the VR/SE split, prevUV/uvLow/uvHigh/maskReject). DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The luma-convergence block aliased motionReject.z as the luma diff, motionReject.x as the motion-vs-history term, and history.xw as the resulting similarity weights. Name them lumaDiff/motionVsHistory/ similarity (.x colour weight, .y feedback weight) and route the downstream consumers (colour blend, weight clamp, feedback, final luma fixup) to them. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Name the running temporal blend weight (motionReject.x -> blendWeight), the history feedback weight (tapA0.y -> historyFeedback), and the final feedback-luma fixup (feedbackLuma/lumaNudge/lumaNudgeTiny). DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the reused tap/sampleUV register-packs in the temporal colour blend with named locals: workColor (evolving history->blended colour), neighborBlend, centerVsNeighbor, targetColor, detailDelta. tapMin and sampleUV.xyw are now free of the blend; corner stays only as the output carrier. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
history.yz (mask sample) -> maskValues; the corner pack reused as the final output carrier -> outPacked (.x feedback luma, .yzw colour). DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sampleUV only held UVs already available as drCenter/drUVMin (the blend now uses workColor) plus one scratch scalar -> historyMotionDecay; use the existing locals directly and drop sampleUV. Also drop the throwaway history.xy = drMax copy (pass drMax to ClampScreenUV directly). DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
motionReject.y -> cornerBelowHist (toggles corner-tap inclusion in the neighbourhood AABB); update the stale sampleUV reference in the channel- layout comment. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the tapC0/tapC1 repack bridge between the AABB bound selection and the clip candidates; build clipLo/clipHi straight from minBoundSel/ maxBoundSel. tapC0/tapC1 are now used only as neighbour taps. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop comments describing the pre-destructure code: the 'verbatim math' tag, the 'handed back to tapA0.yzw / bespoke tail' note, and the 'was center.x' intermediate reference; reframe the register notes as decompile cross-refs. Comment-only; DXBC byte-identical. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the 8 explicit FlickerLumaContribution calls + the hand-written 8-term sum with an array + [unroll] reduce (tap order preserved). DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the VR/SE history-reprojection #ifdef out of main() into ReprojectHistoryUV(): VR per-eye reproject + OOB flag, SE screen-space reproject returning the raw UV (read later by the SE disocclusion test). Both out-params written on both paths. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the six AssignPackedNeighbor calls + scattered .x belowHist slots with neighbors[7]/neighborBelowHist[7] sampled in a loop (vanilla order: uvMin,A0,A1,B0,B1,C0,C1). The min/max brackets and flicker now index that array (fold runs neighbors[5..0] = the old foldTaps order); drops the foldTaps/foldGates/belowHistC1 scaffolding and the named tap packs. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The neighbors[] sampling loop calls SampleNeighborGRB/PackNeighborTap directly, so AssignPackedNeighbor has no callers. Removing an uncalled function is a no-op; DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
tapMin was only the reprojected history UV -> float2 historyUV (out of the register-pack declaration). corner.xyz was overloaded as the rectified blend colour after serving as the corner tap -> float3 rectifiedColor, so corner stays the corner tap. DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
history was the last register carrying four shifting lifetimes. Split into named locals: historyLuma (feedback luma), historyFeedbackPrev / prevMotion Confidence (the feedback-RT round-trip weights), clipState (selected luma, lo, hi) + clipRatio for the rectification, and blendLumaMotion for the post-reject luma/motion (kept as one float2 select to match the decompile's movc). DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
similarity: the #ifdef now selects only the diff input (HDR luma diff vs SDR motion-vs-history); the shared '-diff*scale+1' and max(0,...) move outside the branch. feedback luma: hoist the shared saturate() and drop the #else (HDR just wraps it in EncodeFeedbackLuma). Less duplicated code across permutations. DXBC byte-identical across all four. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the SE out-of-bounds test into ReprojectHistoryUV (same boolean as the decompile: any(prevUV >= 1) || any(prevUV <= 0)) so both paths return outOfBounds, and drop the VR/SE #ifdef in main() — reject is now a single 'prevUVOutOfBounds ? 1 : 0'. The dead rawReprojUV out-param is removed. Not byte-identical (SE/HDR bytecode reexpresses the OOB boolean + one extra 'and' from ?1:0), but OUTPUT-identical: reject is used only as a select condition and the boolean is unchanged. VR/VR+HDR are byte-identical (reject expression unchanged); SE confirmed bit-identical output via the RenderDoc same-frame A/B (mean_abs=max_abs=0 on an in-game frame). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
f58959b to
bf3a8fc
Compare
Address Copilot review: store the low-clamped C1 bracket seed in its own lowClampedC1 (used only to seed the max bracket) instead of overwriting neighbors[6].xyz, so neighbors[] stays a pure tap array (preserves the documented GRB.xyz/luma.w invariant). Also reword two header comments that described the refactor action rather than the present code (per CLAUDE.md comment guidelines). DXBC byte-identical across all four permutations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restructures
package/Shaders/ISTemporalAA.hlsl— Skyrim's TAA decompile transcription — into readable, named TAA, with no behaviour change. Single PR (consolidated from the former #90/#96/#98 stack now that the work is complete); built on the VR fix #84, already merged to dev.What changed (1 file, ISTemporalAA.hlsl)
MergeLumaBracket/MergeMaxBracket(neighbourhood colour brackets),SharpenDelta(RCAS),FlickerLumaContribution,SampleNeighborGRB/AssignPackedNeighbor,SelectDepthGuidedUV, the UV clamps.kMaxLumaCap,kSimilarityScale,kHistoryBlendThreshold, …).foldTaps/foldGatesloop.main()into named locals — the deep decompile aliasing where one component meant different things at different points (motionReject.z= reject flag then luma diff;history.xw= luma then similarity weights;sampleUV.xyw= UV then working colour). Now:reject,lumaDiff,similarity,blendWeight,workColor,neighborBlend,targetColor,clipLo/clipHi,maskValues,outPacked,cornerBelowHist, … ThesampleUVpack, thehistory.xy = drMaxtemp, and thetapC0/tapC1clip bridge are gone. Only genuine packed data (motion vector, history sample, neighbour taps) remains as float4.Validation
Two tiers, both green:
tools/verify-shader-refactor.ps1reports all four permutations IDENTICAL (PSHADER × {VR} × {HDR_OUTPUT}) for the destructure + dedup layers.HDR (non-VR) path: the PQ pipeline is intact; HDR-block changes are value-identical renames (couldn't be runtime-captured because the DX12-interop swapchain hides the D3D11 frame, so it rests on static analysis + the byte-identity check).
Will be squash-merged — this PR title becomes the commit.
🤖 Generated with Claude Code
Summary by CodeRabbit