Skip to content

codegen: handle unaligned K for TN#3679

Merged
jfactory07 merged 35 commits into
developfrom
users/jzhou/address-interleave
Feb 11, 2026
Merged

codegen: handle unaligned K for TN#3679
jfactory07 merged 35 commits into
developfrom
users/jzhou/address-interleave

Conversation

@jfactory07
Copy link
Copy Markdown
Contributor

@jfactory07 jfactory07 commented Jan 7, 2026

Motivation

  • Enable new address modes: Add support for B address interleave and K-alignment (KRingShift) in TensileLite codegen.

  • Correctness in tail paths: Ensure tail global reads behave correctly when KRS is disabled at runtime (sgprKRingShift==0) by falling back to the original load-only behavior.

  • Performance: Reduce redundant tail offset work by hoisting invariants and interleaving offset-apply with buffer_load in the tail path.

Technical Details

  • BAddrInterleave (e77420a)

    • Adds BAddrInterleave validation/knob and ISA capability wiring.
    • Computes runtime G once and reuses it across SRD/address calculations (kept live in SGPRs).
  • KRingShift align-k (cec7d49)

    • Adds KRingShift knob and per-workgroup initialization of sgprKRingShift based on cacheline constraints.
    • Applies KRS adjustment to computed global addresses and introduces reference-style tail offset remap macros.
  • Tail-path refinements (5ef3eb7)

    • Moves KRS tail offset patching to just-in-time per-load emission (setup once; apply right before each load).
    • Adds runtime branching so sgprKRingShift==0 takes a no-KRS load-only path; otherwise executes the KRS-enabled interleaved path (including shared A/B label flow when applicable).
    • Fixes tail LDS “zero-out mask” control flow to be conditional (only skip when aligned) and skips the mask sequence when KRS is enabled (since KRS already forces safe OOB behavior).
    • Ensures SGPR cleanup: emits .set ... , UNDEF for KRS/BInterleaveG after last use to avoid accidental remapping.
  • Macro/rocisa robustness (448c4d6)

    • Converts KRS tail offset macros to rocisa.code.Macro API.
    • Fixes RegisterContainer::toString() handling for macro arg register ranges to prevent invalid expansions.
    • Forces specific literals to print as intended (e.g., 0xffffffff).

Test Plan

  • Codegen build: Run Tensile library generation for gfx950 (e.g., asm-debug + keep-build-tmp) and confirm codegen completes without asm errors.
  • Assembly inspection:
    • Confirm KRS markers/macros appear as expected and macro expansions are valid.
    • Validate tail-path behavior:
      • sgprKRingShift==0 → load-only path (no KRS offset apply)
      • sgprKRingShift!=0 → KRS-enabled interleaved path
  • Runtime validation:
    • Run hipblaslt-bench with cases that toggle KRS enable/disable (shapes where cacheline congruence permits/disables KRS) and compare correctness vs baseline.

Test Result

benchmark test for 2048x3072x1880 TN: 9% uplift

Submission Checklist

Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/Common/ValidParameters.py
 - require tiles1 = SizeJ / MT1 to be an integer (SizeJ % MT1 == 0)
    #   - require lowbit(tiles1) > 1 so that G=min(lowbit(tiles1), LVCB) is > 1 (enabled)
    # Note: if lowbit(tiles1) == 1, then G==1 and the kernel disables BAddrInterleave.
Comment thread projects/hipblaslt/tensilelite/Tensile/Common/ValidParameters.py
Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py Outdated
@jfactory07 jfactory07 marked this pull request as ready for review January 23, 2026 06:06
@jfactory07 jfactory07 requested a review from a team as a code owner January 23, 2026 06:06
@jfactory07 jfactory07 changed the title codegen: BAddrInterleave codegen: handle unaligned K for TN Jan 23, 2026
Add a VS Code/Cursor extension to hover Tensile YAML and view/edit a single kernel block, and ignore VSIX/build artifacts.
Copy link
Copy Markdown
Contributor

@aazz44ss aazz44ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Resolve conflicts in Tensile codegen and gfx950 logic YAML.
@jfactory07 jfactory07 enabled auto-merge (squash) February 5, 2026 08:43
@math-ci-webhook
Copy link
Copy Markdown

perfci run on commit 250655e

math-ci run

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (76.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3679   +/-   ##
========================================
  Coverage    65.33%   65.33%           
========================================
  Files         1579     1577    -2     
  Lines       242119   242154   +35     
  Branches     33920    33912    -8     
========================================
+ Hits        158167   158201   +34     
- Misses       69931    69945   +14     
+ Partials     14021    14008   -13     
Flag Coverage Δ *Carryforward flag
hipBLAS 90.67% <ø> (ø) Carriedforward from fe1643e
hipBLASLt 43.62% <ø> (ø)
hipDNN 81.56% <ø> (+0.06%) ⬆️ Carriedforward from fe1643e
hipFFT 56.68% <ø> (ø) Carriedforward from fe1643e
hipSPARSE 84.70% <ø> (ø) Carriedforward from fe1643e
rocBLAS 47.97% <ø> (ø) Carriedforward from fe1643e
rocFFT 48.57% <ø> (ø) Carriedforward from fe1643e
rocSOLVER 76.83% <ø> (ø) Carriedforward from fe1643e
rocSPARSE 71.53% <ø> (-<0.01%) ⬇️ Carriedforward from fe1643e

*This pull request uses carry forward flags. Click here to find out more.
see 48 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@math-ci-webhook
Copy link
Copy Markdown

perfci run on commit e2ac19b

math-ci run

@math-ci-webhook
Copy link
Copy Markdown

perfci run on commit 4d50379

math-ci run

@math-ci-webhook
Copy link
Copy Markdown

perfci run on commit bc6a29e

math-ci run

@jfactory07 jfactory07 merged commit e5bbd57 into develop Feb 11, 2026
65 of 67 checks passed
@jfactory07 jfactory07 deleted the users/jzhou/address-interleave branch February 11, 2026 12:00
kamuruga08 pushed a commit that referenced this pull request Feb 19, 2026
kamuruga08 pushed a commit that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants