[KERNELS] simplify mx shuffled weights defaults by aeng-openai · Pull Request #9986 · triton-lang/triton

aeng-openai · 2026-04-10T03:13:19Z

Simplify use of shuffled blackwell mx value weights

convert directly to BlackwellMX4ValueShuffledLayout; don't require first going through BlackwelllValueLayout
use block sizes from BlackwellMX4ValueShuffledLayout as opt flag constraints. removes complicated code needed to infer the block sizes before making the layout. pick a better default of block_n = 256, block_k = 128 which generally works well and is the inferred one except in cases where N < 256. also makes it simpler to just use, instead of also needing to override disable_mx4_block_swap = True when shuffled weights are used.
add more test coverage

same perf from running torchrun --nproc-per-node=1 python/triton_kernels/bench/bench_mlp.py

ThomasRaoux

Nice!

…9999) two commits forgotten in #9986

Simplify use of shuffled blackwell mx value weights - convert directly to BlackwellMX4ValueShuffledLayout; don't require first going through BlackwelllValueLayout - use block sizes from BlackwellMX4ValueShuffledLayout as opt flag constraints. removes complicated code needed to infer the block sizes before making the layout. pick a better default of block_n = 256, block_k = 128 which generally works well and is the inferred one except in cases where N < 256. also makes it simpler to just use, instead of also needing to override disable_mx4_block_swap = True when shuffled weights are used. - add more test coverage same perf from running `torchrun --nproc-per-node=1 python/triton_kernels/bench/bench_mlp.py`

…riton-lang#9999) two commits forgotten in triton-lang#9986

aeng-openai added 5 commits April 9, 2026 20:06

[KERNELS] simplify mx shuffled weights defaults

e44804f

improve smem heuristics

9713ff6

bench_mlp

50fa4ae

fix

a5f1bc8

smem heuristic

bc04d11

aeng-openai marked this pull request as ready for review April 10, 2026 18:55

aeng-openai requested a review from ptillet as a code owner April 10, 2026 18:55

ThomasRaoux approved these changes Apr 10, 2026

View reviewed changes

aeng-openai merged commit 028e5da into triton-lang:main Apr 10, 2026
9 checks passed

aeng-openai mentioned this pull request Apr 10, 2026

[KERNELS] fix hopper smem heuristic and blackwell shuffle transform #9999

Merged

aeng-openai added a commit that referenced this pull request Apr 11, 2026

[KERNELS] fix hopper smem heuristic and blackwell shuffle transform (#…

6a0b546

…9999) two commits forgotten in #9986

plognjen pushed a commit to plognjen/triton that referenced this pull request Apr 14, 2026

[KERNELS] fix hopper smem heuristic and blackwell shuffle transform (t…

03b1701

…riton-lang#9999) two commits forgotten in triton-lang#9986

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KERNELS] simplify mx shuffled weights defaults#9986

[KERNELS] simplify mx shuffled weights defaults#9986
aeng-openai merged 5 commits into
triton-lang:mainfrom
aeng-openai:aeng/mx-shuffle

aeng-openai commented Apr 10, 2026 •

edited

Loading

Uh oh!

ThomasRaoux left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aeng-openai commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasRaoux left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aeng-openai commented Apr 10, 2026 •

edited

Loading