[AMD] Clean up shuffleXor implementation by FrederickVu · Pull Request #10065 · triton-lang/triton

FrederickVu · 2026-04-17T19:27:48Z

We make things a bit more uniform by decomposing the xor mask and emitting instructions accordingly. For a mask in [1, 15], on RDNA + gfx1250 we use a single row_xmask DPP instruction, and on CDNA we use 1 or 2 DPP instructions. For a mask >= 16, on RDNA, we use a single v_permlanex16, and on CDNA we use ds_bpermute.

We also pull some static utility functions into an anonymous namespace and remove the ShflKind::down case from the enum as it was unimplemented.

antiagainst · 2026-04-18T01:13:09Z

  // CHECK-LABEL: reduce_xor_max
  tt.func @reduce_xor_max(%arg0: tensor<32xf32, #blocked4>) {
-    // CHECK: rocdl.ds_swizzle
+    // stride 16: CDNA fallback to bpermute


Right now this is only checking gfx942. Can you also add check lines for gfx1250 given your changes?

antiagainst · 2026-04-18T01:15:00Z

+  Value hiSel = b.i32_val(buildSelectorMask(8));
+  return ROCDL::PermlaneX16Op::create(rewriter, loc, val.getType(), val, val,
+                                      loSel, hiSel, true, false)
+      .getRes();


Nit: You don't need to explicitly call getRes for one-result ops--it will automatically convert the op to its only result value I believe.

We make things a bit more uniform by decomposing the xor mask and emitting instructions accordingly. For a `mask` in [1, 15], on RDNA + gfx1250 we use a single `row_xmask` DPP instruction, and on CDNA we use 1 or 2 DPP instructions. For a `mask >= 16`, on RDNA, we use a single `v_permlanex16`, and on CDNA we use `ds_bpermute`. We also pull some static utility functions into an anonymous namespace and remove the ShflKind::down case from the enum as it was unimplemented.

Clean up shufflexor implementation

c4f1e2a

FrederickVu requested review from antiagainst, ptillet and zhanglx13 as code owners April 17, 2026 19:27

retrigger CI

67209a6

antiagainst reviewed Apr 18, 2026

View reviewed changes

Add gfx1250 test and drop getRes() calls

1a9639a

antiagainst approved these changes Apr 18, 2026

View reviewed changes

antiagainst merged commit 2796cea into triton-lang:main Apr 18, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Clean up shuffleXor implementation#10065

[AMD] Clean up shuffleXor implementation#10065
antiagainst merged 3 commits into
triton-lang:mainfrom
FrederickVu:shufflexor

FrederickVu commented Apr 17, 2026

Uh oh!

antiagainst Apr 18, 2026

Uh oh!

antiagainst Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FrederickVu commented Apr 17, 2026

Uh oh!

antiagainst Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

antiagainst Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants