Skip to content

Conversation

@arun-thmn
Copy link
Contributor

This PR update a small logic in the micro-kernels lowering based on m and n tile size. If:

  • m >= n - first load all B matrix elements, then broadcast A one-by-one + do fma.
  • n > m - do the opposite. First broadcast all A matrix elements, then load B one-by-one + do fma

The logic is updated for fp32 (both avx512 & avx2) and bf16 (only avx512).
Bf16 avx2 will be done later after fixing the llvm pattern matching problem on ADL machine.

@arun-thmn arun-thmn added the benchmark-full Benchmark all targets label Jul 8, 2025
@arun-thmn arun-thmn marked this pull request as ready for review July 8, 2025 07:34
@arun-thmn arun-thmn requested review from adam-smnk and shahidact July 8, 2025 07:35
Copy link
Contributor

@adam-smnk adam-smnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging by the tests looks fine.

To be honest, I'm getting lost in all the branches here 😅
Perhaps it could be simplified if you created all the needed ops first then reshuffled them using rewrite.moveOp....
But ultimately as you prefer, as long as you know what's going on. 🙂

@arun-thmn
Copy link
Contributor Author

Judging by the tests looks fine.

To be honest, I'm getting lost in all the branches here 😅 Perhaps it could be simplified if you created all the needed ops first then reshuffled them using rewrite.moveOp.... But ultimately as you prefer, as long as you know what's going on. 🙂

True, @adam-smnk.
This pass requires lot of conditional branches. There are few TODOs for this pass like i8 support and want to finish then first. Afterwards, definitely will try to simply this one.

@arun-thmn arun-thmn merged commit 4a95804 into libxsmm:main Jul 8, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark-full Benchmark all targets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants