Skip to content

Update the Triton softmax micro-bench.#1207

Merged
chengjunlu merged 1 commit into
llvm-targetfrom
chengjun/llvm-target-softmax-microbench
May 30, 2024
Merged

Update the Triton softmax micro-bench.#1207
chengjunlu merged 1 commit into
llvm-targetfrom
chengjun/llvm-target-softmax-microbench

Conversation

@chengjunlu
Copy link
Copy Markdown
Contributor

  1. Modify the softmax kernel for better performance on N < 1024 cases.
  2. Use the synchronize submitting by default for the benchmark.
  3. Align the tile configuration of the XeTLA kernel and Triton kernel.

@chengjunlu chengjunlu force-pushed the chengjun/llvm-target-softmax-microbench branch from f8f2184 to 3546f31 Compare May 29, 2024 03:07
Comment thread benchmarks/xetla_benchmark/fused_softmax.py Outdated
Comment thread benchmarks/xetla_benchmark/fused_softmax.py Outdated
@chengjunlu chengjunlu force-pushed the chengjun/llvm-target-softmax-microbench branch from 3546f31 to eea9219 Compare May 30, 2024 05:28
@chengjunlu chengjunlu merged commit ceadb1b into llvm-target May 30, 2024
@chengjunlu chengjunlu deleted the chengjun/llvm-target-softmax-microbench branch May 31, 2024 04:38
wdziurdz pushed a commit that referenced this pull request Apr 7, 2026
These changes are needed for #1179 to run llama kernels on simulator
with performance traces. They are ported from corresponding JGS files
with no modifications.

---------

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants