Skip to content
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,12 +1,21 @@
# Fused MoE Triton Kernel Configurations

## Introduce

This directory contains tuned configurations for different settings of the fused_moe kernel.
For different settings of

- E (number of experts)
- N (intermediate size)
- device_name (torch.cuda.get_device_name())

the JSON file contains a mapping from M (batch size) to the chosen configuration.

The example configurations provided are for the Mixtral model for TP2 on H100
and TP4 on A100. Mixtral has intermediate size N = 14336, i.e. for TP2 we have
N = 7168 and for TP4 we have N = 3584.

See `benchmark/kernels/benchmark_moe.py` on how to generate these config files.


Check failure on line 20 in vllm/model_executor/layers/fused_moe/configs/README.md

View workflow job for this annotation

GitHub Actions / pre-commit

Multiple consecutive blank lines [Expected: 1; Actual: 2]
### Tune MoE kernel

Check failure on line 21 in vllm/model_executor/layers/fused_moe/configs/README.md

View workflow job for this annotation

GitHub Actions / pre-commit

Files should end with a single newline character

Check failure on line 21 in vllm/model_executor/layers/fused_moe/configs/README.md

View workflow job for this annotation

GitHub Actions / pre-commit

Trailing spaces [Expected: 0 or 2; Actual: 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new section ### Tune MoE kernel is empty. This could be confusing for readers and makes the documentation feel incomplete. Please either add the relevant content for tuning the MoE kernel or remove this section if it's not yet ready.

Loading