Skip to content

[RFC]: Improve MoE triton kernel tuning #24112

@jeejeelee

Description

@jeejeelee

Motivation.

Different triton versions produce different MoE tuned configurations. For example, the triton 3.4.0 tuned config differs from the config on version 0.10.1.1. These differences may impact model performance. This RFC mainly discusses how to handle different versions of triton tuned config.

Proposed Change.

We want to:

  • Split the MoE tuned config into N folders (currently only contains legacy_configs and triton_3_4_0). legacy_configs represents the folder for configurations that cannot be traced back to the committed triton version, while triton_3_4_0 represents the folder for the corresponding triton version, which is currently 3.4.0. As triton versions are updated, there may be more triton version folders. See: [Kernel] Split moe tuned configs #24113
  • Add documentation related to benchmark_moe, including how to benchmark, how to pass local tuned configs through VLLM_TUNED_CONFIG_FOLDER, etc. This will encourage more users in tuning kernels on their hardware. see: [Doc] Add benckmark_moe doc #24860

Feedback Period.

No response

CC List.

@mgoin @simon-mo @WoosukKwon @youkaichao

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions