-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Open
Labels
Description
Motivation.
Different triton versions produce different MoE tuned configurations. For example, the triton 3.4.0 tuned config differs from the config on version 0.10.1.1. These differences may impact model performance. This RFC mainly discusses how to handle different versions of triton tuned config.
Proposed Change.
We want to:
- Split the MoE tuned config into N folders (currently only contains
legacy_configsandtriton_3_4_0).legacy_configsrepresents the folder for configurations that cannot be traced back to the committed triton version, whiletriton_3_4_0represents the folder for the corresponding triton version, which is currently3.4.0. As triton versions are updated, there may be more triton version folders. See: [Kernel] Split moe tuned configs #24113 - Add documentation related to
benchmark_moe, including how to benchmark, how to pass local tuned configs throughVLLM_TUNED_CONFIG_FOLDER, etc. This will encourage more users in tuning kernels on their hardware. see: [Doc] Add benckmark_moe doc #24860
Feedback Period.
No response
CC List.
@mgoin @simon-mo @WoosukKwon @youkaichao
Any Other Things.
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
noooop, KuntaiDu, simon-mo, ZJY0516, kebe7jun and 1 more