[benchmark] Add triton version in the moe tuned config#24769
[benchmark] Add triton version in the moe tuned config#24769jeejeelee merged 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds the Triton version to the tuned MoE configuration files. The changes involve updating the benchmark script to save the version and modifying the MoE layer to handle this new metadata when loading configurations. The version of the Triton placeholder is also updated.
My review identifies a potential robustness issue in how the configuration is loaded. The current approach of explicitly removing the triton_version key is brittle. I've suggested a more resilient method that filters for numeric keys, which will prevent crashes if other metadata is added to the configuration files in the future.
| logger.info("Using configuration from %s for MoE layer.", | ||
| config_file_path) | ||
| # If a configuration has been found, return it | ||
| return {int(key): val for key, val in json.load(f).items()} |
There was a problem hiding this comment.
The proposed change to explicitly pop triton_version is brittle. If other metadata is added to the configuration file in the future, this code will fail with a ValueError when trying to convert a non-integer string key to an integer.
A more robust approach is to filter for keys that are valid integers. By checking key.isdigit(), you can automatically ignore any non-numeric keys, including triton_version and any future metadata, making the code more resilient to future changes.
| return {int(key): val for key, val in json.load(f).items()} | |
| return {int(key): val for key, val in json.load(f).items() if key.isdigit()} |
|
Let's merge this PR first, then consider fp8 gemm in a subsequent PR |
…24769) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…24769) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Purpose
See: #24112 (comment)
The tuned configuration will include the triton_version, like:
{ "triton_version": "3.4.0", "1": { "BLOCK_SIZE_M": 16, "BLOCK_SIZE_N": 32, "BLOCK_SIZE_K": 64, "GROUP_SIZE_M": 1, "num_warps": 4, "num_stages": 4 }, "2": { "BLOCK_SIZE_M": 16, "BLOCK_SIZE_N": 128, "BLOCK_SIZE_K": 64, "GROUP_SIZE_M": 1, "num_warps": 4, "num_stages": 3 }, "4": { "BLOCK_SIZE_M": 16, "BLOCK_SIZE_N": 128, "BLOCK_SIZE_K": 64, "GROUP_SIZE_M": 1, "num_warps": 4, "num_stages": 3 }, }Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.