Add more ci for moe refactor b200#31769
Conversation
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
…m-project/vllm into add-more-ci-for-moe-refactor
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
|
|
There was a problem hiding this comment.
Code Review
This pull request adds new CI tests for a Mixture-of-Experts (MoE) refactor. While adding test coverage is valuable, the implementation contains several critical and high-severity issues, primarily due to copy-paste errors. There's a critical error in the Buildkite pipeline configuration where the B200 test incorrectly uses the H100 configuration file. Additionally, there are multiple inconsistencies in the YAML test configuration files, including typos in environment variable names, invalid YAML syntax, and mismatches between filenames and their corresponding test settings. These issues will likely cause CI failures or lead to tests running with incorrect configurations, undermining their purpose.
| optional: true | ||
| num_gpus: 2 | ||
| commands: | ||
| - pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=evals/gsm8k/configs/moe-refactor/config-h100.txt No newline at end of file |
There was a problem hiding this comment.
The B200 integration test is incorrectly using the configuration file for H100 (config-h100.txt). This will cause the wrong set of tests to be executed on the B200 hardware. It should be using config-b200.txt to run the tests intended for B200.
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=evals/gsm8k/configs/moe-refactor/config-b200.txt| server_args: "--enforce-eager --max-model-len 8192 --tensor-parallel-size 2" | ||
| env: | ||
| VLLM_USE_FLASHINFER_MOE_FP8: "1" | ||
| VLLM_FLASHINFER_MaOE_BACKEND: "latency" |
There was a problem hiding this comment.
| num_questions: 1319 | ||
| num_fewshot: 5 | ||
| server_args: "--enforce-eager --max-model-len 8192 --tensor-parallel-size 2" | ||
|
|
| env: | ||
| VLLM_TEST_FORCE_FP8_MARLIN: "1" |
There was a problem hiding this comment.
| env: | ||
| VLLM_USE_FLASHINFER_MOE_FP4: "1" | ||
| VLLM_FLASHINFER_MOE_BACKEND: "throughput" |
There was a problem hiding this comment.
The filename indicates a marlin test configuration, but the environment variables are set for flashinfer. This is inconsistent and will not test the marlin kernel. Please update the environment variables to be consistent with a marlin test for this model type.
env:
VLLM_USE_DEEP_GEMM: "0"
VLLM_USE_DEEP_GEMM_MOE: "0"
VLLM_TEST_FORCE_FP8_MARLIN: "1"| env: | ||
| VLLM_TEST_FORCE_FP8_MARLIN: "1" |
There was a problem hiding this comment.
| env: | ||
| VLLM_USE_FLASHINFER_MOE_FP4: "1" | ||
| VLLM_FLASHINFER_MOE_BACKEND: "throughput" |
There was a problem hiding this comment.
| env: | ||
| VLLM_TEST_FORCE_FP8_MARLIN: "1" |
| @@ -0,0 +1,12 @@ | |||
| Llama-4-Scout-Fp8-ModelOpt-fi-trtllm.yaml | |||
| Qwen3-30B-A3B-Fp8-AutoFp8-fi-trtllm.yaml | |||
|
|
1 similar comment
|
|
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.