Skip to content

Add more ci for moe refactor b200#31769

Closed
robertgshaw2-redhat wants to merge 10 commits intomainfrom
add-more-ci-for-moe-refactor-b200
Closed

Add more ci for moe refactor b200#31769
robertgshaw2-redhat wants to merge 10 commits intomainfrom
add-more-ci-for-moe-refactor-b200

Conversation

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Jan 6, 2026

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Robert Shaw and others added 10 commits January 5, 2026 20:31
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
@robertgshaw2-redhat robertgshaw2-redhat deleted the add-more-ci-for-moe-refactor-b200 branch January 6, 2026 04:05
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 6, 2026

⚠️ The sha of the head commit of this PR conflicts with #31759. Mergify cannot evaluate rules on this PR. ⚠️

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds new CI tests for a Mixture-of-Experts (MoE) refactor. While adding test coverage is valuable, the implementation contains several critical and high-severity issues, primarily due to copy-paste errors. There's a critical error in the Buildkite pipeline configuration where the B200 test incorrectly uses the H100 configuration file. Additionally, there are multiple inconsistencies in the YAML test configuration files, including typos in environment variable names, invalid YAML syntax, and mismatches between filenames and their corresponding test settings. These issues will likely cause CI failures or lead to tests running with incorrect configurations, undermining their purpose.

optional: true
num_gpus: 2
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=evals/gsm8k/configs/moe-refactor/config-h100.txt No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The B200 integration test is incorrectly using the configuration file for H100 (config-h100.txt). This will cause the wrong set of tests to be executed on the B200 hardware. It should be using config-b200.txt to run the tests intended for B200.

    - pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=evals/gsm8k/configs/moe-refactor/config-b200.txt

server_args: "--enforce-eager --max-model-len 8192 --tensor-parallel-size 2"
env:
VLLM_USE_FLASHINFER_MOE_FP8: "1"
VLLM_FLASHINFER_MaOE_BACKEND: "latency"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a typo in the environment variable name VLLM_FLASHINFER_MaOE_BACKEND. It should be VLLM_FLASHINFER_MOE_BACKEND. This typo will prevent the correct backend from being configured, causing the test to not run as intended.

  VLLM_FLASHINFER_MOE_BACKEND: "latency"

num_questions: 1319
num_fewshot: 5
server_args: "--enforce-eager --max-model-len 8192 --tensor-parallel-size 2"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This line contains only indentation, which makes the YAML file invalid. This will cause a parsing error when the test configuration is loaded. Please remove this line to ensure the file is valid.

Comment on lines +6 to +7
env:
VLLM_TEST_FORCE_FP8_MARLIN: "1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The filename suggests a vllm-cutlass configuration, but the environment variable VLLM_TEST_FORCE_FP8_MARLIN is set, which forces the marlin kernel. This is inconsistent and will not test the intended vllm-cutlass kernel. Based on other vllm-cutlass configurations, this env block should be removed.

Comment on lines +6 to +8
env:
VLLM_USE_FLASHINFER_MOE_FP4: "1"
VLLM_FLASHINFER_MOE_BACKEND: "throughput"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The filename indicates a marlin test configuration, but the environment variables are set for flashinfer. This is inconsistent and will not test the marlin kernel. Please update the environment variables to be consistent with a marlin test for this model type.

env:
  VLLM_USE_DEEP_GEMM: "0"
  VLLM_USE_DEEP_GEMM_MOE: "0"
  VLLM_TEST_FORCE_FP8_MARLIN: "1"

Comment on lines +6 to +7
env:
VLLM_TEST_FORCE_FP8_MARLIN: "1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The filename suggests a vllm-cutlass configuration, but the environment variable VLLM_TEST_FORCE_FP8_MARLIN is set, which forces the marlin kernel. This is inconsistent and will not test the intended vllm-cutlass kernel. The env block should be removed to allow the default kernel to be used.

Comment on lines +6 to +8
env:
VLLM_USE_FLASHINFER_MOE_FP4: "1"
VLLM_FLASHINFER_MOE_BACKEND: "throughput"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The filename indicates a marlin test configuration, but the environment variables are set for flashinfer. This is inconsistent and will not test the marlin kernel. Please update the environment variables to force the marlin kernel.

env:
  VLLM_TEST_FORCE_FP8_MARLIN: "1"

Comment on lines +6 to +7
env:
VLLM_TEST_FORCE_FP8_MARLIN: "1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The filename suggests a vllm-cutlass configuration, but the environment variable VLLM_TEST_FORCE_FP8_MARLIN is set, which forces the marlin kernel. This is inconsistent and will not test the intended vllm-cutlass kernel. The env block should be removed.

@@ -0,0 +1,12 @@
Llama-4-Scout-Fp8-ModelOpt-fi-trtllm.yaml
Qwen3-30B-A3B-Fp8-AutoFp8-fi-trtllm.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This line has trailing whitespace, which could cause parsing issues for scripts that consume this file, potentially leading to test failures. Please remove the trailing spaces.

Qwen3-30B-A3B-Fp8-AutoFp8-fi-trtllm.yaml

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 6, 2026

⚠️ The sha of the head commit of this PR conflicts with #31759. Mergify cannot evaluate rules on this PR. ⚠️

1 similar comment
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 6, 2026

⚠️ The sha of the head commit of this PR conflicts with #31759. Mergify cannot evaluate rules on this PR. ⚠️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant