[Test] Add nightly MoE eval tests by bnellnm · Pull Request #39956 · vllm-project/vllm

bnellnm · 2026-04-16T00:20:56Z

Purpose

Add eval tests for important models that can be run nightly. The -small.txt models run with <=2 GPUs, while the -large.txt models need > 2.

Test Plan

Ran it locally

Test Result

Model	Baseline
arcee-ai/Trinity-Mini	0.8408
deepseek-ai/DeepSeek-R1	0.9492
google/gemma-4-26B-A4B-it	0.3017
zai-org/GLM-4.7-Flash	0.8241
openai/gpt-oss-20b	0.3154
ibm-granite/granite-4.0-h-small	0.8400
ai21labs/AI21-Jamba2-Mini	0.7665
LiquidAI/LFM2.5-350M	0.2092
meta-llama/Llama-4-Scout-17B-16E-Instruct	TIMEOUT
MiniMaxAI/MiniMax-M2.7	0.9249
mistralai/Mixtral-8x7B-v0.1	0.5512
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4	0.9295
allenai/OLMoE-1B-7B-0125-Instruct	0.6770
microsoft/Phi-tiny-MoE-instruct	0.7020
sarvamai/sarvam-30b	0.6588
stepfun-ai/Step-3.5-Flash	FAIL

Issues will be created for the failing tests.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Bill Nell <bnell@redhat.com>

gemini-code-assist

Code Review

This pull request adds several new model configuration files for GSM8K evaluations and updates the nightly test lists. Several referenced configuration files are missing from the PR, including DeepSeek-R1-TP.yaml and those in the moe-refactor directory, which will cause CI failures. Additionally, the 120B parameter Nemotron model is incorrectly categorized in the small evaluation suite and should be moved to the large suite.

Signed-off-by: Bill Nell <bnell@redhat.com>

robertgshaw2-redhat · 2026-04-20T23:12:22Z

which job runs these?

bnellnm · 2026-04-21T01:17:53Z

which job runs these?

Good question. Is there a spot to add nightly tests?

vadiklyutiy · 2026-04-21T02:01:29Z

Just wondering on what GPU arch we are going to run it?

bnellnm · 2026-04-21T15:04:27Z

Just wondering on what GPU arch we are going to run it?

I was planning on H100 but we could run on other arches too.

Signed-off-by: Bill Nell <bnell@redhat.com>

…y-eval-tests

bnellnm added 2 commits April 15, 2026 22:58

add nightly eval tests for important models

109dad0

Signed-off-by: Bill Nell <bnell@redhat.com>

bump up accuracy

f2aae25

Signed-off-by: Bill Nell <bnell@redhat.com>

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread tests/evals/gsm8k/configs/models-nightly-large.txt

Comment thread tests/evals/gsm8k/configs/models-nightly-small.txt Outdated

bnellnm added 2 commits April 16, 2026 02:32

tweaks

2bab064

Signed-off-by: Bill Nell <bnell@redhat.com>

tweak

ae10abd

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm marked this pull request as ready for review April 16, 2026 13:03

bnellnm requested review from mgoin and vadiklyutiy as code owners April 16, 2026 13:03

bnellnm requested a review from robertgshaw2-redhat April 16, 2026 13:04

remove mamba-backend arg from jamba

7ea78fd

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm changed the title ~~Nightly eval tests~~ [Test] Add nightly eval tests Apr 16, 2026

bnellnm and others added 2 commits April 17, 2026 18:20

add ernie

6779d26

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge branch 'main' into nightly-eval-tests

54d9d08

bnellnm changed the title ~~[Test] Add nightly eval tests~~ [Test] Add nightly MoE eval tests Apr 20, 2026

bnellnm added 3 commits April 21, 2026 17:17

add lm_eval.yaml tests

6fbf545

Signed-off-by: Bill Nell <bnell@redhat.com>

comment

d565283

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge remote-tracking branch 'nm-vllm/nightly-eval-tests' into nightl…

dc8fb1c

…y-eval-tests

mergify Bot added the ci/build label Apr 21, 2026

bnellnm mentioned this pull request Apr 23, 2026

[Bugfix] Fix DeepSeek V2-Lite Accuracy drop #40673

Merged

4 tasks

Merge branch 'main' into nightly-eval-tests

84e4ccb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Test] Add nightly MoE eval tests#39956

[Test] Add nightly MoE eval tests#39956
bnellnm wants to merge 11 commits intovllm-project:mainfrom
neuralmagic:nightly-eval-tests

bnellnm commented Apr 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Apr 20, 2026

Uh oh!

bnellnm commented Apr 21, 2026

Uh oh!

vadiklyutiy commented Apr 21, 2026

Uh oh!

bnellnm commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bnellnm commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Apr 20, 2026

Uh oh!

bnellnm commented Apr 21, 2026

Uh oh!

vadiklyutiy commented Apr 21, 2026

Uh oh!

bnellnm commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bnellnm commented Apr 16, 2026 •

edited

Loading