[CI] Add mandatory H100 TP=2 smoke test by stecasta · Pull Request #36157 · vllm-project/vllm

stecasta · 2026-03-05T17:21:48Z

Summary

Add a mandatory test that starts a vLLM server with TP=2 on H100 using default settings and verifies it can serve requests across 3 models (dense BF16, dense FP8, MoE BF16).

Context

No mandatory test currently starts a vLLM server on Hopper/Blackwell with TP>1, DP=1, and default config. On this hardware, vLLM auto-enables optimizations like fuse_allreduce_rms and cudagraphs. Existing mandatory tests either run on L4 (where these are disabled), use DP>1, or set CUDAGraphMode.NONE.

This means regressions in the default TP serving path on Hopper/Blackwell can go undetected until users hit them. For example, #34109 introduced a cudagraph capture hang (#35772) that passed all mandatory CI. The issue surfaced in an optional nightly LM eval test but took several days before it was noticed and reported.

Models

Model	Arch	Quant
meta-llama/Llama-3.2-1B-Instruct	Dense	BF16
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8	Dense	FP8
microsoft/Phi-mini-MoE-instruct	MoE	BF16

I chose these 3 models to cover dense, FP8, and MoE architectures with minimal overhead (~8 min total on H100). Open to suggestions for additional models, quantizations, or platforms.

Test plan

Verify test passes on H100 with 2 GPUs
Confirm runtime fits within 15 min timeout

Add a smoke test that starts a vLLM server with TP=2 on H100 using default settings and verifies it can serve requests across dense BF16, dense FP8, and MoE BF16 models. Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>

stecasta · 2026-03-05T17:23:50Z

@ProExpertProg @zou3519 @benchislett could you review when you get a chance?

gemini-code-assist

Code Review

This pull request adds a mandatory smoke test for Tensor Parallelism (TP=2) on H100 GPUs to catch regressions in the default serving path. The changes include a new Buildkite CI step and the corresponding pytest test file. The test covers dense, FP8, and MoE models. My review found one critical issue in the new test file where one of the selected models for testing is not registered, which will cause the test to fail.

gemini-code-assist · 2026-03-05T17:23:56Z

tests/v1/distributed/test_tp2_smoke.py

+MODELS = [
+    "meta-llama/Llama-3.2-1B-Instruct",
+    "RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8",
+    "microsoft/Phi-mini-MoE-instruct",
+]


The model microsoft/Phi-mini-MoE-instruct is not present in tests/models/registry.py. This will cause HF_EXAMPLE_MODELS.find_hf_info(model_id) on line 24 to raise a ValueError, failing the test for this model.

To fix this, you can either add the model to the registry or replace it with an existing small MoE model. For example, you could use TitanML/tiny-mixtral, which is already in the registry.

Suggested change

MODELS = [

"meta-llama/Llama-3.2-1B-Instruct",

"RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8",

"microsoft/Phi-mini-MoE-instruct",

]

MODELS = [

"meta-llama/Llama-3.2-1B-Instruct",

"RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8",

"TitanML/tiny-mixtral",

]

robertgshaw2-redhat · 2026-03-05T17:24:52Z

the current H100 tests are optional due to capacity constraints. these run every night or on a conditional basis.

closing this. please feel free to join #ci-notifications where we monitor and triage nightly status.

xinli-sw · 2026-03-05T22:45:51Z

@robertgshaw2-redhat @stecasta do you think it makes more sense to target B200 instead?

stecasta · 2026-03-06T09:45:33Z

From the test perspective it makes sense to target B200. The idea is to avoid having a broken main that slows down development. Ideally we would target both Hopper and Blackwell, but we can start with the least scarce.

[CI] Add mandatory TP=2 basic correctness test on H100

615a7e1

Add a smoke test that starts a vLLM server with TP=2 on H100 using default settings and verifies it can serve requests across dense BF16, dense FP8, and MoE BF16 models. Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>

mergify bot added ci/build v1 labels Mar 5, 2026

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

robertgshaw2-redhat closed this Mar 5, 2026

jasonlizhengjian mentioned this pull request Mar 6, 2026

[Tracking issue]: NVIDIA CI improvements #36264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Add mandatory H100 TP=2 smoke test#36157

[CI] Add mandatory H100 TP=2 smoke test#36157
stecasta wants to merge 1 commit intovllm-project:mainfrom
stecasta:add-h100-tp2-smoke-test

stecasta commented Mar 5, 2026 •

edited

Loading

Uh oh!

stecasta commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 5, 2026

Uh oh!

robertgshaw2-redhat commented Mar 5, 2026

Uh oh!

xinli-sw commented Mar 5, 2026

Uh oh!

stecasta commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

stecasta commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Models

Test plan

Uh oh!

stecasta commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat commented Mar 5, 2026

Uh oh!

xinli-sw commented Mar 5, 2026

Uh oh!

stecasta commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stecasta commented Mar 5, 2026 •

edited

Loading