[CI/Testing] Add basic single node dual batch overlap test by LucasWilkinson · Pull Request #27235 · vllm-project/vllm

LucasWilkinson · 2025-10-21T00:59:17Z

Ideally we'd do a multi-node test; but add a single node test for now to make sure we atleast get some coverage

Not sure exactly which test suite to put it in; uses DeepEP so needs to be run on hopper or Blackwell

gemini-code-assist

Code Review

This pull request introduces a new test for Dual Batch Overlap (DBO) with Data Parallelism and Expert Parallelism. The test is well-structured, using a GSM8K evaluation to verify correctness on a multi-GPU single-node setup. The CI configuration is also updated to run this test on H100 GPUs. My review found one high-severity issue related to missing test dependencies in the CI configuration, which could lead to the test not running when its helper utilities are modified. Otherwise, the changes are solid and a good addition to the test suite.

gemini-code-assist · 2025-10-21T01:01:13Z

.buildkite/test-pipeline.yaml

+  source_file_dependencies:
+    - docker/Dockerfile # To catch DeepEP updates
+    - vllm/model_executor/layers/fused_moe
+    - vllm/distributed/device_communicators
+    - vllm/v1/worker/
+    - vllm/v1/attention/backends/utils.py


The source_file_dependencies list is missing dependencies on the test utility files used by tests/v1/distributed/test_dbo.py. The test imports from tests.evals.gsm8k.gsm8k_eval and tests.utils. Changes to these files could affect the test's behavior or correctness, but they won't trigger this test run. Please add them to the dependency list to ensure the test is run when its dependencies change.

source_file_dependencies: - tests/evals/gsm8k/gsm8k_eval.py - tests/utils.py - docker/Dockerfile # To catch DeepEP updates - vllm/model_executor/layers/fused_moe - vllm/distributed/device_communicators - vllm/v1/worker/ - vllm/v1/attention/backends/utils.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-10-21T01:02:14Z

.buildkite/test-pipeline.yaml

+- label: Distributed Tests (H100) # optional
+  gpu: h100
+  working_dir: "/vllm-workspace/"
+  num_gpus: 2
+  commands: 
+    - pytest -v -s tests/v1/distributed/test_dbo.py
+  source_file_dependencies:


Mark H100 DBO step optional

The new H100 pipeline step is commented as optional but the Buildkite block doesn’t set optional: true. Without that flag Buildkite will treat the step as required, so every CI run now waits for an H100 agent even when the queue has none available. This effectively blocks the pipeline whenever H100 hardware isn’t provisioned, defeating the stated intent of having an optional dual batch overlap test.

Useful? React with 👍 / 👎.

Seems like a reasonable suggestion

Moved it to B200 and H200 nightly 👍 (per suggestion from @mgoin)

SageMoore

Thanks for the test @LucasWilkinson

SageMoore · 2025-10-21T03:24:27Z

tests/v1/distributed/test_dbo.py

+        # Note: Not using --enforce-eager to test DBO's alternate CUDA graph dispatching
+        "--data-parallel-size", str(DP_SIZE),
+        "--enable-expert-parallel",
+        "--enable-dbo",


Do we want to drop the decode threshold as well?

we could; I already verified that we hit cases above and below both thresholds but probably good to fix them so if they get updated we don't suddenly start testing no-DBO

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

…ect#27235) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mergify bot added ci/build v1 labels Oct 21, 2025

gemini-code-assist bot reviewed Oct 21, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 21, 2025

View reviewed changes

SageMoore reviewed Oct 21, 2025

View reviewed changes

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025

SageMoore approved these changes Oct 21, 2025

View reviewed changes

LucasWilkinson added 4 commits November 3, 2025 14:43

dbo test

7f1aa5a

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

make sure we surpass thresholds

4361272

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

review comments

2dca7dd

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

format

d498913

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson force-pushed the lwilkinson/dbo-test branch from 53c444d to d498913 Compare November 3, 2025 14:44

tlrmchlsmth approved these changes Nov 3, 2025

View reviewed changes

tlrmchlsmth enabled auto-merge (squash) November 3, 2025 15:24

tlrmchlsmth merged commit 4bc400f into vllm-project:main Nov 3, 2025
24 checks passed

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[CI/Testing] Add basic single node dual batch overlap test (vllm-proj…

acdd3e9

…ect#27235) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[CI/Testing] Add basic single node dual batch overlap test (vllm-proj…

a31e41e

…ect#27235) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI/Testing] Add basic single node dual batch overlap test#27235

[CI/Testing] Add basic single node dual batch overlap test#27235
tlrmchlsmth merged 4 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/dbo-test

LucasWilkinson commented Oct 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 21, 2025

Uh oh!

tlrmchlsmth Oct 21, 2025

Uh oh!

LucasWilkinson Oct 21, 2025

Uh oh!

SageMoore left a comment

Uh oh!

SageMoore Oct 21, 2025

Uh oh!

LucasWilkinson Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

LucasWilkinson commented Oct 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

SageMoore left a comment

Choose a reason for hiding this comment

Uh oh!

SageMoore Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucasWilkinson commented Oct 21, 2025 •

edited by github-actions bot

Loading