[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 by zou3519 · Pull Request #31915 · vllm-project/vllm

zou3519 · 2026-01-07T19:40:37Z

Purpose

Fix flakiness in this test for PyTorch 2.10. The test can fail in PyTorch 2.9 too, see #31913 for explanation.

Test Plan

Ran TP_SIZE=2 DP_SIZE=2 pytest tests/v1/distributed/test_eagle_dp.py::test_run_eagle_dp on 2x L4 and verified it passed.

Test Result

Pass

Signed-off-by: Richard Zou <zou3519@gmail.com>

atalman

lgtm

gemini-code-assist

Code Review

This pull request addresses flakiness in the test_eagle_dp test by reducing the number of expected tokens from 100 to 20. While this change is likely to make the test more stable, I have a concern that it also reduces the test's coverage. A bug that only appears in longer generation sequences might be missed with this change. I've left a comment suggesting to investigate the root cause of the flakiness, such as increasing timeouts if it's a performance issue, rather than reducing the test's scope.

gemini-code-assist · 2026-01-07T19:47:30Z

tests/v1/distributed/test_eagle_dp.py

+    # This test might be flaky, see
+    # https://github.com/vllm-project/vllm/issues/31913
+    num_expected_tokens = 20


While reducing num_expected_tokens from 100 to 20 might fix the test flakiness, it also reduces the test's coverage. A bug in the data parallel logic for Eagle that only manifests with longer sequences (more than 20 tokens) might now be missed. This could be problematic for ensuring correctness.

Consider investigating the root cause of the flakiness. If it's a timeout issue (see line 75), increasing the timeout might be a better solution. If it's a deeper race condition or non-determinism, that should be addressed directly. Reducing the test's scope should be a last resort. Reverting this change is suggested if a better fix can be found.

Suggested change

# This test might be flaky, see

# https://github.com/vllm-project/vllm/issues/31913

num_expected_tokens = 20

num_expected_tokens = 100

LucasWilkinson

lgtm

…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com>

…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com>

[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10

28ff6b6

Signed-off-by: Richard Zou <zou3519@gmail.com>

zou3519 requested review from LucasWilkinson and yewentao256 January 7, 2026 19:41

mergify bot added the v1 label Jan 7, 2026

atalman approved these changes Jan 7, 2026

View reviewed changes

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

LucasWilkinson approved these changes Jan 8, 2026

View reviewed changes

zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 8, 2026

zou3519 enabled auto-merge (squash) January 8, 2026 02:17

zou3519 merged commit a79079f into vllm-project:main Jan 8, 2026
27 checks passed

yewentao256 mentioned this pull request Jan 8, 2026

[Bug]: test_eagle_dp test is flaky #31913

Closed

1 task

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (vllm-projec…

0722d69

…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (vllm-projec…

279b7b0

…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (vllm-projec…

8ad40cd

…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (vllm-projec…

91ba74b

…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com>

MatthewBonanni mentioned this pull request Mar 30, 2026

[CI][Bugfix] Fix test_run_eagle_dp #38584

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10#31915

[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10#31915
zou3519 merged 1 commit intovllm-project:mainfrom
zou3519:fix_test_eagle_dp2

zou3519 commented Jan 7, 2026 •

edited by github-actions bot

Loading

Uh oh!

atalman left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

zou3519 commented Jan 7, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zou3519 commented Jan 7, 2026 •

edited by github-actions bot

Loading