[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10#31915
[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10#31915zou3519 merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Richard Zou <zou3519@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses flakiness in the test_eagle_dp test by reducing the number of expected tokens from 100 to 20. While this change is likely to make the test more stable, I have a concern that it also reduces the test's coverage. A bug that only appears in longer generation sequences might be missed with this change. I've left a comment suggesting to investigate the root cause of the flakiness, such as increasing timeouts if it's a performance issue, rather than reducing the test's scope.
| # This test might be flaky, see | ||
| # https://github.com/vllm-project/vllm/issues/31913 | ||
| num_expected_tokens = 20 |
There was a problem hiding this comment.
While reducing num_expected_tokens from 100 to 20 might fix the test flakiness, it also reduces the test's coverage. A bug in the data parallel logic for Eagle that only manifests with longer sequences (more than 20 tokens) might now be missed. This could be problematic for ensuring correctness.
Consider investigating the root cause of the flakiness. If it's a timeout issue (see line 75), increasing the timeout might be a better solution. If it's a deeper race condition or non-determinism, that should be addressed directly. Reducing the test's scope should be a last resort. Reverting this change is suggested if a better fix can be found.
| # This test might be flaky, see | |
| # https://github.com/vllm-project/vllm/issues/31913 | |
| num_expected_tokens = 20 | |
| num_expected_tokens = 100 |
…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com>
…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com>
…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…t#31915) Signed-off-by: Richard Zou <zou3519@gmail.com>
Purpose
Fix flakiness in this test for PyTorch 2.10. The test can fail in PyTorch 2.9 too, see #31913 for explanation.
Test Plan
Ran
TP_SIZE=2 DP_SIZE=2 pytest tests/v1/distributed/test_eagle_dp.py::test_run_eagle_dpon 2x L4 and verified it passed.Test Result
Pass