Skip to content

[CI][Bugfix] Fix test_run_eagle_dp#38584

Merged
NickLucche merged 2 commits intovllm-project:mainfrom
MatthewBonanni:fix_dp_batch_invariant
Mar 31, 2026
Merged

[CI][Bugfix] Fix test_run_eagle_dp#38584
NickLucche merged 2 commits intovllm-project:mainfrom
MatthewBonanni:fix_dp_batch_invariant

Conversation

@MatthewBonanni
Copy link
Copy Markdown
Collaborator

@MatthewBonanni MatthewBonanni commented Mar 30, 2026

FIX: #38234
FIX: #31913
Revert: #31915

Purpose

Fixes flaky test by disabling AOT scheduling when VLLM_BATCH_INVARIANT is enabled.

Test Plan

Distributed DP Tests (2 GPUs)
pytest tests/v1/distributed/test_eagle_dp.py::test_run_eagle_dp[FLASH_ATTN]

Test Result

Should pass in CI


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify bot added v1 bug Something isn't working labels Mar 30, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the Flash Attention backend to disable the Ahead-of-Time (AOT) schedule when batch invariance is enabled via the VLLM_BATCH_INVARIANT environment variable. This change is necessary because the AOT schedule varies with the maximum sequence lengths of the query and key, which is incompatible with batch-invariant execution. I have no feedback to provide as no review comments were submitted.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni MatthewBonanni added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 30, 2026
@markmc
Copy link
Copy Markdown
Member

markmc commented Mar 31, 2026

From @NickLucche on Slack:

I am not really convinced this is a batch invariance issue, given I can occasionally repro with the flag set and with a single request.

# Disable AOT schedule for spec-decode proposer (not worth the overhead)
# and for batch invariance (schedule varies with max_seqlen_q/k).
aot_schedule = (
self.aot_schedule and not fast_build and not envs.VLLM_BATCH_INVARIANT
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a lambda

@NickLucche NickLucche merged commit 7d65463 into vllm-project:main Mar 31, 2026
62 checks passed
@markmc markmc mentioned this pull request Mar 31, 2026
1 task
@MatthewBonanni
Copy link
Copy Markdown
Collaborator Author

Note: wasn't intending for this to be merged yet, wanted to run CI a few times on this PR to verify the fix. Unfortunately it looks like the test is still flaky. #38566 disables the test temporarily

@MatthewBonanni MatthewBonanni deleted the fix_dp_batch_invariant branch March 31, 2026 14:11
@MatthewBonanni MatthewBonanni restored the fix_dp_batch_invariant branch March 31, 2026 15:03
@yewentao256 yewentao256 changed the title [WIP][CI][Bugfix] Fix test_run_eagle_dp [CI][Bugfix] Fix test_run_eagle_dp Mar 31, 2026
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
bhargav-patel-29 pushed a commit to Bharatgen-Tech/vllm that referenced this pull request Apr 1, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test Failure: test_run_eagle_dp[FLASH_ATTN] produces non-deterministic outputs with EAGLE speculative decoding [Bug]: test_eagle_dp test is flaky

4 participants