[CI][Bugfix] Fix test_run_eagle_dp#38584
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies the Flash Attention backend to disable the Ahead-of-Time (AOT) schedule when batch invariance is enabled via the VLLM_BATCH_INVARIANT environment variable. This change is necessary because the AOT schedule varies with the maximum sequence lengths of the query and key, which is incompatible with batch-invariant execution. I have no feedback to provide as no review comments were submitted.
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
From @NickLucche on Slack:
|
| # Disable AOT schedule for spec-decode proposer (not worth the overhead) | ||
| # and for batch invariance (schedule varies with max_seqlen_q/k). | ||
| aot_schedule = ( | ||
| self.aot_schedule and not fast_build and not envs.VLLM_BATCH_INVARIANT |
|
Note: wasn't intending for this to be merged yet, wanted to run CI a few times on this PR to verify the fix. Unfortunately it looks like the test is still flaky. #38566 disables the test temporarily |
test_run_eagle_dptest_run_eagle_dp
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
FIX: #38234
FIX: #31913
Revert: #31915
Purpose
Fixes flaky test by disabling AOT scheduling when
VLLM_BATCH_INVARIANTis enabled.Test Plan
Distributed DP Tests (2 GPUs)
pytest tests/v1/distributed/test_eagle_dp.py::test_run_eagle_dp[FLASH_ATTN]Test Result
Should pass in CI
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.