Disable flashinfer autotune temporarily due to correctness issues#41524
Conversation
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request disables the enable_flashinfer_autotune setting within the OPTIMIZATION_LEVEL_01 and OPTIMIZATION_LEVEL_02 configurations in vllm/config/vllm.py to address known correctness issues. Feedback suggests that this feature should also be disabled for OPTIMIZATION_LEVEL_03 to ensure consistency and prevent potential errors for users on that optimization level.
| "enable_flashinfer_autotune": True, | ||
| # Disabled for now due to correctness issues: | ||
| # https://github.com/flashinfer-ai/flashinfer/issues/3197 | ||
| "enable_flashinfer_autotune": False, |
There was a problem hiding this comment.
The correctness issues with FlashInfer autotuning likely affect OPTIMIZATION_LEVEL_03 as well, especially since it is documented as being the same as OPTIMIZATION_LEVEL_02 (line 80). Leaving it enabled in O3 (line 256) while disabling it in O1 and O2 creates an inconsistency that could lead to incorrect results for users selecting the O3 level. Please consider disabling it for O3 as well.
mgoin
left a comment
There was a problem hiding this comment.
I've been seeing workarounds like this recently, LGTM
yewentao256
left a comment
There was a problem hiding this comment.
LGTM, thanks for the work!
|
Merge from main as "Rule: auto-rebase to keep merge candidate within 1 day behind main (update)" |
…lm-project#41524) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: Joachim Studnia <joachim@mistral.ai>
…lm-project#41524) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
…lm-project#41524) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
…lm-project#41524) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
|
The issue has been root caused in flashinfer and fixed by flashinfer-ai/flashinfer#3227. We can re-enable autotuning when flashinfer v0.6.11 is integrated. |
…sues (vllm-project#41524)" This reverts commit c51df43.
Purpose
We have observed correctness bugs with flashinfer autotuning, as seen in issue flashinfer-ai/flashinfer#3197. Kernel-level reproduction is available.
While this is pending for fix, this PR disables flashinfer autotuning by default for now for O1 and O2 optimization levels.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.