[UX] Fallback to native implementation when flashinfer sampler failed to compile#26799
[UX] Fallback to native implementation when flashinfer sampler failed to compile#26799Isotr0py wants to merge 6 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
There was a problem hiding this comment.
Code Review
This pull request introduces a fallback mechanism for the flashinfer sampler. If flashinfer fails to compile its kernels at runtime, vLLM will now gracefully fall back to its native PyTorch-based sampler implementation and log a helpful warning message. This improves the user experience for those who have flashinfer installed but are missing its pre-compiled kernels. My review includes a critical suggestion to also handle AttributeError in the exception block to prevent a potential fatal error during module import if an older version of flashinfer is used.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
|
We disabled flashinfer sampler by default last night #26859, so I think this change with make it such that users will see this warning for now reason since it is immediately attempting to compile on import. Could we make this only trigger when we are requesting the flashinfer sampler? |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Done in e0b95c2 |
Purpose
flashinfer-cubinandflashinfer-jit-cacheare not included. With cubin and jit-cache missing, flashinfer sampler can fail to compile during runtime due to various reasons:flashinfer-cubinandflashinfer-jit-cachemanually.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.