Skip to content

[UX] Fallback to native implementation when flashinfer sampler failed to compile#26799

Closed
Isotr0py wants to merge 6 commits intovllm-project:mainfrom
Isotr0py:flashinfer-sampler
Closed

[UX] Fallback to native implementation when flashinfer sampler failed to compile#26799
Isotr0py wants to merge 6 commits intovllm-project:mainfrom
Isotr0py:flashinfer-sampler

Conversation

@Isotr0py
Copy link
Copy Markdown
Member

@Isotr0py Isotr0py commented Oct 14, 2025

Purpose

  • After [UX] Add FlashInfer as default CUDA dependency #26443, flashinfer became a default dependency, but flashinfer-cubin and flashinfer-jit-cache are not included. With cubin and jit-cache missing, flashinfer sampler can fail to compile during runtime due to various reasons:
(EngineCore_DP0 pid=7122) RuntimeError: Ninja build failed. Ninja output:
(EngineCore_DP0 pid=7122) ninja: Entering directory `/root/.cache/flashinfer/75/cached_ops'
...
(EngineCore_DP0 pid=7122) /usr/bin/ld: cannot find -lcuda: No such file or directory
(EngineCore_DP0 pid=815718) RuntimeError: Ninja build failed. Ninja output:
(EngineCore_DP0 pid=815718) ninja: Entering directory `/home/mozf/.cache/flashinfer/86/cached_ops'
(EngineCore_DP0 pid=815718) ninja: error: '/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/flashinfer/data/csrc/flashinfer_sampling_ops.cu', needed by 'sampling/flashinfer_sampling_ops.cuda.o', missing and no known rule to make it
  • This PR allows sampler falled back to native implementation when flashinfer ones failed to compile, and logging a message to suggest user to install flashinfer-cubin and flashinfer-jit-cache manually.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@mergify mergify bot added the v1 label Oct 14, 2025
@Isotr0py Isotr0py requested a review from mgoin October 14, 2025 11:33
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fallback mechanism for the flashinfer sampler. If flashinfer fails to compile its kernels at runtime, vLLM will now gracefully fall back to its native PyTorch-based sampler implementation and log a helpful warning message. This improves the user experience for those who have flashinfer installed but are missing its pre-compiled kernels. My review includes a critical suggestion to also handle AttributeError in the exception block to prevent a potential fatal error during module import if an older version of flashinfer is used.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

@mgoin
Copy link
Copy Markdown
Member

mgoin commented Oct 15, 2025

We disabled flashinfer sampler by default last night #26859, so I think this change with make it such that users will see this warning for now reason since it is immediately attempting to compile on import. Could we make this only trigger when we are requesting the flashinfer sampler?

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@Isotr0py
Copy link
Copy Markdown
Member Author

Could we make this only trigger when we are requesting the flashinfer sampler?

Done in e0b95c2

@mgoin mgoin closed this Jan 5, 2026
@Isotr0py Isotr0py deleted the flashinfer-sampler branch January 6, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants