[UX] Fallback to native implementation when flashinfer sampler failed to compile by Isotr0py · Pull Request #26799 · vllm-project/vllm

Isotr0py · 2025-10-14T11:33:03Z

Purpose

After [UX] Add FlashInfer as default CUDA dependency #26443, flashinfer became a default dependency, but flashinfer-cubin and flashinfer-jit-cache are not included. With cubin and jit-cache missing, flashinfer sampler can fail to compile during runtime due to various reasons:

(EngineCore_DP0 pid=7122) RuntimeError: Ninja build failed. Ninja output:
(EngineCore_DP0 pid=7122) ninja: Entering directory `/root/.cache/flashinfer/75/cached_ops'
...
(EngineCore_DP0 pid=7122) /usr/bin/ld: cannot find -lcuda: No such file or directory

(EngineCore_DP0 pid=815718) RuntimeError: Ninja build failed. Ninja output:
(EngineCore_DP0 pid=815718) ninja: Entering directory `/home/mozf/.cache/flashinfer/86/cached_ops'
(EngineCore_DP0 pid=815718) ninja: error: '/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/flashinfer/data/csrc/flashinfer_sampling_ops.cu', needed by 'sampling/flashinfer_sampling_ops.cuda.o', missing and no known rule to make it

This PR allows sampler falled back to native implementation when flashinfer ones failed to compile, and logging a message to suggest user to install flashinfer-cubin and flashinfer-jit-cache manually.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

gemini-code-assist

Code Review

This pull request introduces a fallback mechanism for the flashinfer sampler. If flashinfer fails to compile its kernels at runtime, vLLM will now gracefully fall back to its native PyTorch-based sampler implementation and log a helpful warning message. This improves the user experience for those who have flashinfer installed but are missing its pre-compiled kernels. My review includes a critical suggestion to also handle AttributeError in the exception block to prevent a potential fatal error during module import if an older version of flashinfer is used.

vllm/v1/sample/ops/topk_topp_sampler.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

vllm/v1/sample/ops/topk_topp_sampler.py

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mgoin · 2025-10-15T15:54:24Z

We disabled flashinfer sampler by default last night #26859, so I think this change with make it such that users will see this warning for now reason since it is immediately attempting to compile on import. Could we make this only trigger when we are requesting the flashinfer sampler?

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py · 2025-10-15T17:40:10Z

Could we make this only trigger when we are requesting the flashinfer sampler?

Done in e0b95c2

Isotr0py added 2 commits October 14, 2025 19:19

fallback to native sampler

b7ec2b9

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

msg

dd05760

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py requested review from 22quinn, houseroad and njhill as code owners October 14, 2025 11:33

mergify bot added the v1 label Oct 14, 2025

Isotr0py requested a review from mgoin October 14, 2025 11:33

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

vllm/v1/sample/ops/topk_topp_sampler.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 14, 2025

View reviewed changes

vllm/v1/sample/ops/topk_topp_sampler.py Show resolved Hide resolved

Isotr0py and others added 3 commits October 14, 2025 19:53

gemini

3a8c9e9

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Merge branch 'vllm-project:main' into flashinfer-sampler

bf9a6c3

Merge branch 'vllm-project:main' into flashinfer-sampler

edd149b

only trigger compilation when enabling flashinfer sampler

e0b95c2

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mgoin closed this Jan 5, 2026

Isotr0py deleted the flashinfer-sampler branch January 6, 2026 00:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[UX] Fallback to native implementation when flashinfer sampler failed to compile#26799

[UX] Fallback to native implementation when flashinfer sampler failed to compile#26799
Isotr0py wants to merge 6 commits intovllm-project:mainfrom
Isotr0py:flashinfer-sampler

Isotr0py commented Oct 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

mgoin commented Oct 15, 2025

Uh oh!

Isotr0py commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Isotr0py commented Oct 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mgoin commented Oct 15, 2025

Uh oh!

Isotr0py commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Isotr0py commented Oct 14, 2025 •

edited by github-actions bot

Loading