Skip to content

[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins#27990

Merged
houseroad merged 3 commits intovllm-project:mainfrom
mxz297:export-D86104899
Nov 8, 2025
Merged

[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins#27990
houseroad merged 3 commits intovllm-project:mainfrom
mxz297:export-D86104899

Conversation

@mxz297
Copy link
Copy Markdown
Contributor

@mxz297 mxz297 commented Nov 3, 2025

Summary: #26443 adds checking of availability of nvcc as a condition to enable flashinfer moe. In our deployment env, there is no nvcc, so flashinfer moe is disabled
Differential Revision: D86104899

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to enable FlashInfer in environments where nvcc is not available by removing the nvcc availability check. While this addresses the issue for environments with pre-compiled kernels, it could introduce runtime crashes for users who lack both nvcc and pre-compiled kernels. I've suggested a safer alternative that makes the nvcc check conditional on the VLLM_HAS_FLASHINFER_CUBIN environment variable. This approach provides the desired flexibility for production environments while preserving the safeguard for other users.

@mxz297
Copy link
Copy Markdown
Contributor Author

mxz297 commented Nov 3, 2025

@mgoin our internal prod environment uses flashinfer in an AOT fashion, and do not have nvcc. So right now we are seeing flashinfer moe being disabled internally, causing perf regression.

@alecsolder
Copy link
Copy Markdown
Contributor

Is there a way we can add unit tests to ensure this doesn't get turned off accidentally again for the model?

@mxz297 mxz297 changed the title do not check nvcc availability [flashinfer][fix] do not check nvcc availability Nov 3, 2025
@heheda12345
Copy link
Copy Markdown
Collaborator

Is nvcc required by the jit compilation of FlashInfer?

@mxz297 mxz297 changed the title [flashinfer][fix] do not check nvcc availability [flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins Nov 7, 2025
@mxz297
Copy link
Copy Markdown
Contributor Author

mxz297 commented Nov 7, 2025

Is nvcc required by the jit compilation of FlashInfer?

Yes, so it makes sense to take the recommendation from the gemini-code-assist to guard the check with VLLM_HAS_FLASHINFER_CUBIN

@gemini-code-assist
Copy link
Copy Markdown
Contributor

That's correct. Guarding the nvcc check with VLLM_HAS_FLASHINFER_CUBIN ensures that FlashInfer JIT compilation only attempts to use nvcc when pre-downloaded cubins are not available, which aligns with the goal of supporting AOT environments. This change effectively addresses the issue of FlashInfer being disabled in environments without nvcc.

…-project#27990)

Summary:

vllm-project#26443 adds checking of availability of nvcc as a condition to enable flashinfer moe. On devgpus, we may have nvcc so there is no issue. But in tw jobs, there is no nvcc, then flashinfer moe is disabled.

Differential Revision: D86104899

Signed-off-by: Xiaozhu <mxz297@gmail.com>
Signed-off-by: Xiaozhu <mxz297@gmail.com>
Copy link
Copy Markdown
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@houseroad houseroad added ready-for-merge ready ONLY add when PR is ready to merge/full CI is needed labels Nov 7, 2025
@houseroad houseroad merged commit 4a36681 into vllm-project:main Nov 8, 2025
45 checks passed
@mxz297 mxz297 deleted the export-D86104899 branch November 8, 2025 18:10
@mgoin mgoin added the nvidia label Nov 10, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…oaded cubins (vllm-project#27990)

Signed-off-by: Xiaozhu <mxz297@gmail.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants