[NVIDIA] Update FlashInfer to version 0.6.7.post3. Avoid re-downloading BMM export headers when flashinfer-cubin is installed by johnnynunez · Pull Request #38913 · vllm-project/vllm

johnnynunez · 2026-04-03T11:34:50Z

Update 0.6.7.post3 with important fixes. Preparing for 0.6.8.

fix: avoid re-downloading BMM export headers when flashinfer-cubin is installed

johnnynunez · 2026-04-03T11:36:48Z

gemini-code-assist

Code Review

This pull request updates the FlashInfer version from 0.6.7 to 0.6.7.post1 across the repository, including Dockerfiles, version configuration files, and dependency requirements. It also updates internal documentation comments to reflect the new version. I have no feedback to provide.

mgoin

Thanks!

…iles Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

cjackal · 2026-04-04T15:00:27Z

I wonder if it is appropriate to comment here, but flashinfer==0.6.7.post2 is out a few hours ago, which contains the GB300 deadlock fix for TRTLLM attention that #38730 is hotfixing.

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

…ss configuration Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

…iles Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

johnnynunez · 2026-04-07T15:31:38Z

@mgoin this should be ok now

wzhao18 · 2026-04-07T18:27:23Z

After this, maybe we should remove the workaround for GB300 trtllm attention?

johnnynunez · 2026-04-13T07:38:10Z

closing in favor 0.6.8 flashinfer-ai/flashinfer#3042

johnnynunez force-pushed the main branch from 6ea66ba to db0d3fd Compare April 3, 2026 11:35

mergify Bot added ci/build nvidia labels Apr 3, 2026

github-project-automation Bot added this to NVIDIA Apr 3, 2026

johnnynunez changed the title ~~Update FlashInfer to version 0.6.7.post1 in Dockerfiles and related f…~~ [NVIDIA] Update FlashInfer to version 0.6.7.post1 Apr 3, 2026

gemini-code-assist Bot reviewed Apr 3, 2026

View reviewed changes

johnnynunez changed the title ~~[NVIDIA] Update FlashInfer to version 0.6.7.post1~~ [NVIDIA] Update FlashInfer to version 0.6.7.post1. Hot fix for DGX Spark Apr 3, 2026

mgoin approved these changes Apr 3, 2026

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Apr 3, 2026

mgoin added ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs labels Apr 3, 2026

johnnynunez changed the title ~~[NVIDIA] Update FlashInfer to version 0.6.7.post1. Hot fix for DGX Spark~~ [NVIDIA] Update FlashInfer to version 0.6.7.post1. Avoid re-downloading BMM export headers when flashinfer-cubin is installed Apr 3, 2026

johnnynunez added 2 commits April 3, 2026 18:56

Update FlashInfer to version 0.6.7.post1 in Dockerfiles and related f…

26bbbaa

…iles Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Remove pre-download step for FlashInfer TRTLLM BMM headers in Dockerfile

0e7b5ed

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

johnnynunez force-pushed the main branch from 1acabfa to 0e7b5ed Compare April 3, 2026 16:56

johnnynunez and others added 2 commits April 3, 2026 18:57

Merge branch 'main' into main

0a459b3

Merge branch 'vllm-project:main' into main

88f7c9b

johnnynunez and others added 2 commits April 4, 2026 18:38

0.6.7.post2

e6a8591

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Merge branch 'vllm-project:main' into main

12dcd47

johnnynunez changed the title ~~[NVIDIA] Update FlashInfer to version 0.6.7.post1. Avoid re-downloading BMM export headers when flashinfer-cubin is installed~~ [NVIDIA] Update FlashInfer to version 0.6.7.post2. Avoid re-downloading BMM export headers when flashinfer-cubin is installed Apr 4, 2026

johnnynunez added 2 commits April 4, 2026 18:42

Add startup_max_wait_seconds parameter to Llama-4-Scout-BF16-fi-cutla…

e6266b5

…ss configuration Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Merge remote-tracking branch 'origin/main'

0a85d8d

johnnynunez requested a review from vadiklyutiy as a code owner April 4, 2026 16:42

johnnynunez closed this Apr 4, 2026

github-project-automation Bot moved this from Ready to Done in NVIDIA Apr 4, 2026

johnnynunez reopened this Apr 6, 2026

johnnynunez changed the title ~~[NVIDIA] Update FlashInfer to version 0.6.7.post2. Avoid re-downloading BMM export headers when flashinfer-cubin is installed~~ [NVIDIA] Update FlashInfer to version 0.6.7.post3. Avoid re-downloading BMM export headers when flashinfer-cubin is installed Apr 6, 2026

Merge branch 'vllm-project:main' into main

656b6ca

johnnynunez added 2 commits April 6, 2026 10:14

Update FlashInfer to version 0.6.7.post3 in Dockerfiles and related f…

a4c2278

…iles Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Merge branch 'main' into main

3de2b38

johnnynunez added 2 commits April 7, 2026 17:32

Merge branch 'main' into main

768f9db

Merge branch 'main' into main

be3de93

johnnynunez and others added 3 commits April 7, 2026 20:50

Merge branch 'main' into main

528531a

Merge branch 'main' into main

0a3e02e

Merge branch 'main' into main

fcd42ff

johnnynunez requested a review from mgoin April 10, 2026 07:43

ZJY0516 mentioned this pull request Apr 11, 2026

[Perf]: ~23% output throughput regression on Qwen3.5-397B NVFP4 decode (8×B200) over the last 10 days #39004

Closed

1 task

Merge branch 'main' into main

8d3d7d6

johnnynunez closed this Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVIDIA] Update FlashInfer to version 0.6.7.post3. Avoid re-downloading BMM export headers when flashinfer-cubin is installed#38913

[NVIDIA] Update FlashInfer to version 0.6.7.post3. Avoid re-downloading BMM export headers when flashinfer-cubin is installed#38913
johnnynunez wants to merge 17 commits into
vllm-project:mainfrom
johnnynunez:main

johnnynunez commented Apr 3, 2026 •

edited

Loading

Uh oh!

johnnynunez commented Apr 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mgoin left a comment

Uh oh!

cjackal commented Apr 4, 2026

Uh oh!

johnnynunez commented Apr 7, 2026

Uh oh!

wzhao18 commented Apr 7, 2026

Uh oh!

johnnynunez commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

johnnynunez commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Apr 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

cjackal commented Apr 4, 2026

Uh oh!

johnnynunez commented Apr 7, 2026

Uh oh!

wzhao18 commented Apr 7, 2026

Uh oh!

johnnynunez commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johnnynunez commented Apr 3, 2026 •

edited

Loading