Skip to content

[NVIDIA] Update FlashInfer to version 0.6.7.post3. Avoid re-downloading BMM export headers when flashinfer-cubin is installed#38913

Closed
johnnynunez wants to merge 17 commits into
vllm-project:mainfrom
johnnynunez:main
Closed

[NVIDIA] Update FlashInfer to version 0.6.7.post3. Avoid re-downloading BMM export headers when flashinfer-cubin is installed#38913
johnnynunez wants to merge 17 commits into
vllm-project:mainfrom
johnnynunez:main

Conversation

@johnnynunez
Copy link
Copy Markdown
Contributor

@johnnynunez johnnynunez commented Apr 3, 2026

Update 0.6.7.post3 with important fixes. Preparing for 0.6.8.

  • fix: avoid re-downloading BMM export headers when flashinfer-cubin is installed

@johnnynunez johnnynunez changed the title Update FlashInfer to version 0.6.7.post1 in Dockerfiles and related f… [NVIDIA] Update FlashInfer to version 0.6.7.post1 Apr 3, 2026
@johnnynunez
Copy link
Copy Markdown
Contributor Author

cc @mgoin

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the FlashInfer version from 0.6.7 to 0.6.7.post1 across the repository, including Dockerfiles, version configuration files, and dependency requirements. It also updates internal documentation comments to reflect the new version. I have no feedback to provide.

@johnnynunez johnnynunez changed the title [NVIDIA] Update FlashInfer to version 0.6.7.post1 [NVIDIA] Update FlashInfer to version 0.6.7.post1. Hot fix for DGX Spark Apr 3, 2026
Copy link
Copy Markdown
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@github-project-automation github-project-automation Bot moved this to Ready in NVIDIA Apr 3, 2026
@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs labels Apr 3, 2026
@johnnynunez johnnynunez changed the title [NVIDIA] Update FlashInfer to version 0.6.7.post1. Hot fix for DGX Spark [NVIDIA] Update FlashInfer to version 0.6.7.post1. Avoid re-downloading BMM export headers when flashinfer-cubin is installed Apr 3, 2026
…iles

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
@cjackal
Copy link
Copy Markdown
Contributor

cjackal commented Apr 4, 2026

I wonder if it is appropriate to comment here, but flashinfer==0.6.7.post2 is out a few hours ago, which contains the GB300 deadlock fix for TRTLLM attention that #38730 is hotfixing.

johnnynunez and others added 2 commits April 4, 2026 18:38
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
@johnnynunez johnnynunez changed the title [NVIDIA] Update FlashInfer to version 0.6.7.post1. Avoid re-downloading BMM export headers when flashinfer-cubin is installed [NVIDIA] Update FlashInfer to version 0.6.7.post2. Avoid re-downloading BMM export headers when flashinfer-cubin is installed Apr 4, 2026
@johnnynunez johnnynunez requested a review from vadiklyutiy as a code owner April 4, 2026 16:42
@johnnynunez johnnynunez closed this Apr 4, 2026
@github-project-automation github-project-automation Bot moved this from Ready to Done in NVIDIA Apr 4, 2026
@johnnynunez johnnynunez reopened this Apr 6, 2026
@johnnynunez johnnynunez changed the title [NVIDIA] Update FlashInfer to version 0.6.7.post2. Avoid re-downloading BMM export headers when flashinfer-cubin is installed [NVIDIA] Update FlashInfer to version 0.6.7.post3. Avoid re-downloading BMM export headers when flashinfer-cubin is installed Apr 6, 2026
@johnnynunez
Copy link
Copy Markdown
Contributor Author

@mgoin this should be ok now

@wzhao18
Copy link
Copy Markdown
Contributor

wzhao18 commented Apr 7, 2026

After this, maybe we should remove the workaround for GB300 trtllm attention?

@johnnynunez
Copy link
Copy Markdown
Contributor Author

closing in favor 0.6.8 flashinfer-ai/flashinfer#3042

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build nvidia ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants