update flashinfer to v0.2.9rc1 by weireweire · Pull Request #21485 · vllm-project/vllm

weireweire · 2025-07-24T02:16:16Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

updata flashinfer and modify it's trtllm-gen call to use latest API.

Test Plan

test llama4 with lm_eval

Test Result

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.934|±  |0.0111|
|     |       |strict-match    |     5|exact_match|↑  |0.912|±  |0.0127|

(Optional) Documentation Update

github-actions · 2025-07-24T02:16:25Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request updates the FlashInfer dependency to version v0.2.9rc1. The code changes correctly adapt the calls to trtllm_batch_decode_with_kv_cache to match the updated API in the new version. The changes are consistent across the affected files and appear to be correct.

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>

mergify · 2025-07-24T13:18:37Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @weireweire.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mgoin

Aren't there other updates needed for the latest release? I remember the flashinfer fp4 moe utils also have some changes cc @wenscarl

mgoin

LGTM. Other related PRs for this change have already landed:
Update flashinfer CUTLASS MoE Kernel #21408
Fix vLLM cutlass FP4 MoE functionality issue #21465

wenscarl · 2025-07-24T19:17:38Z

#21408 is already merged. No more changes needed.

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: x22x22 <wadeking@qq.com>

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>

weireweire requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners July 24, 2025 02:16

mergify bot added ci/build v1 labels Jul 24, 2025

gemini-code-assist bot reviewed Jul 24, 2025

View reviewed changes

update flashinfer to v0.2.9rc1

6289473

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>

weireweire force-pushed the update-flashinfer branch from aeec6ab to 6289473 Compare July 24, 2025 03:21

This was referenced Jul 24, 2025

[NVIDIA] Fix Llama4 Scout FP4 functionality issues #21499

Merged

[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend #21458

Merged

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 24, 2025

mergify bot added the needs-rebase label Jul 24, 2025

Merge branch 'main' into update-flashinfer

ab4b9a4

mergify bot removed the needs-rebase label Jul 24, 2025

mgoin reviewed Jul 24, 2025

View reviewed changes

mgoin mentioned this pull request Jul 24, 2025

[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels #21411

Merged

mgoin approved these changes Jul 24, 2025

View reviewed changes

mgoin enabled auto-merge (squash) July 24, 2025 19:10

simon-mo disabled auto-merge July 24, 2025 21:06

simon-mo merged commit 2dd72d2 into vllm-project:main Jul 24, 2025
98 of 100 checks passed

elvischenv mentioned this pull request Jul 25, 2025

[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend #21525

Merged

4 tasks

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

update flashinfer to v0.2.9rc1 (vllm-project#21485)

0d98f49

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: x22x22 <wadeking@qq.com>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

update flashinfer to v0.2.9rc1 (vllm-project#21485)

f817b3f

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

update flashinfer to v0.2.9rc1 (vllm-project#21485)

f32a960

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

update flashinfer to v0.2.9rc1 (vllm-project#21485)

3e74535

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

update flashinfer to v0.2.9rc1 (vllm-project#21485)

da934ff

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

update flashinfer to v0.2.9rc1 (vllm-project#21485)

f47dc31

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

update flashinfer to v0.2.9rc1 (vllm-project#21485)

2f8f9aa

Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update flashinfer to v0.2.9rc1#21485

update flashinfer to v0.2.9rc1#21485
simon-mo merged 2 commits intovllm-project:mainfrom
weireweire:update-flashinfer

weireweire commented Jul 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Jul 24, 2025

Uh oh!

mgoin left a comment

Uh oh!

mgoin left a comment

Uh oh!

wenscarl commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

weireweire commented Jul 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Jul 24, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

wenscarl commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

weireweire commented Jul 24, 2025 •

edited by github-actions bot

Loading