[3rdparty] Bump FlashInfer #17236

MasterJH5574 · 2024-08-02T20:57:27Z

This PR bumps FlashInfer and updates PagedKVCache accordingly for performance improvement.

Some notes on this bump:

When the Grouped-Query Attention group size is at least 4 and FlashInfer is enabled, we use the prefill attn kernel for better performance.
We enlarge the temporary workspace for FlashInfer use accordingly, as FlashInfer in the current version may consume much larger workspace. We turn off the workspace when FlashInfer is not enabled.
We reduce the max block depth to be 2, in observation of the limited help of cascade inference when batch size is not large and the prompt reuse is low.

This PR bumps FlashInfer and updates PagedKVCache accordingly for performance improvement. Some notes on this bump: * When the Grouped-Query Attention group size is at least 4 and FlashInfer is enabled, we use the prefill attn kernel for better performance. * We enlarge the temporary workspace for FlashInfer use accordingly, as FlashInfer in the current version may consume much larger workspace. We turn off the workspace when FlashInfer is not enabled. * We reduce the max block depth to be 2, in observation of the limited help of cascade inference when batch size is not large and the prompt reuse is low.

MasterJH5574 mentioned this pull request Aug 2, 2024

[Runtime] Reorganize PagedKVCache attn kernel invocation #17237

Merged

MasterJH5574 force-pushed the tvm-dev/2024-08-02-bump-flashinfer branch from d695af4 to e6987df Compare August 2, 2024 21:45

tqchen approved these changes Aug 2, 2024

View reviewed changes

tqchen merged commit 76b954a into apache:main Aug 3, 2024

ysh329 mentioned this pull request Oct 16, 2024

[Release] v0.18.0 Release Candidate Notes #17468

Closed

kurisu6912 mentioned this pull request Sep 5, 2025

kurisu add assume attr patch 1 tile-ai/tvm#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[3rdparty] Bump FlashInfer #17236

[3rdparty] Bump FlashInfer #17236

Uh oh!

MasterJH5574 commented Aug 2, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[3rdparty] Bump FlashInfer #17236

[3rdparty] Bump FlashInfer #17236

Uh oh!

Conversation

MasterJH5574 commented Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MasterJH5574 commented Aug 2, 2024 •

edited

Loading