Feature(MInference): SGLang support minference kernel #148

iofu728 · 2025-05-05T06:37:42Z

What does this PR do?

Add SGLang kernel support. [Feat] Add sparse attn to sgl-kernel sgl-project/sglang#5327 Notably, SGLang also adapted it for FlashAttention-3. Special thanks to @zhyncs and @yinfan98 for their contributions!

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@iofu728

Co-authored-by: yinfan98 <[email protected]> Co-authored-by: Yineng Zhang <[email protected]>

Copilot

Pull Request Overview

This PR introduces support for the SGLang sparse attention kernel into MInference, updating the project’s news section to highlight the new feature and acknowledge external contributions.

Adds a news entry for SGLang and FlashAttention-3 kernel support.
Updates the detailed news list with a new SCBench announcement entry.

Comments suppressed due to low confidence (1)

README.md:30

[nitpick] Verify that the formatting and ordering of news entries in the additional details section are consistent with the summary above and that there is no redundant information.

+  <li> 🍩 [24/12/13] We are excited to announce the release of our KV cache-centric analysis work, <a href="https://aka.ms/SCBench">SCBench</a>, which evaluates long-context methods from a KV cache perspective.</li>

Copilot · 2025-05-05T06:38:22Z

README.md


 ## 📰 News
 - 🐝 [25/05/02] MMInference has been accepted at **ICML'25**.
+- 👨‍💻‍ [25/04/14] [SGLang](https://github.com/sgl-project/sglang/pull/5327) and [vLLM](https://github.com/vllm-project/flash-attention/pull/33) have merged the MInference sparse attention kernel. Notably, SGLang also adapted it for FlashAttention-3. Special thanks to @zhyncs and @yinfan98 for their contributions!


[nitpick] Consider adding a brief note or reference to documentation that describes how the SGLang kernel support differs from previous implementations so that readers have additional context.

Suggested change

- 👨‍💻‍ [25/04/14] [SGLang](https://github.com/sgl-project/sglang/pull/5327) and [vLLM](https://github.com/vllm-project/flash-attention/pull/33) have merged the MInference sparse attention kernel. Notably, SGLang also adapted it for FlashAttention-3. Special thanks to @zhyncs and @yinfan98 for their contributions!

- 👨‍💻‍ [25/04/14] [SGLang](https://github.com/sgl-project/sglang/pull/5327) and [vLLM](https://github.com/vllm-project/flash-attention/pull/33) have merged the MInference sparse attention kernel. Notably, SGLang also adapted it for FlashAttention-3, introducing optimizations for memory efficiency and computational speed. For more details, see the [SGLang documentation](https://sgl-project.github.io/docs/kernels) or the [pull request discussion](https://github.com/sgl-project/sglang/pull/5327). Special thanks to @zhyncs and @yinfan98 for their contributions!

Feature(MInference): SGLang support minference kernel

eac09c1

Co-authored-by: yinfan98 <[email protected]> Co-authored-by: Yineng Zhang <[email protected]>

iofu728 added the documentation Improvements or additions to documentation label May 5, 2025

iofu728 requested a review from Copilot May 5, 2025 06:37

iofu728 self-assigned this May 5, 2025

iofu728 merged commit 9d76f96 into main May 5, 2025

iofu728 deleted the hjiang/add_sglang_support branch May 5, 2025 06:38

Copilot AI reviewed May 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature(MInference): SGLang support minference kernel #148

Feature(MInference): SGLang support minference kernel #148

Uh oh!

iofu728 commented May 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature(MInference): SGLang support minference kernel #148

Feature(MInference): SGLang support minference kernel #148

Uh oh!

Conversation

iofu728 commented May 5, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI May 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants