Skip to content

Conversation

@iofu728
Copy link
Collaborator

@iofu728 iofu728 commented May 5, 2025

What does this PR do?

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@iofu728

@iofu728 iofu728 added the documentation Improvements or additions to documentation label May 5, 2025
@iofu728 iofu728 requested a review from Copilot May 5, 2025 06:37
@iofu728 iofu728 self-assigned this May 5, 2025
@iofu728 iofu728 merged commit 9d76f96 into main May 5, 2025
@iofu728 iofu728 deleted the hjiang/add_sglang_support branch May 5, 2025 06:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for the SGLang sparse attention kernel into MInference, updating the project’s news section to highlight the new feature and acknowledge external contributions.

  • Adds a news entry for SGLang and FlashAttention-3 kernel support.
  • Updates the detailed news list with a new SCBench announcement entry.
Comments suppressed due to low confidence (1)

README.md:30

  • [nitpick] Verify that the formatting and ordering of news entries in the additional details section are consistent with the summary above and that there is no redundant information.
+  <li> 🍩 [24/12/13] We are excited to announce the release of our KV cache-centric analysis work, <a href="https://aka.ms/SCBench">SCBench</a>, which evaluates long-context methods from a KV cache perspective.</li>


## 📰 News
- 🐝 [25/05/02] MMInference has been accepted at **ICML'25**.
- 👨‍💻‍ [25/04/14] [SGLang](https://github.com/sgl-project/sglang/pull/5327) and [vLLM](https://github.com/vllm-project/flash-attention/pull/33) have merged the MInference sparse attention kernel. Notably, SGLang also adapted it for FlashAttention-3. Special thanks to @zhyncs and @yinfan98 for their contributions!
Copy link

Copilot AI May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider adding a brief note or reference to documentation that describes how the SGLang kernel support differs from previous implementations so that readers have additional context.

Suggested change
- 👨‍💻‍ [25/04/14] [SGLang](https://github.com/sgl-project/sglang/pull/5327) and [vLLM](https://github.com/vllm-project/flash-attention/pull/33) have merged the MInference sparse attention kernel. Notably, SGLang also adapted it for FlashAttention-3. Special thanks to @zhyncs and @yinfan98 for their contributions!
- 👨‍💻‍ [25/04/14] [SGLang](https://github.com/sgl-project/sglang/pull/5327) and [vLLM](https://github.com/vllm-project/flash-attention/pull/33) have merged the MInference sparse attention kernel. Notably, SGLang also adapted it for FlashAttention-3, introducing optimizations for memory efficiency and computational speed. For more details, see the [SGLang documentation](https://sgl-project.github.io/docs/kernels) or the [pull request discussion](https://github.com/sgl-project/sglang/pull/5327). Special thanks to @zhyncs and @yinfan98 for their contributions!

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants