-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development Roadmap (2025 H1) #4042
Comments
Hi, about |
As part of long context optimization, the implementation of HiP #3930 attention will be considered? |
@Swipe4057 Thanks. We will review this and merge it |
Hi @artetaout , now it is |
Hi @zhyncs , could you please specify your plans on unsloth model support a bit? Will you be supporting unsloth's 1.58-bit dynamic quantization for deepseek-r1? |
Hi @SandroPats Please join https://slack.sglang.ai and discuss at #quantization Thanks! |
Do we plan to get performance improvement via te.layernorm_mlp or te.layernorm_linear ? I've integrated them, but didn't see improvement in bf16 |
In TE, if you need to enable tp overlap only in inference, you need to split sequences manually (SP/TP). I guess that's perhaps the reason you did not see improvement. You can join https://slack.sglang.ai/ and find me to discuss further |
Here is the development roadmap for 2025 H1. Contributions and feedback are welcome (Join Bi-weekly Development Meeting). The previous 2024 Q4 roadmap can be found in #1487
Focus
Parallelism
Caching
Kernel
Quantization
RL Framework integration
Core refactor
scheduler.py
andmodel_runner.py
to make them more modularSpeculative decoding
Multi-LoRA serving
Hardware
Model coverage
Function Calling
Others
sglang/docs/references/faq.md
Line 3 in 8912b76
The text was updated successfully, but these errors were encountered: