Highlights
- DeepSeek-R1 Multi-Node H100 Deployment Support
- FlashInfer Integration
- XGrammer Integration
What's Changed
- Benchclient by @shihaobai in #740
- fix pause reqs by @shihaobai in #741
- add RETURN_LIST for tgi_api by @shihaobai in #742
- fix: fix a precision bug in the context_flashattention by @blueswhen in #743
- Improve the accuracy of deepseekv3 by @hiworldwzj in #744
- deepseekv3 bmm noquant and fix moe gemm bug. by @hiworldwzj in #745
- Add Xgrammar Support by @flyinglandlord in #701
- fuse fp8 quant in kv copying and add flashinfer decode mla operator in the attention module by @blueswhen in #737
- fix: add flashinfer-python in the requirements.txt by @blueswhen in #749
- Fix tokens2 by @SangChengC in #748
- Fix Unit-test in PR: Add xgrammar by @flyinglandlord in #750
- add support for multinode tp by @shihaobai in #751
Full Changelog: v1.0.0...v1.0.1