You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- .github/pull_request_template.md -->
## 📌 Description
This PR reverts #1774
and #1835 which have
some issues with some shapes under cuda graph. The kernels ported in
this PR comes from SGLANG. [[NVIDA] [1/N] Nvfp4 Masked Gemm: Add quant
op for the flashinfer grouped
gemm](https://github.com/sgl-project/sglang/pull/9200/files) and
[[NVIDIA] [2/N] Optimize silu_and_mul_scaled_fp4_grouped_quant
perf](https://github.com/sgl-project/sglang/pull/9556/files) by @kaixih
.
## 🔍 Related Issues
<!-- Link any related issues here -->
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [ ] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [ ] I have installed the hooks with `pre-commit install`.
- [ ] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
- Added grouped FP4 quantization (scaled_fp4_grouped_quantize) and an
NV-focused Silu+Mul expert quantization entry
(silu_and_mul_scaled_nvfp4_experts_quantize).
* **API Changes**
- Replaced legacy batched APIs with new expert/grouped APIs; removed
legacy mask parameter from FP4/MXFP8 quantization signatures and
adjusted FP4 output layouts/types.
* **Documentation**
- Updated docs to list new functions and remove deprecated symbols.
* **Tests**
- Updated tests to validate new quantization paths, shapes, dtypes, and
layouts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Shu Wang. <[email protected]>
0 commit comments