Commit e450c7d
Fix moe fp8 failure for sm121 (#2061)
<!-- .github/pull_request_template.md -->
## π Description
fix the failure for sm121 in
[pipeline](https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/230180150)
## π Related Issues
<!-- Link any related issues here -->
## π Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### β
Pre-commit Checks
- [ ] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [ ] I have installed the hooks with `pre-commit install`.
- [ ] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## π§ͺ Tests
- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Extended FP8 grouped matrix-multiplication support to include an
additional GPU architecture (SM121), providing the same optimized tile
configuration options as the previously supported SM variants, improving
performance consistency and broader hardware compatibility for FP8
workloads.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: Zihao Ye <[email protected]>1 parent c8f2b03 commit e450c7d
File tree
1 file changed
+1
-1
lines changed- csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels
1 file changed
+1
-1
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
161 | | - | |
| 161 | + | |
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| |||
0 commit comments