Commit 4aed50c
authored
perf: enable pdl for cutlass fp4 gemm (#2095)
<!-- .github/pull_request_template.md -->
## π Description
The `enablePDL` flag is set to false, this PR turned them on.
Set to true for both because sm_100 and sm_120 should have support of
pdl.
## π Related Issues
<!-- Link any related issues here -->
## π Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### β
Pre-commit Checks
- [ ] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [ ] I have installed the hooks with `pre-commit install`.
- [ ] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## π§ͺ Tests
- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Updated runtime configuration for FP4 GEMM operations to enhance
execution performance on SM100 and SM120 GPU architectures.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->1 parent d42b71f commit 4aed50c
File tree
2 files changed
+2
-2
lines changed- include/flashinfer/gemm
2 files changed
+2
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
273 | 273 | | |
274 | 274 | | |
275 | 275 | | |
276 | | - | |
| 276 | + | |
277 | 277 | | |
278 | 278 | | |
279 | 279 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
257 | 257 | | |
258 | 258 | | |
259 | 259 | | |
260 | | - | |
| 260 | + | |
261 | 261 | | |
262 | 262 | | |
263 | 263 | | |
| |||
0 commit comments