Commit a4ddf26
authored
PDL patch for TGV GEMM (#1877)
<!-- .github/pull_request_template.md -->
## 📌 Description
<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->
## 🔍 Related Issues
Add missing nvcc flags to TGV gemm including the one that enables PDL
feature.
Also add safe guarding checks to TGV gemm to prevent tensor size being
too large for TMA.
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->1 parent 792dcb1 commit a4ddf26
3 files changed
+16
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
151 | 159 | | |
152 | 160 | | |
153 | 161 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
584 | 584 | | |
585 | 585 | | |
586 | 586 | | |
587 | | - | |
| 587 | + | |
588 | 588 | | |
589 | 589 | | |
590 | 590 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
445 | 445 | | |
446 | 446 | | |
447 | 447 | | |
448 | | - | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
449 | 455 | | |
450 | 456 | | |
451 | 457 | | |
| |||
0 commit comments