-
Notifications
You must be signed in to change notification settings - Fork 829
Add cute dsl mla decode op #2743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+10,449
−45
Merged
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
0129aeb
feat: Add CuTe DSL MLA decode kernel for Blackwell SM100
limin2021 d95da37
style: Fix pre-commit lint/format/type errors in MLA decode kernel files
limin2021 5cc2f74
chore: Update copyright year to 2026
limin2021 3c38f20
feat: Add dtype assertions and FP8 tests for cute_dsl_mla_decode
limin2021 7b12c25
perf: Reduce host overhead in cute_dsl_mla_decode
limin2021 a8c3b5a
perf: Move permute logic into kernel __call__ to eliminate Python-sid…
limin2021 7420b6b
feat: Add is_var_split_kv parameter and workspace size check to cute_…
limin2021 c480145
style: Fix trailing whitespace and ruff formatting
limin2021 b3b0f8b
perf: Simplify split_kv computation and remove is_var_split_kv parameter
limin2021 020fea5
feat: Add BFloat16 support to CuTe DSL MLA decode kernel
limin2021 842b624
minor.
limin2021 deee819
format
limin2021 5cef493
refactor: Replace hardcoded MLA config constants with function parame…
limin2021 2ece5a7
refactor: Split can_implement check from kernel compilation to avoid …
limin2021 98eae77
fix: Align cute-dsl output shape with trtllm-gen and fix tensor scale…
limin2021 a4a8723
fix workspace None issue.
limin2021 c2e769e
fix: align assumed_align values with kernel's from_dlpack settings
limin2021 b66eb4e
perf: add divisibility hints and opt-level 2 for CuTe DSL MLA compila…
limin2021 114460f
format.
limin2021 104c9fe
feat: add is_var_seq parameter for auto persistent/non-persistent str…
limin2021 0999000
doc update
limin2021 80a93fd
fix: address review feedback for CuTe DSL MLA decode
limin2021 9dddc5b
fix: resolve merge conflict and validate unsupported args in cute-dsl…
limin2021 72bec07
fix: add compat shim for cutlass-dsl setmaxregister API
limin2021 a913f90
refactor: move MLA CuTe DSL kernels to flashinfer/mla/cute_dsl/
limin2021 82cc2c3
fix: update copyright year to 2026 in flashinfer/mla/
limin2021 2423c57
fix: update copyright years to 2026
limin2021 037eab6
fix: add compat shim for cutlass-dsl get_max_tmem_alloc_cols API
limin2021 c09a2cc
feat: add cute-dsl backend to test_trtllm_gen_mla uniform testing
limin2021 96fb34e
feat: add cute-dsl backend support for MLA microbenchmark
limin2021 0455e5c
Merge remote-tracking branch 'origin/main' into add_cute_dsl_mla_new
limin2021 fcc1b78
Merge origin/main into add_cute_dsl_mla_new
limin2021 81ec3aa
feat: support flexible output dtype for CuTe DSL MLA FP8 decode kernel
limin2021 4d221fd
fix: skip CuTe DSL MLA tests on unsupported archs (SM120+)
limin2021 adafa35
Merge remote-tracking branch 'origin/main' into add_cute_dsl_mla_new
limin2021 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # Copyright (c) 2026 by FlashInfer team. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from ._core import * # noqa: F401,F403 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # Copyright (c) 2026 by FlashInfer team. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| """ | ||
| CuTe DSL MLA Decode Kernels for Blackwell SM100. | ||
| """ | ||
|
|
||
| from flashinfer.cute_dsl.utils import is_cute_dsl_available | ||
|
|
||
| if is_cute_dsl_available(): | ||
| from .mla_decode import cute_dsl_mla_decode | ||
|
|
||
| __all__ = [ | ||
| "is_cute_dsl_available", | ||
| ] | ||
|
|
||
| if is_cute_dsl_available(): | ||
| __all__ += [ | ||
| "cute_dsl_mla_decode", | ||
| ] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are moving kernels out of this directory, maybe create a module
flashinfer/mla/and move these kernels underflashinfer/mla/cute_dsl?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.