-
Notifications
You must be signed in to change notification settings - Fork 832
Mamba SSU: better automatic kernel selection + algorithm selection optionally exposed to the user. #2591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Mamba SSU: better automatic kernel selection + algorithm selection optionally exposed to the user. #2591
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
d0a53b5
adopt reference implementation from sglang
ishovkun 320a72c
Extract create_test_inputs to shared test_utils module
ishovkun 4022f10
Rename test to reflect that it's an single-token test file
ishovkun a8bc286
Add multi-token support to the interface of selective_state_update
ishovkun 2e70ea4
Refactor selective_state_update: add validation helpers and update param
ishovkun 295ae56
Non-contiguous state
ishovkun 5541624
Simplify code for template dispatching
ishovkun ab33cc1
Refactor dispatch logic in selective_state_update.cuh
ishovkun 26271a9
Refactor pointer alignement checking away from the logic.
ishovkun f3f02f5
Support int32 and int64 state_batch_indices in selective_state_update
ishovkun 1cb4ac7
Refactor Mamba selective state update kernel dispatch and add dtype
ishovkun 3265bd5
Merge branch 'flashinfer-ai:main' into main
ishovkun 9d6d35c
Fix simple stp kernel to only write state if a flag is provided
ishovkun 5b5756d
Fix Triton kernel intermediate state caching to match CUDA behavior
ishovkun e3f751e
Merge branch 'main' of github.com:ishovkun/flashinfer-dev
ishovkun fb693d0
Add Mamba2 SSD chunk scan test and reorganize Triton refs
ishovkun 0ce5d47
Merge branch 'main' of github.com:ishovkun/flashinfer-dev
ishovkun 304fd59
Enable .jinja templates for mamba
ishovkun 329bfd0
Remove SM100 module, unify SM90+ selective state update handling
ishovkun f464097
Add algorithm selection to selective_state_update kernels
ishovkun c65670c
Fix include order: config.inc before header in selective_state_updateβ¦
ishovkun 44b6c25
Parallelize consumer warp loads in vertical SSU kernel
ishovkun eff403c
Reduce test combinations in SSU tests to base + independent deviations
ishovkun afc7c6a
Add algorithm parameter to selective_state_update tests
ishovkun 74accb0
Merge branch 'flashinfer-ai:main' into main
ishovkun 1d42007
Update selective_state_update instantiations to include SSUAlgorithm
ishovkun 61d88bd
Clarify algorithm selection docstring in selective_state_update
ishovkun ead4943
Merge branch 'main' of github.com:ishovkun/flashinfer-dev
ishovkun 6f6a3d7
Remove chunk scan combined kernels as they are irrelevant to this PR
ishovkun de96dd5
Remove ssd_chunk_state.py Triton reference implementation (irrelevant to
ishovkun 4c30f07
Delete test_utils.py
ishovkun 1f1c2f4
Suppress mypy false positive for gen_selective_state_update calls
ishovkun 157ecb5
Move Triton reference kernel to triton_reference subdir and update
ishovkun f32b63b
mark an unused variable with "_" in a test
ishovkun 2656202
rename an unused test variable to _state_ref
ishovkun 5580d28
Refactor Triton reference import for selective_state_update
ishovkun 58f56cd
Fixes aot compilation of the gdn_prefill_sm90 module
ishovkun 5d8184e
Substantially reduce the nubmer of SSU aot compilation units. Limited to
ishovkun File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| #pragma once | ||
| #include <cuda_bf16.h> | ||
| #include <cuda_fp16.h> | ||
| #include <cstdint> | ||
|
|
||
| using state_t = {{ state_dtype }}; | ||
| using input_t = {{ input_dtype }}; | ||
| using weight_t = {{ weight_dtype }}; | ||
| using matrixA_t = {{ matrixA_dtype }}; | ||
| using stateIndex_t = {{ stateIndex_dtype }}; | ||
|
|
||
| constexpr int DIM = {{ dim }}; | ||
| constexpr int DSTATE = {{ dstate }}; | ||
| constexpr int NTOKENS_MTP = {{ ntokens_mtp }}; |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.