-
Notifications
You must be signed in to change notification settings - Fork 906
int16 Block-Scaled State and Stochastic Rounding for SSU (mamba) #2645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
69 commits
Select commit
Hold shift + click to select a range
d0a53b5
adopt reference implementation from sglang
ishovkun 320a72c
Extract create_test_inputs to shared test_utils module
ishovkun 4022f10
Rename test to reflect that it's an single-token test file
ishovkun a8bc286
Add multi-token support to the interface of selective_state_update
ishovkun 2e70ea4
Refactor selective_state_update: add validation helpers and update param
ishovkun 295ae56
Non-contiguous state
ishovkun 5541624
Simplify code for template dispatching
ishovkun ab33cc1
Refactor dispatch logic in selective_state_update.cuh
ishovkun 26271a9
Refactor pointer alignement checking away from the logic.
ishovkun f3f02f5
Support int32 and int64 state_batch_indices in selective_state_update
ishovkun 1cb4ac7
Refactor Mamba selective state update kernel dispatch and add dtype
ishovkun 3265bd5
Merge branch 'flashinfer-ai:main' into main
ishovkun 9d6d35c
Fix simple stp kernel to only write state if a flag is provided
ishovkun 5b5756d
Fix Triton kernel intermediate state caching to match CUDA behavior
ishovkun e3f751e
Merge branch 'main' of github.com:ishovkun/flashinfer-dev
ishovkun fb693d0
Add Mamba2 SSD chunk scan test and reorganize Triton refs
ishovkun 0ce5d47
Merge branch 'main' of github.com:ishovkun/flashinfer-dev
ishovkun 304fd59
Enable .jinja templates for mamba
ishovkun 329bfd0
Remove SM100 module, unify SM90+ selective state update handling
ishovkun f464097
Add algorithm selection to selective_state_update kernels
ishovkun c65670c
Fix include order: config.inc before header in selective_state_update…
ishovkun 44b6c25
Parallelize consumer warp loads in vertical SSU kernel
ishovkun eff403c
Reduce test combinations in SSU tests to base + independent deviations
ishovkun afc7c6a
Add algorithm parameter to selective_state_update tests
ishovkun 74accb0
Merge branch 'flashinfer-ai:main' into main
ishovkun 1d42007
Update selective_state_update instantiations to include SSUAlgorithm
ishovkun 61d88bd
Clarify algorithm selection docstring in selective_state_update
ishovkun ead4943
Merge branch 'main' of github.com:ishovkun/flashinfer-dev
ishovkun 6f6a3d7
Remove chunk scan combined kernels as they are irrelevant to this PR
ishovkun de96dd5
Remove ssd_chunk_state.py Triton reference implementation (irrelevant to
ishovkun 4c30f07
Delete test_utils.py
ishovkun 1f1c2f4
Suppress mypy false positive for gen_selective_state_update calls
ishovkun 157ecb5
Move Triton reference kernel to triton_reference subdir and update
ishovkun f32b63b
mark an unused variable with "_" in a test
ishovkun 2656202
rename an unused test variable to _state_ref
ishovkun 5580d28
Refactor Triton reference import for selective_state_update
ishovkun 8738964
Add int16 state quantization with block scaling to
ishovkun 02db096
Add int16 quantized state support to selective_state_update
ishovkun 58f56cd
Fixes aot compilation of the gdn_prefill_sm90 module
ishovkun d4e33de
Merge branch 'main' into ssu_int16
ishovkun 5d8184e
Substantially reduce the nubmer of SSU aot compilation units. Limited to
ishovkun 9775391
Merge branch 'main' into ssu_int16
ishovkun 7f1173f
Add int16 support for block scaling in selective_state_update kernel
ishovkun 35cc7ba
Add int16 block scaling support to selective_state_update MTP
ishovkun 6cf61b7
Fix rNewState array size calculation for scaleState flag
ishovkun e9ab619
Refactor selective_state_update to use state_scale dtype
ishovkun 60b627e
Add Philox-4x32 PRNG matching Triton tl.randint and tests
ishovkun b873d10
Refactor philox_randint to template and add rounding tests
ishovkun 3292662
Stochastic rounding support for fp16 state update (plubming)
ishovkun b206a5f
Implement stochastic rounding for fp16 state in selective_state_update
ishovkun c70efbd
Optimize Philox PRNG usage in selective_state_update kernel
ishovkun 181d80d
Fix Philox random offset calculation for state updates
ishovkun fd1af7c
Remove .plans directory from .gitignore
ishovkun ff8dfde
Merge branch 'ssu_int16': int16 block-scaled state and stochastic rou…
ishovkun 60bbb5d
Merge remote-tracking branch 'upstream/main'
ishovkun c01eced
Replace asserts with if checks in the python wrapper
ishovkun 0bf77aa
Remove redundant dtype check for state_batch_indices and
ishovkun deb48a8
Use tuples instead of lists for parameter sets in tests
ishovkun e1d9dc3
Fix selective_state_update argument order for state_scale_dtype
ishovkun b30db63
Replace float pointer casts with __float_as_uint in conversion kernels
ishovkun 317f6bb
Handle zero max value in state scaling calculations
ishovkun 852b9a2
Add static_assert for fp16 state in SR branch to check that an edge case
ishovkun 2ca355e
if not philox_rounds > 0:` → `if philox_rounds <= 0:` — same semantics,
ishovkun fa95f9d
Fix SM gencode flag to match current device compute capability (fixed
ishovkun 3b20f6e
Fix Triton reference to skip stochastic rounding on pre-SM100a GPUs
ishovkun f93b325
Rename unused state_dtype and kwargs parameters to _state_dtype and
ishovkun 6b65cff
Change _SR_PARAMS from list to tuple in test_selective_state_update_stp
ishovkun 4cb8c40
Restore issue-claim.yml accidentally deleted during rebase
ishovkun 10a36c4
Refactor selective_state_update to use device-side rand_seed tensor
ishovkun File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.