-
Notifications
You must be signed in to change notification settings - Fork 4k
Support nonpad_kv_seqlen and external kv cache in-place update in Attention-24 (CUDA) #27486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 13 commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
23f284f
refactor
titaiwangms de61cdd
Add opset 24 nonpad_kv_seqlen tests for Attention op
titaiwangms 17cbe1d
Fix test bugs from code review for Attention-24 nonpad_kv_seqlen
titaiwangms e40249e
Fix CUDA kernel review findings: GPU bounds clamping and pointer safety
titaiwangms bef5b9a
lint
titaiwangms db65ef5
resolve conflict
titaiwangms 5b748d0
address review and use tensorscatter op in tests
titaiwangms e04597f
Update ONNX backend test filters for TensorScatter and nonpad_kv_seqlen
titaiwangms efa3ac2
Address review findings: CUDA kernel assertions, pointer safety, and …
titaiwangms 3649935
Revert TestCase.cc change — QNN does not yet support nonpad_kv_seqlen
titaiwangms 31aac47
Add causal mode test variants for TensorScatter + nonpad_kv_seqlen
titaiwangms a6713ad
Address P2 review findings: tolerance, GQA TODO, and parameter docume…
titaiwangms acbc057
Add batch=1 edge case test for TensorScatter attention
titaiwangms a221099
Merge branch 'main' into titaiwang/support_nonpad_kv_seqlen
titaiwangms bd468ca
update docs
titaiwangms 9f9e768
Add GQA decode test cases to test_tensorscatter_attention.py
titaiwangms 1d01361
Add FlashAttentionForExternalKVCache helper for TensorScatter + Atten…
titaiwangms 2d06220
Address code review: guard header declaration and consolidate seqlens…
titaiwangms 5a2475c
Fix critical review: OOB guard and BNSH check for flash KV cache
titaiwangms cc4c70e
Use ORT_MAKE_STATUS instead of ORT_ENFORCE for OOM guard
titaiwangms f7c3ae2
Fix GQA prompt + nonpad_kv_seqlen: skip seqlens_k conversion in promp…
titaiwangms 4f00ed2
Fix GQA prompt + nonpad_kv_seqlen: skip seqlens_k conversion in promp…
titaiwangms 61ac657
Restore MHA partial masking test coverage per review feedback
titaiwangms 53bae3d
Add warning for GQA prompt + nonpad_kv_seqlen partial masking limitation
titaiwangms 2678de7
Downgrade GQA prompt nonpad log from WARNING to VERBOSE
titaiwangms 9be02c9
Reject GQA + prompt + nonpad_kv_seqlen with hard error on CUDA
titaiwangms 5d187a4
Replace GQA prompt+nonpad hard error with CUDA_KERNEL_ASSERT
titaiwangms e4ca928
Add comment on total_seq_lens invariant in GQA prompt mode
titaiwangms File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.