-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[Feat] Integrate FIA operator in mla_cp._forward_decode #5641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
6ae53b5
integrate FIA operator into mla_cp
08de021
make it more readable
048b04f
Merge branch 'main' of https://github.com/vllm-project/vllm-ascend in…
daafaff
adapt acl_graph in mla_cp FIA
cab49ba
Merge branch 'main' of https://github.com/vllm-project/vllm-ascend in…
452c663
adapt graph mode
6733ce3
support mtp
3650848
Merge branch 'main' of https://github.com/vllm-project/vllm-ascend in…
410be4d
remove redundant attributes
8d06f81
remove data cleaning
1352315
Update vllm_ascend/attention/context_parallel/mla_cp.py
845473182 47072e3
fix lint
120ac20
Merge branch 'FIA_rebase' of https://github.com/845473182/vllm-ascend…
7e899c6
fix lint
40afa15
fix lint
4134757
Merge branch 'main' into FIA_rebase
845473182 c3f5465
fix ut
b559ab0
Merge branch 'FIA_rebase' of https://github.com/845473182/vllm-ascend…
92436a2
fix lint
a2a6f72
[Ops] replace _update_out_and_lse with _npu_attn_out_lse_update
6a563e2
Merge branch 'ops' of https://github.com/YzTongNiar/vllm-ascend into …
73976cb
Merge branch 'main' of https://github.com/vllm-project/vllm-ascend in…
7b1dd4a
fix pre-commit
bba3ddf
restore _process_attn_out_lse
92b50c3
restore _process_attn_out_lse
c51a43b
fix ut
0d80040
Revert "[Ops] replace _update_out_and_lse with _npu_attn_out_lse_update"
188edfa
Merge branch 'main' of https://github.com/vllm-project/vllm-ascend in…
a22aa13
Merge branch 'main' of https://github.com/vllm-project/vllm-ascend in…
8b2138c
Merge branch 'main' of https://github.com/vllm-project/vllm-ascend in…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic parameters such as
block_table,spec_attn_mask, andactual_seq_lengthsare not updated during graph replay. They are read from theparamtuple which contains values from the time of graph capture. This will cause the replayed graph to execute with stale data, leading to incorrect attention outputs. These parameters must be updated from the currentforward_contextat every step, similar to howactual_seq_lengths_kvis being updated.