[Attention][4/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties#32073
Conversation
There was a problem hiding this comment.
Code Review
This pull request is part of a series to remove deprecated properties from CommonAttentionMetadata, specifically seq_lens_cpu and num_computed_tokens_cpu. The changes correctly replace direct access to these properties by deriving the values from other available metadata, often directly on the GPU to avoid device-to-host synchronization. This refactoring is applied consistently across various components, including attention backends and speculative decoding logic. The changes are correct, improve code clarity, and should lead to better performance. I have not found any issues of high or critical severity.
|
This pull request has merge conflicts that must be resolved before it can be |
9a68b38 to
a75a466
Compare
8bb7c1f to
1093c10
Compare
benchislett
left a comment
There was a problem hiding this comment.
I don't understand why we're moving from .seq_lens_cpu to .seq_lens.cpu(). It feels like this would introduce more DtoH copies than would otherwise be necessary.
| @@ -266,6 +266,7 @@ def build( | |||
| num_tokens = common_attn_metadata.num_actual_tokens | |||
|
|
|||
| query_start_loc_cpu = common_attn_metadata.query_start_loc_cpu | |||
| seq_lens_cpu = common_attn_metadata.seq_lens.cpu() | |||
There was a problem hiding this comment.
move this so it only happens when we has num_prefills > 0
benchislett
left a comment
There was a problem hiding this comment.
Given that this impacts only rare cases and will eventually be phased out entirely, I'm happy with this change. Just needs one line moved to avoid synchronizing on the decode codepath when it's not needed
|
This pull request has merge conflicts that must be resolved before it can be |
9b1ea04 to
36f2a9d
Compare
aa13fd0 to
27a5618
Compare
|
Hi @LucasWilkinson, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
059a956 to
b40c717
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
b40c717 to
e854658
Compare
|
Hi @LucasWilkinson, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Continue deprecating
seq_lens_cpuandnum_computed_tokens_cpuDon't fully deprecate
_seq_lens_cpudue to FlashInfer perf regressions, see: #33771