Skip to content

[Attention][2/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties#31774

Merged
DarkLight1337 merged 5 commits intovllm-project:mainfrom
neuralmagic:deprecate-cpu-props/ssm-backends
Jan 6, 2026
Merged

[Attention][2/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties#31774
DarkLight1337 merged 5 commits intovllm-project:mainfrom
neuralmagic:deprecate-cpu-props/ssm-backends

Conversation

@LucasWilkinson
Copy link
Copy Markdown
Collaborator

@LucasWilkinson LucasWilkinson commented Jan 6, 2026

Update SSM backends

@mergify mergify bot added the v1 label Jan 6, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively removes the usage of the deprecated num_computed_tokens_cpu property from CommonAttentionMetadata in the GDN and Mamba attention backends. The changes correctly replace the deprecated property with direct computation. In the Mamba backends, this is cleanly handled by introducing a new helper method _get_num_computed_tokens_cpu.

I've identified one area for improvement in gdn_attn.py where a line of code appears to be redundant after the refactoring. Please see the specific comment for details.

Comment on lines +375 to +376
# Note: Setting _num_computed_tokens_cpu directly for cudagraph capture
m._num_computed_tokens_cpu = m.seq_lens.cpu() - num_accepted_tokens.cpu()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The build method, which is called by this function, no longer relies on m.num_computed_tokens_cpu. Instead, it computes context_lens from scratch. As a result, setting m._num_computed_tokens_cpu here is redundant and has no effect. This line and the comment above it can be safely removed.

@LucasWilkinson LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026
…ends

Replace deprecated CommonAttentionMetadata.num_computed_tokens_cpu and
seq_lens_cpu properties with explicit computation:
  num_computed_tokens = seq_lens - query_lens

Changes:
- mamba_attn.py: Add helper method _get_num_computed_tokens_cpu() and use it
- mamba2_attn.py: Use inherited helper from base class
- gdn_attn.py: Inline the computation

This is part of the deprecation effort for seq_lens_cpu and
num_computed_tokens_cpu properties (to be removed in v0.14.0).

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@LucasWilkinson LucasWilkinson force-pushed the deprecate-cpu-props/ssm-backends branch from 5f53307 to ef81b2a Compare January 6, 2026 15:43
LucasWilkinson and others added 2 commits January 6, 2026 10:45
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 6, 2026 15:58
@DarkLight1337 DarkLight1337 merged commit 4c73be1 into vllm-project:main Jan 6, 2026
51 checks passed
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
…omputed_tokens_cpu` CommonAttentionMetadata properties (vllm-project#31774)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
…omputed_tokens_cpu` CommonAttentionMetadata properties (vllm-project#31774)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…omputed_tokens_cpu` CommonAttentionMetadata properties (vllm-project#31774)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…omputed_tokens_cpu` CommonAttentionMetadata properties (vllm-project#31774)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants