[Attention][2/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties by LucasWilkinson · Pull Request #31774 · vllm-project/vllm

LucasWilkinson · 2026-01-06T04:57:04Z

Update SSM backends

gemini-code-assist

Code Review

This pull request effectively removes the usage of the deprecated num_computed_tokens_cpu property from CommonAttentionMetadata in the GDN and Mamba attention backends. The changes correctly replace the deprecated property with direct computation. In the Mamba backends, this is cleanly handled by introducing a new helper method _get_num_computed_tokens_cpu.

I've identified one area for improvement in gdn_attn.py where a line of code appears to be redundant after the refactoring. Please see the specific comment for details.

gemini-code-assist · 2026-01-06T04:59:44Z

vllm/v1/attention/backends/gdn_attn.py

+        # Note: Setting _num_computed_tokens_cpu directly for cudagraph capture
+        m._num_computed_tokens_cpu = m.seq_lens.cpu() - num_accepted_tokens.cpu()


The build method, which is called by this function, no longer relies on m.num_computed_tokens_cpu. Instead, it computes context_lens from scratch. As a result, setting m._num_computed_tokens_cpu here is redundant and has no effect. This line and the comment above it can be safely removed.

vllm/v1/attention/backends/mamba_attn.py

…ends Replace deprecated CommonAttentionMetadata.num_computed_tokens_cpu and seq_lens_cpu properties with explicit computation: num_computed_tokens = seq_lens - query_lens Changes: - mamba_attn.py: Add helper method _get_num_computed_tokens_cpu() and use it - mamba2_attn.py: Use inherited helper from base class - gdn_attn.py: Inline the computation This is part of the deprecation effort for seq_lens_cpu and num_computed_tokens_cpu properties (to be removed in v0.14.0). Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

…omputed_tokens_cpu` CommonAttentionMetadata properties (vllm-project#31774) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

…omputed_tokens_cpu` CommonAttentionMetadata properties (vllm-project#31774) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…omputed_tokens_cpu` CommonAttentionMetadata properties (vllm-project#31774) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

mergify bot added the v1 label Jan 6, 2026

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026

LucasWilkinson requested a review from tdoublep January 6, 2026 05:13

DarkLight1337 reviewed Jan 6, 2026

View reviewed changes

vllm/v1/attention/backends/mamba_attn.py Outdated Show resolved Hide resolved

LucasWilkinson added 3 commits January 6, 2026 15:42

cleanup

12c8b9c

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

ef81b2a

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson force-pushed the deprecate-cpu-props/ssm-backends branch from 5f53307 to ef81b2a Compare January 6, 2026 15:43

LucasWilkinson and others added 2 commits January 6, 2026 10:45

Apply suggestion from @DarkLight1337

d603fe7

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

cleanup

85d2931

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

DarkLight1337 approved these changes Jan 6, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) January 6, 2026 15:58

DarkLight1337 merged commit 4c73be1 into vllm-project:main Jan 6, 2026
51 checks passed

LucasWilkinson mentioned this pull request Jan 10, 2026

[Tracking]: Deprecate CPU seqlen related CommonAttentionMetadata properties #32072

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Attention][2/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties#31774

[Attention][2/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties#31774
DarkLight1337 merged 5 commits intovllm-project:mainfrom
neuralmagic:deprecate-cpu-props/ssm-backends

LucasWilkinson commented Jan 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Note: Setting _num_computed_tokens_cpu directly for cudagraph capture
		m._num_computed_tokens_cpu = m.seq_lens.cpu() - num_accepted_tokens.cpu()

Uh oh!

Conversation

LucasWilkinson commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LucasWilkinson commented Jan 6, 2026 •

edited

Loading