perf(v1): optimize InputBatch.swap_states by swapping active token prefixes by VedantMadane · Pull Request #35018 · vllm-project/vllm

VedantMadane · 2026-02-21T10:16:42Z

Addresses #34731.

Summary

This PR optimizes InputBatch.swap_states() in �llm/v1/worker/gpu_input_batch.py by only swapping the active token prefix of oken_ids_cpu and is_token_ids instead of copying the entire rows.

Changes

Calculated the active token count for each request being swapped (including speculative draft tokens).
Used these counts to perform partial row copies/swaps using numpy slices.
Ensured that is_token_ids is correctly reset for indices beyond the new prefix length to prevent stale state.

Impact

Reduces memory copy overhead during request reordering in the V1 engine, especially beneficial when max_model_len is large (e.g., 32k or 128k).

github-actions · 2026-02-21T10:16:51Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

dosubot · 2026-02-21T10:16:51Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

This pull request optimizes InputBatch.swap_states by only swapping active token prefixes, which is a great performance improvement. However, I've found a critical bug in the implementation of swap_states in vllm/v1/worker/gpu_input_batch.py. The calculation of token lengths happens after some metadata has been swapped, leading to incorrect lengths being used for slicing, which can cause data corruption or out-of-bounds access. I've provided a suggestion to fix this. The other changes in vllm/v1/attention/backends/triton_attn.py look like correct refactoring.

vllm/v1/worker/gpu_input_batch.py

pjo256 · 2026-02-21T12:13:04Z

Hey @VedantMadane, thanks for taking a look at the issue. I already have a PR up addressing swap_states - #34733, including validation/benchmark notes. To avoid duplicate reviews, would you be open to closing this PR and adding any feedback on that PR?

mergify · 2026-02-23T06:15:27Z

Hi @VedantMadane, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2026-02-24T03:25:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VedantMadane.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>

…efixes Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>

Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>

VedantMadane requested a review from tdoublep as a code owner February 21, 2026 10:16

mergify bot added the v1 label Feb 21, 2026

VedantMadane force-pushed the optimize-swap-states branch from 7487bf0 to 4304bcc Compare February 21, 2026 10:17

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

vllm/v1/worker/gpu_input_batch.py Outdated Show resolved Hide resolved

VedantMadane force-pushed the optimize-swap-states branch from 80c0a50 to b55b1c2 Compare February 23, 2026 06:10

mergify bot added the needs-rebase label Feb 24, 2026

VedantMadane force-pushed the optimize-swap-states branch from 980a1c4 to 1b4c99d Compare February 24, 2026 04:19

mergify bot removed the needs-rebase label Feb 24, 2026

VedantMadane and others added 4 commits February 24, 2026 09:52

feat: extract kv-cache update for TritonAttention backend

6f6dd2c

Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>

perf(v1): optimize InputBatch.swap_states by swapping active token pr…

fe41b7d

…efixes Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>

Update vllm/v1/worker/gpu_input_batch.py

1cb2e54

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>

style: fix line too long (E501) in swap_states via ruff format

37bbcbf

Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>

VedantMadane force-pushed the optimize-swap-states branch from 1b4c99d to 37bbcbf Compare February 24, 2026 04:23

VedantMadane closed this Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(v1): optimize InputBatch.swap_states by swapping active token prefixes#35018

perf(v1): optimize InputBatch.swap_states by swapping active token prefixes#35018
VedantMadane wants to merge 4 commits intovllm-project:mainfrom
VedantMadane:optimize-swap-states

VedantMadane commented Feb 21, 2026

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

pjo256 commented Feb 21, 2026

Uh oh!

mergify bot commented Feb 23, 2026

Uh oh!

mergify bot commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

VedantMadane commented Feb 21, 2026

Summary

Changes

Impact

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

pjo256 commented Feb 21, 2026

Uh oh!

mergify bot commented Feb 23, 2026

Uh oh!

mergify bot commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants