Skip to content

perf(v1): optimize InputBatch.swap_states by swapping active token prefixes#35018

Closed
VedantMadane wants to merge 4 commits intovllm-project:mainfrom
VedantMadane:optimize-swap-states
Closed

perf(v1): optimize InputBatch.swap_states by swapping active token prefixes#35018
VedantMadane wants to merge 4 commits intovllm-project:mainfrom
VedantMadane:optimize-swap-states

Conversation

@VedantMadane
Copy link
Copy Markdown

Addresses #34731.

Summary

This PR optimizes InputBatch.swap_states() in �llm/v1/worker/gpu_input_batch.py by only swapping the active token prefix of oken_ids_cpu and is_token_ids instead of copying the entire rows.

Changes

  • Calculated the active token count for each request being swapped (including speculative draft tokens).
  • Used these counts to perform partial row copies/swaps using numpy slices.
  • Ensured that is_token_ids is correctly reset for indices beyond the new prefix length to prevent stale state.

Impact

Reduces memory copy overhead during request reordering in the V1 engine, especially beneficial when max_model_len is large (e.g., 32k or 128k).

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@dosubot
Copy link
Copy Markdown

dosubot bot commented Feb 21, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify mergify bot added the v1 label Feb 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes InputBatch.swap_states by only swapping active token prefixes, which is a great performance improvement. However, I've found a critical bug in the implementation of swap_states in vllm/v1/worker/gpu_input_batch.py. The calculation of token lengths happens after some metadata has been swapped, leading to incorrect lengths being used for slicing, which can cause data corruption or out-of-bounds access. I've provided a suggestion to fix this. The other changes in vllm/v1/attention/backends/triton_attn.py look like correct refactoring.

@pjo256
Copy link
Copy Markdown
Contributor

pjo256 commented Feb 21, 2026

Hey @VedantMadane, thanks for taking a look at the issue. I already have a PR up addressing swap_states - #34733, including validation/benchmark notes. To avoid duplicate reviews, would you be open to closing this PR and adding any feedback on that PR?

@mergify
Copy link
Copy Markdown

mergify bot commented Feb 23, 2026

Hi @VedantMadane, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@mergify
Copy link
Copy Markdown

mergify bot commented Feb 24, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VedantMadane.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

VedantMadane and others added 4 commits February 24, 2026 09:52
Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>
…efixes

Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants