Skip to content

[BugFix] Mistakenly passing num_reqs_padded as num_reqs in _dummy_run#34121

Closed
Selkh wants to merge 0 commit intovllm-project:mainfrom
Selkh:main
Closed

[BugFix] Mistakenly passing num_reqs_padded as num_reqs in _dummy_run#34121
Selkh wants to merge 0 commit intovllm-project:mainfrom
Selkh:main

Conversation

@Selkh
Copy link

@Selkh Selkh commented Feb 9, 2026

Purpose

In "_dummy_run", "num_tokens_padded" was mistakenly passed as num_tokens, leading to an Assertion Error when "split_decodes_and_prefills" for an attention backend "requires uniform" when "enable_sp" is ON.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added v1 bug Something isn't working labels Feb 9, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug in _dummy_run where num_reqs_padded was incorrectly passed as the num_reqs argument to _build_attention_metadata. While this change is correct, the call is still incomplete as it's missing the num_tokens_padded and num_reqs_padded arguments, which can lead to incorrect behavior when CUDA graph padding is enabled. I've suggested a more complete fix to ensure that padded values are used correctly when pad_attn is true.

@mergify
Copy link

mergify bot commented Feb 10, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Selkh.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@Selkh
Copy link
Author

Selkh commented Feb 10, 2026

@LucasWilkinson Still not fixed after #34187 when SP & uniform batch backend

Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution! lets just make this fully match execute model, i.e.

                    num_reqs=num_reqs,
                    num_reqs_padded=num_reqs_padded if pad_attn else None,

@Selkh
Copy link
Author

Selkh commented Feb 10, 2026

thanks for the contribution! lets just make this fully match execute model, i.e.

                    num_reqs=num_reqs,
                    num_reqs_padded=num_reqs_padded if pad_attn else None,

"dummy_run" may need unpadded num_reqs and num_tokens, rather than matching execute_model. Consider the following two scenarios:

  • In dummy_run, the query_start_loc is constructed from a fake scheduled_tokens_list rather than real values. This may cause an assertion failure during speculative decoding: assert num_reqs * query_lens[0] == num_tokens, "tokens not padded correctly".
  • The num_tokens_padded and num_reqs_padded values after _determine_batch_execution_and_padding may become inconsistent. For example, when capturing the execution graph with CUDAGraphMode.NONE, num_reqs may not get padded, while num_tokens might be padded by speculative decoding logic, leading to a mismatch.

@LucasWilkinson
Copy link
Collaborator

the query_start_loc is constructed from a fake scheduled_tokens_list rather than real values. This may cause an assertion failure during speculative decoding: assert num_reqs * query_lens[0] == num_tokens, "tokens not padded correctly".

can you please provide a reproducer?

@vadiklyutiy
Copy link
Collaborator

maybe similar to #35243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants