Skip to content

Commit 5fe0c52

Browse files
author
Varun Sundar Rabindranath
committed
assert num_tokens_after_padding bounds
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
1 parent f38a179 commit 5fe0c52

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

vllm/v1/worker/gpu_model_runner.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3200,6 +3200,9 @@ def _dummy_run(
32003200

32013201
with self.maybe_dummy_run_with_lora(self.lora_config,
32023202
num_scheduled_tokens, remove_lora):
3203+
3204+
# Make sure padding doesn't exceed max_num_tokens
3205+
assert num_tokens_after_padding <= self.max_num_tokens
32033206
model_kwargs = self._init_model_kwargs(num_tokens_after_padding)
32043207
if (self.supports_mm_inputs
32053208
and not self.model_config.is_encoder_decoder):

0 commit comments

Comments
 (0)