Skip to content

[V1] Warm up slot mapping before JIT monitor#60

Closed
lesj0610 wants to merge 1 commit into
lesj/pr-integration-dirty-diff-045d4230e1from
lesj/v1-slot-mapping-jit-warmup-clean-20260509
Closed

[V1] Warm up slot mapping before JIT monitor#60
lesj0610 wants to merge 1 commit into
lesj/pr-integration-dirty-diff-045d4230e1from
lesj/v1-slot-mapping-jit-warmup-clean-20260509

Conversation

@lesj0610
Copy link
Copy Markdown
Owner

@lesj0610 lesj0610 commented May 9, 2026

Purpose

the JIT monitoring change added Triton JIT monitoring after warmup. In V1, current warmup path still does not cover the real request input preparation path for slot mapping.

So first real request can compile _compute_slot_mapping_kernel after the JIT monitor is already active, and users see warning during inference.

Root cause is two parts:

  • V1 _dummy_run() does not call BlockTable.compute_slot_mapping().
  • _compute_slot_mapping_kernel was specialized on num_tokens, so compiling one token count does not cover another request size.

This PR fixes that V1 slot mapping path.

I added a small V1 warmup that calls BlockTable.compute_slot_mapping() directly before enabling the JIT monitor. It uses block id 1 because block 0 is the null block, then clears the temporary block table row in finally.

I also mark num_tokens as do_not_specialize, so different request token counts reuse the same compiled kernel. max_num_tokens stays specialized because it is fixed for the engine.

This does not run synthetic execute_model() warmup. That path is model-specific and can require model-specific dummy inputs, so this PR keeps warmup limited to slot mapping only.

V2 warmup is not changed. Existing V1 sampler warmup is not changed.

Checked open PRs — no existing PR for V1 slot mapping JIT warning after the JIT monitoring change.

Test Plan

.venv/bin/python -m pytest tests/v1/worker/test_gpu_model_runner.py -v

pre-commit run ruff-format --files \
  vllm/v1/worker/block_table.py \
  vllm/v1/worker/gpu/warmup.py \
  vllm/v1/worker/gpu_worker.py \
  tests/v1/worker/test_gpu_model_runner.py

pre-commit run ruff-check --files \
  vllm/v1/worker/block_table.py \
  vllm/v1/worker/gpu/warmup.py \
  vllm/v1/worker/gpu_worker.py \
  tests/v1/worker/test_gpu_model_runner.py

pre-commit run mypy-3.10 --files \
  vllm/v1/worker/block_table.py \
  vllm/v1/worker/gpu/warmup.py \
  vllm/v1/worker/gpu_worker.py \
  tests/v1/worker/test_gpu_model_runner.py \
  --hook-stage manual

git diff --check

Test Result

tests/v1/worker/test_gpu_model_runner.py: 42 passed, 16 warnings.

ruff format / ruff check: passed.

mypy-3.10: passed.

git diff --check: passed.

Local OAI smoke on V1 runner:

  • Qwen3-8B text-only: HTTP 200, no _compute_slot_mapping_kernel warning on first request.

AI assistance was used (Codex, Claude).

Signed-off-by: lesj0610 <lesj0610@users.noreply.github.com>
@lesj0610 lesj0610 marked this pull request as draft May 9, 2026 13:10
@lesj0610 lesj0610 closed this May 9, 2026
@lesj0610 lesj0610 deleted the lesj/v1-slot-mapping-jit-warmup-clean-20260509 branch May 9, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant