[V1] Warm up slot mapping before JIT monitor by lesj0610 · Pull Request #60 · lesj0610/vllm

lesj0610 · 2026-05-09T12:25:07Z

Purpose

the JIT monitoring change added Triton JIT monitoring after warmup. In V1, current warmup path still does not cover the real request input preparation path for slot mapping.

So first real request can compile _compute_slot_mapping_kernel after the JIT monitor is already active, and users see warning during inference.

Root cause is two parts:

V1 _dummy_run() does not call BlockTable.compute_slot_mapping().
_compute_slot_mapping_kernel was specialized on num_tokens, so compiling one token count does not cover another request size.

This PR fixes that V1 slot mapping path.

I added a small V1 warmup that calls BlockTable.compute_slot_mapping() directly before enabling the JIT monitor. It uses block id 1 because block 0 is the null block, then clears the temporary block table row in finally.

I also mark num_tokens as do_not_specialize, so different request token counts reuse the same compiled kernel. max_num_tokens stays specialized because it is fixed for the engine.

This does not run synthetic execute_model() warmup. That path is model-specific and can require model-specific dummy inputs, so this PR keeps warmup limited to slot mapping only.

V2 warmup is not changed. Existing V1 sampler warmup is not changed.

Checked open PRs — no existing PR for V1 slot mapping JIT warning after the JIT monitoring change.

Test Plan

.venv/bin/python -m pytest tests/v1/worker/test_gpu_model_runner.py -v

pre-commit run ruff-format --files \
  vllm/v1/worker/block_table.py \
  vllm/v1/worker/gpu/warmup.py \
  vllm/v1/worker/gpu_worker.py \
  tests/v1/worker/test_gpu_model_runner.py

pre-commit run ruff-check --files \
  vllm/v1/worker/block_table.py \
  vllm/v1/worker/gpu/warmup.py \
  vllm/v1/worker/gpu_worker.py \
  tests/v1/worker/test_gpu_model_runner.py

pre-commit run mypy-3.10 --files \
  vllm/v1/worker/block_table.py \
  vllm/v1/worker/gpu/warmup.py \
  vllm/v1/worker/gpu_worker.py \
  tests/v1/worker/test_gpu_model_runner.py \
  --hook-stage manual

git diff --check

Test Result

tests/v1/worker/test_gpu_model_runner.py: 42 passed, 16 warnings.

ruff format / ruff check: passed.

mypy-3.10: passed.

git diff --check: passed.

Local OAI smoke on V1 runner:

Qwen3-8B text-only: HTTP 200, no _compute_slot_mapping_kernel warning on first request.

AI assistance was used (Codex, Claude).

Signed-off-by: lesj0610 <lesj0610@users.noreply.github.com>

Warm up V1 slot mapping before JIT monitor

393e70f

Signed-off-by: lesj0610 <lesj0610@users.noreply.github.com>

lesj0610 marked this pull request as draft May 9, 2026 13:10

lesj0610 closed this May 9, 2026

lesj0610 deleted the lesj/v1-slot-mapping-jit-warmup-clean-20260509 branch May 9, 2026 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Warm up slot mapping before JIT monitor#60

[V1] Warm up slot mapping before JIT monitor#60
lesj0610 wants to merge 1 commit into
lesj/pr-integration-dirty-diff-045d4230e1from
lesj/v1-slot-mapping-jit-warmup-clean-20260509

lesj0610 commented May 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lesj0610 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lesj0610 commented May 9, 2026 •

edited

Loading