[CI] Reject out-of-vocabulary before they reach the GPU logprob path by AndreasKaratzas · Pull Request #44042 · vllm-project/vllm

AndreasKaratzas · 2026-05-30T01:49:49Z

This stabilizes a flakiness in ROCm CI seen at "AMD: Entrypoints Integration (API Server openai - Part 3) (mi325_1)":

https://buildkite.com/vllm/ci/builds/68845/canvas?sid=019e7252-a1c0-4bdd-a643-d3d57f1b41f1&tab=output

The first real failure came from the Schemathesis-generated /inference/v1/generate request:

{
  "sampling_params": {
    "logprob_token_ids": [-35, -11378, 3392, 1873042417, 126]
  },
  "token_ids": [0]
}

After that request, the engine hit a ROCm/HSA hardware exception and died. The later failures in the same test were cascade failures from the module-scoped API server being dead. SamplingParams.verify() validated the length of logprob_token_ids, but did not validate that every requested token ID was in the model vocabulary. This was flaky because Schemathesis only sometimes generated the invalid logprob_token_ids field during the full OpenAI schema test group. When it did, the first bad request killed the server and unrelated endpoint checks failed afterward.

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

hclsys · 2026-05-30T02:31:02Z

right shape — admission-time rejection on < 0 or >= vocab_size cuts the crash before it hits the kernel, and _validate_logprobs is the consistent place for it (same pattern as the recent max_logprobs ge=-1 fix).

one edge worth a thought: vocab_size = model_config.get_vocab_size() assumes the model config is fully populated by the time .verify() runs. is there a path where verify() is called before the model loads (or with a model that doesn't expose get_vocab_size — your test uses SimpleNamespace, real ModelConfig always has it?) so this'd raise AttributeError instead of returning a clean VLLMValidationError? probably fine in practice since the existing max_logprobs check above already uses model_config, just flagging.

AndreasKaratzas · 2026-05-30T03:11:52Z

@hclsys I think this is okay as-is because verify is typed around ModelConfig, and real ModelConfig always exposes get_vocab_size() from model_arch_config; this does not depend on weights being loaded. We already use get_vocab_size() for logprobs=-1 / prompt_logprobs=-1, so this should not introduce a new model_config assumption.

…mpler

…l_token_threshold Both fields were declared as bare `int` with no Field constraint, and the downstream validation chain only handled specific values: - `max_logprobs`: only `-1` is rewritten to vocab_size; other negatives flow through and either land in a confusing "max allowed: -5" error or silently no-op on the cap check. - `long_prefill_token_threshold`: the clamp is guarded by `0 < threshold < num_new_tokens` and the cap by `> max_model_len`, so a negative value matches neither and silently passes through unvalidated. Add field_validators (mode="after"), matching the pattern landed in vllm-project#43794 and the recent vllm-project#44002 / vllm-project#44042 / vllm-project#44057. `max_logprobs` keeps the `-1` sentinel for auto-derive; `long_prefill_token_threshold` requires `>= 0` (0 = off, > 0 = clamp). Fixes vllm-project#43985. Signed-off-by: Chenglun Hu <chenglunhu@gmail.com>

…mpler

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-06-01T02:39:21Z

Update: Hardened the model runner shutdown path on ROCm by explicitly releasing captured CUDA/HIP graph state before the worker clears static forward context, drops model references, resets workspace state, and tears down distributed state. The cause was an intermittent ROCm CI failure in:

tests/v1/sample/test_logprobs.py::test_spec_decode_logprobs[eagle0-raw_logits]

The condition is intentionally ROCm-only. CUDA and other platforms keep the previous shutdown behavior. The failing test has a very specific lifetime pattern:

ref_llm = LLM(...)
ref_llm.generate(...)
del ref_llm
torch.accelerator.empty_cache()
cleanup_dist_env_and_memory()

spec_llm = LLM(..., speculative_config=...)

The Buildkite failure occurs after the reference engine logs Shutdown complete, then the EAGLE speculative engine starts. The HSA fault appears right after distributed initialization and before model loading:

Initializing a V1 LLM engine ...
world_size=1 rank=0 ...
rank 0 in world size 1 ...
Memory access fault by GPU node-2 ... Reason: Write access to a read-only page.

That timing points at a delayed GPU resource lifetime problem from the previous engine, not a synchronous error in the new engine's Python config parsing or model loading.

AndreasKaratzas · 2026-06-01T02:39:56Z

cc @njhill Lmk if you are still OK with this PR. Don't want to squash it with new changes, even though they are all ROCm specific.

…mpler

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-06-01T20:07:57Z

Another small patch to keep the existing index_put_ logits update path for non-ROCm platforms, while routing ROCm through a flattened index_fill_ write. The previous CI run of this PR showed a GPU memory access fault during the thinking-budget forced-token write. My take was that ROCm graph/runtime changes exposed a latent ROCm/PyTorch advanced-indexing issue rather than introducing a thinking-budget logic regression. Flattening row/token pairs into one-dimensional indices avoids the fragile 2-D advanced-indexing kernel while preserving the same logits result.

…mpler

njhill · 2026-06-02T15:44:34Z

Thanks @AndreasKaratzas ... do you think we could split these other rocm fixes into a separate PR?

The OOV validation I think makes sense generally, the other things are partially hack workaround for rocm bug IIUC.

AndreasKaratzas · 2026-06-02T17:58:53Z

@njhill I added them here so that CI is green, cause without some of these, I would get errors, that were always there, but due to the refactoring were exposed only now.

…llm-project#44042) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

…llm-project#44042) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…llm-project#44042) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: JisoLya <523420504@qq.com>

Reject out-of-vocabulary before they reach the GPU logprob path

7db9924

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas marked this pull request as ready for review May 30, 2026 01:50

AndreasKaratzas requested review from NickLucche and njhill as code owners May 30, 2026 01:50

AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label May 30, 2026

mergify Bot added the v1 label May 30, 2026

Merge remote-tracking branch 'origin/main' into akaratza_stabilize_sa…

f507504

…mpler

This was referenced May 30, 2026

[Bugfix] Reject non-positive values for ParallelConfig int knobs #44057

Merged

fix(config): reject negative max_logprobs (except -1) and long_prefill_token_threshold #44070

Open

njhill approved these changes May 31, 2026

View reviewed changes

AndreasKaratzas added 2 commits May 31, 2026 21:30

Merge remote-tracking branch 'origin/main' into akaratza_stabilize_sa…

dabc06e

…mpler

[ROCm] Clear captured V1 CUDA graphs before shutdown teardown

ce2fcd5

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas added 3 commits June 1, 2026 02:47

Merge remote-tracking branch 'origin/main' into akaratza_stabilize_sa…

085f3d7

…mpler

Merge remote-tracking branch 'origin/main' into akaratza_stabilize_sa…

90a5589

…mpler

[ROCm] Avoid ROCm index_put fault in thinking budget

28d637b

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested review from 22quinn and houseroad as code owners June 1, 2026 20:06

Merge remote-tracking branch 'origin/main' into akaratza_stabilize_sa…

e43c09c

…mpler

AndreasKaratzas merged commit 53b88d1 into vllm-project:main Jun 3, 2026
67 checks passed

AndreasKaratzas deleted the akaratza_stabilize_sampler branch June 3, 2026 03:27

andakai pushed a commit to andakai/vllm that referenced this pull request Jun 4, 2026

[CI] Reject out-of-vocabulary before they reach the GPU logprob path (v…

a6f526b

…llm-project#44042) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026

[CI] Reject out-of-vocabulary before they reach the GPU logprob path (v…

1dd54a4

…llm-project#44042) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: JisoLya <523420504@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Reject out-of-vocabulary before they reach the GPU logprob path#44042

[CI] Reject out-of-vocabulary before they reach the GPU logprob path#44042
AndreasKaratzas merged 8 commits into
vllm-project:mainfrom
ROCm:akaratza_stabilize_sampler

AndreasKaratzas commented May 30, 2026

Uh oh!

hclsys commented May 30, 2026

Uh oh!

AndreasKaratzas commented May 30, 2026

Uh oh!

AndreasKaratzas commented Jun 1, 2026

Uh oh!

AndreasKaratzas commented Jun 1, 2026

Uh oh!

AndreasKaratzas commented Jun 1, 2026

Uh oh!

njhill commented Jun 2, 2026

Uh oh!

AndreasKaratzas commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

AndreasKaratzas commented May 30, 2026

Uh oh!

hclsys commented May 30, 2026

Uh oh!

AndreasKaratzas commented May 30, 2026

Uh oh!

AndreasKaratzas commented Jun 1, 2026

Uh oh!

AndreasKaratzas commented Jun 1, 2026

Uh oh!

AndreasKaratzas commented Jun 1, 2026

Uh oh!

njhill commented Jun 2, 2026

Uh oh!

AndreasKaratzas commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants