[CI] De-flake Language Models Test (Extended Generation) test_models(False-False-5-32-bigcode/starcoder2-3b) by haosdent · Pull Request #42392 · vllm-project/vllm

haosdent · 2026-05-12T08:06:02Z

Purpose

Fix the long-standing CI failure for bigcode/starcoder2-3b in tests/models/language/generation/test_common.py on L4 (SM 8.9). Tracked in #37304 and #42336; #42336 narrows the regression window to 08bfedc15..2488d1dc which contains #34644 (PyTorch 2.10 → 2.11)

Test1 is an open-ended NL prompt fed to a code-completion model. After ~8 matched tokens starcoder2-3b wanders into a Jupyter-style markdown id and lands in a near-uniform digit-token logit region; HF↔vLLM bf16 drift over 30 layers reorders top-K by one logprob bit, so output_id_0 in logprobs_elem_1 fails. Replace Test1 only for this model with a code prompt that keeps it on its training distribution; other models are unchanged.

Recent failing daily/nightly runs (all on L4): #65678, #65498, #65468, #65423, #65378, #65324, #65246, #65099, #65033, #64859, #64792.

Test Plan

Test Result

bigcode/starcoder2-3b fails models/language/generation/test_common.py on L4 (SM 8.9). The failure is on Test1, an open-ended NL prompt fed to a code-completion model. After 8 matched tokens the model wanders into a Jupyter notebook markdown id and lands in a near-uniform digit-token logit region; HF and vLLM disagree on which digit lands in top-5 by ~1 logprob bit, so `output_id_0 in logprobs_elem_1` fails. Replace Test1 only for bigcode/starcoder2-3b with a code prompt that keeps the model on its training distribution and produces sharp top-1 logits at every position. Other models are unchanged. Verified locally on GB10: both parametrizations PASS in ~95s each. Signed-off-by: haosdent <haosdent@gmail.com>

gemini-code-assist

Code Review

This pull request updates the common generation tests in tests/models/language/generation/test_common.py to specifically handle the bigcode/starcoder2-3b model. It replaces a natural language prompt with a code-based prompt to mitigate test instability caused by logit drift between Hugging Face and vLLM implementations. I have no feedback to provide as there were no review comments.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

haosdent · 2026-05-12T08:20:07Z

@zou3519 @yzong-rh May you help to review when you are available?
cc @ZJY0516

ZJY0516

Thanks

yzong-rh · 2026-05-12T12:52:03Z

Thank you!

…False-False-5-32-bigcode/starcoder2-3b) (vllm-project#42392) Signed-off-by: haosdent <haosdent@gmail.com>

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

haosdent changed the title ~~[WIP][CI/Build] Replace Test1 with a code prompt for starcoder2-3b~~ [CI] De-flake Language Models Test (Extended Generation) test_models(False-False-5-32-bigcode/starcoder2-3b) May 12, 2026

haosdent marked this pull request as ready for review May 12, 2026 08:19

haosdent requested review from DarkLight1337 and ywang96 as code owners May 12, 2026 08:19

claude Bot reviewed May 12, 2026

View reviewed changes

DarkLight1337 approved these changes May 12, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) May 12, 2026 09:13

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 12, 2026

ZJY0516 approved these changes May 12, 2026

View reviewed changes

DarkLight1337 merged commit fc8bf6e into vllm-project:main May 12, 2026
24 checks passed

weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026

[CI] De-flake Language Models Test (Extended Generation) test_models(…

5253c6e

…False-False-5-32-bigcode/starcoder2-3b) (vllm-project#42392) Signed-off-by: haosdent <haosdent@gmail.com>

daniel-devlab mentioned this pull request May 14, 2026

[Bug]: Language Models Test (Extended Generation) test_models[False-False-5-32-bigcode/starcoder2-3b] test issue #37304

Open

1 task

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026

[CI] De-flake Language Models Test (Extended Generation) test_models(…

30690b1

…False-False-5-32-bigcode/starcoder2-3b) (vllm-project#42392) Signed-off-by: haosdent <haosdent@gmail.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[CI] De-flake Language Models Test (Extended Generation) test_models(…

362b365

…False-False-5-32-bigcode/starcoder2-3b) (vllm-project#42392) Signed-off-by: haosdent <haosdent@gmail.com>

h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026

[CI] De-flake Language Models Test (Extended Generation) test_models(…

b84ed93

…False-False-5-32-bigcode/starcoder2-3b) (vllm-project#42392) Signed-off-by: haosdent <haosdent@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] De-flake Language Models Test (Extended Generation) test_models(False-False-5-32-bigcode/starcoder2-3b)#42392

[CI] De-flake Language Models Test (Extended Generation) test_models(False-False-5-32-bigcode/starcoder2-3b)#42392
DarkLight1337 merged 1 commit into
vllm-project:mainfrom
haosdent:fix-starcoder2-3b-code-prompt

haosdent commented May 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

haosdent commented May 12, 2026

Uh oh!

ZJY0516 left a comment

Uh oh!

Uh oh!

yzong-rh commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

haosdent commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

haosdent commented May 12, 2026

Uh oh!

ZJY0516 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yzong-rh commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

haosdent commented May 12, 2026 •

edited

Loading