Skip to content

[CI] De-flake Language Models Test (Extended Generation) test_models(False-False-5-32-bigcode/starcoder2-3b)#42392

Merged
DarkLight1337 merged 1 commit into
vllm-project:mainfrom
haosdent:fix-starcoder2-3b-code-prompt
May 12, 2026
Merged

[CI] De-flake Language Models Test (Extended Generation) test_models(False-False-5-32-bigcode/starcoder2-3b)#42392
DarkLight1337 merged 1 commit into
vllm-project:mainfrom
haosdent:fix-starcoder2-3b-code-prompt

Conversation

@haosdent
Copy link
Copy Markdown
Contributor

@haosdent haosdent commented May 12, 2026

Purpose

Fix the long-standing CI failure for bigcode/starcoder2-3b in tests/models/language/generation/test_common.py on L4 (SM 8.9). Tracked in #37304 and #42336; #42336 narrows the regression window to 08bfedc15..2488d1dc which contains #34644 (PyTorch 2.10 → 2.11)

Test1 is an open-ended NL prompt fed to a code-completion model. After ~8 matched tokens starcoder2-3b wanders into a Jupyter-style markdown id and lands in a near-uniform digit-token logit region; HF↔vLLM bf16 drift over 30 layers reorders top-K by one logprob bit, so output_id_0 in logprobs_elem_1 fails. Replace Test1 only for this model with a code prompt that keeps it on its training distribution; other models are unchanged.

Recent failing daily/nightly runs (all on L4): #65678, #65498, #65468, #65423, #65378, #65324, #65246, #65099, #65033, #64859, #64792.

Test Plan

Test Result

bigcode/starcoder2-3b fails models/language/generation/test_common.py
on L4 (SM 8.9). The failure is on Test1, an open-ended NL prompt fed to
a code-completion model. After 8 matched tokens the model wanders into
a Jupyter notebook markdown id and lands in a near-uniform digit-token
logit region; HF and vLLM disagree on which digit lands in top-5 by ~1
logprob bit, so `output_id_0 in logprobs_elem_1` fails.

Replace Test1 only for bigcode/starcoder2-3b with a code prompt that
keeps the model on its training distribution and produces sharp top-1
logits at every position. Other models are unchanged.

Verified locally on GB10: both parametrizations PASS in ~95s each.

Signed-off-by: haosdent <haosdent@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the common generation tests in tests/models/language/generation/test_common.py to specifically handle the bigcode/starcoder2-3b model. It replaces a natural language prompt with a code-based prompt to mitigate test instability caused by logit drift between Hugging Face and vLLM implementations. I have no feedback to provide as there were no review comments.

@haosdent haosdent changed the title [WIP][CI/Build] Replace Test1 with a code prompt for starcoder2-3b [CI] De-flake Language Models Test (Extended Generation) test_models(False-False-5-32-bigcode/starcoder2-3b) May 12, 2026
@haosdent haosdent marked this pull request as ready for review May 12, 2026 08:19
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@haosdent
Copy link
Copy Markdown
Contributor Author

@zou3519 @yzong-rh May you help to review when you are available?
cc @ZJY0516

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 12, 2026 09:13
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 12, 2026
Copy link
Copy Markdown
Member

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@DarkLight1337 DarkLight1337 merged commit fc8bf6e into vllm-project:main May 12, 2026
24 checks passed
@yzong-rh
Copy link
Copy Markdown
Contributor

Thank you!

weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
…False-False-5-32-bigcode/starcoder2-3b) (vllm-project#42392)

Signed-off-by: haosdent <haosdent@gmail.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
…False-False-5-32-bigcode/starcoder2-3b) (vllm-project#42392)

Signed-off-by: haosdent <haosdent@gmail.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
…False-False-5-32-bigcode/starcoder2-3b) (vllm-project#42392)

Signed-off-by: haosdent <haosdent@gmail.com>
h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026
…False-False-5-32-bigcode/starcoder2-3b) (vllm-project#42392)

Signed-off-by: haosdent <haosdent@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants