Harden Qwen3.5 CI test to detect regressions by shepark · Pull Request #1443 · vllm-project/vllm-gaudi

shepark · 2026-05-12T18:44:02Z

#1433 fixed a Qwen3.5 accuracy regression that was only detected
when the prompt bucket batch size is large. Adding VLLM_PROMPT_BS_BUCKET_MAX=32 to the CI test covers that case.
Also tighten the passing threshold to better catch future regressions.

Signed-off-by: Seunghyuk Park <separk@habana.ai>

Copilot

Pull request overview

This PR hardens the Qwen3.5-35B-A3B GSM8K CI signal to better detect accuracy regressions that only appear at larger prompt batching, aligning coverage with the regression fixed in #1433.

Changes:

Increase CI coverage by running the Qwen3.5 GSM8K test with VLLM_PROMPT_BS_BUCKET_MAX=32.
Tighten the GSM8K exact-match pass threshold for the Qwen3.5-35B-A3B model card (0.75 → 0.9).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`tests/full_tests/model_cards/qwen3.5-35b-a3b.yaml`	Raises the required GSM8K exact-match threshold to better catch future accuracy regressions.
`tests/full_tests/ci_e2e_discoverable_tests.sh`	Sets `VLLM_PROMPT_BS_BUCKET_MAX=32` for the Qwen3.5 GSM8K CI run to exercise larger prompt batch bucketing behavior.

github-actions · 2026-05-13T14:41:30Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

github-actions · 2026-05-15T04:04:20Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

github-actions · 2026-05-15T19:55:38Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

#1433 fixed a Qwen3.5 accuracy regression that was only detected when the prompt bucket batch size is large. Adding VLLM_PROMPT_BS_BUCKET_MAX=32 to the CI test covers that case. Also tighten the passing threshold to better catch future regressions. Signed-off-by: Seunghyuk Park <separk@habana.ai> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Libin Tang <libin.tang@intel.com>

Harden Qwen3.5 CI test to detect regressions

5a71917

Signed-off-by: Seunghyuk Park <separk@habana.ai>

Copilot AI review requested due to automatic review settings May 12, 2026 18:44

shepark requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, jbyczkow, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners May 12, 2026 18:44

Copilot started reviewing on behalf of shepark May 12, 2026 18:44 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 12, 2026

🚦 Team Review Dashboard #701

Open

Merge branch 'main' into shepark/qwen35_ci_test_update

7147674

adobrzyn approved these changes May 13, 2026

View reviewed changes

shepark and others added 2 commits May 13, 2026 07:58

Merge branch 'main' into shepark/qwen35_ci_test_update

dd77eca

Merge branch 'main' into shepark/qwen35_ci_test_update

7d6997f

Merge branch 'main' into shepark/qwen35_ci_test_update

43049e3

iboiko-habana merged commit 252970e into vllm-project:main May 18, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden Qwen3.5 CI test to detect regressions#1443

Harden Qwen3.5 CI test to detect regressions#1443
iboiko-habana merged 5 commits into
vllm-project:mainfrom
shepark:shepark/qwen35_ci_test_update

shepark commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

shepark commented May 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented May 13, 2026

✅ CI Passed

Uh oh!

github-actions Bot commented May 15, 2026

✅ CI Passed

Uh oh!

github-actions Bot commented May 15, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants