Skip to content

Harden Qwen3.5 CI test to detect regressions#1443

Merged
iboiko-habana merged 5 commits into
vllm-project:mainfrom
shepark:shepark/qwen35_ci_test_update
May 18, 2026
Merged

Harden Qwen3.5 CI test to detect regressions#1443
iboiko-habana merged 5 commits into
vllm-project:mainfrom
shepark:shepark/qwen35_ci_test_update

Conversation

@shepark
Copy link
Copy Markdown
Contributor

@shepark shepark commented May 12, 2026

#1433 fixed a Qwen3.5 accuracy regression that was only detected
when the prompt bucket batch size is large. Adding VLLM_PROMPT_BS_BUCKET_MAX=32 to the CI test covers that case.
Also tighten the passing threshold to better catch future regressions.

Signed-off-by: Seunghyuk Park <separk@habana.ai>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Qwen3.5-35B-A3B GSM8K CI signal to better detect accuracy regressions that only appear at larger prompt batching, aligning coverage with the regression fixed in #1433.

Changes:

  • Increase CI coverage by running the Qwen3.5 GSM8K test with VLLM_PROMPT_BS_BUCKET_MAX=32.
  • Tighten the GSM8K exact-match pass threshold for the Qwen3.5-35B-A3B model card (0.75 → 0.9).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/full_tests/model_cards/qwen3.5-35b-a3b.yaml Raises the required GSM8K exact-match threshold to better catch future accuracy regressions.
tests/full_tests/ci_e2e_discoverable_tests.sh Sets VLLM_PROMPT_BS_BUCKET_MAX=32 for the Qwen3.5 GSM8K CI run to exercise larger prompt batch bucketing behavior.

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

@iboiko-habana iboiko-habana merged commit 252970e into vllm-project:main May 18, 2026
2 checks passed
iboiko-habana pushed a commit that referenced this pull request May 18, 2026
#1433 fixed a Qwen3.5
accuracy regression that was only detected
when the prompt bucket batch size is large. Adding
VLLM_PROMPT_BS_BUCKET_MAX=32 to the CI test covers that case.
Also tighten the passing threshold to better catch future regressions.

Signed-off-by: Seunghyuk Park <separk@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Co-authored-by: Libin Tang <libin.tang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants