[CI] Add Qwen3.5-0.8B hybrid smoke test and fix json parsing by ricky-chaoju · Pull Request #239 · vllm-project/vllm-metal

ricky-chaoju · 2026-04-07T10:21:32Z

Summary

Add Qwen3.5-0.8B smoke test alongside the existing Qwen3-0.6B test, covering the hybrid SDPA + GDN linear attention paged path end-to-end
Fix json.load → json.loads(strict=False) for both smoke tests — responses containing newlines (e.g. Qwen3.5 output) cause Invalid control character with strict parsing
Pin model revision to 2fc06364715b967f1860aea9cf38778875588b17
Use longer health check timeout for Qwen3.5 (--retry 30 --retry-delay 10)
Use --max-num-seqs 1 and VLLM_METAL_MEMORY_FRACTION=0.8 for Qwen3.5 to fit within the CI runner's ~5GB Metal memory (hybrid models allocate GDN linear state per slot, default 256 slots would exceed budget)

Depends on #235 (merged) for the block_size translation fix.

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>

WindChimeRan

Could you please refactor it to reduce duplicated code?

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>

WindChimeRan

VLLM_METAL_MEMORY_FRACTION is spilled in muliple places with different abstraction hierarchy

If 0.8 works, we can put everying in 0.8

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>

…oject#239) ## Summary - Add Qwen3.5-0.8B smoke test alongside the existing Qwen3-0.6B test, covering the hybrid SDPA + GDN linear attention paged path end-to-end - Fix `json.load` → `json.loads(strict=False)` for both smoke tests — responses containing newlines (e.g. Qwen3.5 output) cause `Invalid control character` with strict parsing - Pin model revision to `2fc06364715b967f1860aea9cf38778875588b17` - Use longer health check timeout for Qwen3.5 (`--retry 30 --retry-delay 10`) - Use `--max-num-seqs 1` and `VLLM_METAL_MEMORY_FRACTION=0.8` for Qwen3.5 to fit within the CI runner's ~5GB Metal memory (hybrid models allocate GDN linear state per slot, default 256 slots would exceed budget) Depends on vllm-project#235 (merged) for the block_size translation fix. --------- Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>

ricky-chaoju added 2 commits April 7, 2026 18:17

Add Qwen3.5-0.8B hybrid smoke test (SDPA + GDN paged path)

243b367

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>

Reduce Qwen3.5 memory: max-num-seqs=1, fraction=0.8 for CI runner

e30fed1

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>

WindChimeRan requested changes Apr 8, 2026

View reviewed changes

ricky-chaoju added 2 commits April 8, 2026 08:13

Refactor smoke tests: extract run_smoke_test to reduce duplication

5c8db3f

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>

Fix unbound variable when extra_args is empty under set -eu

0ee44a4

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>

ricky-chaoju requested a review from WindChimeRan April 8, 2026 00:48

WindChimeRan requested changes Apr 8, 2026

View reviewed changes

Unify VLLM_METAL_MEMORY_FRACTION=0.8 for all smoke tests

74ff643

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>

ricky-chaoju requested a review from WindChimeRan April 8, 2026 01:22

WindChimeRan approved these changes Apr 8, 2026

View reviewed changes

WindChimeRan merged commit 6dbf01c into vllm-project:main Apr 8, 2026
5 checks passed

ricky-chaoju deleted the ci/add-qwen35-smoke-test branch April 8, 2026 03:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add Qwen3.5-0.8B hybrid smoke test and fix json parsing#239

[CI] Add Qwen3.5-0.8B hybrid smoke test and fix json parsing#239
WindChimeRan merged 5 commits intovllm-project:mainfrom
ricky-chaoju:ci/add-qwen35-smoke-test

ricky-chaoju commented Apr 7, 2026 •

edited

Loading

Uh oh!

WindChimeRan left a comment

Uh oh!

WindChimeRan left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ricky-chaoju commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

WindChimeRan left a comment

Choose a reason for hiding this comment

Uh oh!

WindChimeRan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ricky-chaoju commented Apr 7, 2026 •

edited

Loading

WindChimeRan left a comment •

edited

Loading