Skip to content

[CI] Add Qwen3.5-0.8B hybrid smoke test and fix json parsing#239

Merged
WindChimeRan merged 5 commits intovllm-project:mainfrom
ricky-chaoju:ci/add-qwen35-smoke-test
Apr 8, 2026
Merged

[CI] Add Qwen3.5-0.8B hybrid smoke test and fix json parsing#239
WindChimeRan merged 5 commits intovllm-project:mainfrom
ricky-chaoju:ci/add-qwen35-smoke-test

Conversation

@ricky-chaoju
Copy link
Copy Markdown
Contributor

@ricky-chaoju ricky-chaoju commented Apr 7, 2026

Summary

  • Add Qwen3.5-0.8B smoke test alongside the existing Qwen3-0.6B test, covering the hybrid SDPA + GDN linear attention paged path end-to-end
  • Fix json.loadjson.loads(strict=False) for both smoke tests — responses containing newlines (e.g. Qwen3.5 output) cause Invalid control character with strict parsing
  • Pin model revision to 2fc06364715b967f1860aea9cf38778875588b17
  • Use longer health check timeout for Qwen3.5 (--retry 30 --retry-delay 10)
  • Use --max-num-seqs 1 and VLLM_METAL_MEMORY_FRACTION=0.8 for Qwen3.5 to fit within the CI runner's ~5GB Metal memory (hybrid models allocate GDN linear state per slot, default 256 slots would exceed budget)

Depends on #235 (merged) for the block_size translation fix.

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
Copy link
Copy Markdown
Collaborator

@WindChimeRan WindChimeRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please refactor it to reduce duplicated code?

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
Copy link
Copy Markdown
Collaborator

@WindChimeRan WindChimeRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLLM_METAL_MEMORY_FRACTION is spilled in muliple places with different abstraction hierarchy

If 0.8 works, we can put everying in 0.8

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
@WindChimeRan WindChimeRan merged commit 6dbf01c into vllm-project:main Apr 8, 2026
5 checks passed
@ricky-chaoju ricky-chaoju deleted the ci/add-qwen35-smoke-test branch April 8, 2026 03:31
Alex-ai-future pushed a commit to Alex-ai-future/vllm-metal that referenced this pull request Apr 8, 2026
…oject#239)

## Summary
- Add Qwen3.5-0.8B smoke test alongside the existing Qwen3-0.6B test,
covering the hybrid SDPA + GDN linear attention paged path end-to-end
- Fix `json.load` → `json.loads(strict=False)` for both smoke tests —
responses containing newlines (e.g. Qwen3.5 output) cause `Invalid
control character` with strict parsing
- Pin model revision to `2fc06364715b967f1860aea9cf38778875588b17`
- Use longer health check timeout for Qwen3.5 (`--retry 30 --retry-delay
10`)
- Use `--max-num-seqs 1` and `VLLM_METAL_MEMORY_FRACTION=0.8` for
Qwen3.5 to fit within the CI runner's ~5GB Metal memory (hybrid models
allocate GDN linear state per slot, default 256 slots would exceed
budget)

Depends on vllm-project#235 (merged) for the block_size translation fix.

---------

Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants