[CI] Add Qwen3.5-0.8B hybrid smoke test and fix json parsing#239
Merged
WindChimeRan merged 5 commits intovllm-project:mainfrom Apr 8, 2026
Merged
Conversation
Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
WindChimeRan
requested changes
Apr 8, 2026
Collaborator
WindChimeRan
left a comment
There was a problem hiding this comment.
Could you please refactor it to reduce duplicated code?
Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
WindChimeRan
requested changes
Apr 8, 2026
Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
WindChimeRan
approved these changes
Apr 8, 2026
Alex-ai-future
pushed a commit
to Alex-ai-future/vllm-metal
that referenced
this pull request
Apr 8, 2026
…oject#239) ## Summary - Add Qwen3.5-0.8B smoke test alongside the existing Qwen3-0.6B test, covering the hybrid SDPA + GDN linear attention paged path end-to-end - Fix `json.load` → `json.loads(strict=False)` for both smoke tests — responses containing newlines (e.g. Qwen3.5 output) cause `Invalid control character` with strict parsing - Pin model revision to `2fc06364715b967f1860aea9cf38778875588b17` - Use longer health check timeout for Qwen3.5 (`--retry 30 --retry-delay 10`) - Use `--max-num-seqs 1` and `VLLM_METAL_MEMORY_FRACTION=0.8` for Qwen3.5 to fit within the CI runner's ~5GB Metal memory (hybrid models allocate GDN linear state per slot, default 256 slots would exceed budget) Depends on vllm-project#235 (merged) for the block_size translation fix. --------- Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
json.load→json.loads(strict=False)for both smoke tests — responses containing newlines (e.g. Qwen3.5 output) causeInvalid control characterwith strict parsing2fc06364715b967f1860aea9cf38778875588b17--retry 30 --retry-delay 10)--max-num-seqs 1andVLLM_METAL_MEMORY_FRACTION=0.8for Qwen3.5 to fit within the CI runner's ~5GB Metal memory (hybrid models allocate GDN linear state per slot, default 256 slots would exceed budget)Depends on #235 (merged) for the block_size translation fix.