[Model] Add Qwen3.5 hybrid model support#34131
[Model] Add Qwen3.5 hybrid model support#34131liuchenbing2026 wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for the Qwen3.5 hybrid model architecture. The implementation is well-structured, reusing components from Qwen3-Next where appropriate and adding new modules like Qwen3_5GatedDeltaNet for the model's specific characteristics. The changes include the model definition, its configuration, and registration within the vLLM framework. The code largely follows existing patterns, but I've identified one issue with an incorrect type hint that should be addressed for correctness and consistency.
| def get_mamba_state_shape_from_config( | ||
| cls, vllm_config: "VllmConfig" | ||
| ) -> tuple[tuple[int, int], tuple[int, int]]: |
There was a problem hiding this comment.
The return type hint for get_mamba_state_shape_from_config is incorrect. It is specified as tuple[tuple[int, int], tuple[int, int]], but the MambaStateShapeCalculator.gated_delta_net_state_shape function it calls returns a tuple where the second element is a 3-tuple (num_heads, head_v_dim, head_k_dim). The correct return type should be tuple[tuple[int, int], tuple[int, int, int]] to match the actual returned value and the base class IsHybrid.
| def get_mamba_state_shape_from_config( | |
| cls, vllm_config: "VllmConfig" | |
| ) -> tuple[tuple[int, int], tuple[int, int]]: | |
| def get_mamba_state_shape_from_config( | |
| cls, vllm_config: "VllmConfig" | |
| ) -> tuple[tuple[int, int], tuple[int, int, int]]: |
|
We already have another open PR for this: #34110 |
Add inference support for the Qwen3.5 hybrid architecture model. New files: - vllm/transformers_utils/configs/qwen3_5.py - vllm/model_executor/models/qwen3_5.py Modified files: - Register Qwen3_5TextConfig in config registry - Register Qwen3_5ForCausalLM in model registry
ccf450b to
c04143b
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
Closing as superseded by #34110 |
<!-- .github/pull_request_template.md --> ## 📌 Description Add test case for Qwen3N, and Qwen3.5 according to vllm-project/vllm#34131 <!-- What does this PR do? Briefly describe the changes and why they’re needed. --> ## 🔍 Related Issues <!-- Link any related issues here --> ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Expanded test coverage by adding additional head-configuration cases across multiple test scenarios to improve reliability and catch more edge cases. * No changes to test logic or public interfaces; only parameterized inputs were extended. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Add inference support for the Qwen3.5 hybrid architecture model, which interleaves full attention (transformer) and linear attention (Gated Delta Net) layers with dense MLP.
New files:
Modified files:
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.