Skip to content

[Model] Register Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM for text-only checkpoints#36289

Closed
aminsamir45 wants to merge 4 commits intovllm-project:mainfrom
aminsamir45:fix/register-qwen3-5-causal-lm
Closed

[Model] Register Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM for text-only checkpoints#36289
aminsamir45 wants to merge 4 commits intovllm-project:mainfrom
aminsamir45:fix/register-qwen3-5-causal-lm

Conversation

@aminsamir45
Copy link
Copy Markdown

Summary

Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM classes already exist in qwen3_5.py but are not registered in _TEXT_GENERATION_MODELS in the model registry. This means vLLM cannot load text-only Qwen3.5 checkpoints (e.g. Qwen/Qwen3.5-4B).

When a text-only checkpoint specifies architectures: ["Qwen3_5ForCausalLM"], the registry lookup fails and vLLM falls through to the VLM class Qwen3_5ForConditionalGeneration, which expects weights under language_model.model.layers.* instead of model.layers.*, causing a hard weight mismatch.

This completely blocks using vLLM with TRL's GRPOTrainer (colocate mode) for RL training on text-only Qwen3.5 models.

Fix

Two lines added to _TEXT_GENERATION_MODELS in registry.py:

"Qwen3_5ForCausalLM": ("qwen3_5", "Qwen3_5ForCausalLM"),
"Qwen3_5MoeForCausalLM": ("qwen3_5", "Qwen3_5MoeForCausalLM"),

The implementation classes are already fully functional — they just weren't wired into the registry.

Fixes #36275
Related: #36236

Test plan

  • Load a text-only Qwen3.5 checkpoint (e.g. Qwen/Qwen3.5-4B) and verify it resolves to Qwen3_5ForCausalLM instead of Qwen3_5ForConditionalGeneration
  • Verify weights load correctly at model.layers.* without the language_model. prefix mismatch
  • Verify existing VLM usage of Qwen3_5ForConditionalGeneration is unaffected

Made with Cursor

@mergify mergify bot added new-model Requests to new models qwen Related to Qwen models labels Mar 6, 2026
@aminsamir45 aminsamir45 force-pushed the fix/register-qwen3-5-causal-lm branch from d02d404 to 69c39ca Compare March 6, 2026 22:43
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly registers Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM to enable loading text-only Qwen3.5 checkpoints. The change is straightforward and addresses the described issue. However, it appears to be missing a corresponding update to the test registry, which is a required step for adding new model architectures to ensure proper test coverage.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

…l registry

The Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM classes already exist
in qwen3_5.py but are not registered in the model registry, so vLLM
cannot load text-only Qwen3.5 checkpoints. When a text-only checkpoint
(e.g. Qwen3.5-4B) specifies architectures: ["Qwen3_5ForCausalLM"],
the registry lookup fails and vLLM falls through to the VLM class
Qwen3_5ForConditionalGeneration, which expects weights under
language_model.model.layers.* instead of model.layers.*, causing a
hard weight mismatch.

Changes:
- Add Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM to
  _TEXT_GENERATION_MODELS in registry.py
- Add IsHybrid mixin and get_mamba_state_dtype_from_config,
  get_mamba_state_shape_from_config, get_mamba_state_copy_func to
  Qwen3_5ForCausalLMBase so standalone text-only usage correctly
  configures the Gated DeltaNet state cache
- Add both architectures to MODELS_CONFIG_MAP in config.py so
  mamba_ssm_cache_dtype is auto-configured from the HF config
- Add test registry entries for CI coverage

Fixes vllm-project#36275

Signed-off-by: Samir Amin <aminsamir45@gmail.com>
Made-with: Cursor
@aminsamir45 aminsamir45 force-pushed the fix/register-qwen3-5-causal-lm branch from 69c39ca to efb2d60 Compare March 6, 2026 23:04
@ywang96
Copy link
Copy Markdown
Member

ywang96 commented Mar 7, 2026

This means vLLM cannot load text-only Qwen3.5 checkpoints (e.g. Qwen/Qwen3.5-4B).

This is not true - afaik all Qwen3.5 checkpoints are natively multimodal: https://huggingface.co/Qwen/Qwen3.5-4B/blob/main/config.json#L3

Do you have a text-only checkpoint that's available on huggingface? Adding them into the list of model architectures will also require public checkpoints with Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM in their architectures field for CI to test model loading

Comment on lines +482 to +489
"Qwen3_5ForCausalLM": _HfExamplesInfo(
"Qwen/Qwen3.5-0.8B",
max_model_len=4096,
),
"Qwen3_5MoeForCausalLM": _HfExamplesInfo(
"Qwen/Qwen3.5-35B-A3B",
max_model_len=4096,
),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor is just wrong here... both models are registered as XXXForConditionalGeneration

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests have been removed

aminsamir45 and others added 2 commits March 8, 2026 14:36
The example models (Qwen/Qwen3.5-0.8B, Qwen/Qwen3.5-35B-A3B) are VLMs
that use Qwen3_5ForConditionalGeneration, not Qwen3_5ForCausalLM. The
test entries never exercised the text-only code path.

For text-only checkpoints produced by fine-tuning a Qwen3.5 VLM with
AutoModelForCausalLM, the officially supported path is to load via the
VLM class with language_model_only=True, which skips the vision pipeline
and loads only the LM backbone.

Signed-off-by: Samir Amin <aminsamir45@gmail.com>
@ywang96
Copy link
Copy Markdown
Member

ywang96 commented Mar 8, 2026

There are no changes from this PR - do you still intend to keep it?

If the purpose is to provide guidance for people who run into the issue when they cannot initialize the model as a text-only model, we already have --language-model-only flag and it'd be great if you can enhance our documentation about it! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-model Requests to new models qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Qwen3.5 4b incompatibility

2 participants