Skip to content

fix: add Qwen3.5 version gate in loader dispatch#4335

Merged
danielhanchen merged 5 commits into
mainfrom
fix/qwen3_5-loader-dispatch
Mar 17, 2026
Merged

fix: add Qwen3.5 version gate in loader dispatch#4335
danielhanchen merged 5 commits into
mainfrom
fix/qwen3_5-loader-dispatch

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

@danielhanchen danielhanchen commented Mar 16, 2026

Summary

Fixes #4188. Adds a Qwen3.5-specific error message when users try to load Qwen3.5 models on unsupported transformers versions, and corrects the FORCE_FLOAT32 comment.

Qwen3.5 requires transformers >= 5.2.0. Previously, users on older versions got a generic "does not recognize this architecture" error with no guidance. This intercepts that error in the AutoConfig exception handler and provides a clear upgrade path.

No dedicated FastQwen3_5Model class is needed -- the unsloth compiler already applies fused CE automatically via apply_fused_lm_head for both Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration through the generic FastModel fallback path.

Changes

unsloth/models/loader.py:

  • Add qwen3_5 check in AutoConfig.from_pretrained error handler (both FastLanguageModel and FastModel paths) to show a specific error message pointing users to transformers>=5.2.0
  • Update FORCE_FLOAT32 comment for qwen3_5: the (1+w) RMSNorm pattern does not overflow float16 (it computes in float32 internally), the real reason is GDN layers produce NaN grad norms during float16 training

Test results

Environment Result
transformers 4.57.6: load Qwen3.5 Clear error: "minimum required version is 5.2.0"
transformers 5.3.0: Qwen3.5-0.8B 4bit training 200 steps 1.55 GB peak, loss 1.50, no NaN
transformers 5.3.0: import unsloth works

Supersedes #4331. Companion PR: unslothai/unsloth-zoo#552 (Conv dtype fix in compiler).

Qwen3.5 (model_type qwen3_5) only exists in transformers >= 5.0.0.
Without this gate, loading a Qwen3.5 model on transformers 4.x gives
an unhelpful generic error. This adds a clear version check before the
qwen3 dispatch to prevent substring misrouting and give a useful error
message pointing users to upgrade.

No dedicated FastQwen3_5Model is needed -- the compiler already applies
fused CE automatically via apply_fused_lm_head for both
Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration. The generic
FastModel fallback path handles everything.

FORCE_FLOAT32 already has qwen3_5 on main.

Tested on transformers 5.3.0: Qwen3.5-0.8B 4bit, 1.38 GB peak memory.
Backwards compatible: import unsloth works on transformers 4.57.6.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of the FastLanguageModel loader by introducing a version compatibility check specifically for Qwen3.5 models. The change prevents unhelpful errors when users attempt to load Qwen3.5 models with older transformers library versions, instead providing a clear message to upgrade. This ensures a smoother user experience and maintains stability across different transformers environments.

Highlights

  • Qwen3.5 Version Gate: Implemented a version check for Qwen3.5 models within the FastLanguageModel loader dispatch to ensure compatibility with transformers versions 5.0.0 and higher.
  • Improved Error Handling: Introduced a specific ImportError message that guides users to upgrade their transformers library if they attempt to load a Qwen3.5 model with an unsupported older version.
  • Qwen3.5 Model Handling: Confirmed that no dedicated FastQwen3_5Model class is required, as the unsloth compiler automatically applies fused CE via the generic FastModel fallback path for Qwen3.5 models.
Changelog
  • unsloth/models/loader.py
    • Added an elif condition to handle qwen3_5 model types.
    • Implemented a version check to ensure transformers is at least 5.0.0 for Qwen3.5.
    • Included an ImportError with upgrade instructions for incompatible transformers versions.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2dba3296a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth/models/loader.py Outdated
Comment on lines +618 to +622
elif model_type == "qwen3_5":
# Qwen3.5 only exists in transformers 5.x; the compiler handles
# fused CE automatically via the generic FastModel fallback path,
# so no dedicated FastQwen3_5Model is needed.
if transformers_version < Version("5.0.0"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Assign dispatch model for qwen3_5 path

For model_type == "qwen3_5" on transformers >= 5.0.0, this branch performs only a version check and never sets dispatch_model, so execution later reaches dispatch_model.from_pretrained(...) and raises UnboundLocalError instead of loading the model. Before this change, qwen3_5 would fall through to the generic FastModel fallback; this new branch intercepts that flow and breaks successful loads in the supported version range.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies the need for a version gate for Qwen3.5 models to prevent unhelpful errors on older transformers versions. The added check ensures users have a compatible environment. However, the current implementation has a logical flaw that will cause a crash when a supported transformers version is used. My review includes a critical comment explaining the issue and suggesting a structural change to ensure the code correctly falls back to the generic model path as intended.

Comment thread unsloth/models/loader.py Outdated
Comment on lines +618 to +628
elif model_type == "qwen3_5":
# Qwen3.5 only exists in transformers 5.x; the compiler handles
# fused CE automatically via the generic FastModel fallback path,
# so no dedicated FastQwen3_5Model is needed.
if transformers_version < Version("5.0.0"):
raise ImportError(
f"Unsloth: Your transformers version of {transformers_version} does not support Qwen3.5.\n"
f"The minimum required version is 5.0.0.\n"
f'Try `pip install --upgrade "transformers>=5.0.0"`\n'
f"to obtain the latest transformers build, then restart this session."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a logical flow issue here that will cause a crash.

When model_type is qwen3_5 and the transformers version is sufficient, this elif block is entered. However, dispatch_model is not assigned a value. This will lead to an UnboundLocalError on line 715 when dispatch_model is used.

Your comment and the PR description correctly state that the intention is to use the generic FastModel fallback path (the else block at line 656). By adding this elif condition, you are preventing the code from ever reaching that else block for qwen3_5 models.

To fix this, this version check should be performed before the if/elif dispatch chain (which starts on line 562), and this elif block should be removed. This will allow the logic to correctly fall through to the generic else block for qwen3_5 models as intended.

@vitalis
Copy link
Copy Markdown

vitalis commented Mar 17, 2026

@danielhanchen — one data point on the version gate.

>= 5.0.0 may be too conservative. We have a Qwen3.5-2B-Base training run currently executing on Kaggle T4 with transformers 4.53.x — the transformers.models.qwen3_5 module clearly exists in that release. The unsloth codebase itself uses 4.53.0 as the threshold for SUPPORTS_FALCON_H1 and SUPPORTS_GEMMA3N, which were added in the same transformers release window as Qwen3.5.

If the gate is >= 5.0.0, any user on transformers 4.53.x–4.57.x who tries to load Qwen3.5 gets a "please upgrade" error even though it would actually work. That is a subset of real Kaggle users today.

Happy to defer to your call — just wanted the evidence on record.

@vitalis
Copy link
Copy Markdown

vitalis commented Mar 17, 2026

@danielhanchen — respectfully, #4331 supersedes this PR, not the other way around.

This PR adds 11 lines: a version gate and error message. #4331 adds everything this PR does plus RoPE patching, attention kernel replacements, explicit GDN layer handling, a FastQwen3_5Model class, and 626 lines of unit tests. A subset cannot supersede a superset.

Version gate: >= 5.0.0 is incorrect. As noted in my other comment on this PR, Qwen3.5-2B-Base is training on Kaggle T4 with transformers 4.53.x right now. The correct threshold is 4.53.0, consistent with SUPPORTS_FALCON_H1 and SUPPORTS_GEMMA3N in the same file. Setting 5.0.0 falsely blocks real users on 4.53.x–4.57.x.

The GDN argument: The compiler handles fused CE generically — agreed. But routing qwen3_5 to the generic FastModel fallback leaves RoPE and attention kernels unoptimised for ~50% of layers (the standard transformer attention layers). The other ~50% are GDN linear-attention layers that must NOT be patched (incompatible with unsloth's attention patches). #4331 makes this boundary explicit. The generic path has no awareness of it.

Would you consider closing this in favour of #4331? Happy to apply all 6 fixes from your #4334 review immediately.

The (1+w) RMSNorm pattern does not overflow float16 since Qwen3_5RMSNorm
computes in float32 internally. The actual reason FORCE_FLOAT32 is needed
is that Qwen3.5 GDN layers produce NaN grad norms during float16 training.
Updated the comment to reflect the real reason.
The elif block intercepted qwen3_5 on transformers >= 5.0.0 without
setting dispatch_model, causing UnboundLocalError at line 715.

Move the version check before the if/elif dispatch chain so on
transformers >= 5.0.0 the model_type falls through to the generic
FastModel path as intended.
Checked all 5.x releases:
- 5.0.0: no qwen3_5 module
- 5.1.0: no qwen3_5 module
- 5.2.0: qwen3_5 available
The previous version check at the dispatch chain was unreachable --
AutoConfig.from_pretrained fails first with a generic "does not
recognize this architecture" error on transformers < 5.2.0, so
execution never reached the check.

Move the qwen3_5-specific error message into the AutoConfig exception
handler where "architecture" errors are caught. This intercepts the
error before the generic message and gives users a clear upgrade path.

Also remove the now-redundant check before the dispatch chain.
Both FastLanguageModel and FastModel paths are covered.

Tested: transformers 4.57.6 shows the Qwen3.5-specific error,
transformers 5.3.0 loads and trains normally.
@danielhanchen
Copy link
Copy Markdown
Member Author

@vitalis Thanks for your contribution again, but this PR should resolve it - Qwen3.5 is only in 5.2.0 and older

@danielhanchen danielhanchen merged commit 6912a15 into main Mar 17, 2026
5 checks passed
@danielhanchen danielhanchen deleted the fix/qwen3_5-loader-dispatch branch March 17, 2026 03:37
shibizhao pushed a commit to shibizhao/unsloth-npu that referenced this pull request Apr 7, 2026
* fix: add Qwen3.5 version gate in loader dispatch (unslothai#4188)

Qwen3.5 (model_type qwen3_5) only exists in transformers >= 5.0.0.
Without this gate, loading a Qwen3.5 model on transformers 4.x gives
an unhelpful generic error. This adds a clear version check before the
qwen3 dispatch to prevent substring misrouting and give a useful error
message pointing users to upgrade.

No dedicated FastQwen3_5Model is needed -- the compiler already applies
fused CE automatically via apply_fused_lm_head for both
Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration. The generic
FastModel fallback path handles everything.

FORCE_FLOAT32 already has qwen3_5 on main.

Tested on transformers 5.3.0: Qwen3.5-0.8B 4bit, 1.38 GB peak memory.
Backwards compatible: import unsloth works on transformers 4.57.6.

* fix: update FORCE_FLOAT32 comment for qwen3_5

The (1+w) RMSNorm pattern does not overflow float16 since Qwen3_5RMSNorm
computes in float32 internally. The actual reason FORCE_FLOAT32 is needed
is that Qwen3.5 GDN layers produce NaN grad norms during float16 training.
Updated the comment to reflect the real reason.

* fix: move qwen3_5 version check before dispatch chain

The elif block intercepted qwen3_5 on transformers >= 5.0.0 without
setting dispatch_model, causing UnboundLocalError at line 715.

Move the version check before the if/elif dispatch chain so on
transformers >= 5.0.0 the model_type falls through to the generic
FastModel path as intended.

* fix: qwen3_5 requires transformers >= 5.2.0, not 5.0.0

Checked all 5.x releases:
- 5.0.0: no qwen3_5 module
- 5.1.0: no qwen3_5 module
- 5.2.0: qwen3_5 available

* fix: move qwen3_5 version check into AutoConfig error handler

The previous version check at the dispatch chain was unreachable --
AutoConfig.from_pretrained fails first with a generic "does not
recognize this architecture" error on transformers < 5.2.0, so
execution never reached the check.

Move the qwen3_5-specific error message into the AutoConfig exception
handler where "architecture" errors are caught. This intercepts the
error before the generic message and gives users a clear upgrade path.

Also remove the now-redundant check before the dispatch chain.
Both FastLanguageModel and FastModel paths are covered.

Tested: transformers 4.57.6 shows the Qwen3.5-specific error,
transformers 5.3.0 loads and trains normally.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Extremely high CPU/VRAM usage and slow training with Qwen3.5

2 participants