fix: add Qwen3.5 version gate in loader dispatch by danielhanchen · Pull Request #4335 · unslothai/unsloth

danielhanchen · 2026-03-16T23:29:09Z

Summary

Fixes #4188. Adds a Qwen3.5-specific error message when users try to load Qwen3.5 models on unsupported transformers versions, and corrects the FORCE_FLOAT32 comment.

Qwen3.5 requires transformers >= 5.2.0. Previously, users on older versions got a generic "does not recognize this architecture" error with no guidance. This intercepts that error in the AutoConfig exception handler and provides a clear upgrade path.

No dedicated FastQwen3_5Model class is needed -- the unsloth compiler already applies fused CE automatically via apply_fused_lm_head for both Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration through the generic FastModel fallback path.

Changes

unsloth/models/loader.py:

Add qwen3_5 check in AutoConfig.from_pretrained error handler (both FastLanguageModel and FastModel paths) to show a specific error message pointing users to transformers>=5.2.0
Update FORCE_FLOAT32 comment for qwen3_5: the (1+w) RMSNorm pattern does not overflow float16 (it computes in float32 internally), the real reason is GDN layers produce NaN grad norms during float16 training

Test results

Environment	Result
transformers 4.57.6: load Qwen3.5	Clear error: "minimum required version is 5.2.0"
transformers 5.3.0: Qwen3.5-0.8B 4bit training 200 steps	1.55 GB peak, loss 1.50, no NaN
transformers 5.3.0: import unsloth	works

Supersedes #4331. Companion PR: unslothai/unsloth-zoo#552 (Conv dtype fix in compiler).

Qwen3.5 (model_type qwen3_5) only exists in transformers >= 5.0.0. Without this gate, loading a Qwen3.5 model on transformers 4.x gives an unhelpful generic error. This adds a clear version check before the qwen3 dispatch to prevent substring misrouting and give a useful error message pointing users to upgrade. No dedicated FastQwen3_5Model is needed -- the compiler already applies fused CE automatically via apply_fused_lm_head for both Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration. The generic FastModel fallback path handles everything. FORCE_FLOAT32 already has qwen3_5 on main. Tested on transformers 5.3.0: Qwen3.5-0.8B 4bit, 1.38 GB peak memory. Backwards compatible: import unsloth works on transformers 4.57.6.

gemini-code-assist · 2026-03-16T23:29:21Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of the FastLanguageModel loader by introducing a version compatibility check specifically for Qwen3.5 models. The change prevents unhelpful errors when users attempt to load Qwen3.5 models with older transformers library versions, instead providing a clear message to upgrade. This ensures a smoother user experience and maintains stability across different transformers environments.

Highlights

Qwen3.5 Version Gate: Implemented a version check for Qwen3.5 models within the FastLanguageModel loader dispatch to ensure compatibility with transformers versions 5.0.0 and higher.
Improved Error Handling: Introduced a specific ImportError message that guides users to upgrade their transformers library if they attempt to load a Qwen3.5 model with an unsupported older version.
Qwen3.5 Model Handling: Confirmed that no dedicated FastQwen3_5Model class is required, as the unsloth compiler automatically applies fused CE via the generic FastModel fallback path for Qwen3.5 models.

Changelog

unsloth/models/loader.py
- Added an elif condition to handle qwen3_5 model types.
- Implemented a version check to ensure transformers is at least 5.0.0 for Qwen3.5.
- Included an ImportError with upgrade instructions for incompatible transformers versions.

Activity

Fixed issue [Bug] Extremely high CPU/VRAM usage and slow training with Qwen3.5 #4188.
Superseded pull request fix: Qwen3.5 OOM during training — add FastQwen3_5Model with fused CE loss #4331.
Verified Qwen3.5-0.8B 4bit training on transformers 5.3.0 with 1.38 GB peak memory and 1.88 loss.
Confirmed that importing unsloth works without crashing on transformers 4.57.6.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2dba3296a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-16T23:30:14Z

+        elif model_type == "qwen3_5":
+            # Qwen3.5 only exists in transformers 5.x; the compiler handles
+            # fused CE automatically via the generic FastModel fallback path,
+            # so no dedicated FastQwen3_5Model is needed.
+            if transformers_version < Version("5.0.0"):


Assign dispatch model for qwen3_5 path

For model_type == "qwen3_5" on transformers >= 5.0.0, this branch performs only a version check and never sets dispatch_model, so execution later reaches dispatch_model.from_pretrained(...) and raises UnboundLocalError instead of loading the model. Before this change, qwen3_5 would fall through to the generic FastModel fallback; this new branch intercepts that flow and breaks successful loads in the supported version range.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request correctly identifies the need for a version gate for Qwen3.5 models to prevent unhelpful errors on older transformers versions. The added check ensures users have a compatible environment. However, the current implementation has a logical flaw that will cause a crash when a supported transformers version is used. My review includes a critical comment explaining the issue and suggesting a structural change to ensure the code correctly falls back to the generic model path as intended.

gemini-code-assist · 2026-03-16T23:31:35Z

+        elif model_type == "qwen3_5":
+            # Qwen3.5 only exists in transformers 5.x; the compiler handles
+            # fused CE automatically via the generic FastModel fallback path,
+            # so no dedicated FastQwen3_5Model is needed.
+            if transformers_version < Version("5.0.0"):
+                raise ImportError(
+                    f"Unsloth: Your transformers version of {transformers_version} does not support Qwen3.5.\n"
+                    f"The minimum required version is 5.0.0.\n"
+                    f'Try `pip install --upgrade "transformers>=5.0.0"`\n'
+                    f"to obtain the latest transformers build, then restart this session."
+                )


There's a logical flow issue here that will cause a crash.

When model_type is qwen3_5 and the transformers version is sufficient, this elif block is entered. However, dispatch_model is not assigned a value. This will lead to an UnboundLocalError on line 715 when dispatch_model is used.

Your comment and the PR description correctly state that the intention is to use the generic FastModel fallback path (the else block at line 656). By adding this elif condition, you are preventing the code from ever reaching that else block for qwen3_5 models.

To fix this, this version check should be performed before the if/elif dispatch chain (which starts on line 562), and this elif block should be removed. This will allow the logic to correctly fall through to the generic else block for qwen3_5 models as intended.

vitalis · 2026-03-17T00:06:21Z

@danielhanchen — one data point on the version gate.

>= 5.0.0 may be too conservative. We have a Qwen3.5-2B-Base training run currently executing on Kaggle T4 with transformers 4.53.x — the transformers.models.qwen3_5 module clearly exists in that release. The unsloth codebase itself uses 4.53.0 as the threshold for SUPPORTS_FALCON_H1 and SUPPORTS_GEMMA3N, which were added in the same transformers release window as Qwen3.5.

If the gate is >= 5.0.0, any user on transformers 4.53.x–4.57.x who tries to load Qwen3.5 gets a "please upgrade" error even though it would actually work. That is a subset of real Kaggle users today.

Happy to defer to your call — just wanted the evidence on record.

vitalis · 2026-03-17T00:11:54Z

@danielhanchen — respectfully, #4331 supersedes this PR, not the other way around.

This PR adds 11 lines: a version gate and error message. #4331 adds everything this PR does plus RoPE patching, attention kernel replacements, explicit GDN layer handling, a FastQwen3_5Model class, and 626 lines of unit tests. A subset cannot supersede a superset.

Version gate: >= 5.0.0 is incorrect. As noted in my other comment on this PR, Qwen3.5-2B-Base is training on Kaggle T4 with transformers 4.53.x right now. The correct threshold is 4.53.0, consistent with SUPPORTS_FALCON_H1 and SUPPORTS_GEMMA3N in the same file. Setting 5.0.0 falsely blocks real users on 4.53.x–4.57.x.

The GDN argument: The compiler handles fused CE generically — agreed. But routing qwen3_5 to the generic FastModel fallback leaves RoPE and attention kernels unoptimised for ~50% of layers (the standard transformer attention layers). The other ~50% are GDN linear-attention layers that must NOT be patched (incompatible with unsloth's attention patches). #4331 makes this boundary explicit. The generic path has no awareness of it.

Would you consider closing this in favour of #4331? Happy to apply all 6 fixes from your #4334 review immediately.

The (1+w) RMSNorm pattern does not overflow float16 since Qwen3_5RMSNorm computes in float32 internally. The actual reason FORCE_FLOAT32 is needed is that Qwen3.5 GDN layers produce NaN grad norms during float16 training. Updated the comment to reflect the real reason.

The elif block intercepted qwen3_5 on transformers >= 5.0.0 without setting dispatch_model, causing UnboundLocalError at line 715. Move the version check before the if/elif dispatch chain so on transformers >= 5.0.0 the model_type falls through to the generic FastModel path as intended.

Checked all 5.x releases: - 5.0.0: no qwen3_5 module - 5.1.0: no qwen3_5 module - 5.2.0: qwen3_5 available

The previous version check at the dispatch chain was unreachable -- AutoConfig.from_pretrained fails first with a generic "does not recognize this architecture" error on transformers < 5.2.0, so execution never reached the check. Move the qwen3_5-specific error message into the AutoConfig exception handler where "architecture" errors are caught. This intercepts the error before the generic message and gives users a clear upgrade path. Also remove the now-redundant check before the dispatch chain. Both FastLanguageModel and FastModel paths are covered. Tested: transformers 4.57.6 shows the Qwen3.5-specific error, transformers 5.3.0 loads and trains normally.

danielhanchen · 2026-03-17T03:37:38Z

@vitalis Thanks for your contribution again, but this PR should resolve it - Qwen3.5 is only in 5.2.0 and older

* fix: add Qwen3.5 version gate in loader dispatch (unslothai#4188) Qwen3.5 (model_type qwen3_5) only exists in transformers >= 5.0.0. Without this gate, loading a Qwen3.5 model on transformers 4.x gives an unhelpful generic error. This adds a clear version check before the qwen3 dispatch to prevent substring misrouting and give a useful error message pointing users to upgrade. No dedicated FastQwen3_5Model is needed -- the compiler already applies fused CE automatically via apply_fused_lm_head for both Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration. The generic FastModel fallback path handles everything. FORCE_FLOAT32 already has qwen3_5 on main. Tested on transformers 5.3.0: Qwen3.5-0.8B 4bit, 1.38 GB peak memory. Backwards compatible: import unsloth works on transformers 4.57.6. * fix: update FORCE_FLOAT32 comment for qwen3_5 The (1+w) RMSNorm pattern does not overflow float16 since Qwen3_5RMSNorm computes in float32 internally. The actual reason FORCE_FLOAT32 is needed is that Qwen3.5 GDN layers produce NaN grad norms during float16 training. Updated the comment to reflect the real reason. * fix: move qwen3_5 version check before dispatch chain The elif block intercepted qwen3_5 on transformers >= 5.0.0 without setting dispatch_model, causing UnboundLocalError at line 715. Move the version check before the if/elif dispatch chain so on transformers >= 5.0.0 the model_type falls through to the generic FastModel path as intended. * fix: qwen3_5 requires transformers >= 5.2.0, not 5.0.0 Checked all 5.x releases: - 5.0.0: no qwen3_5 module - 5.1.0: no qwen3_5 module - 5.2.0: qwen3_5 available * fix: move qwen3_5 version check into AutoConfig error handler The previous version check at the dispatch chain was unreachable -- AutoConfig.from_pretrained fails first with a generic "does not recognize this architecture" error on transformers < 5.2.0, so execution never reached the check. Move the qwen3_5-specific error message into the AutoConfig exception handler where "architecture" errors are caught. This intercepts the error before the generic message and gives users a clear upgrade path. Also remove the now-redundant check before the dispatch chain. Both FastLanguageModel and FastModel paths are covered. Tested: transformers 4.57.6 shows the Qwen3.5-specific error, transformers 5.3.0 loads and trains normally. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>

danielhanchen requested a review from mmathew23 as a code owner March 16, 2026 23:29

chatgpt-codex-connector Bot reviewed Mar 16, 2026

View reviewed changes

gemini-code-assist Bot reviewed Mar 16, 2026

View reviewed changes

vitalis mentioned this pull request Mar 17, 2026

fix: Qwen3.5 OOM during training — add FastQwen3_5Model with fused CE loss #4331

Closed

danielhanchen added 4 commits March 17, 2026 01:21

fix: qwen3_5 requires transformers >= 5.2.0, not 5.0.0

ca326bf

Checked all 5.x releases: - 5.0.0: no qwen3_5 module - 5.1.0: no qwen3_5 module - 5.2.0: qwen3_5 available

danielhanchen merged commit 6912a15 into main Mar 17, 2026
5 checks passed

danielhanchen deleted the fix/qwen3_5-loader-dispatch branch March 17, 2026 03:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add Qwen3.5 version gate in loader dispatch#4335

fix: add Qwen3.5 version gate in loader dispatch#4335
danielhanchen merged 5 commits into
mainfrom
fix/qwen3_5-loader-dispatch

danielhanchen commented Mar 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 16, 2026

Uh oh!

vitalis commented Mar 17, 2026

Uh oh!

vitalis commented Mar 17, 2026

Uh oh!

danielhanchen commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

danielhanchen commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test results

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

vitalis commented Mar 17, 2026

Uh oh!

vitalis commented Mar 17, 2026

Uh oh!

danielhanchen commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielhanchen commented Mar 16, 2026 •

edited

Loading