fix: add Qwen3.5 version gate in loader dispatch#4335
Conversation
Qwen3.5 (model_type qwen3_5) only exists in transformers >= 5.0.0. Without this gate, loading a Qwen3.5 model on transformers 4.x gives an unhelpful generic error. This adds a clear version check before the qwen3 dispatch to prevent substring misrouting and give a useful error message pointing users to upgrade. No dedicated FastQwen3_5Model is needed -- the compiler already applies fused CE automatically via apply_fused_lm_head for both Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration. The generic FastModel fallback path handles everything. FORCE_FLOAT32 already has qwen3_5 on main. Tested on transformers 5.3.0: Qwen3.5-0.8B 4bit, 1.38 GB peak memory. Backwards compatible: import unsloth works on transformers 4.57.6.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness of the FastLanguageModel loader by introducing a version compatibility check specifically for Qwen3.5 models. The change prevents unhelpful errors when users attempt to load Qwen3.5 models with older transformers library versions, instead providing a clear message to upgrade. This ensures a smoother user experience and maintains stability across different transformers environments. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d2dba3296a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| elif model_type == "qwen3_5": | ||
| # Qwen3.5 only exists in transformers 5.x; the compiler handles | ||
| # fused CE automatically via the generic FastModel fallback path, | ||
| # so no dedicated FastQwen3_5Model is needed. | ||
| if transformers_version < Version("5.0.0"): |
There was a problem hiding this comment.
Assign dispatch model for qwen3_5 path
For model_type == "qwen3_5" on transformers >= 5.0.0, this branch performs only a version check and never sets dispatch_model, so execution later reaches dispatch_model.from_pretrained(...) and raises UnboundLocalError instead of loading the model. Before this change, qwen3_5 would fall through to the generic FastModel fallback; this new branch intercepts that flow and breaks successful loads in the supported version range.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request correctly identifies the need for a version gate for Qwen3.5 models to prevent unhelpful errors on older transformers versions. The added check ensures users have a compatible environment. However, the current implementation has a logical flaw that will cause a crash when a supported transformers version is used. My review includes a critical comment explaining the issue and suggesting a structural change to ensure the code correctly falls back to the generic model path as intended.
| elif model_type == "qwen3_5": | ||
| # Qwen3.5 only exists in transformers 5.x; the compiler handles | ||
| # fused CE automatically via the generic FastModel fallback path, | ||
| # so no dedicated FastQwen3_5Model is needed. | ||
| if transformers_version < Version("5.0.0"): | ||
| raise ImportError( | ||
| f"Unsloth: Your transformers version of {transformers_version} does not support Qwen3.5.\n" | ||
| f"The minimum required version is 5.0.0.\n" | ||
| f'Try `pip install --upgrade "transformers>=5.0.0"`\n' | ||
| f"to obtain the latest transformers build, then restart this session." | ||
| ) |
There was a problem hiding this comment.
There's a logical flow issue here that will cause a crash.
When model_type is qwen3_5 and the transformers version is sufficient, this elif block is entered. However, dispatch_model is not assigned a value. This will lead to an UnboundLocalError on line 715 when dispatch_model is used.
Your comment and the PR description correctly state that the intention is to use the generic FastModel fallback path (the else block at line 656). By adding this elif condition, you are preventing the code from ever reaching that else block for qwen3_5 models.
To fix this, this version check should be performed before the if/elif dispatch chain (which starts on line 562), and this elif block should be removed. This will allow the logic to correctly fall through to the generic else block for qwen3_5 models as intended.
|
@danielhanchen — one data point on the version gate.
If the gate is Happy to defer to your call — just wanted the evidence on record. |
|
@danielhanchen — respectfully, #4331 supersedes this PR, not the other way around. This PR adds 11 lines: a version gate and error message. #4331 adds everything this PR does plus RoPE patching, attention kernel replacements, explicit GDN layer handling, a Version gate: The GDN argument: The compiler handles fused CE generically — agreed. But routing Would you consider closing this in favour of #4331? Happy to apply all 6 fixes from your #4334 review immediately. |
The (1+w) RMSNorm pattern does not overflow float16 since Qwen3_5RMSNorm computes in float32 internally. The actual reason FORCE_FLOAT32 is needed is that Qwen3.5 GDN layers produce NaN grad norms during float16 training. Updated the comment to reflect the real reason.
The elif block intercepted qwen3_5 on transformers >= 5.0.0 without setting dispatch_model, causing UnboundLocalError at line 715. Move the version check before the if/elif dispatch chain so on transformers >= 5.0.0 the model_type falls through to the generic FastModel path as intended.
Checked all 5.x releases: - 5.0.0: no qwen3_5 module - 5.1.0: no qwen3_5 module - 5.2.0: qwen3_5 available
The previous version check at the dispatch chain was unreachable -- AutoConfig.from_pretrained fails first with a generic "does not recognize this architecture" error on transformers < 5.2.0, so execution never reached the check. Move the qwen3_5-specific error message into the AutoConfig exception handler where "architecture" errors are caught. This intercepts the error before the generic message and gives users a clear upgrade path. Also remove the now-redundant check before the dispatch chain. Both FastLanguageModel and FastModel paths are covered. Tested: transformers 4.57.6 shows the Qwen3.5-specific error, transformers 5.3.0 loads and trains normally.
|
@vitalis Thanks for your contribution again, but this PR should resolve it - Qwen3.5 is only in 5.2.0 and older |
* fix: add Qwen3.5 version gate in loader dispatch (unslothai#4188) Qwen3.5 (model_type qwen3_5) only exists in transformers >= 5.0.0. Without this gate, loading a Qwen3.5 model on transformers 4.x gives an unhelpful generic error. This adds a clear version check before the qwen3 dispatch to prevent substring misrouting and give a useful error message pointing users to upgrade. No dedicated FastQwen3_5Model is needed -- the compiler already applies fused CE automatically via apply_fused_lm_head for both Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration. The generic FastModel fallback path handles everything. FORCE_FLOAT32 already has qwen3_5 on main. Tested on transformers 5.3.0: Qwen3.5-0.8B 4bit, 1.38 GB peak memory. Backwards compatible: import unsloth works on transformers 4.57.6. * fix: update FORCE_FLOAT32 comment for qwen3_5 The (1+w) RMSNorm pattern does not overflow float16 since Qwen3_5RMSNorm computes in float32 internally. The actual reason FORCE_FLOAT32 is needed is that Qwen3.5 GDN layers produce NaN grad norms during float16 training. Updated the comment to reflect the real reason. * fix: move qwen3_5 version check before dispatch chain The elif block intercepted qwen3_5 on transformers >= 5.0.0 without setting dispatch_model, causing UnboundLocalError at line 715. Move the version check before the if/elif dispatch chain so on transformers >= 5.0.0 the model_type falls through to the generic FastModel path as intended. * fix: qwen3_5 requires transformers >= 5.2.0, not 5.0.0 Checked all 5.x releases: - 5.0.0: no qwen3_5 module - 5.1.0: no qwen3_5 module - 5.2.0: qwen3_5 available * fix: move qwen3_5 version check into AutoConfig error handler The previous version check at the dispatch chain was unreachable -- AutoConfig.from_pretrained fails first with a generic "does not recognize this architecture" error on transformers < 5.2.0, so execution never reached the check. Move the qwen3_5-specific error message into the AutoConfig exception handler where "architecture" errors are caught. This intercepts the error before the generic message and gives users a clear upgrade path. Also remove the now-redundant check before the dispatch chain. Both FastLanguageModel and FastModel paths are covered. Tested: transformers 4.57.6 shows the Qwen3.5-specific error, transformers 5.3.0 loads and trains normally. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Summary
Fixes #4188. Adds a Qwen3.5-specific error message when users try to load Qwen3.5 models on unsupported transformers versions, and corrects the FORCE_FLOAT32 comment.
Qwen3.5 requires transformers >= 5.2.0. Previously, users on older versions got a generic "does not recognize this architecture" error with no guidance. This intercepts that error in the
AutoConfigexception handler and provides a clear upgrade path.No dedicated
FastQwen3_5Modelclass is needed -- the unsloth compiler already applies fused CE automatically viaapply_fused_lm_headfor bothQwen3_5ForCausalLMandQwen3_5ForConditionalGenerationthrough the genericFastModelfallback path.Changes
unsloth/models/loader.py:qwen3_5check inAutoConfig.from_pretrainederror handler (bothFastLanguageModelandFastModelpaths) to show a specific error message pointing users totransformers>=5.2.0qwen3_5: the(1+w)RMSNorm pattern does not overflow float16 (it computes in float32 internally), the real reason is GDN layers produce NaN grad norms during float16 trainingTest results
Supersedes #4331. Companion PR: unslothai/unsloth-zoo#552 (Conv dtype fix in compiler).