Fix loss function not patched for Qwen3.5 models by rycerzes · Pull Request #5442 · unslothai/unsloth

rycerzes · 2026-05-15T11:06:31Z

Qwen3.5 models (Qwen3_5ForConditionalGeneration) have a loss_type of "ForConditionalGeneration" rather than "ForCausalLM". patch_loss_functions only ever updated the "ForCausalLM" key in LOSS_MAPPING, so Qwen3.5 silently kept using HuggingFace's stock ForCausalLMLoss — which casts logits to fp32 and immediately OOMs on anything ≤24GB at Qwen3.5's vocab size of 248k tokens.

The fix is straightforward: instead of hardcoding "ForCausalLM", the patch now scans all entries in LOSS_MAPPING and replaces any that still point to the original ForCausalLMLoss with the Unsloth kernel. This means new model architectures with different loss_type names won't silently regress in the future.

On the test side, added a check that every mapping key pointing to ForCausalLMLoss gets patched (not just "ForCausalLM"), a check that unrelated loss types like masked-LM or detection aren't accidentally overwritten, and made sure tests clean up after themselves so patching in one test doesn't bleed into another.

Closes #5441, related to #4188.

…ent OOM errors

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request addresses an OOM issue in models like Qwen3.5 by ensuring that all loss mapping keys aliased to ForCausalLMLoss are correctly patched with the Unsloth kernel. It also introduces regression tests to verify the patching logic and ensure unrelated loss types are not affected. The review feedback suggests improving the exception handling in the patching logic by catching specific exceptions and adding debug logging instead of using a broad, silent pass.

gemini-code-assist · 2026-05-15T11:08:44Z

+    try:
+        import transformers.loss.loss_utils as _lu
+
+        _unsloth_loss = _lu.LOSS_MAPPING.get("ForCausalLM")
+        if _unsloth_loss is not None:
+            _causal_lm_loss_name = "ForCausalLMLoss"
+            for _key, _fn in list(_lu.LOSS_MAPPING.items()):
+                if (
+                    _key != "ForCausalLM"
+                    and getattr(_fn, "__name__", "") == _causal_lm_loss_name


Avoid using broad exception handlers like except Exception:. While logging the error is an improvement over a silent pass, you should catch specific exceptions such as ModuleNotFoundError and AttributeError to avoid suppressing unrelated issues. This aligns with the repository's guidelines on exception handling and optional dependencies.

Suggested change

try:

import transformers.loss.loss_utils as _lu

_unsloth_loss = _lu.LOSS_MAPPING.get("ForCausalLM")

if _unsloth_loss is not None:

_causal_lm_loss_name = "ForCausalLMLoss"

for _key, _fn in list(_lu.LOSS_MAPPING.items()):

if (

_key != "ForCausalLM"

and getattr(_fn, "__name__", "") == _causal_lm_loss_name

try:

import transformers.loss.loss_utils as _lu

_unsloth_loss = _lu.LOSS_MAPPING.get("ForCausalLM")

if _unsloth_loss is not None:

_causal_lm_loss_name = "ForCausalLMLoss"

for _key, _fn in list(_lu.LOSS_MAPPING.items()):

if _key != "ForCausalLM" and getattr(_fn, "__name__", "") == _causal_lm_loss_name:

_lu.LOSS_MAPPING[_key] = _unsloth_loss

except (ModuleNotFoundError, AttributeError) as e:

logger.debug(f"Unsloth: Failed to patch additional loss functions: {e}", exc_info=True)

References

When catching an ImportError for an optional dependency, prefer catching the more specific ModuleNotFoundError and check the module name to avoid suppressing unrelated import errors.

When handling exceptions, avoid broad except Exception: pass clauses. Instead, catch specific exceptions and log them (at least at a debug level) to aid in troubleshooting. If a failure is expected, log the specific exception type and its details.

Replace bare except Exception with the only two compatibility errors we actually care about so genuine bugs in the sweep surface. Drop the redundant _key != "ForCausalLM" guard since the __name__ predicate already excludes the patched entry (UnslothForCausalLMLoss != ForCausalLMLoss).

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 812c514ff8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T07:55:10Z

+        cg_loss = lu.LOSS_MAPPING.get("ForConditionalGeneration")
+        assert cg_loss is unsloth_loss, (


Handle Transformers versions without conditional-generation loss

This assertion fails under supported Transformers releases that do not define LOSS_MAPPING['ForConditionalGeneration'] yet; for example the package constraint in pyproject.toml still allows transformers>=4.51.3, and 4.51.3's loss mapping only has ForCausalLM plus the older task keys. In that environment cg_loss is None, so this new drift test goes red even though there is no alias to patch. Please gate this check on the key being present, or test all existing keys whose original value was ForCausalLMLoss instead of requiring this newer key unconditionally.

Useful? React with 👍 / 👎.

danielhanchen · 2026-05-19T10:57:46Z

Thanks for the PR!

* fix: patch loss functions for Qwen3_5ForConditionalGeneration to prevent OOM errors * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Narrow except scope and simplify LOSS_MAPPING sweep Replace bare except Exception with the only two compatibility errors we actually care about so genuine bugs in the sweep surface. Drop the redundant _key != "ForCausalLM" guard since the __name__ predicate already excludes the patched entry (UnslothForCausalLMLoss != ForCausalLMLoss). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>

fix: patch loss functions for Qwen3_5ForConditionalGeneration to prev…

2f95f7b

…ent OOM errors

rycerzes requested review from danielhanchen and rolandtannous as code owners May 15, 2026 11:06

[pre-commit.ci] auto fixes from pre-commit.com hooks

9e1e5bc

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

This was referenced May 16, 2026

Patch every LOSS_MAPPING key aliased to ForCausalLMLoss unslothai/unsloth-zoo#656

Merged

[Bug] model.loss_function not patched for Qwen3_5ForConditionalGeneration, causes logits.float() OOM on ≤24GB GPU #5441

Closed

danielhanchen and others added 2 commits May 19, 2026 07:52

[pre-commit.ci] auto fixes from pre-commit.com hooks

812c514

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

danielhanchen merged commit 66cfbea into unslothai:main May 19, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix loss function not patched for Qwen3.5 models#5442

Fix loss function not patched for Qwen3.5 models#5442
danielhanchen merged 4 commits into
unslothai:mainfrom
rycerzes:fix/patch-loss-function-qwen3-5-conditional-generation

rycerzes commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Uh oh!

danielhanchen commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		cg_loss = lu.LOSS_MAPPING.get("ForConditionalGeneration")
		assert cg_loss is unsloth_loss, (

Uh oh!

Conversation

rycerzes commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants