[FIX] Move loss and n_items to logits device in fast_cross_entropy_loss loss for multi-GPU support by nole69 · Pull Request #4063 · unslothai/unsloth

nole69 · 2026-02-15T08:00:33Z

Continuation of @devchilll 's PR #4059 . PR 4059 resolves the RuntimeError at the masked_fill_ call in the chunked cross-entropy forward path but similar error continues in return loss.sum() / n_items call downstream.

Fix: explicitly move loss and n_items to the same device as logits at the end of fast_cross_entropy_loss

Completes final fix of #4041

gemini-code-assist · 2026-02-15T08:00:45Z

Summary of Changes

Hello @nole69, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical multi-GPU runtime error in the fast_cross_entropy_loss function. By explicitly moving the calculated loss and n_items to the correct device, it resolves an issue where these tensors could reside on different devices, leading to failures during the final summation and division. This enhances the robustness of the loss calculation for distributed training setups.

Highlights

Multi-GPU Support: Ensured loss and n_items tensors are explicitly moved to the same device as logits to prevent runtime errors in multi-GPU environments, completing a previous fix for issue [Bug] RuntimeError: expected self and mask to be on the same device, but got mask on cuda:0 and self on cuda:1 #4041.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth/kernels/cross_entropy_loss.py
- Captured the device of the logits tensor.
- Explicitly moved the loss and n_items tensors to the logits device before returning the final calculated loss.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request correctly identifies and addresses a device mismatch issue in multi-GPU environments when calculating the final cross-entropy loss. However, the current implementation will crash if n_items is passed as a Python integer (which occurs in recent Transformers versions). Additionally, moving loss to the device is redundant as it is already allocated on the correct device within the kernel. The suggested change ensures the code is robust for both tensor and scalar inputs.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74a349e12f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-15T08:03:03Z

    if n_items is None:
        n_items = torch.count_nonzero(labels != -100)
+    loss = loss.to(device)
+    n_items = n_items.to(device)


Handle scalar n_items before device transfer

fast_cross_entropy_loss now unconditionally calls n_items.to(device), but callers pass n_items straight from kwargs (unsloth/models/llama.py:1507-1515, unsloth/models/mistral.py:366-373) and that value can be a plain Python scalar from trainer plumbing; in that case this line raises AttributeError: 'int' object has no attribute 'to' and training fails before computing loss. This is a regression from the previous behavior, which accepted numeric n_items values without requiring tensor methods.

Useful? React with 👍 / 👎.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

danielhanchen · 2026-02-15T09:09:05Z

Thanks

@gemini-code-assist

…ss loss for multi-GPU support (unslothai#4063) * bug fix for multi-GPU * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

bug fix for multi-GPU

74a349e

nole69 requested a review from danielhanchen as a code owner February 15, 2026 08:00

gemini-code-assist Bot reviewed Feb 15, 2026

View reviewed changes

Comment thread unsloth/kernels/cross_entropy_loss.py Outdated

chatgpt-codex-connector Bot reviewed Feb 15, 2026

View reviewed changes

Apply suggestion from @gemini-code-assist[bot]

5f68761

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

danielhanchen merged commit 56c8f96 into unslothai:main Feb 15, 2026
1 check passed

marksverdhei mentioned this pull request Mar 18, 2026

feat: Multi-GPU QLoRA with block-boundary model sharding heiervang-technologies/ht-unsloth#1

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FIX] Move loss and n_items to logits device in fast_cross_entropy_loss loss for multi-GPU support#4063

[FIX] Move loss and n_items to logits device in fast_cross_entropy_loss loss for multi-GPU support#4063
danielhanchen merged 2 commits into
unslothai:mainfrom
nole69:fix/cross-entropy-device-mismatch-contd

nole69 commented Feb 15, 2026

Uh oh!

gemini-code-assist Bot commented Feb 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 15, 2026

Uh oh!

danielhanchen commented Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nole69 commented Feb 15, 2026

Uh oh!

gemini-code-assist Bot commented Feb 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants