Fix: Move RoPE tensors to right devices by Datta0 · Pull Request #2862 · unslothai/unsloth

Datta0 · 2025-07-02T15:36:54Z

Multi GPU inference fails as some tensors are explicitly set to GPU 0 and sometimes RoPE values end up on the wrong GPU. This PR fixes that.

danielhanchen · 2025-07-03T04:43:25Z

Ok your solution is fine, but moving to CPU will make things slower. We might have to replicate cos / sin on each GPU and make a tuple indexer

danielhanchen · 2025-07-03T04:44:45Z

Ie:

cos = coses[X.device.index]
sin = sines[X.device.index]

or something

Datta0 · 2025-07-03T08:15:17Z

Ok your solution is fine, but moving to CPU will make things slower. We might have to replicate cos / sin on each GPU and make a tuple indexer

I'm only moving position_ids to CPU cuz they are indices and can be on CPU. We can move them to the right GPU as well.

And for cos, sin, we're moving them directly between GPUs, no explicit CPU calls by us.

The reason why I didn't replicate across GPUs was I wanted to be lean on the memory

danielhanchen · 2025-07-03T22:52:44Z

I think it's best to use a tuple and not a dict

unsloth/models/cohere.py

danielhanchen · 2025-07-04T09:54:36Z

unsloth/models/cohere.py

    next_decoder_cache = []
    for idx, decoder_layer in enumerate(self.model.layers):
+        decoder_device = decoder_layer.self_attn.q_proj.weight.device
+        hidden_states, out_weight, position_ids = move_to_device(


wait wait we shouldnt need to move these tensors right?

we're mostly using PP. Which means some layers are on GPU0 and some on GPU1
The inputs and/or these tensors don't seem to move to the 2nd GPU automatically. It has to be done explicitly.

Also we do have a check inside the move_to_device to ensure that we aren't moving unnecessarily.

danielhanchen · 2025-07-10T04:52:58Z

unsloth/models/llama.py

        This means we can pass in a row of Q, but we need to
        remember K and V, which are called the KV cache.
    """
+    if position_ids is not None:


will be handled separately in another PR

Datta0 · 2025-07-11T04:56:13Z

Closing this as this is handled in #2919

Datta0 added 2 commits July 2, 2025 13:22

Move tensors to right devices

2b95cc7

fix multi gpu for non mistral models

2ca7875

Datta0 requested a review from danielhanchen July 2, 2025 15:45

Datta0 added 2 commits July 3, 2025 15:57

multi GPU RoPE for gemma2

03c57c1

Finish up multi GPU inference

a937baa

Make multiGPU rope a list

44955c3

danielhanchen reviewed Jul 4, 2025

View reviewed changes

unsloth/models/cohere.py Outdated Show resolved Hide resolved

danielhanchen reviewed Jul 4, 2025

View reviewed changes

Remove unnecessary transfer to CPU

e5158da

danielhanchen reviewed Jul 10, 2025

View reviewed changes

Datta0 added 2 commits July 10, 2025 05:13

Remove unnecessary move to CPU

de38ece

Donot move inputs to device yet

324b392

will be handled separately in another PR

Datta0 force-pushed the multigpu_inference branch from 606c451 to 324b392 Compare July 10, 2025 05:18

Datta0 changed the title ~~Fix: Move tensors to right devices~~ Fix: Move RoPE tensors to right devices Jul 10, 2025

Datta0 mentioned this pull request Jul 10, 2025

Move inputs to right devices. #2919

Merged

Datta0 closed this Jul 11, 2025

Datta0 deleted the multigpu_inference branch July 26, 2025 04:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Move RoPE tensors to right devices#2862

Fix: Move RoPE tensors to right devices#2862
Datta0 wants to merge 8 commits intounslothai:mainfrom
Datta0:multigpu_inference

Datta0 commented Jul 2, 2025

Uh oh!

danielhanchen commented Jul 3, 2025

Uh oh!

danielhanchen commented Jul 3, 2025 •

edited

Loading

Uh oh!

Datta0 commented Jul 3, 2025 •

edited

Loading

Uh oh!

danielhanchen commented Jul 3, 2025

Uh oh!

Uh oh!

danielhanchen Jul 4, 2025

Uh oh!

Datta0 Jul 4, 2025

Uh oh!

Datta0 Jul 4, 2025

Uh oh!

danielhanchen Jul 10, 2025

Uh oh!

Datta0 commented Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Datta0 commented Jul 2, 2025

Uh oh!

danielhanchen commented Jul 3, 2025

Uh oh!

danielhanchen commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Datta0 commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielhanchen commented Jul 3, 2025

Uh oh!

Uh oh!

danielhanchen Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

danielhanchen Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 commented Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielhanchen commented Jul 3, 2025 •

edited

Loading

Datta0 commented Jul 3, 2025 •

edited

Loading