RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

mcpaulgeorge · 2024-08-05T09:04:35Z

The server is three A10s(24 G), I didn't add --multigpu.

mcpaulgeorge · 2024-08-05T09:20:28Z

shubhra · 2024-08-13T20:16:21Z

HItting the same issue with --multigpu and even without it

SSshuishui · 2024-09-03T02:33:36Z

Hi, there:
I changed 'LMClass.py' with self.model = AutoModelForCausalLM.from_pretrained(args.model, config=config, device_map='auto',torch_dtype=torch.float16) and self._device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") fixed this problem.
Then change cos, sin = self.rotary_emb(value_states, position_ids=position_ids) query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin) in 'models/int_llama_layer.py'
Finally change cache = {"i": 0, "attention_mask": None} and

class Catcher(nn.Module):
        def __init__(self, module):
            super().__init__()
            self.module = module
            self.is_llama = False
        def forward(self, inp, **kwargs):
            inps[cache["i"]] = inp
            cache["i"] += 1
            # cache["attention_mask"] = kwargs["attention_mask"]
            if self.is_llama:
                cache["position_ids"] = kwargs["position_ids"]
            raise ValueError`

in quantize/omniquant.py
Hope it can be helpful to you.
My 'transformers' is 4.44.2

SSshuishui · 2024-09-11T13:21:27Z

Hi, there: I changed 'LMClass.py' with self.model = AutoModelForCausalLM.from_pretrained(args.model, config=config, device_map='auto',torch_dtype=torch.float16) and self._device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") fixed this problem. Then change cos, sin = self.rotary_emb(value_states, position_ids=position_ids) query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin) in 'models/int_llama_layer.py' Finally change cache = {"i": 0, "attention_mask": None} and
class Catcher(nn.Module):
        def __init__(self, module):
            super().__init__()
            self.module = module
            self.is_llama = False
        def forward(self, inp, **kwargs):
            inps[cache["i"]] = inp
            cache["i"] += 1
            # cache["attention_mask"] = kwargs["attention_mask"]
            if self.is_llama:
                cache["position_ids"] = kwargs["position_ids"]
            raise ValueError` 
in quantize/omniquant.py Hope it can be helpful to you. My 'transformers' is 4.44.2

Not use --multigpu, and change with:

hf_device_map = model.hf_device_map
print(hf_device_map)

for i in range(len(layers)):
    logger.info(f"=== Start quantize layer {i} ===")
    print(f'================={i}==================')
    hf_device = f"cuda:{hf_device_map[f'{layer_name_prefix}.{i}']}"
    layer = layers[i].to(hf_device)
    inps = inps.to(hf_device)
    position_ids = position_ids.to(hf_device)

if don't set # cache["attention_mask"] = kwargs["attention_mask"], has error ValueError: Attention mask should be of size (1, 1, 2048, 2048), but is torch.Size([1, 1, 2048, 2049])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

mcpaulgeorge commented Aug 5, 2024

mcpaulgeorge commented Aug 5, 2024

shubhra commented Aug 13, 2024

SSshuishui commented Sep 3, 2024 •

edited

Loading

SSshuishui commented Sep 11, 2024

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

Comments

mcpaulgeorge commented Aug 5, 2024

mcpaulgeorge commented Aug 5, 2024

shubhra commented Aug 13, 2024

SSshuishui commented Sep 3, 2024 • edited Loading

SSshuishui commented Sep 11, 2024

SSshuishui commented Sep 3, 2024 •

edited

Loading