feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart by kausmeows · Pull Request #22591 · huggingface/transformers

kausmeows · 2023-04-05T15:55:44Z

What does this PR do?

As suggested in the #22561 moving the labels to the same device as the logits they are compared to for bart and gpt-2 models

This action has been referred to from #22535

lm_logits = self.lm_head(outputs[0])
lm_logits = lm_logits + self.final_logits_bias.to(lm_logits.device)

masked_lm_loss = None
if labels is not None:
    labels = labels.to(lm_logits.device)
    loss_fct = CrossEntropyLoss()
    masked_lm_loss = loss_fct(lm_logits.view(-1, self.config.vocab_size), labels.view(-1))

cc @sgugger could you review this once.

…logits for gpt2 and bart

HuggingFaceDocBuilderDev · 2023-04-05T16:10:34Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2023-04-05T17:26:15Z

Thanks a lot for your PR! Could you apply make fix-copies so that the models copied from BART or GPT-2 are auto-updated?

kausmeows · 2023-04-05T18:05:54Z

Thanks a lot for your PR! Could you apply make fix-copies so that the models copied from BART or GPT-2 are auto-updated?

Hi, just did that!

sgugger

Thanks a lot!

kausmeows · 2023-04-05T18:38:10Z

Thanks a lot!

All good! ✨

innat · 2023-04-05T19:33:54Z

Hi, @kaustubh-s1, does this change will fix model parallel for gpt2? I've just tried but got

 File "/opt/conda/envs/gpt_neox/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

P.S. my setup is almost same like this, only the following differences

def get_parallel_model(model_name):
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map='auto',
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True
    )

    # 
    # setattr(model, 'model_parallel', True)
    # setattr(model, 'is_parallelizable', True)

    setattr(model, 'gradient_checkpointing', True)
    return model

kausmeows · 2023-04-06T06:54:59Z

Hi, @kaustubh-s1, does this change will fix model parallel for gpt2? I've just tried but got

 File "/opt/conda/envs/gpt_neox/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

P.S. my setup is almost same like this, only the following differences

def get_parallel_model(model_name):
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map='auto',
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True
    )

    # 
    # setattr(model, 'model_parallel', True)
    # setattr(model, 'is_parallelizable', True)

    setattr(model, 'gradient_checkpointing', True)
    return model

Hi @innat. It should do that ig. But I do not have a multi gpu setup so can't say for sure. I just followed the steps #22535 to move labels to same device as logits. Theoretically speaking, it should work.

…logits for gpt2 and bart (huggingface#22591)

feat(model parallelism): moving the labels to the same device as the …

c0d5df2

…logits for gpt2 and bart

apply make fix-copies

6efbcd5

sgugger approved these changes Apr 5, 2023

View reviewed changes

sgugger merged commit 1564189 into huggingface:main Apr 5, 2023

kausmeows deleted the kaus branch April 5, 2023 18:40

xssChauhan mentioned this pull request Apr 5, 2023

Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 #22596

Merged

innat mentioned this pull request Apr 7, 2023

Make all Transformer models compatible with model parallelism #22561

Closed

41 tasks

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

feat(model parallelism): moving the labels to the same device as the …

ef73be7

…logits for gpt2 and bart (huggingface#22591)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart#22591

feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart#22591
sgugger merged 2 commits into
huggingface:mainfrom
kausmeows:kaus

kausmeows commented Apr 5, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2023 •

edited

Loading

Uh oh!

sgugger commented Apr 5, 2023

Uh oh!

kausmeows commented Apr 5, 2023

Uh oh!

sgugger left a comment

Uh oh!

kausmeows commented Apr 5, 2023

Uh oh!

innat commented Apr 5, 2023

Uh oh!

kausmeows commented Apr 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kausmeows commented Apr 5, 2023

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Apr 5, 2023

Uh oh!

kausmeows commented Apr 5, 2023

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

kausmeows commented Apr 5, 2023

Uh oh!

innat commented Apr 5, 2023

Uh oh!

kausmeows commented Apr 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuggingFaceDocBuilderDev commented Apr 5, 2023 •

edited

Loading