Skip to content

feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart#22591

Merged
sgugger merged 2 commits into
huggingface:mainfrom
kausmeows:kaus
Apr 5, 2023
Merged

feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart#22591
sgugger merged 2 commits into
huggingface:mainfrom
kausmeows:kaus

Conversation

@kausmeows

Copy link
Copy Markdown
Contributor

What does this PR do?

As suggested in the #22561 moving the labels to the same device as the logits they are compared to for bart and gpt-2 models

This action has been referred to from #22535

lm_logits = self.lm_head(outputs[0])
lm_logits = lm_logits + self.final_logits_bias.to(lm_logits.device)

masked_lm_loss = None
if labels is not None:
    labels = labels.to(lm_logits.device)
    loss_fct = CrossEntropyLoss()
    masked_lm_loss = loss_fct(lm_logits.view(-1, self.config.vocab_size), labels.view(-1))

cc @sgugger could you review this once.

@HuggingFaceDocBuilderDev

HuggingFaceDocBuilderDev commented Apr 5, 2023

Copy link
Copy Markdown

The documentation is not available anymore as the PR was closed or merged.

@sgugger

sgugger commented Apr 5, 2023

Copy link
Copy Markdown
Collaborator

Thanks a lot for your PR! Could you apply make fix-copies so that the models copied from BART or GPT-2 are auto-updated?

@kausmeows

Copy link
Copy Markdown
Contributor Author

Thanks a lot for your PR! Could you apply make fix-copies so that the models copied from BART or GPT-2 are auto-updated?

Hi, just did that!

@sgugger sgugger left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@sgugger sgugger merged commit 1564189 into huggingface:main Apr 5, 2023
@kausmeows

Copy link
Copy Markdown
Contributor Author

Thanks a lot!

All good! ✨

@kausmeows kausmeows deleted the kaus branch April 5, 2023 18:40
@innat

innat commented Apr 5, 2023

Copy link
Copy Markdown

Hi, @kaustubh-s1, does this change will fix model parallel for gpt2? I've just tried but got

 File "/opt/conda/envs/gpt_neox/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

P.S. my setup is almost same like this, only the following differences

def get_parallel_model(model_name):
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map='auto',
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True
    )

    # 
    # setattr(model, 'model_parallel', True)
    # setattr(model, 'is_parallelizable', True)

    setattr(model, 'gradient_checkpointing', True)
    return model

@kausmeows

Copy link
Copy Markdown
Contributor Author

Hi, @kaustubh-s1, does this change will fix model parallel for gpt2? I've just tried but got

 File "/opt/conda/envs/gpt_neox/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

P.S. my setup is almost same like this, only the following differences

def get_parallel_model(model_name):
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map='auto',
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True
    )

    # 
    # setattr(model, 'model_parallel', True)
    # setattr(model, 'is_parallelizable', True)

    setattr(model, 'gradient_checkpointing', True)
    return model

Hi @innat. It should do that ig. But I do not have a multi gpu setup so can't say for sure. I just followed the steps #22535 to move labels to same device as logits. Theoretically speaking, it should work.

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants