feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart#22591
Conversation
…logits for gpt2 and bart
|
The documentation is not available anymore as the PR was closed or merged. |
|
Thanks a lot for your PR! Could you apply |
Hi, just did that! |
All good! ✨ |
|
Hi, @kaustubh-s1, does this change will fix model parallel for gpt2? I've just tried but got P.S. my setup is almost same like this, only the following differences def get_parallel_model(model_name):
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map='auto',
torch_dtype=torch.float16,
low_cpu_mem_usage=True
)
#
# setattr(model, 'model_parallel', True)
# setattr(model, 'is_parallelizable', True)
setattr(model, 'gradient_checkpointing', True)
return model |
Hi @innat. It should do that ig. But I do not have a multi gpu setup so can't say for sure. I just followed the steps #22535 to move labels to same device as logits. Theoretically speaking, it should work. |
…logits for gpt2 and bart (huggingface#22591)
What does this PR do?
As suggested in the #22561 moving the labels to the same device as the logits they are compared to for
bartandgpt-2modelsThis action has been referred to from #22535
cc @sgugger could you review this once.