-
Notifications
You must be signed in to change notification settings - Fork 33.6k
[Trainer] Force is_model_parallel when model is loaded in multiple GPUs using accelerate
#22532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
1ccad27
316a1ec
8a0ddb9
e394c09
d98c4c1
5eb72b4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -370,6 +370,19 @@ def __init__( | |
| else: | ||
| self.is_model_parallel = False | ||
|
|
||
| if ( | ||
| getattr(model, "hf_device_map", None) is not None | ||
| and len(set(model.hf_device_map.values())) > 1 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe actually check the number of GPUs, cause this could be one GPU and CPU here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense! Fixed in 5eb72b4
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Multi-device placement should only be on GPUs for naive pipelining to work, right? Offloading to CPU/disk won't work, isn't it the case?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think offloading to CPU/disk won't work yes, I am also unsure if CPU/disk offload training works out of the box with |
||
| and not self.is_model_parallel | ||
| ): | ||
| self.is_model_parallel = True | ||
|
|
||
| # warn users | ||
| logger.warning( | ||
|
younesbelkada marked this conversation as resolved.
Outdated
|
||
| "You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set", | ||
| " to `True` to avoid any unexpected behavior such as device placement mismatching.", | ||
| ) | ||
|
|
||
| # At this stage the model is already loaded | ||
| if getattr(model, "is_loaded_in_8bit", False): | ||
| if getattr(model, "_is_int8_training_enabled", False): | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.