cp dataloader #3626
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
| process_index = process_index // submesh_tp_size | ||
| num_processes = submesh_fsdp_size * submesh_dp_size | ||
|
|
||
| if cp: | ||
| process_index = 0 | ||
| num_processes = 1 | ||
|
|
There was a problem hiding this comment.
Does only 1 process break n-d parallel? Maybe something like?
| process_index = process_index // submesh_tp_size | |
| num_processes = submesh_fsdp_size * submesh_dp_size | |
| if cp: | |
| process_index = 0 | |
| num_processes = 1 | |
| process_index = process_index // (submesh_tp_size * submesh_cp_size) | |
| num_processes = submesh_fsdp_size * submesh_dp_size // (submesh_tp_size * submesh_cp_size) | |
| if cp: | |
| process_index = 0 | |
| num_processes = 1 | |
There was a problem hiding this comment.
indeed we will have something like that. I just opened this PR to not forget about this but we will upstream the changes to main in another pr when n-d parallelism pr will be finished.
|
this should cover this PR #3682 |
What does this PR do?
To try CP support for dataloader. Make sure to set
dispatch_batchesto False andsplit_batchesto False in accelerate configcc @qgallouedec