You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: facebookresearch#955
We have users who can't train models on extremely large embeddings because we try to allocate space for that on the GPU.
With this diff, in training, we add a flag which users can set explicitly to keep the embedding layer on CPU even when the model is getting trained on GPUs. This is not default because we need the user to know that there will be a cost associated moving the tensors on and off the GPU.
Note that this only applies during training.
Also note that this does not work in a multi-GPU environment because of the way the weights are synced via NCCL.
Differential Revision: D17114398
fbshipit-source-id: 840f37f77c70089137f2cf23a262dc503e5e2080
0 commit comments