-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve moving data / model to GPU using torchtext #1245
Comments
@mateuszpieniak which version of lightning are you actually using? |
@Borda Version == 0.6.0, but I see that there is some new release ^^ Let me check it for the new version.
moving the model to GPU and configures AMP straightaway. |
@Borda Well, for version == 0.7.1 I have a slightly different issue. As I mentioned before, I build a vocabulary using the training dataset in
It creates a
because of running sanity check for the validation set before training happens (or to be precise before I thought that I could disable this using |
@Borda I am still not able to get torchtext iterators to work with pytorch lightning even on a single GPU as @mateuszpieniak mentioned in his first Case above and in #226 The (very bad) workaround which I have been using so far is to manually move the data to a GPU while creating the iterator (by passing the device to the torchtext iterator). This prevents me from utilizing multiple GPUs and also unnecessarily loads all the data on the GPU in the beginning. Has there been any progress on this issue? Thanks |
Sorry, don't know torchtext well, but if I understand correctly, the problem is that the internal method that moves the data to gpu checks for some known types (tuple, list, dict) but when passing a custom batch object, it does not know how to do that. My pitch: def __transfer_data_to_device(...)
batch = model.transfer_batch_to_device(batch, device) # call the hook
if batch is not None: # for example, it returns None if user did not override it
return batch
# otherwise continue with built in detection of tuples etc. This allows anyone to implement their own transfers for custom objects. |
@awaelchli You understood the case 1 correctly. Well, it should work 😃 I wasn't aware of the hook concept in PyTorchLightning. I'll let you know whether it works. Thanks! |
Case 1) @awaelchli I ended up with creating my custom Trainer and overriding a private method:
Case 2) I just build dictionary in the If we don't plan to support torchtext, then then I think we can consider this issue closed. |
Case 1: that certainly works. If we introduced a hook for that, then the user of PL would not need to modify the source code of Trainer. It is not meant to be modified. Case 2: If you want a tensor in your model be moved with the model (when .to(..) is called on the model) then register it as a buffer: |
I would not close it yet. @PyTorchLightning/core-contributors should torchtext be supported directly in PL or do we provide a hook as I suggested, or both? |
good ideas everyone! |
@awaelchli i would prefer a hook since that will be generally useful and will likely reduce the long-term maintenance burden |
@mateuszpieniak the model hook I added should make it easier to work with the Batch object in torchtext, but I have not actually tested it yet in a full example. |
@awaelchli Super simple classification example with a bag of embeddings. It should work both on GPU & CPU. https://gist.github.com/mateuszpieniak/f290b3a727db7e94b9da0bd3bd2e33c1 |
great! i can start with that! |
nice, @mateuszpieniak or if you want to take is and @awaelchli may assist you :] |
@Borda Sure, I will do this over the weekend unless @awaelchli already started to work on this :] |
Not yet started, and I don't mind you taking it if you are interested. |
Great discussion. |
? of course it's possible... this is at the core of lightning... If you specifically mean for torchtext, then it sounds like there's some edge case it introduces that we are working on. |
@mateuszpieniak we're preparing 0.8.2 now. Want to submit this PR? |
Sorry, indeed I meant multi-GPU training with torchtext. |
@williamFalcon @ceceu Sorry for the delay. I've just landed a PR, so @awaelchli please take a look. @ceceu I don't think it works for multi GPU due to the different issue #2350 . |
Thanks @mateuszpieniak, But it’s sad to hear that. |
we might be able to override / extend the scatter and gather in DataParallel to make use of the transfer data hook, but not sure if that's safe... I havent had the time to look into that. I agree the hook right now is not very useful :( |
@mateuszpieniak maybe it was a bit premature to add an extra implementation to move torchtext.Batch data to the device. It seems they are dropping the Batch class from torchtext, it is outdated. |
@awaelchli Perhaps it was. The actual need for |
🚀 Feature
Improve moving data and model to GPU if torchtext is used.
Motivation
Case 1:
Batch
object generated bytorchtext.data.Iterator
doesn't follow the rules described here https://github.com/PyTorchLightning/pytorch-lightning/blob/45d671a4a81788b9d97fd6b47763816926e58e95/pytorch_lightning/trainer/distrib_parts.py#L420As the result data is not moved to GPU.
torchtext.data.Iterator
is returned by methodtrain_dataloader
. Take in mind thattorchtext.data.Iterator
has a device argument that is not properly utilized by pytorch-ligthning.Partially reported here #226
Case 2
Using torchtext you can read pre-trained embeddings and create
nn.Embedding
object as followsnn.Embedding
is clearly dependent onself.text_field.vocab
and this is in turn dependent ondataset
that is used bytrain_dataloader
. Currently any part of the model that is not created fully in__init__
of theptl.LigthningModule
is not moved to the GPU. It requires still to have a global variable that determines a device i.e.DEVICE
. It makesTrainer(n_gpus=...)
useless.Pitch
I would like not to worry about moving data to GPU using torchtext combined with pytorch-lightning.
The text was updated successfully, but these errors were encountered: