Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Memory Leak #278

Closed
Phoveran opened this issue Nov 5, 2021 · 4 comments
Closed

Potential Memory Leak #278

Phoveran opened this issue Nov 5, 2021 · 4 comments

Comments

@Phoveran
Copy link

Phoveran commented Nov 5, 2021

I'm running exactly examples/vision/anil_fc100.py with

  1. learn2learn 0.1.6 (using pip install learn2learn)
  2. pytorch 1.10
  3. CUDA 11.4
  4. a single RTX 2080Ti

and it crashes after 3 iterations, telling me cuda memory has run out

Iteration 0
Meta Train Error 1.496383372694254
Meta Train Accuracy 0.3362499892245978
Meta Valid Error 1.6015896834433079
Meta Valid Accuracy 0.2937499925028533
Meta Test Error 1.5358335226774216
Meta Test Accuracy 0.3562499899417162


Iteration 1
Meta Train Error 1.4316311068832874
Meta Train Accuracy 0.39999998873099685
Meta Valid Error 1.588501501828432
Meta Valid Accuracy 0.28749999054707587
Meta Test Error 1.45738809928298
Meta Test Accuracy 0.3774999915622175


Iteration 2
Meta Train Error 1.3444917295128107
Meta Train Accuracy 0.47624998819082975
Meta Valid Error 1.5741207413375378
Meta Valid Accuracy 0.28874999145045877
Meta Test Error 1.4722651988267899
Meta Test Accuracy 0.3599999900907278
Traceback (most recent call last):
  File "/home/stan/work/icml2022/test.py", line 207, in <module>
    main()
  File "/home/stan/work/icml2022/test.py", line 179, in main
    evaluation_error, evaluation_accuracy = fast_adapt(batch,
  File "/home/stan/work/icml2022/test.py", line 32, in fast_adapt
    data = features(data)
  File "/home/stan/tool/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stan/tool/miniconda3/lib/python3.9/site-packages/learn2learn/vision/models/cnn4.py", line 247, in forward
    x = super(CNN4Backbone, self).forward(x)
  File "/home/stan/tool/miniconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/stan/tool/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stan/tool/miniconda3/lib/python3.9/site-packages/learn2learn/vision/models/cnn4.py", line 96, in forward
    x = self.relu(x)
  File "/home/stan/tool/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stan/tool/miniconda3/lib/python3.9/site-packages/torch/nn/modules/activation.py", line 98, in forward
    return F.relu(input, inplace=self.inplace)
  File "/home/stan/tool/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 1299, in relu
    result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 10.76 GiB total capacity; 9.25 GiB already allocated; 2.31 MiB free; 9.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON
@Phoveran
Copy link
Author

Phoveran commented Nov 5, 2021

Also I've checked, the memory usage increases linearly with iterations

@nightlessbaron
Copy link
Contributor

I have reproduced the same code on [Colab Notebook], it works fine for me.
Can you try decreasing the meta batch size to 16, and see if it works?

@Phoveran
Copy link
Author

Phoveran commented Nov 5, 2021

I have reproduced the same code on [Colab Notebook], it works fine for me. Can you try decreasing the meta batch size to 16, and see if it works?

Thanks for your timely reply !
I've changed that even to 1, but still not working.
Maybe there's something wrong with my GPU, I'll test this on another machine and see if it works.

@Phoveran Phoveran closed this as completed Nov 7, 2021
@pandeydeep9
Copy link

I am also getting a similar error. I installed learn2learn using "pip install learn2learn". When I try to run maml_miniimagenet.py (from learn2learn/examples/vision/maml_miniimagenet.py ) with a batch size of 2 and shot = 1, I get the same error after 63 iterations. When I change to shot = 5, I get the error after 3 iterations.

Iteration 63
Meta Train Error 2.0417345762252808
Meta Train Accuracy 0.20000000298023224
Meta Valid Error 1.8002310991287231
Meta Valid Accuracy 0.20000000298023224
Traceback (most recent call last):
File "/home/deep/Desktop/IMPLEMENTATION/MyTry/MetaSGD/mini_Temp_Test.py", line 156, in
main()
File "/home/deep/Desktop/IMPLEMENTATION/MyTry/MetaSGD/mini_Temp_Test.py", line 106, in main
evaluation_error.backward()
File "/home/deep/.local/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/deep/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 5.79 GiB total capacity; 3.60 GiB already allocated; 77.56 MiB free; 3.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

However, If I comment out the meta-validation loss part, (line 114-112 in this script) then I don't get the memory leak problem. I wonder how the issue can be solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants