Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memfix #106

Merged
merged 8 commits into from
Feb 3, 2024
Merged

Memfix #106

merged 8 commits into from
Feb 3, 2024

Conversation

surajpaib
Copy link
Collaborator

Addresses #82

Lightning-AI/pytorch-lightning#15656

does not seem to fix the memory leak - memory usage keeps increasing.

Adding the garbage collector keeps the memory usage constant throughout the prediction loop.

For good measure, I have also added code to clear the _predictions in the predict_loop of the trainer that causes accumulation during the batch. I figure the gc already clears this but added it nonetheless

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 2, 2024
@surajpaib surajpaib requested a review from ibro45 February 2, 2024 19:37
@ibro45
Copy link
Collaborator

ibro45 commented Feb 2, 2024

Should we place it in system.on_predict_batch_start() https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#on-predict-batch-start?

Two benefits:

  • In case any callback is trying to access trainer.predict_loop._predictions on the batch end, it will be able to do it coz it's available until the start of the next batch. I'm not 100% sure about this one. If not, we can do it on on_predict_batch_end()
  • Nice to remove it from the _base_step() logic, it's quite abrupt

@surajpaib
Copy link
Collaborator Author

@ibro45 We can call it either on batch start or end, does not make a difference. I am quite certain nothing accesses _predictions unless its on an epoch level

@surajpaib
Copy link
Collaborator Author

Do we want this by default in LightningModule like now?
Or do we want to move it to the writer?

I am thinking of moving it to the writer because this is unexplained behaviour should someone try to use their own writer and try to save predictions at the end of the epoch

@surajpaib
Copy link
Collaborator Author

Would be placed here instead

@ibro45
Copy link
Collaborator

ibro45 commented Feb 2, 2024 via email

@ibro45
Copy link
Collaborator

ibro45 commented Feb 3, 2024

Lgtm

@surajpaib surajpaib merged commit 2a3a451 into main Feb 3, 2024
4 of 7 checks passed
@surajpaib surajpaib deleted the memfix branch February 3, 2024 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants