Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer.predict on really large dataset cause CPU out-of-memory #15656

Closed
junwang-wish opened this issue Nov 11, 2022 · 1 comment
Closed

Trainer.predict on really large dataset cause CPU out-of-memory #15656

junwang-wish opened this issue Nov 11, 2022 · 1 comment
Labels
needs triage Waiting to be triaged by maintainers

Comments

@junwang-wish
Copy link

Bug description

Trainer.predict(model, datamodule) on sufficiently large data would cause CPU out-of-memory due to the fact that results are appended to a list during predict (this is true even if setting return_predictions=False): https://github.com/Lightning-AI/lightning/blob/4e8cf85b0cd5128adcec3f3ad0f2254f417ae1ee/src/pytorch_lightning/loops/dataloader/prediction_loop.py#L103

What is the correct way of running prediction on a dataset that is orders of magnitude larger than CPU memory?

How to reproduce the bug

# Just always return `None` in `predict_step` and track ur memory usage:
def predict_step(self, batch, batch_idx):
    import objgraph
    objgraph.show_growth(limit=3)
    return None

Error messages and logs


# You will see memory for type list will increment at every prediction step like below
list    11320        +1

Environment


#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 1.10):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

@junwang-wish junwang-wish added the needs triage Waiting to be triaged by maintainers label Nov 11, 2022
@junwang-wish
Copy link
Author

Sorry actually setting return_predictions=False would fix the problem, I read too much into the objgraph.show_growth(limit=3) results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Waiting to be triaged by maintainers
Projects
None yet
Development

No branches or pull requests

1 participant