Trainer.predict on really large dataset cause CPU out-of-memory #15656

junwang-wish · 2022-11-11T20:18:22Z

Bug description

Trainer.predict(model, datamodule) on sufficiently large data would cause CPU out-of-memory due to the fact that results are appended to a list during predict (this is true even if setting return_predictions=False): https://github.com/Lightning-AI/lightning/blob/4e8cf85b0cd5128adcec3f3ad0f2254f417ae1ee/src/pytorch_lightning/loops/dataloader/prediction_loop.py#L103

What is the correct way of running prediction on a dataset that is orders of magnitude larger than CPU memory?

How to reproduce the bug

# Just always return `None` in `predict_step` and track ur memory usage:
def predict_step(self, batch, batch_idx):
    import objgraph
    objgraph.show_growth(limit=3)
    return None

Error messages and logs


# You will see memory for type list will increment at every prediction step like below
list    11320        +1

Environment


#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 1.10):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

The text was updated successfully, but these errors were encountered:

junwang-wish · 2022-11-11T20:40:22Z

Sorry actually setting return_predictions=False would fix the problem, I read too much into the objgraph.show_growth(limit=3) results.

junwang-wish added the needs triage Waiting to be triaged by maintainers label Nov 11, 2022

junwang-wish closed this as completed Nov 11, 2022

This was referenced Feb 2, 2024

Memfix project-lighter/lighter#106

Merged

CPU-Memory keeps accumulating during trainer.predict #19398

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainer.predict on really large dataset cause CPU out-of-memory #15656

Trainer.predict on really large dataset cause CPU out-of-memory #15656

junwang-wish commented Nov 11, 2022

junwang-wish commented Nov 11, 2022

Trainer.predict on really large dataset cause CPU out-of-memory #15656

Trainer.predict on really large dataset cause CPU out-of-memory #15656

Comments

junwang-wish commented Nov 11, 2022

Bug description

How to reproduce the bug

Error messages and logs

Environment

More info

junwang-wish commented Nov 11, 2022