Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default LightningDataModule.__init__ isn't called #11225

Closed
rohitgr7 opened this issue Dec 22, 2021 · 4 comments
Closed

Default LightningDataModule.__init__ isn't called #11225

rohitgr7 opened this issue Dec 22, 2021 · 4 comments
Labels
bug Something isn't working lightningdatamodule pl.LightningDataModule

Comments

@rohitgr7
Copy link
Contributor

rohitgr7 commented Dec 22, 2021

🐛 Bug

To Reproduce

Default LightningDataModule init call was added here #9039 but was removed here #10350. Not sure if it was by mistake or intentional but the following code isn't working on master but working fine on the latest patch release. If it was intentional then I'd suggest we should somehow enforce users to always call super().__init__() in their LightningDataModule init methods because the following error isn't user-friendly.

Code:

import os

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, Trainer, LightningDataModule


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len

class BoringData(LightningDataModule):
    def __init__(self):
        # super().__init__()
        print('DataModule init')
#         
    def test_dataloader(self):
        return DataLoader(RandomDataset(32, 100), batch_size=16)
    

class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)
    
    def test_step(self, batch, batch_idx):
        logits = self.layer(batch)
    
model = BoringModel()
dm = BoringData()

trainer = Trainer(fast_dev_run=True)
trainer.test(model, datamodule=dm)

Error:

AttributeError: 'BoringData' object has no attribute 'prepare_data_per_node'

Expected behavior

Environment

  • PyTorch Lightning Version (e.g., 1.5.0): master
  • PyTorch Version (e.g., 1.10): 1.10
  • Python version (e.g., 3.9): 3.9
  • OS (e.g., Linux): macos
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

cc @carmocca @awaelchli @Borda @ananthsub @ninginthecloud @jjenniferdai

@rohitgr7 rohitgr7 added bug Something isn't working lightningdatamodule pl.LightningDataModule labels Dec 22, 2021
@carmocca
Copy link
Contributor

Forgetting to call super().__init__() is a classic bug in all Python codebases of the world. I don't think it is our job to improve the UX of the language itself. If Python devs chose not to raise a warning or anything like that, it's probably for a reason.

Also, we don't do the same for the LightningModule

@rohitgr7
Copy link
Contributor Author

rohitgr7 commented Dec 22, 2021

but for the lightning module I guess nn.Module takes care of it by throwing the following error which actually makes sense.

AttributeError: cannot assign module before Module.__init__() call

also might be a problem with using dataclass because they don't have any generic __init__ method.
we have a failing example in our own repo due to this: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/loop_examples/kfold.py

@awaelchli
Copy link
Contributor

awaelchli commented Dec 22, 2021

I agree with @carmocca

@rohitgr7 When converting the module to a dataclass, one has to add:

    def __post_init__(self):
        super().__init__() 

The Python docs for this: https://docs.python.org/3/library/dataclasses.html#post-init-processing
Dataclass does not call super().init, it is left to the user.

@rohitgr7
Copy link
Contributor Author

yes, I understand that's the responsibility of the user. But my only request was to provide a meaningful error in such a case which I guess is not possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lightningdatamodule pl.LightningDataModule
Projects
None yet
Development

No branches or pull requests

3 participants