Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightningDeprecationWarning: DataModule.setup has already been called #9943

Closed
Renthal opened this issue Oct 15, 2021 · 22 comments
Closed

LightningDeprecationWarning: DataModule.setup has already been called #9943

Renthal opened this issue Oct 15, 2021 · 22 comments
Labels
bug Something isn't working deprecation Includes a deprecation
Milestone

Comments

@Renthal
Copy link

Renthal commented Oct 15, 2021

As of v1.4 properties sch as has_setup_fit have been deprecated in DataModule and are set to be removed in v1.6.
However, the docs on latest still show "If you need information from the dataset to build your model, then run prepare_data() and setup() manually"
and the code:

dm = MNISTDataModule()
dm.prepare_data()
dm.setup(stage="fit")

model = Model(num_classes=dm.num_classes, width=dm.width, vocab=dm.vocab)
trainer.fit(model, dm)

dm.setup(stage="test")
trainer.test(datamodule=dm)

This leads to the following warning being raised:

LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup.

What is the recommended way to do this without ending up calling setup() twice as v1.6?

@Programmer-RD-AI
Copy link
Contributor

hi,
Can you send the cod that you are using,

I will check the warning

With best Regars,
Ranuga

@Programmer-RD-AI
Copy link
Contributor

hi can you send MNISTDataModule()

with best regards,
Ranga

@Renthal
Copy link
Author

Renthal commented Oct 15, 2021

This is literally the code shown in the Docs (see link above). The module is described above.
Nevertheless, here is a MWE to reproduce (but then again, is straight out of the docs):

import os

import torch
from pytorch_lightning import LightningModule, Trainer, LightningDataModule
from torch.utils.data import DataLoader, Dataset


class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class RandomDataModule(LightningDataModule):

    def setup(self, stage: str = None, **kwargs):
        if stage == 'fit' or stage is None:
            self.dataset_size = 32

        if stage == 'test' or stage is None:
            pass

    def train_dataloader(self):
        return DataLoader(RandomDataset(self.dataset_size, 64), batch_size=2)

    def test_dataloader(self):
        return DataLoader(RandomDataset(self.dataset_size, 64), batch_size=2)


class BoringModel(LightningModule):

    def __init__(self, hidden_neurons: int):
        super().__init__()
        self.layer = torch.nn.Linear(hidden_neurons, 2)
        self.save_hyperparameters()

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def run():
    dm = RandomDataModule()
    dm.prepare_data()
    dm.setup(stage="fit")

    model = BoringModel(hidden_neurons=dm.dataset_size)

    trainer = Trainer(
        default_root_dir=os.getcwd(),
        limit_train_batches=10,
        limit_val_batches=10,
        num_sanity_val_steps=0,
        max_epochs=2,
    )
    trainer.fit(model, datamodule=dm)

    dm.setup(stage="test")
    trainer.test(model, datamodule=dm)

if __name__ == '__main__':
    run()

@Programmer-RD-AI
Copy link
Contributor

ok thank you,

With best regards,
Ranuga

@Renthal
Copy link
Author

Renthal commented Oct 15, 2021

Yes this MWE gets those too, but before those are displayed there is

LightningDeprecationWarning: DataModule.prepare_data has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.prepare_data.

LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup.

What is your point?

@Programmer-RD-AI
Copy link
Contributor

Yes this MWE gets those too, but before those are displayed there is

LightningDeprecationWarning: DataModule.prepare_data has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.prepare_data.

LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup.

What is your point?

I don't understand what do you mean?

With best regards,
Ranuga

@Renthal
Copy link
Author

Renthal commented Oct 15, 2021

When I posted that there was another message with some warning printed which has no been deleted. I was referring to that. If it was posted by mistake ignore my last message please.

@Programmer-RD-AI
Copy link
Contributor

When I posted that there was another message with some warning printed which has no been deleted. I was referring to that. If it was posted by mistake ignore my last message please.

oh ok, I can understand,

I just fixed this warning for now I will check the other warnings later.

sorry for the misunderstanding.

With best regards,
Ranuga

@carmocca carmocca added this to the v1.4.x milestone Oct 15, 2021
@carmocca carmocca added bug Something isn't working deprecation Includes a deprecation labels Oct 15, 2021
@carmocca
Copy link
Contributor

Related to #9939

@Programmer-RD-AI
Copy link
Contributor

does this pull request help with the error?
#9945

With best regards,
Ranuga

@carmocca
Copy link
Contributor

@Programmer-RD-AI Your PR is not updated on master, additionally, I believe the deprecation only appears in the bug-fix branch so the fix needs to be applied to it directly, not to master.

The branch is https://github.com/PyTorchLightning/pytorch-lightning/tree/release/1.4.x

@Programmer-RD-AI
Copy link
Contributor

hi I create a new pull request #9953

with best regards,

@Programmer-RD-AI
Copy link
Contributor

hi #9970 is the new PR

@awaelchli awaelchli modified the milestones: v1.4.x, 1.5.x Nov 3, 2021
@carmocca
Copy link
Contributor

carmocca commented Mar 1, 2022

Hi! The recommendation here is that prepare_data or setup check whether they've been called already before running expensive work, for example, checking whether a directory already exists before downloading inside prepare_data.

You could also set and check your own self.has_setup_fit attribute to know whether it has been called already. For example:

def setup(self, stage=None):
    if not self.has_setup_fit and stage == "fit":
         expensive_work()
         self.has_setup_fit = True

Hope that helps! Feel free to ask any further questions.

@carmocca carmocca closed this as completed Mar 1, 2022
@Renthal
Copy link
Author

Renthal commented Mar 1, 2022

Hasn't has_setup_fit been deprecated from v1.4 ?

@carmocca
Copy link
Contributor

carmocca commented Mar 1, 2022

What was deprecated is relying on the library setting and checking has_setup_fit automatically because some users needed to run it twice.

However, you can still replicate the behaviour as I described in my previous comment. Although you might need to change the variable name to avoid the collision

@chanshing
Copy link

When I use the solution suggested by @carmocca I get:

LightningDeprecationWarning: DataModule property `has_setup_fit` was deprecated in v1.4 and will be removed in v1.6.

I'm using 1.5.10.

@ananthsub
Copy link
Contributor

ananthsub commented Mar 9, 2022

@chanshing you can do this within your data module to ensure setup is only called once per stage by the Trainer.

def __init__(self, ...) -> None:
    self._already_called: Dict[str, bool] = {}
    for stage in ("fit", "validate", "test", "predict"):
         self._already_called[stage] = False

def setup(self, stage: Optional[str] = None) -> None:
    if stage and self._already_called[stage]:
         return
    # do your logic here
    self._already_called[stage] = True

@mahieyin-rahmun
Copy link

I also ran into this issue since my model depends on the dimensions of the data to be initialized and I followed the documentation but ended up with the aforementioned warning. Seems counter-intuitive to deprecate this property when the workaround is literally doing the same thing but with a different name

@carmocca
Copy link
Contributor

We propose a different name to avoid the name collision as it would trigger the same deprecation warning you are trying to hide.

Once the deprecation path is removed, you will be able to use has_setup_fit if that's your preference going forward.

@Renthal
Copy link
Author

Renthal commented Mar 15, 2022

I think the main point was why is it being deprecated if then the expectations are to do the exact same thing ourselves?

@ananthsub
Copy link
Contributor

I think the main point was why is it being deprecated if then the expectations are to do the exact same thing ourselves?

See the issue which deprecated these: #7301
These properties are being deprecated in order to allow users who do need to call these methods every time fit / validate/ etc are called. Before, this was not possible, as these were hardcoded to be called once on the Trainer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment