Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling trainer.test() when using fast_dev_run throws confusing error #6615

Closed
ashleve opened this issue Mar 21, 2021 · 4 comments · Fixed by #6667
Closed

Calling trainer.test() when using fast_dev_run throws confusing error #6615

ashleve opened this issue Mar 21, 2021 · 4 comments · Fixed by #6667
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@ashleve
Copy link
Contributor

ashleve commented Mar 21, 2021

🐛 Bug

Calling trainer.test() when using fast_dev_run throws confusing error:

Traceback (most recent call last):                                                                                                                                                                 
  File "main.py", line 89, in <module>
    trainer.test(test_dataloaders=test)
  File "/home/ash/miniconda3/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 916, in test
    results = self.__test_using_best_weights(ckpt_path, test_dataloaders)
  File "/home/ash/miniconda3/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 927, in __test_using_best_weights
    raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: ckpt_path is "best", but ModelCheckpoint is not configured to save the best model.

Please reproduce using the BoringModel

from pytorch_lightning import LightningModule
import torch
from torch.utils.data import DataLoader, Dataset
import pytorch_lightning as pl


class RandomDataset(Dataset):
    def __init__(self, size, num_samples):
        self.len = num_samples
        self.data = torch.randn(num_samples, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def loss(self, batch, prediction):
        return torch.nn.functional.mse_loss(prediction, torch.ones_like(prediction))

    def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        return {"loss": loss}

    def training_step_end(self, training_step_outputs):
        return training_step_outputs

    def training_epoch_end(self, outputs) -> None:
        torch.stack([x["loss"] for x in outputs]).mean()

    def validation_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        return {"x": loss}

    def validation_epoch_end(self, outputs) -> None:
        torch.stack([x['x'] for x in outputs]).mean()

    def test_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.log('fake_test_acc', loss)
        return {"y": loss}

    def test_epoch_end(self, outputs) -> None:
        torch.stack([x["y"] for x in outputs]).mean()

    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.layer.parameters(), lr=0.1)
        lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
        return [optimizer], [lr_scheduler]


num_samples = 10000

train = RandomDataset(32, num_samples)
train = DataLoader(train, batch_size=32)

val = RandomDataset(32, num_samples)
val = DataLoader(val, batch_size=32)

test = RandomDataset(32, num_samples)
test = DataLoader(test, batch_size=32)

model = BoringModel()

trainer = pl.Trainer(
    fast_dev_run=True
)

trainer.fit(model, train, val)

trainer.test(test_dataloaders=test)
@ashleve ashleve added bug Something isn't working help wanted Open to be worked on labels Mar 21, 2021
@tchaton
Copy link
Contributor

tchaton commented Mar 21, 2021

Dear @hobogalaxy,

This happens because trainer.fit didn't generate a checkpoint and you don't provide model to trainer.test.
Therefore, it doesn't find a model to load.
This is expected behaviour but I agree the MisconfigurationException could be more explicit on what's wrong.

Best,
T.C

@ashleve
Copy link
Contributor Author

ashleve commented Mar 21, 2021

@tchaton Thank you for explanation. It is confusing though and it probably happens to many users. I think lightning should detect the case when fast_dev_run is set to True and either:

  1. Throw python warning instead of exception and simply not execute testing
  2. Raise exception that informs explicitly you shouln't call trainer.test() when using fast_dev_run

@Borda
Copy link
Member

Borda commented Mar 23, 2021

@hobogalaxy mind send a PR with one of your suggestions?

@carmocca
Copy link
Contributor

I think I prefer (2) as it is more explicit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants