Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer flag overfit_batches does not overwrite train dataloaders shuffle flag #2600

Closed
p-wein opened this issue Jul 13, 2020 · 3 comments · Fixed by #3501
Closed

Trainer flag overfit_batches does not overwrite train dataloaders shuffle flag #2600

p-wein opened this issue Jul 13, 2020 · 3 comments · Fixed by #3501
Labels
bug Something isn't working help wanted Open to be worked on won't fix This will not be worked on

Comments

@p-wein
Copy link
Contributor

p-wein commented Jul 13, 2020

🐛 Bug

Setting the trainer flag overfit_batches (e.g. =10) does not overwrite the shuffle flag set in the training dataloader, even though the warning reads:
UserWarning: You requested to overfit but enabled training dataloader shuffling. We are turning it off for you.

To Reproduce

Steps to reproduce the behavior:

  1. Create lightning module with method train_dataloader with flag shuffle=True:
   def train_dataloader(self) -> loading.DataLoader:
        dataset = ProstateX(train=True)
        batch_transforms, gpu_transforms, sample_transforms = self.get_transformations()
        dataloader = loading.DataLoader(dataset,
                                        batch_size=self.hparams.tr_batch_size,
                                        batch_transforms=batch_transforms,
                                        shuffle=True,
                                        sample_transforms= sample_transforms,
                                        gpu_transforms=gpu_transforms,
                                        pseudo_batch_dim=True,
                                        num_workers=self.hparams.num_workers)
        return dataloader 

( I use a rising dataloader, bug should also occur with pytorch dataloaders though)

  1. Create main.py with:
mymodel = model.Model3D(cfg)
trainer = pl.Trainer(gpus=1, precision=16, overfit_batches=10)
trainer.fit(mymodel)`
  1. Run main.py
  2. Find out that your model does not converge.
  3. set shuffle=False when creating Dataloader in train_dataloader
  4. See that your model converges after some epochs.

(Or log the samples loaded by the dataloader and check if they are the same each epoch.)

Code sample

Expected behavior

Either model also converges with shuffle=True, since warning says that it got overwritten (assuming model converges with shuffle=False) or at least warning should read that user has to change shuffle to False.

Environment

  • CUDA:
    - GPU:
    - GeForce GTX 1080 Ti
    - available: True
    - version: 10.1
  • Packages:
    - numpy: 1.19.0
    - pyTorch_debug: False
    - pyTorch_version: 1.7.0.dev20200705+cu101
    - pytorch-lightning: 0.8.5
    - tensorboard: 2.2.2
    - tqdm: 4.47.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    -
    - processor: x86_64
    - python: 3.7.7
    - version: docs: enable syntax highlight #109-Ubuntu SMP Fri Jun 19 11:33:10 UTC 2020

Additional context

@p-wein p-wein added bug Something isn't working help wanted Open to be worked on labels Jul 13, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@p-wein p-wein changed the title Trainer flag overfit_batches does not overwrite train dataloaders shuffle flag as stated in warning. Trainer flag overfit_batches does not overwrite train dataloaders shuffle flag Jul 14, 2020
@stale
Copy link

stale bot commented Sep 12, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@denck007
Copy link

I am seeing the same issue when using --overfit_pct. From a comment in the code, I believe that option is to be removed in 1.0.0, but is it worth it to fix it anyways? The same code will do fix the issue just checking self.overfit_pct instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on won't fix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants