Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix global seed missing bug with DDP [WIP] #3904

Merged
merged 1 commit into from
Oct 6, 2020
Merged

Fix global seed missing bug with DDP [WIP] #3904

merged 1 commit into from
Oct 6, 2020

Conversation

SeanNaren
Copy link
Contributor

What does this PR do?

Closes #3898.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@SeanNaren
Copy link
Contributor Author

Awaiting response on the issue to create a test to ensure case is fixed.

@SeanNaren SeanNaren self-assigned this Oct 6, 2020
@mergify mergify bot requested a review from a team October 6, 2020 15:12
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this doesn't set the seed if it is not None, I think you need to keep the assignment

@mergify mergify bot requested a review from a team October 6, 2020 15:47
@codecov
Copy link

codecov bot commented Oct 6, 2020

Codecov Report

Merging #3904 into master will decrease coverage by 2%.
The diff coverage is 0%.

@@           Coverage Diff           @@
##           master   #3904    +/-   ##
=======================================
- Coverage      87%     85%    -2%     
=======================================
  Files         117     117            
  Lines        9214    9502   +288     
=======================================
+ Hits         8053    8117    +64     
- Misses       1161    1385   +224     

@williamFalcon
Copy link
Contributor

@SeanNaren good catch!
want to look through the other env vars there to make sure they aren't set if None?

@SeanNaren
Copy link
Contributor Author

but this doesn't set the seed if it is not None, I think you need to keep the assignment

@awaelchli Shouldn't this already be set in env_copy since we're doing a copy?

want to look through the other env vars there to make sure they aren't set if None?

@williamFalcon Will do, I really want to add a test because I'm unsure how to reproduce this and what's the case where this pops up. Any ideas?

@williamFalcon
Copy link
Contributor

yeah, that's hard haha... i don't know where this would happen either. It looks like if you set an envvar and use popopen then it will crash?

likely because a user calls popen themselves which is fine. the test is very involved though, we can talk offline about it.

@williamFalcon
Copy link
Contributor

but this doesn't set the seed if it is not None, I think you need to keep the assignment

i don't understand what you mean by this

@williamFalcon williamFalcon merged commit e4a56fa into master Oct 6, 2020
@SeanNaren SeanNaren deleted the 3898 branch October 6, 2020 16:31
@awaelchli
Copy link
Contributor

Two cases:

  1. "GLOBAL_SEED" not in os.environ, then del os.environ["GLOBAL_SEED"] will throw KeyError.
  2. os.environ["GLOBAL_SEED"] = None, then this fix works but the env variable should not get added in the first place.

@awaelchli
Copy link
Contributor

As I said, if the user does not set the seed, running our mnist example with ddp gives

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [2,3]
Traceback (most recent call last):
  File "pl_examples/basic_examples/mnist.py", line 112, in <module>
    cli_main()
  File "pl_examples/basic_examples/mnist.py", line 103, in cli_main
    trainer.fit(model, train_loader, val_loader)
  File "/home/adrian/repositories/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 482, in fit
    self.accelerator_backend.setup(model)
  File "/home/adrian/repositories/pytorch-lightning/pytorch_lightning/accelerators/ddp_backend.py", line 59, in setup
    self._call_children_scripts()
  File "/home/adrian/repositories/pytorch-lightning/pytorch_lightning/accelerators/ddp_backend.py", line 121, in _call_children_scripts
    del env_copy['PL_GLOBAL_SEED']
KeyError: 'PL_GLOBAL_SEED'

@SeanNaren
Copy link
Contributor Author

Thanks @awaelchli realised we were missing a check... has already been fixed in #3927

@Borda Borda added this to the 0.10.0 milestone Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError: expected str, bytes or os.PathLike object, not NoneType
4 participants