Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: expected str, bytes or os.PathLike object, not NoneType #3898

Closed
rotabulo opened this issue Oct 6, 2020 · 3 comments · Fixed by #3904
Closed

TypeError: expected str, bytes or os.PathLike object, not NoneType #3898

rotabulo opened this issue Oct 6, 2020 · 3 comments · Fixed by #3904
Assignees
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@rotabulo
Copy link

rotabulo commented Oct 6, 2020

🐛 Bug

I am summarizing the source of the issue to speedup the fix.
After this line of code
https://github.com/PyTorchLightning/pytorch-lightning/blob/90929fa4333e5136020e9f9dcb7c1133e4c290f3/pytorch_lightning/accelerators/ddp_backend.py#L119

I have that env_copy['PL_GLOBAL_SEED'] is None and having an environment variable set to None breaks subprocess.Popen here
https://github.com/PyTorchLightning/pytorch-lightning/blob/90929fa4333e5136020e9f9dcb7c1133e4c290f3/pytorch_lightning/accelerators/ddp_backend.py#L127

My fix at the moment is to add

if env_copy['PL_GLOBAL_SEED'] is None:
                del env_copy['PL_GLOBAL_SEED']

after

https://github.com/PyTorchLightning/pytorch-lightning/blob/90929fa4333e5136020e9f9dcb7c1133e4c290f3/pytorch_lightning/accelerators/ddp_backend.py#L119

Environment

* CUDA:
	- GPU:
	- available:         False
	- version:           10.2
* Packages:
	- numpy:             1.18.5
	- pyTorch_debug:     False
	- pyTorch_version:   1.6.0
	- pytorch-lightning: 0.10.0rc1
	- tqdm:              4.48.0
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- 
	- processor:         x86_64
	- python:            3.7.7
	- version:           #100-Ubuntu SMP Wed Apr 22 20:32:56 UTC 2020
@rotabulo rotabulo added bug Something isn't working help wanted Open to be worked on labels Oct 6, 2020
@SeanNaren
Copy link
Contributor

Hey @rotabulo thanks for the report, could you describe the case/give a code example where PL_GLOBAL_SEED is None at this step?

@sidml
Copy link
Contributor

sidml commented Jul 9, 2021

I got the same error while training. I am using pytorch_lightning:1.3.8, torch: 1.9.0+cu102.
It seems to happen at random, so i am not sure how to reproduce it.

Here's the pytorch lightening error message

Traceback (most recent call last):
File "train.py", line 203, in
trainer.fit(model)
File "/home/sid/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
self._run(model)
File "/home/sid/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
self.dispatch()
File "/home/sid/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
self.accelerator.start_training(self)
File "/home/sid/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/sid/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/home/sid/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
return self.run_train()
File "/home/sid/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 909, in run_train
self.training_type_plugin.reconciliate_processes(traceback.format_exc())
File "/home/sid/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 383, in reconciliate_processes
torch.save(True, os.path.join(sync_dir, f"{self.global_rank}.pl"))
File "/home/sid/miniconda3/lib/python3.8/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

@bw4sz
Copy link

bw4sz commented Jul 22, 2021

Can we re-open here or get some guidance how we can make reproducible? I see this at random (re-run does not produce it) on DDP on slurm. Too many workers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants