Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError('signal only works in main thread') messages back for wand sweeps in v1.5.6 #11118

Closed
garrett361 opened this issue Dec 17, 2021 · 4 comments · Fixed by #11124
Closed
Labels
bug Something isn't working
Milestone

Comments

@garrett361
Copy link

🐛 Bug

An issue raised in #10336 is back in a lesser form in pl v1.5.6.

After upgrading to v1.5.6, wandb hyperparameter sweeps performed in Colab notebooks raise aValueError:

ValueError('signal only works in main thread')

Unlike in #10336, the error does not terminate the sweep, which still seems to sync up with wandb without issue, as far as I can tell.

To Reproduce

Reproduced at the end of this minimally modified BoringModel Colab notebook: https://colab.research.google.com/drive/1QheDfu4G5QEUSnHWpvK7UZzdHysRQq4h?usp=sharing

Expected behavior

wandb sweeps should complete without the cited ValueError. This was previously fixed in pl v1.5.3 and v1.5.4

Environment

  • CUDA:
    • GPU:
      • A100-SXM4-40GB
    • available: True
    • version: 11.1
  • Packages:
    • numpy: 1.19.5
    • pyTorch_debug: False
    • pyTorch_version: 1.10.0+cu111
    • pytorch-lightning: 1.5.6
    • tqdm: 4.62.3
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.12
    • version: Proposal for help #1 SMP Sat Jun 5 09:50:34 PDT 2021
      You can also fill out the list below manually.
      -->
  • PyTorch Lightning Version (e.g., 1.3.0):
  • PyTorch Version (e.g., 1.8)
  • Python version:
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

@garrett361 garrett361 added the bug Something isn't working label Dec 17, 2021
@garrett361
Copy link
Author

@tchaton

@awaelchli
Copy link
Contributor

Hello @garrett361
The colab link you posted is currently locked.

@awaelchli
Copy link
Contributor

Wild guess: in teardown when we call signal.signal, it should be guarded and not run outside the main thread. #11124

@garrett361
Copy link
Author

@awaelchli Oops, unlocked now, though a bit too late to be helpful. Thanks for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants