Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error "cannot join thread before it is started" after upgrading to 3.0.0 #115

Open
loretoparisi opened this issue Dec 1, 2021 · 10 comments

Comments

@loretoparisi
Copy link

Hello, I'm getting this threading error, after upgrading to the latest version.
This is currently my full stack trace
Schermata 2021-12-01 alle 12 33 11

@sampsyo
Copy link
Member

sampsyo commented Dec 1, 2021

Huh, that's interesting! I don't see an immediate reason why #114 would have caused this, but perhaps @Bomme can see a reason?

Can you provide instructions to reproduce the problem? (That is, what code produced this crash?)

@loretoparisi
Copy link
Author

Huh, that's interesting! I don't see an immediate reason why #114 would have caused this, but perhaps @Bomme can see a reason?

Can you provide instructions to reproduce the problem? (That is, what code produced this crash?)

Thanks, I will check the implementation, because I'm not using it directly but via some other package, not sure which one. Let me dig into.

@Bomme
Copy link
Contributor

Bomme commented Dec 1, 2021

I don't think it's related to the latest changes. In the traceback in the screenshot line 300 in ffdec.py is self.stderr_reader.join().
Since the latest changes this is actually in line line 297

@loretoparisi maybe you can provide a pip freeze output?

@sampsyo
Copy link
Member

sampsyo commented Dec 1, 2021

Indeed—maybe it would also be worth trying the same code on an older version to check whether the crash is truly new?

@loretoparisi
Copy link
Author

Thanks for helping guys, it seems we use it from librosa, and I assume it is using the pypi version so it should be in fact not the latest one from what I can see.
We will try to reproduce it, by the way the first idea was that this could be related to an issue on the file system where the audio location was.

Since the error I see here is a threading error (join), what I'm not sure of, is if originates internally in your sdk due to I/O access issues, and throws outsides, or the log just says that while the FFPMEGAudioReader thread was running something externally occurred...
As soon as I can reproduce it, I will tell you more.

@DWhettam
Copy link

Did you get anywhere with resolving this @loretoparisi?

I'm encountering the same issue using librosa to load audio from mp4 files.

@loretoparisi
Copy link
Author

nope I moved to native ffmpeg.

@Bomme
Copy link
Contributor

Bomme commented Dec 12, 2023

@DWhettam can you please share a stacktrace of the error that you see?

@DWhettam
Copy link

Sure. I get the "cannot join thread before it is started error" as well as "can't start new thread". I also get "Format not recognised." on the mp4 videos. This error occurs on a different mp4 each time, and only occurs for me when I try to read the audio and video from the mp4 file, if I just load the audio, or the video, I don't have any issues, so I'm fairly certain there is no issue with the files themselves, as I do not get that issue when reading the audio files. More specifically, if I call read_video(uses pytorchvideo), I have no errors. If I call read_audio(using librosa), I have no issues. But if I call one after the other, I encounter the below issue. Any help would be greatly appreciated!

Traceback (most recent call last):
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/audioread/ffdec.py", line 308, in __del__
    self.close()  
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/audioread/ffdec.py", line 297, in close
    self.stderr_reader.join()
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/threading.py", line 1107, in join
    raise RuntimeError("cannot join thread before it is started") 
RuntimeError: cannot join thread before it is started
Traceback (most recent call last):
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/RepetitionCounting/train_multimodal_w_eval_stats.py", line 508, in <module>
    for (i, data) in enumerate(dataloader):
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1325, in _next_data
    return self._process_data(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^ 
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/torch/_utils.py", line 644, in reraise  
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 5.
Original Traceback (most recent call last):
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/librosa/core/audio.py", line 175, in load 
    y, sr_native = __soundfile_load(path, offset, duration, dtype)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/librosa/core/audio.py", line 208, in __soundfile_load 
    context = sf.SoundFile(path)
              ^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/soundfile.py", line 658, in __init__
    self._file = self._open(file, mode_int, closefd)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/soundfile.py", line 1216, in _open
    raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening '/raid/local_scratch/ddw69-wwp01/569411/countix_videos/sc5bsO7CYDs.mp4': Format not recognised.

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
           ^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
            ~~~~~~~~~~~~^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/RepetitionCounting/dataloader_multimodal.py", line 340, in __getitem__
    audio = read_audio(video_name,start_crop,end_crop,self.add_noise)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/RepetitionCounting/dataloader_multimodal.py", line 113, in read_audio
    y, sr = librosa.load(video_filename, offset=start, duration=seg_duration)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/librosa/core/audio.py", line 183, in load
    y, sr_native = __audioread_load(path, offset, duration, dtype)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/librosa/util/decorators.py", line 59, in __wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/librosa/core/audio.py", line 239, in __audioread_load
    reader = audioread.audio_open(path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/audioread/__init__.py", line 127, in audio_open
    return BackendClass(path)
           ^^^^^^^^^^^^^^^^^^
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/site-packages/audioread/ffdec.py", line 177, in __init__
    self.stderr_reader.start()
  File "/jmain02/home/J2AD001/wwp01/ddw69-wwp01/.conda/envs/repcount/lib/python3.11/threading.py", line 957, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

@DWhettam
Copy link

@Bomme by reducing the number of workers in my pytorch dataloader I am able to run the code for much longer, although the error is still occurring. Instead of within the first epoch of training, reducing the number of workers causes the error to occur in the sixth epoch. I'm not sure what to interpret from this, but at least this confirms the "Format not recognised" part of the stack trace is a misnomer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants