Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Shar export of multi-channel, multi-source recording and cuts with start>0 #1053

Merged
merged 2 commits into from
May 5, 2023

Conversation

pzelasko
Copy link
Collaborator

@pzelasko pzelasko commented May 4, 2023

@desh2608 can you please try this out before we merge? The tests are working but I'd rather double-check.

@pzelasko pzelasko added this to the v1.15 milestone May 4, 2023
@pzelasko pzelasko requested a review from desh2608 May 4, 2023 14:43
@pzelasko pzelasko linked an issue May 4, 2023 that may be closed by this pull request
@pzelasko pzelasko merged commit 4c1202a into master May 5, 2023
@pzelasko pzelasko deleted the feature/multi-cut-multi-audio-source-shar branch May 5, 2023 12:34
@desh2608
Copy link
Collaborator

desh2608 commented May 5, 2023

Sorry, I found that there is some issue when using >1 jobs for writing:

Shard progress: 0it [00:04, ?it/s]
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/hltcoe/draj/.conda/envs/torch2/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/exp/draj/mini_scale_2022/lhotse/lhotse/cut/set.py", line 3405, in _export_to_shar_single
    for cut in cuts:
  File "/exp/draj/mini_scale_2022/lhotse/lhotse/lazy.py", line 165, in values
    yield from self
  File "/exp/draj/mini_scale_2022/lhotse/lhotse/lazy.py", line 216, in __iter__
    yield from map(deserialize_item, self.source)
  File "/exp/draj/mini_scale_2022/lhotse/lhotse/lazy.py", line 186, in __iter__
    for line in f:
  File "/home/hltcoe/draj/.conda/envs/torch2/lib/python3.8/gzip.py", line 305, in read1
    return self._buffer.read1(size)
  File "/home/hltcoe/draj/.conda/envs/torch2/lib/python3.8/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/hltcoe/draj/.conda/envs/torch2/lib/python3.8/gzip.py", line 498, in read
    raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "local/prepare_chime6_ihm.py", line 83, in <module>
    prepare_chime6_ihm()
  File "local/prepare_chime6_ihm.py", line 69, in prepare_chime6_ihm
    shards = cuts.to_shar(
  File "/exp/draj/mini_scale_2022/lhotse/lhotse/cut/set.py", line 618, in to_shar
    partial_paths = f.result()
  File "/home/hltcoe/draj/.conda/envs/torch2/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/home/hltcoe/draj/.conda/envs/torch2/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
EOFError: Compressed file ended before the end-of-stream marker was reached

@pzelasko
Copy link
Collaborator Author

pzelasko commented May 5, 2023

OK, we'll resolve it in a follow up PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for in-memory cuts with multiple audio sources
2 participants