Computing Shar features in parallel (CPU and/or GPU) #1226
Replies: 2 comments 5 replies
-
There’s a CLI utility for built in feature extraction for shar that you can use or adapt: lhotse/lhotse/bin/modes/shar.py Line 132 in b869488 |
Beta Was this translation helpful? Give feedback.
-
BTW, I am trying to add WavLM as a feature extractor. WavLM has a output resolution of 0.02 seconds but the length of the features is not consistent with that formula. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I am looking into a way to parallelize feature extraction in the sharded format.
In particular referring to
lhotse/examples/04-lhotse-shar.ipynb
Line 125 in b869488
Here copy-pasted for convenience.
I tried to do the following to parallelize over CPU:
Basically I have a writer for each shard and I dispatch different workers (using pytorch as a easy workaround) which will access the correct writer by
self.writers[int(item / self.shard_size)]
.The problem is that this code only works when I have only one worker:
num_workers=1
in the dataloader.Otherwise it fails with
RuntimeError: Uneven number of files in the tarfile (expected to iterate pairs of binary data + JSON metadata.
Here is the full script (basic example on mini-librispeech): pre_compute_sharded.zip
Beta Was this translation helpful? Give feedback.
All reactions