Add multiprocessing to meeting simulation workflow #972

desh2608 · 2023-02-08T17:03:10Z

Changes in this PR

Refactor meeting simulation workflow to put utterance group creation as a common sampler.
Add option for using multiple workers to mix the sampled utterance groups. Can potentially speed up ~2-3x.
Fix parallel_map method.

Recommendations for `num_jobs`

When using the simulate() method in the workflow. we recommend using 1 job when the number of source utterances is small (up to 50k, for example). For larger inputs, this number can be scaled up slowly, but not more than 4-8 jobs to avoid slow-down due to multiprocessing overhead.

… sim_nj

pzelasko

Very cool!

desh2608 · 2023-02-08T18:49:28Z

Very cool!

I'll keep this under WIP while I test it out on my actual simulation tasks.

desh2608 · 2023-02-08T20:04:27Z

lhotse/workflows/meeting_simulation/base.py

+        for spk_id in this_batch_spk_ids:
+            sampler = self.samplers[spk_id]
+            try:
+                this_batch = next(sampler)


@pzelasko An issue I am facing here is that the whole sampler gets exhausted after sampling just 1 batch. I'm not sure why this is happening. Could you take a look?

Okay very basic mistake --- forgot to sort the cuts by speaker id before groupby!

desh2608 · 2023-02-08T23:41:35Z

The simulation should be quite fast now. As an example, creating mixtures from ~3.2M source utterances (each used once) takes ~1h using 4 jobs.

desh2608 added 3 commits February 8, 2023 11:47

add mulitprocessing to meeting simulation workflow

585a689

minor changes to docstring

6a0e1ca

Merge branch 'master' of https://github.com/lhotse-speech/lhotse into…

d9a237d

… sim_nj

pzelasko previously approved these changes Feb 8, 2023

View reviewed changes

pzelasko added this to the v1.13 milestone Feb 8, 2023

desh2608 changed the title ~~Add mulitprocessing to meeting simulation workflow~~ [WIP] Add mulitprocessing to meeting simulation workflow Feb 8, 2023

Merge branch 'master' into sim_nj

132c938

desh2608 changed the title ~~[WIP] Add mulitprocessing to meeting simulation workflow~~ [WIP] Add multiprocessing to meeting simulation workflow Feb 8, 2023

desh2608 added 2 commits February 8, 2023 14:58

make samplers into dict for fast sampling and removal

efaffcc

Merge branch 'sim_nj' of https://github.com/desh2608/lhotse into sim_nj

26a69bb

desh2608 dismissed pzelasko’s stale review via 26a69bb February 8, 2023 19:58

desh2608 commented Feb 8, 2023

View reviewed changes

desh2608 added 2 commits February 8, 2023 15:43

sort cuts before grouping

0ad87f8

fix failing test

01e1f6a

desh2608 changed the title ~~[WIP] Add multiprocessing to meeting simulation workflow~~ Add multiprocessing to meeting simulation workflow Feb 8, 2023

pzelasko approved these changes Feb 8, 2023

View reviewed changes

Merge branch 'master' into sim_nj

36d3996

desh2608 merged commit a418912 into lhotse-speech:master Feb 9, 2023

desh2608 deleted the sim_nj branch November 2, 2023 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multiprocessing to meeting simulation workflow #972

Add multiprocessing to meeting simulation workflow #972

desh2608 commented Feb 8, 2023

pzelasko left a comment

desh2608 commented Feb 8, 2023

desh2608 Feb 8, 2023

desh2608 Feb 8, 2023

desh2608 commented Feb 8, 2023

Add multiprocessing to meeting simulation workflow #972

Add multiprocessing to meeting simulation workflow #972

Conversation

desh2608 commented Feb 8, 2023

Changes in this PR

Recommendations for num_jobs

pzelasko left a comment

Choose a reason for hiding this comment

desh2608 commented Feb 8, 2023

desh2608 Feb 8, 2023

Choose a reason for hiding this comment

desh2608 Feb 8, 2023

Choose a reason for hiding this comment

desh2608 commented Feb 8, 2023

Recommendations for `num_jobs`