Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator #5897

stevehuang52 · 2023-01-31T20:30:40Z

Signed-off-by: stevehuang52 [email protected]

What does this PR do ?

Replace the previous silence insertion method with a new one that guarantees close approximation to specified mean_silence

Collection: [ASR]

Adding Silence in ASR Data Simulator

Requirements:

Sentence durations in each session follow a negative-binomial (NB) distribution
Silence ratios in all sessions follows a Beta distribution (range in [0.1])
Per silence length should look like a long tail distribution, thus Gamma distribution is used.
Silence Uniformity: each speech sentence (overlaps combined) should be followed by some minimum silence
Silence Variablity: per-silence durations and sentence durations should be approximately independent (i.e., p-value close to 0)

Parameters:

NUM_SESSIONS: number of sessions
MAX_SESS_DUR: maximum session duration
SAMPLING_RATE: sampling rate for audio
[NB_COUNT, NB_PROB]: parameters for per sentence duration distribution
SILENCE_RATIO_MEAN: mean for target silence ratio in all sessions, in (0,1)
SILENCE_RATIO_VAR: std for target silence ratio in all sessions, set small values (e.g., 0.1) for better approximation to mean, set larger (e.g., 2.0) for more diversity in silence.
PER_SILENCE_VAR: std for individual silence length, default to 20 for achieving p-value=0.1 to de-correlate speech and silence lengths
[PER_SILENCE_MIN,PER_SILENCE_MAX]: mix and max of per silence duration in seconds, max=-1 for no constraint

Algorithm:

MAX_SESSION_LEN = MAX_SESS_DUR * SAMPLING_RATE
MIN_SILENCE_LEN = SILENCE_RATIO_MIN * SAMPLING_RATE
MAX_SILENCE_LEN = min(SILENCE_RATIO_MAX * SAMPLING_RATE, MAX_SESSION_LEN)

sessions = []
for i in range(NUM_SESSIONS):
    curr_session = []

    curr_sess_len = 0
    curr_speech_len = 0
    curr_silence_len = 0

    a = SILENCE_RATIO_MEAN ** 2 * (1 - SILENCE_RATIO_MEAN) / SILENCE_RATIO_VAR - SILENCE_RATIO_MEAN
    b = SILENCE_RATIO_MEAN * (1 - SILENCE_RATIO_MEAN) ** 2 / SILENCE_RATIO_VAR - (1 - SILENCE_RATIO_MEAN)
    sess_silence_mean = Beta(a, b).rvs()

    while curr_sess_len < MAX_SESSION_LEN:
        speech_len = NB(NB_COUNT, NB_PROB).rvs()
        sentence = build_sentence(speech_len,curr_sess_len,  MAX_SESSION_LEN)
        
        curr_session += sentence
        curr_sess_len += len(sentence)
        curr_speech_len += len(sentence)

        if curr_sess_len >= MAX_SESSION_LEN:
            break
        
        # dynamically adjust silence mean to achieve the overall mean
        silence_mean = max(1, MIN_SIL1ENCE_LEN, (sess_silence_mean * curr_sess_len - curr_silence_len) / (1 - sess_silence_mean))

        # sampling with large std to de-correlate with previous sentence length
        silence_len = Gamma(a=silence_mean ** 2 / PER_SILENCE_VAR, scale=PER_SILENCE_VAR / silence_mean).rvs()  

        # enforce valid length
        silence_len = min(max(MIN_SILENCE_LEN, silence_len), MAX_SILENCE_LEN, max_session_len - curr_sess_len)  
        
        silence = add_silence(silence_len)
        
        curr_session += silence
        curr_sess_len += silence_len
        curr_silence_len += silence_len

    sessions.append(curr_session)

Notes

Harder to get desired silence ratio with shorter session length
- E.g., sess_len=20s, mean_silence=0.35, 500 hours -> actual ratio 0.24
- E.g., sess_len=120s. mean_silence=0.3, 100 hours -> actual ratio ~0.3
Silence lengths distribution is approximately exponential, and reducing silence ratio can reduce the avg silence length
- E.g., mean_silence=0.1, std=0.05 -> mean per-silence length 0.2s
- E.g., mean_silence=0.2, std=0.5 -> mean per-silence length 2.4s

Signed-off-by: stevehuang52 <[email protected]>

…52/NeMo into fix_simulator_silence

Signed-off-by: stevehuang52 <[email protected]>

Signed-off-by: Taejin Park <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Taejin Park <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Taejin Park <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Taejin Park <[email protected]>

…52/NeMo into fix_simulator_silence

Signed-off-by: Taejin Park <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Taejin Park <[email protected]>

…52/NeMo into fix_simulator_silence

Signed-off-by: Taejin Park <[email protected]>

…52/NeMo into fix_simulator_silence

Signed-off-by: stevehuang52 <[email protected]>

Signed-off-by: Taejin Park <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: stevehuang52 <[email protected]>

…52/NeMo into fix_simulator_silence

Signed-off-by: Taejin Park <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Taejin Park <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: stevehuang52 <[email protected]>

Signed-off-by: Taejin Park <[email protected]>

tango4j · 2023-02-10T00:36:22Z

Notebook is tested, works with no problem. the new script was missing a license template, so I added. I will approve as soon as it passes the test.

Signed-off-by: stevehuang52 <[email protected]>

…52/NeMo into fix_simulator_silence

Signed-off-by: stevehuang52 <[email protected]>

…52/NeMo into fix_simulator_silence

tango4j

Notebooks and data simulation code both tested. Very nice work by stevehuang, thanks.

…ulator (NVIDIA#5897) * fix silence insertioon Signed-off-by: stevehuang52 <[email protected]> * update docs and tutorial Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * change to beta annd gamma distributions Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Added silence vs overlap selector with overlap algo Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Function name change and fixes Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update silence and overlap adding algorithm for better accuracy Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Recommended range for overlap mean Signed-off-by: Taejin Park <[email protected]> * Changing yaml file default values Signed-off-by: Taejin Park <[email protected]> * Fixed typos and errors in docstrings Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed minor bugs and removed unused functions Signed-off-by: Taejin Park <[email protected]> * Fixed minor bugs and removed unused imports Signed-off-by: Taejin Park <[email protected]> * Added docstrings for newly updated overlap algos Signed-off-by: Taejin Park <[email protected]> * Fixed non_silence_len_samples calculation, more accurate now Signed-off-by: Taejin Park <[email protected]> * adding missing docstring for non_silence_len Signed-off-by: Taejin Park <[email protected]> * removed ipdb lines Signed-off-by: Taejin Park <[email protected]> * refactor and update Signed-off-by: stevehuang52 <[email protected]> * updated logs for v1.1 Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Argument check update for mean=0 var=0 case Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: stevehuang52 <[email protected]> * update silence/overlap mean clipping Signed-off-by: stevehuang52 <[email protected]> * Adding mean clipping Signed-off-by: Taejin Park <[email protected]> * added 0 handling for ovl/sim_mean Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tested on fisher and fixed the bug with string-speaker ID Signed-off-by: Taejin Park <[email protected]> * update code for visualization Signed-off-by: stevehuang52 <[email protected]> * refactor Signed-off-by: stevehuang52 <[email protected]> * fix load_rttm Signed-off-by: stevehuang52 <[email protected]> * Adding docstrings Signed-off-by: Taejin Park <[email protected]> * Adding usage in the analysis script Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix filename Signed-off-by: stevehuang52 <[email protected]> * Added argument check for sentence length params Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed unnecessary NB torch sampling Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add build_synthetic_vad_manifest.py Signed-off-by: stevehuang52 <[email protected]> * add check for non rttm files Signed-off-by: stevehuang52 <[email protected]> * added docstrings Signed-off-by: Taejin Park <[email protected]> * typo is fixed Signed-off-by: Taejin Park <[email protected]> * License template was missing, added Signed-off-by: Taejin Park <[email protected]> * add missing copyright and move script Signed-off-by: stevehuang52 <[email protected]> * add missing comma Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

fix silence insertioon

1eb82ad

Signed-off-by: stevehuang52 <[email protected]>

stevehuang52 requested a review from tango4j January 31, 2023 20:30

github-actions bot added the ASR label Jan 31, 2023

tango4j and others added 4 commits January 31, 2023 12:46

Merge branch 'main' into fix_simulator_silence

d95a9cc

update docs and tutorial

7aed6da

Signed-off-by: stevehuang52 <[email protected]>

Merge branch 'fix_simulator_silence' of https://github.com/stevehuang…

2c94bd1

…52/NeMo into fix_simulator_silence

update

5318015

Signed-off-by: stevehuang52 <[email protected]>

tango4j changed the title ~~Fix Silence Insertion for ASR Data Simulator~~ Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator Jan 31, 2023

stevehuang52 and others added 22 commits February 2, 2023 12:51

change to beta annd gamma distributions

5a2790a

Signed-off-by: stevehuang52 <[email protected]>

update

196e2f6

Signed-off-by: stevehuang52 <[email protected]>

fix typo

dd78c75

Signed-off-by: stevehuang52 <[email protected]>

Added silence vs overlap selector with overlap algo

0623fdd

Signed-off-by: Taejin Park <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

25e2e52

for more information, see https://pre-commit.ci

Function name change and fixes

42be03e

Signed-off-by: Taejin Park <[email protected]>

Function name change and fixes after merge

f343b78

Signed-off-by: Taejin Park <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

2ae8899

for more information, see https://pre-commit.ci

Update silence and overlap adding algorithm for better accuracy

4c1f3ba

Signed-off-by: Taejin Park <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

608b332

for more information, see https://pre-commit.ci

Recommended range for overlap mean

1257513

Signed-off-by: Taejin Park <[email protected]>

Merge branch 'fix_simulator_silence' of https://github.com/stevehuang…

ed78284

…52/NeMo into fix_simulator_silence

Changing yaml file default values

e100337

Signed-off-by: Taejin Park <[email protected]>

Fixed typos and errors in docstrings

2087695

Signed-off-by: Taejin Park <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6bba63d

for more information, see https://pre-commit.ci

Fixed minor bugs and removed unused functions

8a0c78a

Signed-off-by: Taejin Park <[email protected]>

Fixed minor bugs and removed unused imports

a72da70

Signed-off-by: Taejin Park <[email protected]>

Merge branch 'fix_simulator_silence' of https://github.com/stevehuang…

89342af

…52/NeMo into fix_simulator_silence

Added docstrings for newly updated overlap algos

ffb48c4

Signed-off-by: Taejin Park <[email protected]>

Merge branch 'main' into fix_simulator_silence

bed1c06

Fixed non_silence_len_samples calculation, more accurate now

498eeb7

Signed-off-by: Taejin Park <[email protected]>

Merge branch 'fix_simulator_silence' of https://github.com/stevehuang…

33aff26

…52/NeMo into fix_simulator_silence

stevehuang52 and others added 18 commits February 7, 2023 16:43

refactor

222a47d

Signed-off-by: stevehuang52 <[email protected]>

fix load_rttm

32bf6c3

Signed-off-by: stevehuang52 <[email protected]>

Adding docstrings

7a42dd7

Signed-off-by: Taejin Park <[email protected]>

Adding usage in the analysis script

4f84b0b

Signed-off-by: Taejin Park <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

5a5194b

for more information, see https://pre-commit.ci

fix filename

2848a19

Signed-off-by: stevehuang52 <[email protected]>

Merge branch 'fix_simulator_silence' of https://github.com/stevehuang…

50ceae4

…52/NeMo into fix_simulator_silence

Added argument check for sentence length params

3db820c

Signed-off-by: Taejin Park <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

113baf2

for more information, see https://pre-commit.ci

Removed unnecessary NB torch sampling

de37ee6

Signed-off-by: Taejin Park <[email protected]>

Resolved conflict

e591adb

Signed-off-by: Taejin Park <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

46dce93

for more information, see https://pre-commit.ci

add build_synthetic_vad_manifest.py

ba18333

Signed-off-by: stevehuang52 <[email protected]>

add check for non rttm files

cf1427c

Signed-off-by: stevehuang52 <[email protected]>

added docstrings

dfb6b44

Signed-off-by: Taejin Park <[email protected]>

typo is fixed

ea88362

Signed-off-by: Taejin Park <[email protected]>

Merge branch 'main' into fix_simulator_silence

3f81f47

License template was missing, added

1f55fb6

Signed-off-by: Taejin Park <[email protected]>

stevehuang52 and others added 8 commits February 9, 2023 20:51

Merge branch 'main' into fix_simulator_silence

5600e33

Merge branch 'main' into fix_simulator_silence

735ab2a

add missing copyright and move script

37ab4e4

Signed-off-by: stevehuang52 <[email protected]>

Merge branch 'fix_simulator_silence' of https://github.com/stevehuang…

f445c2e

…52/NeMo into fix_simulator_silence

Merge branch 'main' into fix_simulator_silence

e040ccf

add missing comma

cfa60e6

Signed-off-by: stevehuang52 <[email protected]>

Merge branch 'fix_simulator_silence' of https://github.com/stevehuang…

4cf8d02

…52/NeMo into fix_simulator_silence

Merge branch 'main' into fix_simulator_silence

c7476dd

tango4j approved these changes Feb 10, 2023

View reviewed changes

stevehuang52 merged commit 63f6d44 into NVIDIA:main Feb 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator #5897

Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator #5897

stevehuang52 commented Jan 31, 2023 •

edited

Loading

tango4j commented Feb 10, 2023 •

edited

Loading

tango4j left a comment

Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator #5897

Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator #5897

Conversation

stevehuang52 commented Jan 31, 2023 • edited Loading

What does this PR do ?

Adding Silence in ASR Data Simulator

Requirements:

Parameters:

Algorithm:

Notes

tango4j commented Feb 10, 2023 • edited Loading

tango4j left a comment

Choose a reason for hiding this comment

stevehuang52 commented Jan 31, 2023 •

edited

Loading

tango4j commented Feb 10, 2023 •

edited

Loading