Long-form audio speaker diarization OOM in clustering #7912

remenberl · 2023-11-20T02:24:34Z

Hi,

Thanks for the recent development of long-form audio speaker diarization in #7737. Recently I encounter a 4-hour-long audio and observe OOM on RAM (not VRAM).

Steps/Code to reproduce bug
It can be reproduced by using audio https://podwise-hh.s3.us-west-1.amazonaws.com/0ef8d4f5beb504fae9f12272a83db030ab29c92f4e033eb86bfefe3d1668c7cf.m4a

My telephonic config file is just the default, with the clustering/msdd part pasted below:

  clustering:
    parameters:
      oracle_num_speakers: False
      max_num_speakers: 8
      enhanced_count_thres: 80
      max_rp_threshold: 0.25
      sparse_search_volume: 30
      maj_vote_spk_count: False 
      chunk_cluster_count: 50
      embeddings_per_chunk: 10000
  msdd_model:
    model_path: diar_msdd_telephonic
    parameters:
      use_speaker_model_from_ckpt: True 
      infer_batch_size: 25
      sigmoid_threshold: [0.7] 
      seq_eval_mode: False
      split_infer: True
      diar_window_length: 50
      overlap_infer_spk_limit: 5

Expected behavior
The job stops after my screen prints the last iteration of "Extracting embeddings for Diarization" and the program quickly consumes close to 64GB memory from 20GB- in previous steps. FYI, here are the last lines of prints before being killed.

[NeMo I 2023-11-19 20:54:29 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2023-11-19 20:54:29 collections:445] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2023-11-19 20:54:29 collections:446] Dataset loaded with 52949 items, total duration of  7.25 hours.
[NeMo I 2023-11-19 20:54:29 collections:448] # 52949 files loaded accounting to # 1 labels

Environment overview (please complete the following information)

Environment location: Ubuntu 22.04 with 64GB RAM, 4090 GPU
Method of NeMo install: python -m pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]
If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

OS version: Ubuntu 22.04
PyTorch version: '2.0.1+cu117'
Python version: 3.10

Additional context

Add any other context about the problem here.
Example: GPU model

The text was updated successfully, but these errors were encountered:

nithinraok · 2023-11-21T06:19:28Z

Thanks for raising this issue, could you please attach full log here.

remenberl · 2023-11-21T16:01:59Z

Attached the log from running NeuralDiarizer.

nemo.log

erikqu · 2023-12-05T18:58:43Z

I'm also having this issue, hangs at this step, similar settings as op. Versions: Nemo 1.21.0, python 3.10

[NeMo I 2023-12-05 18:46:27 collections:302] Dataset loaded with 43 items, total duration of  0.60 hours.
[NeMo I 2023-12-05 18:46:27 collections:304] # 43 files loaded accounting to # 1 labels
vad: 100%|██████████| 43/43 [00:07<00:00,  6.06it/s]
[NeMo I 2023-12-05 18:46:34 clustering_diarizer:250] Generating predictions with overlapping input segments

tango4j · 2023-12-06T06:05:23Z

@remenberl
Thank you for uploading the samples. I will test it and get back to you.
Looking into the log you shared, I suppose 64GB RAM is not enough to handle 4 hours of diarization in an offline manner.
I will confirm the RAM requirement for this sample after run this myself.

erikqu · 2023-12-13T02:06:52Z

This worked for me after building from source, and setting num_workers to 0

github-actions · 2024-01-13T01:45:49Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-01-21T01:49:46Z

This issue was closed because it has been inactive for 7 days since being marked as stale.

remenberl added the bug Something isn't working label Nov 20, 2023

nithinraok assigned tango4j Nov 21, 2023

github-actions bot added the stale label Jan 13, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long-form audio speaker diarization OOM in clustering #7912

Long-form audio speaker diarization OOM in clustering #7912

remenberl commented Nov 20, 2023 •

edited

Loading

nithinraok commented Nov 21, 2023

remenberl commented Nov 21, 2023 •

edited

Loading

erikqu commented Dec 5, 2023 •

edited

Loading

tango4j commented Dec 6, 2023

erikqu commented Dec 13, 2023

github-actions bot commented Jan 13, 2024

github-actions bot commented Jan 21, 2024

Long-form audio speaker diarization OOM in clustering #7912

Long-form audio speaker diarization OOM in clustering #7912

Comments

remenberl commented Nov 20, 2023 • edited Loading

nithinraok commented Nov 21, 2023

remenberl commented Nov 21, 2023 • edited Loading

erikqu commented Dec 5, 2023 • edited Loading

tango4j commented Dec 6, 2023

erikqu commented Dec 13, 2023

github-actions bot commented Jan 13, 2024

github-actions bot commented Jan 21, 2024

remenberl commented Nov 20, 2023 •

edited

Loading

remenberl commented Nov 21, 2023 •

edited

Loading

erikqu commented Dec 5, 2023 •

edited

Loading