How to use speaker diarization without writing input to manifest and output to .rttm file ? #6271

dungnguyen98 · 2023-03-22T13:06:59Z

Thanks for your great repository. I have a question about speaker diarization.
I do the inference steps according to this tutorial: https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb

According to this tutorial, audio will be written to manifest file, output will be saved in .rttm format in out_dir ( manifest file and out_dir are defined in diar_infer_telephonic.yaml . Is there a direct way to pass the input as audio metadata (path, duration, ...) and give the results without writing the manifest and the .rttm files?

tango4j · 2023-03-23T16:47:58Z

You can perform speaker diarization without such manifest/rttm with the following feature:

#5945

Try following the example in the above PR.

dungnguyen98 · 2023-03-24T01:45:33Z

Is this feature has not been integrated in nemo library via pip ? I installed newest version (version 1.16 at this point) and I realized that I can not import NeuralDiarizer using command "from nemo.collections.asr.models import NeuralDiarizer", but main branch on github can do that.

dungnguyen98 · 2023-03-24T02:30:58Z

I have read code in NeuralDiarizer class. I realized that the audio files still must be written into manifest file. When I run multi-processes, the processes will overwrite this file. This will lead to conflig problem. I think I need to customize code due to my need using monkey patching :(

magicse · 2023-11-21T17:57:53Z

@dungnguyen98

import os
import sys
from nemo.collections.asr.models.msdd_models import NeuralDiarizer
import torch
from pydub import AudioSegment
import pandas as pd

device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)
model_path = "diarization_model"
os.environ["NEMO_CACHE_DIR"] = model_path

test_model = NeuralDiarizer.from_pretrained("diar_msdd_telephonic").to(device)
annotation = test_model("denoised_vocals.wav", num_workers=0, batch_size=16)
rttm=annotation.to_rttm()

df = pd.DataFrame(columns=['start_time', 'end_time', 'speaker', 'text'])
lines = rttm.splitlines()
if len(lines) == 0:
    df.loc[0] = 0, 0, 'No speaker found'
else:
    start_time, duration, prev_speaker = float(lines[0].split()[3]), float(lines[0].split()[4]), lines[0].split()[7]
    end_time = float(start_time) + float(duration)
    df.loc[0] = start_time, end_time, prev_speaker, ''

    for line in lines[1:]:
        split = line.split()
        start_time, duration, cur_speaker = float(split[3]), float(split[4]), split[7]
        end_time = float(start_time) + float(duration)
        if cur_speaker == prev_speaker:
            df.loc[df.index[-1], 'end_time'] = end_time
        else:
            df.loc[len(df)] = start_time, end_time, cur_speaker, ''
        prev_speaker = cur_speaker
    
print(df.to_string(index=False))

dungnguyen98 assigned okuchaiev Mar 22, 2023

dungnguyen98 changed the title ~~Speaker diarization~~ how to use speaker diarization without writing input to manifest and output to .rttm file ? Mar 22, 2023

dungnguyen98 changed the title ~~how to use speaker diarization without writing input to manifest and output to .rttm file ?~~ How to use speaker diarization without writing input to manifest and output to .rttm file ? Mar 22, 2023

dungnguyen98 closed this as completed Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use speaker diarization without writing input to manifest and output to .rttm file ? #6271

How to use speaker diarization without writing input to manifest and output to .rttm file ? #6271

dungnguyen98 commented Mar 22, 2023

tango4j commented Mar 23, 2023

dungnguyen98 commented Mar 24, 2023

dungnguyen98 commented Mar 24, 2023 •

edited

Loading

magicse commented Nov 21, 2023 •

edited

Loading

How to use speaker diarization without writing input to manifest and output to .rttm file ? #6271

How to use speaker diarization without writing input to manifest and output to .rttm file ? #6271

Comments

dungnguyen98 commented Mar 22, 2023

tango4j commented Mar 23, 2023

dungnguyen98 commented Mar 24, 2023

dungnguyen98 commented Mar 24, 2023 • edited Loading

magicse commented Nov 21, 2023 • edited Loading

dungnguyen98 commented Mar 24, 2023 •

edited

Loading

magicse commented Nov 21, 2023 •

edited

Loading