Major Misidentification Issues in Diarization #248

andonagio · 2024-09-26T12:31:00Z

I recently tried to transcribe and diarize a meeting with six people. The transcription is spot-on, but there are a lot of misidentification issues in the diarization. For instance, this is two different people talking (a man and a woman):

    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        42.22,
        48.54
      ],
      "text": " How are you?."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        50.26,
        51.64
      ],
      "text": " I'm fine. How are you?"
    },

And this is the same person talking:

    {
      "speaker": "SPEAKER_07",
      "timestamp": [
        52.08,
        53.3
      ],
      "text": "So let's get started."
    },
    {
      "speaker": "SPEAKER_04",
      "timestamp": [
        53.3,
        62.48
      ],
      "text": " Welcome to our meeting."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        62.96,
        64.36
      ],
      "text": " Here's the first item on our agenda."
    },

I'm not clear on whether this is a pyannote problem, an issue with my recording (made in a conference room on an iPhone), or a limitation of current diarization models in general. I get the same problem even if I specify the number of speakers in the command line.

Is there a way to improve the diarization? I'd welcome any insight the community has. Thanks!

The text was updated successfully, but these errors were encountered:

timlac · 2024-09-26T16:16:29Z

I'm having the same experience, just recently tried the diarization on a file with two speakers (swedish language), the transcription works great but the diarization results are very poor.

anujbohra23 · 2024-10-01T06:07:49Z

I recently tried to transcribe and diarize a meeting with six people. The transcription is spot-on, but there are a lot of misidentification issues in the diarization. For instance, this is two different people talking (a man and a woman):
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        42.22,
        48.54
      ],
      "text": " How are you?."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        50.26,
        51.64
      ],
      "text": " I'm fine. How are you?"
    },
And this is the same person talking:
    {
      "speaker": "SPEAKER_07",
      "timestamp": [
        52.08,
        53.3
      ],
      "text": "So let's get started."
    },
    {
      "speaker": "SPEAKER_04",
      "timestamp": [
        53.3,
        62.48
      ],
      "text": " Welcome to our meeting."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        62.96,
        64.36
      ],
      "text": " Here's the first item on our agenda."
    },
I'm not clear on whether this is a pyannote problem, an issue with my recording (made in a conference room on an iPhone), or a limitation of current diarization models in general. I get the same problem even if I specify the number of speakers in the command line.

Is there a way to improve the diarization? I'd welcome any insight the community has. Thanks!

could you provide the code for the diarization process?

NicolasDrapier · 2024-10-01T09:01:02Z

same issue here, I tried to downgrade the pyannote model but nothing better

dirkdiggler41 · 2024-10-04T20:20:23Z

The problem is pyannote itself. Could you try to run it by itself to see if you get a different result? From my experience, I did not get a better result.

timlac · 2024-10-11T11:15:14Z

Anyone know any good alternatives for diariziation that plays nice with insanely fast whisper?

tm-robinson · 2024-11-11T23:15:20Z

I have the same issue. During segmentation I also get a message utils/diarize.py:55: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behaviour. Not sure if it's related or not.

This is on macOS and seems to consistently happen across various recordings with 3-4 speakers. The command I used was:

insanely-fast-whisper --file-name file.wav --language en --device-id mps --batch-size 4 --hf-token HFTOKEN

The code in https://github.com/MahmoudAshraf97/whisper-diarization is able to successfully diarize the same file, however it is much slower as it runs on the CPU rather than the GPU on my M1 Mac.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major Misidentification Issues in Diarization #248

Major Misidentification Issues in Diarization #248

andonagio commented Sep 26, 2024 •

edited

Loading

timlac commented Sep 26, 2024

anujbohra23 commented Oct 1, 2024

NicolasDrapier commented Oct 1, 2024

dirkdiggler41 commented Oct 4, 2024

timlac commented Oct 11, 2024

tm-robinson commented Nov 11, 2024 •

edited

Loading

Major Misidentification Issues in Diarization #248

Major Misidentification Issues in Diarization #248

Comments

andonagio commented Sep 26, 2024 • edited Loading

timlac commented Sep 26, 2024

anujbohra23 commented Oct 1, 2024

NicolasDrapier commented Oct 1, 2024

dirkdiggler41 commented Oct 4, 2024

timlac commented Oct 11, 2024

tm-robinson commented Nov 11, 2024 • edited Loading

andonagio commented Sep 26, 2024 •

edited

Loading

tm-robinson commented Nov 11, 2024 •

edited

Loading