Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major Misidentification Issues in Diarization #248

Open
andonagio opened this issue Sep 26, 2024 · 6 comments
Open

Major Misidentification Issues in Diarization #248

andonagio opened this issue Sep 26, 2024 · 6 comments

Comments

@andonagio
Copy link

andonagio commented Sep 26, 2024

I recently tried to transcribe and diarize a meeting with six people. The transcription is spot-on, but there are a lot of misidentification issues in the diarization. For instance, this is two different people talking (a man and a woman):

    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        42.22,
        48.54
      ],
      "text": " How are you?."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        50.26,
        51.64
      ],
      "text": " I'm fine. How are you?"
    },

And this is the same person talking:

    {
      "speaker": "SPEAKER_07",
      "timestamp": [
        52.08,
        53.3
      ],
      "text": "So let's get started."
    },
    {
      "speaker": "SPEAKER_04",
      "timestamp": [
        53.3,
        62.48
      ],
      "text": " Welcome to our meeting."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        62.96,
        64.36
      ],
      "text": " Here's the first item on our agenda."
    },

I'm not clear on whether this is a pyannote problem, an issue with my recording (made in a conference room on an iPhone), or a limitation of current diarization models in general. I get the same problem even if I specify the number of speakers in the command line.

Is there a way to improve the diarization? I'd welcome any insight the community has. Thanks!

@timlac
Copy link

timlac commented Sep 26, 2024

I'm having the same experience, just recently tried the diarization on a file with two speakers (swedish language), the transcription works great but the diarization results are very poor.

@anujbohra23
Copy link

I recently tried to transcribe and diarize a meeting with six people. The transcription is spot-on, but there are a lot of misidentification issues in the diarization. For instance, this is two different people talking (a man and a woman):

    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        42.22,
        48.54
      ],
      "text": " How are you?."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        50.26,
        51.64
      ],
      "text": " I'm fine. How are you?"
    },

And this is the same person talking:

    {
      "speaker": "SPEAKER_07",
      "timestamp": [
        52.08,
        53.3
      ],
      "text": "So let's get started."
    },
    {
      "speaker": "SPEAKER_04",
      "timestamp": [
        53.3,
        62.48
      ],
      "text": " Welcome to our meeting."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        62.96,
        64.36
      ],
      "text": " Here's the first item on our agenda."
    },

I'm not clear on whether this is a pyannote problem, an issue with my recording (made in a conference room on an iPhone), or a limitation of current diarization models in general. I get the same problem even if I specify the number of speakers in the command line.

Is there a way to improve the diarization? I'd welcome any insight the community has. Thanks!

I recently tried to transcribe and diarize a meeting with six people. The transcription is spot-on, but there are a lot of misidentification issues in the diarization. For instance, this is two different people talking (a man and a woman):

    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        42.22,
        48.54
      ],
      "text": " How are you?."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        50.26,
        51.64
      ],
      "text": " I'm fine. How are you?"
    },

And this is the same person talking:

    {
      "speaker": "SPEAKER_07",
      "timestamp": [
        52.08,
        53.3
      ],
      "text": "So let's get started."
    },
    {
      "speaker": "SPEAKER_04",
      "timestamp": [
        53.3,
        62.48
      ],
      "text": " Welcome to our meeting."
    },
    {
      "speaker": "SPEAKER_00",
      "timestamp": [
        62.96,
        64.36
      ],
      "text": " Here's the first item on our agenda."
    },

I'm not clear on whether this is a pyannote problem, an issue with my recording (made in a conference room on an iPhone), or a limitation of current diarization models in general. I get the same problem even if I specify the number of speakers in the command line.

Is there a way to improve the diarization? I'd welcome any insight the community has. Thanks!

could you provide the code for the diarization process?

@NicolasDrapier
Copy link

same issue here, I tried to downgrade the pyannote model but nothing better

@dirkdiggler41
Copy link

The problem is pyannote itself. Could you try to run it by itself to see if you get a different result? From my experience, I did not get a better result.

@timlac
Copy link

timlac commented Oct 11, 2024

Anyone know any good alternatives for diariziation that plays nice with insanely fast whisper?

@tm-robinson
Copy link

tm-robinson commented Nov 11, 2024

I have the same issue. During segmentation I also get a message utils/diarize.py:55: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behaviour. Not sure if it's related or not.

This is on macOS and seems to consistently happen across various recordings with 3-4 speakers. The command I used was:

insanely-fast-whisper --file-name file.wav --language en --device-id mps --batch-size 4 --hf-token HFTOKEN

The code in https://github.com/MahmoudAshraf97/whisper-diarization is able to successfully diarize the same file, however it is much slower as it runs on the CPU rather than the GPU on my M1 Mac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants