Skip to content

Conversation

@nsepehr
Copy link
Contributor

@nsepehr nsepehr commented Sep 29, 2025

Summary

This PR adds support for the max_speakers parameter to the Speechmatics STT plugin, allowing developers to limit the number of unique speakers detected during diarization.

Problem

Currently, when using the Speechmatics STT plugin with diarization enabled, there's no way to specify the maximum number of speakers. The transcription_config parameter (which is deprecated) accepts a speaker_diarization_config with max_speakers, but this value is not preserved when the plugin processes the configuration.

Solution

  • Added max_speakers as a direct parameter to the STT __init__ method
  • Updated the STTOptions dataclass to include the max_speakers field
  • Modified _process_config to include max_speakers in the speaker_diarization_config when sending to the Speechmatics API
  • Added proper handling for extracting max_speakers from the deprecated transcription_config parameter for backward compatibility
  • Updated documentation to explain the new parameter

Use Case

This parameter is particularly useful for scenarios where the number of participants is known in advance, such as:

  • Two-person interviews or conversations
  • Small group discussions with a fixed number of participants
  • Customer service calls (agent and customer)
  • Educational settings with known speaker counts

Testing

  • Tested locally with a multi-speaker agent implementation
  • Verified that the parameter is correctly passed to the Speechmatics API configuration
  • Confirmed backward compatibility with the deprecated transcription_config parameter

Example Usage

stt = speechmatics.STT(
    language="en",
    enable_diarization=True,
    max_speakers=2,  # Limit to 2 speakers
    diarization_sensitivity=0.5,
    speaker_active_format="@[{speaker_id}]: {text}",
)

Breaking Changes

None - this is a backward-compatible addition.

@CLAassistant
Copy link

CLAassistant commented Sep 29, 2025

CLA assistant check
All committers have signed the CLA.

@nsepehr nsepehr force-pushed the feat/speechmatics-max-speakers branch 2 times, most recently from a3a7974 to c395ae5 Compare September 29, 2025 06:18
@nsepehr nsepehr force-pushed the feat/speechmatics-max-speakers branch from c395ae5 to 400f516 Compare September 30, 2025 05:25
- Added max_speakers parameter to STT __init__ method
- Updated STTOptions dataclass to include max_speakers field
- Modified _process_config to include max_speakers in speaker_diarization_config
- Added handling for extracting max_speakers from deprecated transcription_config
- Updated documentation to explain the new parameter
- Fixed compatibility with livekit-agents 1.2.6 (removed diarization from STTCapabilities)
- Updated minimum livekit-agents version to 1.2.6

This parameter allows limiting the number of unique speakers detected during
diarization, which is useful for scenarios with a known number of participants
(e.g., 2-person interviews, small group meetings with fixed participants).
@nsepehr nsepehr force-pushed the feat/speechmatics-max-speakers branch from 400f516 to 8400a67 Compare September 30, 2025 05:32
nsepehr and others added 3 commits October 1, 2025 17:29
Refactored _process_config to build all configuration parameters
upfront and pass them to TranscriptionConfig constructor, rather
than creating an instance and mutating it afterward.

This addresses review feedback to avoid assigning dict values
directly to dataclass fields and instead use proper dataclass
initialization patterns.

Also applied ruff formatting fixes.
@nsepehr nsepehr force-pushed the feat/speechmatics-max-speakers branch 3 times, most recently from 5b7e278 to da4d474 Compare October 8, 2025 03:47
- Simplify TranscriptionConfig initialization to use direct mutation
- Add mypy configuration for speechmatics module
- Fix type incompatibility issues with deprecated parameters
- Add type: ignore comments for untyped imports and decorators
- Remove unnecessary type: ignore comments where types are properly handled
@nsepehr nsepehr force-pushed the feat/speechmatics-max-speakers branch 2 times, most recently from 175a1db to 92433d0 Compare October 8, 2025 04:16
Incorporates essential type checking improvements while maintaining max_speakers:
- Import and use SpeakerDiarizationConfig dataclass instead of dict
- Fix additional_vocab to use dict format as per type annotation
- Improve handling of deprecated transcription_config parameter
- Add proper type conversion for AudioEncoding
- Simplify import statement in utils.py
- Apply ruff formatting
@nsepehr nsepehr force-pushed the feat/speechmatics-max-speakers branch from 92433d0 to fa814f6 Compare October 8, 2025 04:24
Copy link
Contributor Author

@nsepehr nsepehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with requested changes and merged with #3599

speaker_sensitivity=self._stt_options.diarization_sensitivity,
prefer_current_speaker=self._stt_options.prefer_current_speaker,
# TODO: speakers field is not supported by SpeakerDiarizationConfig yet
# speakers={s.label: s.speaker_identifiers for s in self._stt_options.known_speakers},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the dict for now with a type: ignore to ignore the type check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SpeakerDiarizationConfig IS a dataclass provided by the Speechmatics SDK. We're using it correctly here. Whether you pass the dataclass or a dict, they serialize to the same JSON structure. The dataclass approach is cleaner and type-safe. I don't think we should change that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the support of speakers options using dict for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... updated it with using dict. Can you review now please.

The Speechmatics API expects additional_vocab as a list of objects with
'content' and 'sounds_like' fields, not a dict. Updated to match API
requirements with type: ignore since SDK type annotation is misleading.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor Author

@nsepehr nsepehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the additional_vocab to match with the expected API.

@nsepehr
Copy link
Contributor Author

nsepehr commented Oct 10, 2025

@longcw can you please review this so it can be landed for the next release 🙏

- Replace typed SpeakerDiarizationConfig with dict + type: ignore
- Add support for the speakers field from known_speakers
- Remove unused SpeakerDiarizationConfig import
- Maintain backward compatibility while allowing API evolution
@davidzhao davidzhao merged commit bf30045 into livekit:main Oct 12, 2025
9 checks passed
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Oct 28, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Oct 28, 2025
meetakshay99 added a commit to meetakshay99/agents that referenced this pull request Oct 30, 2025
* staging: (114 commits)
  Added min_confidence_threshold for deepgram flux.
  livekit-agents 1.2.15 (livekit#3658)
  fix livekit#3650 cartesia version backward compatibility (livekit#3651)
  Unprompted STT Reconnection at startup (livekit#3649)
  enable zero retention mode in elevenlabs (livekit#3647)
  fix: heartbeat (livekit#3648)
  feat: Integrate streaming endpoints for Sarvam APIs (livekit#3498)
  turn_detection: reduce max_endpointing_delay to 3s (livekit#3640)
  fix: exclude temperature parameter for gpt-5 and similar models (livekit#3573)
  add backwards compatibility for google's realtime model (livekit#3630)
  Align Google STT plugin with official documentation (livekit#3628)
  feat(speechmatics): add max_speakers parameter for speaker diarization (livekit#3524)
  fix(deepgram): send CloseStream message before closing TTS WebSocket (livekit#3608)
  chore: Remove duplicate docstring for `preemptive_generation` parameter in AgentSession (livekit#3624)
  Add RTZR(ReturnZero) STT Plugin for LiveKit Agents (livekit#3376)
  feat(telemetry/utils): add ttft reporting to LangFuse (livekit#3594)
  catch delete_room errors and disable delete_room_on_close by default (livekit#3600)
  lift google realtime api out of beta (livekit#3614)
  fix: lock pyav to <16 due to build issue (livekit#3593)
  Updating Cartesia Version (livekit#3570)
  ...
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants