-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat(speechmatics): add max_speakers parameter for speaker diarization #3524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(speechmatics): add max_speakers parameter for speaker diarization #3524
Conversation
a3a7974 to
c395ae5
Compare
livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/stt.py
Outdated
Show resolved
Hide resolved
livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/stt.py
Show resolved
Hide resolved
c395ae5 to
400f516
Compare
- Added max_speakers parameter to STT __init__ method - Updated STTOptions dataclass to include max_speakers field - Modified _process_config to include max_speakers in speaker_diarization_config - Added handling for extracting max_speakers from deprecated transcription_config - Updated documentation to explain the new parameter - Fixed compatibility with livekit-agents 1.2.6 (removed diarization from STTCapabilities) - Updated minimum livekit-agents version to 1.2.6 This parameter allows limiting the number of unique speakers detected during diarization, which is useful for scenarios with a known number of participants (e.g., 2-person interviews, small group meetings with fixed participants).
400f516 to
8400a67
Compare
…peechmatics/stt.py Co-authored-by: Long Chen <[email protected]>
Refactored _process_config to build all configuration parameters upfront and pass them to TranscriptionConfig constructor, rather than creating an instance and mutating it afterward. This addresses review feedback to avoid assigning dict values directly to dataclass fields and instead use proper dataclass initialization patterns. Also applied ruff formatting fixes.
livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/stt.py
Outdated
Show resolved
Hide resolved
livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/stt.py
Show resolved
Hide resolved
livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/stt.py
Outdated
Show resolved
Hide resolved
livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/stt.py
Show resolved
Hide resolved
5b7e278 to
da4d474
Compare
- Simplify TranscriptionConfig initialization to use direct mutation - Add mypy configuration for speechmatics module - Fix type incompatibility issues with deprecated parameters - Add type: ignore comments for untyped imports and decorators - Remove unnecessary type: ignore comments where types are properly handled
175a1db to
92433d0
Compare
Incorporates essential type checking improvements while maintaining max_speakers: - Import and use SpeakerDiarizationConfig dataclass instead of dict - Fix additional_vocab to use dict format as per type annotation - Improve handling of deprecated transcription_config parameter - Add proper type conversion for AudioEncoding - Simplify import statement in utils.py - Apply ruff formatting
92433d0 to
fa814f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with requested changes and merged with #3599
| speaker_sensitivity=self._stt_options.diarization_sensitivity, | ||
| prefer_current_speaker=self._stt_options.prefer_current_speaker, | ||
| # TODO: speakers field is not supported by SpeakerDiarizationConfig yet | ||
| # speakers={s.label: s.speaker_identifiers for s in self._stt_options.known_speakers}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the dict for now with a type: ignore to ignore the type check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SpeakerDiarizationConfig IS a dataclass provided by the Speechmatics SDK. We're using it correctly here. Whether you pass the dataclass or a dict, they serialize to the same JSON structure. The dataclass approach is cleaner and type-safe. I don't think we should change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the support of speakers options using dict for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok... updated it with using dict. Can you review now please.
livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/stt.py
Outdated
Show resolved
Hide resolved
The Speechmatics API expects additional_vocab as a list of objects with 'content' and 'sounds_like' fields, not a dict. Updated to match API requirements with type: ignore since SDK type annotation is misleading. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the additional_vocab to match with the expected API.
|
@longcw can you please review this so it can be landed for the next release 🙏 |
- Replace typed SpeakerDiarizationConfig with dict + type: ignore - Add support for the speakers field from known_speakers - Remove unused SpeakerDiarizationConfig import - Maintain backward compatibility while allowing API evolution
livekit#3524) Co-authored-by: Long Chen <[email protected]> Co-authored-by: Claude <[email protected]>
livekit#3524) Co-authored-by: Long Chen <[email protected]> Co-authored-by: Claude <[email protected]>
* staging: (114 commits) Added min_confidence_threshold for deepgram flux. livekit-agents 1.2.15 (livekit#3658) fix livekit#3650 cartesia version backward compatibility (livekit#3651) Unprompted STT Reconnection at startup (livekit#3649) enable zero retention mode in elevenlabs (livekit#3647) fix: heartbeat (livekit#3648) feat: Integrate streaming endpoints for Sarvam APIs (livekit#3498) turn_detection: reduce max_endpointing_delay to 3s (livekit#3640) fix: exclude temperature parameter for gpt-5 and similar models (livekit#3573) add backwards compatibility for google's realtime model (livekit#3630) Align Google STT plugin with official documentation (livekit#3628) feat(speechmatics): add max_speakers parameter for speaker diarization (livekit#3524) fix(deepgram): send CloseStream message before closing TTS WebSocket (livekit#3608) chore: Remove duplicate docstring for `preemptive_generation` parameter in AgentSession (livekit#3624) Add RTZR(ReturnZero) STT Plugin for LiveKit Agents (livekit#3376) feat(telemetry/utils): add ttft reporting to LangFuse (livekit#3594) catch delete_room errors and disable delete_room_on_close by default (livekit#3600) lift google realtime api out of beta (livekit#3614) fix: lock pyav to <16 due to build issue (livekit#3593) Updating Cartesia Version (livekit#3570) ...
livekit#3524) Co-authored-by: Long Chen <[email protected]> Co-authored-by: Claude <[email protected]>
Summary
This PR adds support for the
max_speakersparameter to the Speechmatics STT plugin, allowing developers to limit the number of unique speakers detected during diarization.Problem
Currently, when using the Speechmatics STT plugin with diarization enabled, there's no way to specify the maximum number of speakers. The
transcription_configparameter (which is deprecated) accepts aspeaker_diarization_configwithmax_speakers, but this value is not preserved when the plugin processes the configuration.Solution
max_speakersas a direct parameter to the STT__init__methodSTTOptionsdataclass to include themax_speakersfield_process_configto includemax_speakersin thespeaker_diarization_configwhen sending to the Speechmatics APImax_speakersfrom the deprecatedtranscription_configparameter for backward compatibilityUse Case
This parameter is particularly useful for scenarios where the number of participants is known in advance, such as:
Testing
transcription_configparameterExample Usage
Breaking Changes
None - this is a backward-compatible addition.