-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Which SDK is this feature request for?
- speechmatics-rt (Real-Time SDK)
- speechmatics-batch (Batch SDK)
- Both SDKs
- General/Repository
Feature Request: Add support for known speakers in SpeakerDiarizationConfig
Summary
Request to add support for known speaker identification/enrollment in the Python SDK's SpeakerDiarizationConfig to enable persistent speaker identification across sessions.
Context
Currently, the SpeakerDiarizationConfig only supports:
max_speakersspeaker_sensitivityprefer_current_speaker
However, the API documentation mentions SpeakersResult as a preview feature, and the SDK code contains references to GET_SPEAKERS and SPEAKERS_RESULT message types (marked as "Internal, Speechmatics only").
Use Case
We're building voice AI applications where identifying specific speakers across sessions is critical, such as:
- Meeting transcription: Identifying recurring participants without having to voice match with their speaker labels again
Currently, every time the same speakers in our system join a meeting we have to match their identities to the speaker. Which in the case of diarization becomes very annoying to have to do each time and a bad user experience. This feature to allow them to be identified beforehand would be incredibly useful.
Proposed Solution
Add a speakers field to SpeakerDiarizationConfig to support known speaker enrollment:
@dataclass
class SpeakerDiarizationConfig:
max_speakers: Optional[int] = None
speaker_sensitivity: Optional[float] = None
prefer_current_speaker: Optional[bool] = None
speakers: Optional[Dict[str, List[str]]] = None # New field for known speakers
# Usage example:
config = SpeakerDiarizationConfig(
max_speakers=2,
speaker_sensitivity=0.5,
speakers={
"John": ["speaker_id_john_123"], # Speaker name -> identifiers
"Jane": ["speaker_id_jane_456"],
}
)Current Workaround
We tested whether the API would accept a speakers field even though it's not in the SDK:
config.speaker_diarization_config = {
"speakers": {
"John": ["speaker_id_john_123"],
"Jane": ["speaker_id_jane_456"],
}
}But the API rejects it with:
Error: Additional property speakers is not allowed
Questions
- Is the
SpeakersResultpreview feature available for early access? - Is there a timeline for when known speaker support will be added to the public API?
- Would you accept a PR to add this functionality to the SDK once the API supports it?
Related
- LiveKit agents integration would benefit from this: livekit/agents#3524
- Similar feature in competitors: AWS Transcribe's Speaker Enrollment, Azure Speech's Speaker Recognition
Environment
- speechmatics-rt version: 0.4.0
- Python version: 3.11
- Use case: Real-time transcription with speaker identification
Would love to hear if this is on the roadmap or if there's an alternative approach we should consider!
Related issues/PRs
Link any related issues or pull requests:
- Closes #
- Related to #
Priority/Impact
How important is this feature to you?
- Critical - blocking current work
- High - would significantly improve workflow
- Medium - nice to have improvement
- Low - minor enhancement