Skip to content

feat: groq stt / tts#2099

Merged
akshaydeo merged 1 commit intomainfrom
03-16-feat_groq_stt___tts
Mar 17, 2026
Merged

feat: groq stt / tts#2099
akshaydeo merged 1 commit intomainfrom
03-16-feat_groq_stt___tts

Conversation

@sammaji
Copy link
Copy Markdown
Member

@sammaji sammaji commented Mar 16, 2026

Summary

Adds speech synthesis and transcription support to the Groq provider.

Changes

  • Implemented Speech() method in Groq provider using OpenAI speech request handler
  • Implemented Transcription() method in Groq provider using OpenAI transcription request handler
  • Added Groq-specific voice mappings (troy, autumn, diana) in test utilities
  • Updated GetProviderDefaultFormat() to return "wav" format for Groq provider
  • Added provider-specific handling for Groq in speech synthesis tests to clear unsupported instructions parameter
  • Updated transcription tests to use provider-specific default formats instead of hardcoded "mp3"
  • Enabled transcription and speech synthesis test scenarios for Groq provider
  • Added test configuration for Groq transcription and speech synthesis models
  • Removed extraneous whitespace and formatting inconsistencies

Type of change

  • Feature
  • Refactor

Affected areas

  • Core (Go)
  • Providers/Integrations

How to test

# Core/Transports
go version
go test ./...

# Test Groq speech synthesis and transcription specifically
go test ./core/providers/groq -v

Set up Groq API credentials and run the provider tests to validate speech synthesis and transcription functionality.

Screenshots/Recordings

N/A

Breaking changes

  • Yes
  • No

Related issues

Closes #2062

Security considerations

Uses existing OpenAI request handlers which maintain the same security patterns for API key handling and request validation.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c2f8cba2-c200-479c-8dbb-6dae88725913

📥 Commits

Reviewing files that changed from the base of the PR and between 25362e5 and 2f8d1a1.

📒 Files selected for processing (6)
  • core/internal/llmtests/speech_synthesis.go
  • core/internal/llmtests/transcription.go
  • core/internal/llmtests/utils.go
  • core/providers/groq/groq.go
  • core/providers/groq/groq_test.go
  • core/schemas/speech.go

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added speech synthesis and transcription support for the Groq provider, including default voice options.
  • Improvements

    • Audio format handling now adapts dynamically per provider for better compatibility.
    • Groq speech requests now align with provider-specific expectations.
  • API Updates

    • PronunciationDictionaryLocators in speech parameters is now optional (omits when empty).

Walkthrough

Adds Groq provider audio support and test coverage: Groq now routes speech and transcription requests to OpenAI-compatible audio endpoints, tests use provider-aware audio formats, Groq-specific voice/default-format mappings added, and a pronunciation JSON tag was made omitempty.

Changes

Cohort / File(s) Summary
Groq Provider Audio Support
core/providers/groq/groq.go, core/providers/groq/groq_test.go
Implements Speech and Transcription by delegating to OpenAI-compatible audio endpoints (/v1/audio/speech, /v1/audio/transcriptions). Tests extended to enable transcription and speech synthesis and added model fields in test config.
LLM Test Logic (transcription / TTS)
core/internal/llmtests/transcription.go, core/internal/llmtests/speech_synthesis.go
Transcription tests use provider-aware audio format via GetProviderDefaultFormat(...); speech synthesis tests clear Instructions for Groq provider in affected advanced test paths. Minor formatting tweaks.
Test Utilities
core/internal/llmtests/utils.go
Adds Groq to default-format mapping (treated like Gemini -> wav) and new Groq voice mappings; minor formatting adjustments.
Schema
core/schemas/speech.go
Adds omitempty to PronunciationDictionaryLocators JSON tag so the field is omitted when empty.

Sequence Diagram

sequenceDiagram
    participant Test as Test Handler
    participant Groq as Groq Provider
    participant AudioAPI as OpenAI-Compatible<br/>Audio API

    rect rgba(100, 200, 100, 0.5)
    Note over Test,AudioAPI: Speech Synthesis (TTS) Flow
    Test->>Groq: Speech(ctx, request, key, headers)
    Groq->>Groq: adjust request (clear Instructions if Groq)
    Groq->>AudioAPI: POST /v1/audio/speech (via base URL + path)
    AudioAPI-->>Groq: audio response
    Groq-->>Test: Speech synthesis result
    end

    rect rgba(100, 150, 200, 0.5)
    Note over Test,AudioAPI: Transcription (STT) Flow
    Test->>Groq: Transcription(ctx, request, key, headers)
    Groq->>AudioAPI: POST /v1/audio/transcriptions (via base URL + path)
    AudioAPI-->>Groq: transcription response
    Groq-->>Test: transcribed text
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped to code with a cheerful bound,

Groq now listens and speaks its sound,
WAV and voices queued in line,
Tests ensure the flows align,
Hooray — audio works just fine! 🎶

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: groq stt / tts' clearly summarizes the main change: adding speech-to-text and text-to-speech support to the Groq provider.
Description check ✅ Passed The PR description follows the template structure, includes all required sections with appropriate detail, and clearly documents changes and testing instructions.
Linked Issues check ✅ Passed All objectives from issue #2062 are met: Groq Speech/TTS and Transcription/STT methods implemented, audio endpoints integrated, and framework-level support provided.
Out of Scope Changes check ✅ Passed Minor formatting changes and test utility updates are reasonable supporting changes for the core TTS/STT implementation and are not out of scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 03-16-feat_groq_stt___tts
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Migrating from UI to YAML configuration.

Use the @coderabbitai configuration command in a PR comment to get a dump of all your UI settings in YAML format. You can then edit this YAML file and upload it to the root of your repository to configure CodeRabbit programmatically.

Copy link
Copy Markdown
Member Author

sammaji commented Mar 16, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Suite Available

This PR can be tested by a repository admin.

Run tests for PR #2099

@sammaji sammaji marked this pull request as ready for review March 16, 2026 11:22
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
core/providers/groq/groq.go (1)

181-181: Prefer provider.GetProviderKey() over hardcoded schemas.Groq in delegated calls.

Using provider.GetProviderKey() keeps provider identity handling consistent across aliases/custom keys and matches existing provider conventions.

♻️ Small consistency refactor
-        schemas.Groq,
+        provider.GetProviderKey(),
...
-        schemas.Groq,
+        provider.GetProviderKey(),

Based on learnings: "When handling unsupported operations across providers, avoid hardcoding provider constants (e.g., schemas.Bedrock). Use the provider.GetProviderKey() (or equivalent API) to obtain the actual provider key from configuration, ensuring errors and messages adapt to custom provider names."

Also applies to: 208-208

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/groq/groq.go` at line 181, The code currently passes the
hardcoded constant schemas.Groq into delegated calls; replace those occurrences
with provider.GetProviderKey() so provider identity respects aliases/custom
keys—find usages where schemas.Groq is passed (e.g., the call around the Groq
provider delegation) and change the argument to provider.GetProviderKey(),
ensuring any error messages or unsupported-operation strings use
provider.GetProviderKey() as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/internal/llmtests/transcription.go`:
- Line 276: The transcription metadata still hardcodes Params.Format = "mp3"
while GenerateTTSAudioForTest now produces provider-default formats; update the
transcription request to use the provider default instead. Replace any hardcoded
`"mp3"` assignments to Params.Format in the same blocks where
GenerateTTSAudioForTest is called (refer to GenerateTTSAudioForTest(...,
GetProviderDefaultFormat(testConfig.Provider)) and testConfig.Provider) with
GetProviderDefaultFormat(testConfig.Provider) so the format used for
transcription matches the generated audio; apply this change at all four updated
call sites (lines corresponding to the GenerateTTSAudioForTest usages).

In `@core/providers/groq/groq.go`:
- Around line 172-173: Update the stale method comments that say "Speech is not
supported by the Groq provider." and similar for Transcription: locate the
GroqProvider.Speech and GroqProvider.Transcription method comments and replace
the misleading "not supported" text with concise, accurate descriptions of the
current behavior (e.g., indicate that Speech handles BifrostSpeechRequest and
returns BifrostSpeechResponse, and Transcription handles
BifrostTranscriptionRequest and returns BifrostTranscriptionResponse), ensuring
the comments reflect that these methods are implemented and noting any relevant
constraints or supported features.

---

Nitpick comments:
In `@core/providers/groq/groq.go`:
- Line 181: The code currently passes the hardcoded constant schemas.Groq into
delegated calls; replace those occurrences with provider.GetProviderKey() so
provider identity respects aliases/custom keys—find usages where schemas.Groq is
passed (e.g., the call around the Groq provider delegation) and change the
argument to provider.GetProviderKey(), ensuring any error messages or
unsupported-operation strings use provider.GetProviderKey() as well.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b083916e-a096-4102-a502-3a5c8b95b9ff

📥 Commits

Reviewing files that changed from the base of the PR and between b7abaf2 and 75d1978.

📒 Files selected for processing (6)
  • core/internal/llmtests/speech_synthesis.go
  • core/internal/llmtests/transcription.go
  • core/internal/llmtests/utils.go
  • core/providers/groq/groq.go
  • core/providers/groq/groq_test.go
  • core/schemas/speech.go

Comment thread core/internal/llmtests/transcription.go Outdated
Comment thread core/providers/groq/groq.go Outdated
Copy link
Copy Markdown
Member Author

@sammaji sammaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai full review

@sammaji
Copy link
Copy Markdown
Member Author

sammaji commented Mar 16, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 16, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

akshaydeo commented Mar 17, 2026

Merge activity

  • Mar 17, 9:02 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Mar 17, 9:06 AM UTC: @akshaydeo merged this pull request with Graphite.

@akshaydeo akshaydeo merged commit 06cefd4 into main Mar 17, 2026
10 checks passed
@akshaydeo akshaydeo deleted the 03-16-feat_groq_stt___tts branch March 17, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add Transcription/STT support to Groq

2 participants