-
Notifications
You must be signed in to change notification settings - Fork 113
Add Inworld TTS plugin #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a new Inworld AI Text-to-Speech plugin under Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Agent
participant InworldTTS as Inworld TTS
participant InworldAPI as Inworld API
participant PyAV
User->>Agent: request TTS (text)
Agent->>InworldTTS: stream_audio(text)
activate InworldTTS
InworldTTS->>InworldAPI: POST /tts/v1/voice:stream (json)
activate InworldAPI
InworldAPI-->>InworldTTS: streaming JSON lines (audioContent base64 / events)
deactivate InworldAPI
loop per streamed chunk
InworldTTS->>InworldTTS: parse JSON, base64-decode audioContent
InworldTTS->>PyAV: decode WAV -> raw frames
PyAV-->>InworldTTS: raw audio frames
InworldTTS->>InworldTTS: resample/convert to 16-bit mono PCM
InworldTTS-->>Agent: yield PcmData chunk
end
deactivate InworldTTS
Agent->>User: play audio
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
plugins/inworld/tests/test_tts.py (2)
13-15: Consider adding proper cleanup for the TTS client.The TTS instance creates an
httpx.AsyncClientthat should be properly closed. The implementation provides__aexit__for cleanup, but the fixture doesn't utilize it.Consider using an async context manager or explicit teardown:
@pytest.fixture async def tts(self) -> inworld.TTS: - return inworld.TTS() + tts = inworld.TTS() + yield tts + await tts.__aexit__(None, None, None)Alternatively, document that tests should use the TTS instance as an async context manager where appropriate.
17-19: Consider adding explicit assertions for test clarity.The test relies on implicit exception-based failure detection. While
manual_tts_to_wavwill raise exceptions on errors, adding an explicit assertion on the returned file path would make the test's intent clearer.@pytest.mark.integration async def test_inworld_tts_convert_text_to_audio_manual_test(self, tts: inworld.TTS): - await manual_tts_to_wav(tts, sample_rate=48000, channels=2) + result = await manual_tts_to_wav(tts, sample_rate=48000, channels=2) + assert result # Ensure a file path was returned
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
agents-core/pyproject.toml(2 hunks)plugins/inworld/README.md(1 hunks)plugins/inworld/example/inworld_tts_example.py(1 hunks)plugins/inworld/example/pyproject.toml(1 hunks)plugins/inworld/pyproject.toml(1 hunks)plugins/inworld/tests/test_tts.py(1 hunks)plugins/inworld/vision_agents/plugins/inworld/__init__.py(1 hunks)plugins/inworld/vision_agents/plugins/inworld/tts.py(1 hunks)pyproject.toml(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
plugins/inworld/vision_agents/plugins/inworld/__init__.py (2)
plugins/inworld/tests/test_tts.py (1)
tts(14-15)plugins/inworld/vision_agents/plugins/inworld/tts.py (1)
TTS(18-157)
plugins/inworld/tests/test_tts.py (3)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(84-135)agents-core/vision_agents/core/tts/testing.py (3)
TTSSession(25-83)errors(69-70)speeches(65-66)plugins/inworld/vision_agents/plugins/inworld/tts.py (1)
TTS(18-157)
plugins/inworld/example/inworld_tts_example.py (4)
agents-core/vision_agents/core/edge/types.py (1)
User(15-18)agents-core/vision_agents/core/agents/agents.py (3)
Agent(74-1314)create_call(773-778)finish(595-628)agents-core/vision_agents/core/agents/agent_launcher.py (1)
AgentLauncher(18-125)plugins/inworld/vision_agents/plugins/inworld/tts.py (1)
TTS(18-157)
plugins/inworld/vision_agents/plugins/inworld/tts.py (2)
plugins/inworld/tests/test_tts.py (1)
tts(14-15)agents-core/vision_agents/core/utils/audio_queue.py (1)
get(119-136)
🪛 markdownlint-cli2 (0.18.1)
plugins/inworld/README.md
45-45: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Cursor Bugbot
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (6)
plugins/inworld/vision_agents/plugins/inworld/__init__.py (1)
1-4: LGTM! Clean plugin initialization.The import and export structure follows Python conventions and properly exposes the TTS class as the public API for the Inworld plugin.
plugins/inworld/tests/test_tts.py (2)
1-9: LGTM! Proper test setup for integration tests.The imports and environment variable loading via
load_dotenv()at module level are appropriate for integration tests that require API credentials.
21-31: LGTM! Well-structured integration test.The test properly exercises the TTS session flow with explicit assertions on errors and speech output. The 15-second timeout is reasonable for an integration test that makes external API calls.
plugins/inworld/example/inworld_tts_example.py (3)
1-30: LGTM! Excellent documentation and setup.The module docstring clearly explains the example's purpose and lists all required environment variables. The imports and initialization are well-organized.
33-44: LGTM! Well-configured agent with all necessary components.The agent is properly configured with all required plugins (edge, TTS, STT, LLM, turn detection). The Inworld TTS will correctly read the API key from the
INWORLD_API_KEYenvironment variable as documented.
67-68: LGTM! Proper launcher initialization.The main entry point correctly uses
AgentLauncherwith thecreate_agentandjoin_callfunctions, wired through thecli()helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the final PR Bugbot will review for you during this billing cycle
Your free Bugbot reviews will reset on December 7
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
plugins/inworld/example/inworld_tts_example.py (1)
57-64: Fix: Use async context manager syntax.Line 57 uses
with await agent.join(call):but must useasync with await agent.join(call):sinceagent.join()returns anAgentSessionContextManager, which is an async context manager. Using synchronouswithwill cause runtime errors.Apply this diff:
# Have the agent join the call/room - with await agent.join(call): + async with await agent.join(call): logger.info("Joining call") logger.info("LLM ready") await asyncio.sleep(5) await agent.llm.simple_response(text="Tell me a story about a dragon.") await agent.finish() # Run till the call ends
🧹 Nitpick comments (6)
plugins/inworld/vision_agents/plugins/inworld/tts.py (4)
25-57: Consider validating the temperature parameter.The documentation states that temperature accepts values between 0 and 2, but there's no validation enforcing this constraint. Consider adding a check to raise a
ValueErrorif the temperature is out of range.Apply this diff to add validation:
self.api_key = api_key self.voice_id = voice_id self.model_id = model_id + if not 0 <= temperature <= 2: + raise ValueError( + f"Temperature must be between 0 and 2, got {temperature}" + ) self.temperature = temperature self.base_url = INWORLD_API_BASE
59-70: Consider validating text length.The docstring mentions a maximum of 2,000 characters for the text parameter, but there's no validation. Consider adding a check to provide a clearer error message if the limit is exceeded.
Apply this diff:
async def stream_audio( self, text: str, *_, **__ ) -> AsyncIterator[PcmData]: """ Convert text to speech using Inworld AI API. Args: text: The text to convert to speech (max 2,000 characters). Returns: An async iterator of audio chunks as PcmData objects. """ + if len(text) > 2000: + raise ValueError( + f"Text exceeds maximum length of 2000 characters (got {len(text)})" + ) url = f"{self.base_url}/tts/v1/voice:stream"
128-145: Consider streaming frames individually.The current implementation accumulates all frames from a WAV chunk into a single
PcmDataobject before yielding (lines 130-135). For long audio segments, this could consume significant memory. Consider yielding each frame immediately after processing, or accumulating only a reasonable buffer.Apply this diff to stream frames individually:
with container: audio_stream = container.streams.audio[0] - pcm: Optional[PcmData] = None for frame in container.decode(audio_stream): frame_pcm = PcmData.from_av_frame(frame) - if pcm is None: - pcm = frame_pcm - else: - pcm.append(frame_pcm) - - if pcm: - pcm = ( - pcm.resample( - target_sample_rate=pcm.sample_rate, - target_channels=1, - ) - .to_int16() + frame_pcm = ( + frame_pcm.resample( + target_sample_rate=frame_pcm.sample_rate, + target_channels=1, + ) + .to_int16() ) - yield pcm + yield frame_pcm
138-144: Resampling with same sample rate is redundant.Line 140 resamples with
target_sample_rate=pcm.sample_rate, which is the same as the source sample rate. This operation is redundant unless the goal is only to convert to mono and int16. Consider only resampling if the rate differs, or clarify the intent.plugins/inworld/example/inworld-audio-guide.md (2)
27-27: Minor style: Remove duplicate adverb.The word "naturally" appears twice in the same sentence. Consider removing one occurrence or using a synonym for better readability.
Apply this diff:
-2. **Insert non-verbal sounds naturally** where a human would naturally pause, breathe, or react. +2. **Insert non-verbal sounds naturally** where a human would pause, breathe, or react.
49-71: Add language identifiers to code blocks.The example code blocks at lines 49, 54, 59, 64, and 69 are missing language identifiers. Adding
textormarkdownidentifiers would improve rendering and syntax highlighting.For example, change:
-``` +```text [happy] I'd be glad to help you with that! [breathe] Here's what you need to know... ```Apply similar changes to the other code blocks.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
plugins/inworld/example/inworld-audio-guide.md(1 hunks)plugins/inworld/example/inworld_tts_example.py(1 hunks)plugins/inworld/pyproject.toml(1 hunks)plugins/inworld/vision_agents/plugins/inworld/tts.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/inworld/example/inworld_tts_example.py (4)
agents-core/vision_agents/core/edge/types.py (1)
User(15-18)agents-core/vision_agents/core/agents/agents.py (3)
Agent(74-1314)create_call(773-778)finish(595-628)agents-core/vision_agents/core/agents/agent_launcher.py (1)
AgentLauncher(18-125)plugins/inworld/vision_agents/plugins/inworld/tts.py (1)
TTS(18-166)
plugins/inworld/vision_agents/plugins/inworld/tts.py (2)
plugins/inworld/tests/test_tts.py (1)
tts(14-15)agents-core/vision_agents/core/utils/audio_queue.py (1)
get(119-136)
🪛 LanguageTool
plugins/inworld/example/inworld-audio-guide.md
[style] ~27-~27: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ... sounds naturally** where a human would naturally pause, breathe, or react. 3. **Match e...
(ADVERB_REPETITION_PREMIUM)
🪛 markdownlint-cli2 (0.18.1)
plugins/inworld/example/inworld-audio-guide.md
49-49: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
54-54: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
59-59: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
64-64: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
69-69: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Cursor Bugbot
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (7)
plugins/inworld/vision_agents/plugins/inworld/tts.py (2)
1-16: LGTM!The imports are well-organized and appropriate for the TTS implementation. The module-level logger and API base constant are properly defined.
153-166: LGTM!The
stop_audiono-op implementation is appropriate for this streaming API, and theclosemethod properly cleans up the HTTP client resource.plugins/inworld/pyproject.toml (2)
28-29: Clarify wheel packages configuration.Line 29 specifies
packages = [".", "vision_agents"], which includes the current directory (.) as a package. This is unusual and may include unintended files. Typically, you would only specify the package directories explicitly.Verify this is intentional. If you only want to package the
vision_agentsmodule, use:[tool.hatch.build.targets.wheel] -packages = [".", "vision_agents"] +packages = ["vision_agents"]
13-17: Dependencies are tested together through workspace lock file—verification complete.The
httpx>=0.27.0andav>=10.0.0constraints are verified compatible:
- Both packages are pinned in the workspace
uv.lock:[email protected]and[email protected](both exceed minimums)- Both are actively imported in
plugins/inworld/vision_agents/plugins/inworld/tts.py- Being in the same workspace lock file means they are tested together in CI
- Minimum version requirements align with actual usage patterns
The workspace build system handles compatibility verification automatically.
plugins/inworld/example/inworld_tts_example.py (3)
1-31: LGTM!The imports and module setup are well-organized. The docstring clearly describes the requirements and purpose of the example.
67-69: LGTM!The main entry point correctly uses the
AgentLauncherpattern with the CLI.
38-38: No action required—the@file reference syntax is supported.The
@inworld-audio-guide.mdsyntax in the instructions is fully supported by the system. Theparse_instructions()function inagents-core/vision_agents/core/utils/utils.pyexplicitly handles this pattern, extracting file references and loading their contents into anInstructionsobject. This is documented with examples showing the exact same usage pattern, and active usage exists elsewhere in the codebase (e.g.,agents-core/vision_agents/core/agents/agents.pyline 84).Likely an incorrect or invalid review comment.
Note
Introduces an Inworld AI TTS plugin with streaming synthesis, docs/examples, tests, and project/workspace registration.
vision_agents.plugins.inworld.TTSusing Inworld API streaming; supportsvoice_id,model_id,temperature; decodes base64 WAV viaavtoPcmData; addsstream_audio,stop_audio,close.pyproject.toml,py.typed).README.md, audio guide, runnable example agent and example project.inworldoptional extra and adds toall-pluginsinagents-core/pyproject.toml.pyproject.toml.Written by Cursor Bugbot for commit 7603776. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores