Skip to content

Conversation

@Nash0x7E2
Copy link
Member

@Nash0x7E2 Nash0x7E2 commented Nov 13, 2025

Note

Introduces an Inworld AI TTS plugin with streaming synthesis, docs/examples, tests, and project/workspace registration.

  • Plugins/Inworld:
    • New TTS: Implements vision_agents.plugins.inworld.TTS using Inworld API streaming; supports voice_id, model_id, temperature; decodes base64 WAV via av to PcmData; adds stream_audio, stop_audio, close.
    • Packaging: Adds plugin package config (pyproject.toml, py.typed).
    • Docs/Examples: Adds README.md, audio guide, runnable example agent and example project.
  • Tests:
    • Integration tests validating streaming TTS and WAV output.
  • Core/Config:
    • Registers inworld optional extra and adds to all-plugins in agents-core/pyproject.toml.
    • Adds plugin to workspace sources/members in root pyproject.toml.

Written by Cursor Bugbot for commit 7603776. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • New Features

    • Added Inworld AI Text-to-Speech plugin with streaming audio synthesis and configurable voice/model/temperature.
  • Documentation

    • Added plugin README, example integration, and an audio-guide describing markup and response rules.
  • Tests

    • Added integration tests validating TTS streaming and audio output.
  • Chores

    • Registered the plugin in workspace and project configuration and added example project setup.

@Nash0x7E2 Nash0x7E2 requested a review from dangusev November 13, 2025 18:47
@Nash0x7E2 Nash0x7E2 self-assigned this Nov 13, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a new Inworld AI Text-to-Speech plugin under plugins/inworld: implementation, packaging, workspace registration, README, example, and integration tests; registers the plugin in the repo workspace and adds it as an optional plugin in agents-core.

Changes

Cohort / File(s) Summary
Workspace & root config
pyproject.toml, agents-core/pyproject.toml
Register vision-agents-plugins-inworld as a workspace source and add plugins/inworld to workspace members; add optional dependency group inworld = ["vision-agents-plugins-inworld"] and append "vision-agents-plugins-inworld" to all-plugins.
Plugin package manifest
plugins/inworld/pyproject.toml
New Hatch project manifest for vision-agents-plugins-inworld with metadata, dependencies (vision-agents, httpx>=0.27.0, av>=10.0.0), hatch VCS versioning, and dev deps.
Plugin code (TTS)
plugins/inworld/vision_agents/plugins/inworld/tts.py, plugins/inworld/vision_agents/plugins/inworld/__init__.py
New TTS class implementing streaming TTS via Inworld API using httpx.AsyncClient, parsing JSON stream, base64 audio decoding, PyAV WAV decoding/resampling to 16-bit mono PCM, lifecycle (close) and stop_audio; __init__ exposes TTS.
Examples & docs
plugins/inworld/README.md, plugins/inworld/example/inworld_tts_example.py, plugins/inworld/example/inworld-audio-guide.md, plugins/inworld/example/pyproject.toml
New README, example agent script and project manifest, and an audio-guidelines document describing emotion and non-verbal markup conventions.
Tests
plugins/inworld/tests/test_tts.py
Integration tests and fixture for Inworld TTS: manual WAV conversion test and streaming session test with timeout/assertions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Agent
    participant InworldTTS as Inworld TTS
    participant InworldAPI as Inworld API
    participant PyAV

    User->>Agent: request TTS (text)
    Agent->>InworldTTS: stream_audio(text)
    activate InworldTTS
    InworldTTS->>InworldAPI: POST /tts/v1/voice:stream (json)
    activate InworldAPI
    InworldAPI-->>InworldTTS: streaming JSON lines (audioContent base64 / events)
    deactivate InworldAPI

    loop per streamed chunk
        InworldTTS->>InworldTTS: parse JSON, base64-decode audioContent
        InworldTTS->>PyAV: decode WAV -> raw frames
        PyAV-->>InworldTTS: raw audio frames
        InworldTTS->>InworldTTS: resample/convert to 16-bit mono PCM
        InworldTTS-->>Agent: yield PcmData chunk
    end
    deactivate InworldTTS
    Agent->>User: play audio
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Focus areas:
    • plugins/inworld/vision_agents/plugins/inworld/tts.py: streaming parse loop, base64 handling, PyAV decoding/resampling, async HTTP lifecycle and error paths.
    • plugins/inworld/tests/test_tts.py: network-dependent integration tests, timeouts and environment-variable gating.
    • Workspace and agents-core/pyproject.toml: optional-dependencies and all-plugins aggregation correctness.

Possibly related PRs

Suggested labels

examples

Suggested reviewers

  • tschellenbach
  • d3xvn
  • maxkahan

Poem

I set the mouth to work, a small cold forge,
and iron syllables drip into a throat of wire;
the machine learns to ache with borrowed breath,
resampling memory into a single bone tone—
the voice arrives, precise as a shard.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add Inworld TTS plugin' clearly and accurately summarizes the main change: introducing a new Text-to-Speech plugin for the Inworld AI service into the Vision Agents framework.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/inworld-tts

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9fc8b62 and 7603776.

📒 Files selected for processing (1)
  • plugins/inworld/README.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/inworld/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
plugins/inworld/tests/test_tts.py (2)

13-15: Consider adding proper cleanup for the TTS client.

The TTS instance creates an httpx.AsyncClient that should be properly closed. The implementation provides __aexit__ for cleanup, but the fixture doesn't utilize it.

Consider using an async context manager or explicit teardown:

 @pytest.fixture
 async def tts(self) -> inworld.TTS:
-    return inworld.TTS()
+    tts = inworld.TTS()
+    yield tts
+    await tts.__aexit__(None, None, None)

Alternatively, document that tests should use the TTS instance as an async context manager where appropriate.


17-19: Consider adding explicit assertions for test clarity.

The test relies on implicit exception-based failure detection. While manual_tts_to_wav will raise exceptions on errors, adding an explicit assertion on the returned file path would make the test's intent clearer.

 @pytest.mark.integration
 async def test_inworld_tts_convert_text_to_audio_manual_test(self, tts: inworld.TTS):
-    await manual_tts_to_wav(tts, sample_rate=48000, channels=2)
+    result = await manual_tts_to_wav(tts, sample_rate=48000, channels=2)
+    assert result  # Ensure a file path was returned
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0250c39 and 9874e56.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • agents-core/pyproject.toml (2 hunks)
  • plugins/inworld/README.md (1 hunks)
  • plugins/inworld/example/inworld_tts_example.py (1 hunks)
  • plugins/inworld/example/pyproject.toml (1 hunks)
  • plugins/inworld/pyproject.toml (1 hunks)
  • plugins/inworld/tests/test_tts.py (1 hunks)
  • plugins/inworld/vision_agents/plugins/inworld/__init__.py (1 hunks)
  • plugins/inworld/vision_agents/plugins/inworld/tts.py (1 hunks)
  • pyproject.toml (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
plugins/inworld/vision_agents/plugins/inworld/__init__.py (2)
plugins/inworld/tests/test_tts.py (1)
  • tts (14-15)
plugins/inworld/vision_agents/plugins/inworld/tts.py (1)
  • TTS (18-157)
plugins/inworld/tests/test_tts.py (3)
agents-core/vision_agents/core/tts/manual_test.py (1)
  • manual_tts_to_wav (84-135)
agents-core/vision_agents/core/tts/testing.py (3)
  • TTSSession (25-83)
  • errors (69-70)
  • speeches (65-66)
plugins/inworld/vision_agents/plugins/inworld/tts.py (1)
  • TTS (18-157)
plugins/inworld/example/inworld_tts_example.py (4)
agents-core/vision_agents/core/edge/types.py (1)
  • User (15-18)
agents-core/vision_agents/core/agents/agents.py (3)
  • Agent (74-1314)
  • create_call (773-778)
  • finish (595-628)
agents-core/vision_agents/core/agents/agent_launcher.py (1)
  • AgentLauncher (18-125)
plugins/inworld/vision_agents/plugins/inworld/tts.py (1)
  • TTS (18-157)
plugins/inworld/vision_agents/plugins/inworld/tts.py (2)
plugins/inworld/tests/test_tts.py (1)
  • tts (14-15)
agents-core/vision_agents/core/utils/audio_queue.py (1)
  • get (119-136)
🪛 markdownlint-cli2 (0.18.1)
plugins/inworld/README.md

45-45: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (6)
plugins/inworld/vision_agents/plugins/inworld/__init__.py (1)

1-4: LGTM! Clean plugin initialization.

The import and export structure follows Python conventions and properly exposes the TTS class as the public API for the Inworld plugin.

plugins/inworld/tests/test_tts.py (2)

1-9: LGTM! Proper test setup for integration tests.

The imports and environment variable loading via load_dotenv() at module level are appropriate for integration tests that require API credentials.


21-31: LGTM! Well-structured integration test.

The test properly exercises the TTS session flow with explicit assertions on errors and speech output. The 15-second timeout is reasonable for an integration test that makes external API calls.

plugins/inworld/example/inworld_tts_example.py (3)

1-30: LGTM! Excellent documentation and setup.

The module docstring clearly explains the example's purpose and lists all required environment variables. The imports and initialization are well-organized.


33-44: LGTM! Well-configured agent with all necessary components.

The agent is properly configured with all required plugins (edge, TTS, STT, LLM, turn detection). The Inworld TTS will correctly read the API key from the INWORLD_API_KEY environment variable as documented.


67-68: LGTM! Proper launcher initialization.

The main entry point correctly uses AgentLauncher with the create_agent and join_call functions, wired through the cli() helper.

@GetStream GetStream deleted a comment from coderabbitai bot Nov 13, 2025
@GetStream GetStream deleted a comment from coderabbitai bot Nov 13, 2025
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on December 7

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
plugins/inworld/example/inworld_tts_example.py (1)

57-64: Fix: Use async context manager syntax.

Line 57 uses with await agent.join(call): but must use async with await agent.join(call): since agent.join() returns an AgentSessionContextManager, which is an async context manager. Using synchronous with will cause runtime errors.

Apply this diff:

     # Have the agent join the call/room
-    with await agent.join(call):
+    async with await agent.join(call):
         logger.info("Joining call")
         logger.info("LLM ready")
         
         await asyncio.sleep(5)
         await agent.llm.simple_response(text="Tell me a story about a dragon.")
         
         await agent.finish()  # Run till the call ends
🧹 Nitpick comments (6)
plugins/inworld/vision_agents/plugins/inworld/tts.py (4)

25-57: Consider validating the temperature parameter.

The documentation states that temperature accepts values between 0 and 2, but there's no validation enforcing this constraint. Consider adding a check to raise a ValueError if the temperature is out of range.

Apply this diff to add validation:

         self.api_key = api_key
         self.voice_id = voice_id
         self.model_id = model_id
+        if not 0 <= temperature <= 2:
+            raise ValueError(
+                f"Temperature must be between 0 and 2, got {temperature}"
+            )
         self.temperature = temperature
         self.base_url = INWORLD_API_BASE

59-70: Consider validating text length.

The docstring mentions a maximum of 2,000 characters for the text parameter, but there's no validation. Consider adding a check to provide a clearer error message if the limit is exceeded.

Apply this diff:

     async def stream_audio(
         self, text: str, *_, **__
     ) -> AsyncIterator[PcmData]:
         """
         Convert text to speech using Inworld AI API.
 
         Args:
             text: The text to convert to speech (max 2,000 characters).
 
         Returns:
             An async iterator of audio chunks as PcmData objects.
         """
+        if len(text) > 2000:
+            raise ValueError(
+                f"Text exceeds maximum length of 2000 characters (got {len(text)})"
+            )
         url = f"{self.base_url}/tts/v1/voice:stream"

128-145: Consider streaming frames individually.

The current implementation accumulates all frames from a WAV chunk into a single PcmData object before yielding (lines 130-135). For long audio segments, this could consume significant memory. Consider yielding each frame immediately after processing, or accumulating only a reasonable buffer.

Apply this diff to stream frames individually:

                     with container:
                         audio_stream = container.streams.audio[0]
-                        pcm: Optional[PcmData] = None
                         for frame in container.decode(audio_stream):
                             frame_pcm = PcmData.from_av_frame(frame)
-                            if pcm is None:
-                                pcm = frame_pcm
-                            else:
-                                pcm.append(frame_pcm)
-
-                        if pcm:
-                            pcm = (
-                                pcm.resample(
-                                    target_sample_rate=pcm.sample_rate,
-                                    target_channels=1,
-                                )
-                                .to_int16()
+                            frame_pcm = (
+                                frame_pcm.resample(
+                                    target_sample_rate=frame_pcm.sample_rate,
+                                    target_channels=1,
+                                )
+                                .to_int16()
                             )
-                            yield pcm
+                            yield frame_pcm

138-144: Resampling with same sample rate is redundant.

Line 140 resamples with target_sample_rate=pcm.sample_rate, which is the same as the source sample rate. This operation is redundant unless the goal is only to convert to mono and int16. Consider only resampling if the rate differs, or clarify the intent.

plugins/inworld/example/inworld-audio-guide.md (2)

27-27: Minor style: Remove duplicate adverb.

The word "naturally" appears twice in the same sentence. Consider removing one occurrence or using a synonym for better readability.

Apply this diff:

-2. **Insert non-verbal sounds naturally** where a human would naturally pause, breathe, or react.
+2. **Insert non-verbal sounds naturally** where a human would pause, breathe, or react.

49-71: Add language identifiers to code blocks.

The example code blocks at lines 49, 54, 59, 64, and 69 are missing language identifiers. Adding text or markdown identifiers would improve rendering and syntax highlighting.

For example, change:

-```
+```text
 [happy] I'd be glad to help you with that! [breathe] Here's what you need to know...
 ```

Apply similar changes to the other code blocks.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9874e56 and 9fc8b62.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • plugins/inworld/example/inworld-audio-guide.md (1 hunks)
  • plugins/inworld/example/inworld_tts_example.py (1 hunks)
  • plugins/inworld/pyproject.toml (1 hunks)
  • plugins/inworld/vision_agents/plugins/inworld/tts.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/inworld/example/inworld_tts_example.py (4)
agents-core/vision_agents/core/edge/types.py (1)
  • User (15-18)
agents-core/vision_agents/core/agents/agents.py (3)
  • Agent (74-1314)
  • create_call (773-778)
  • finish (595-628)
agents-core/vision_agents/core/agents/agent_launcher.py (1)
  • AgentLauncher (18-125)
plugins/inworld/vision_agents/plugins/inworld/tts.py (1)
  • TTS (18-166)
plugins/inworld/vision_agents/plugins/inworld/tts.py (2)
plugins/inworld/tests/test_tts.py (1)
  • tts (14-15)
agents-core/vision_agents/core/utils/audio_queue.py (1)
  • get (119-136)
🪛 LanguageTool
plugins/inworld/example/inworld-audio-guide.md

[style] ~27-~27: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ... sounds naturally** where a human would naturally pause, breathe, or react. 3. **Match e...

(ADVERB_REPETITION_PREMIUM)

🪛 markdownlint-cli2 (0.18.1)
plugins/inworld/example/inworld-audio-guide.md

49-49: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


54-54: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


59-59: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


64-64: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


69-69: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (7)
plugins/inworld/vision_agents/plugins/inworld/tts.py (2)

1-16: LGTM!

The imports are well-organized and appropriate for the TTS implementation. The module-level logger and API base constant are properly defined.


153-166: LGTM!

The stop_audio no-op implementation is appropriate for this streaming API, and the close method properly cleans up the HTTP client resource.

plugins/inworld/pyproject.toml (2)

28-29: Clarify wheel packages configuration.

Line 29 specifies packages = [".", "vision_agents"], which includes the current directory (.) as a package. This is unusual and may include unintended files. Typically, you would only specify the package directories explicitly.

Verify this is intentional. If you only want to package the vision_agents module, use:

 [tool.hatch.build.targets.wheel]
-packages = [".", "vision_agents"]
+packages = ["vision_agents"]

13-17: Dependencies are tested together through workspace lock file—verification complete.

The httpx>=0.27.0 and av>=10.0.0 constraints are verified compatible:

  • Both packages are pinned in the workspace uv.lock: [email protected] and [email protected] (both exceed minimums)
  • Both are actively imported in plugins/inworld/vision_agents/plugins/inworld/tts.py
  • Being in the same workspace lock file means they are tested together in CI
  • Minimum version requirements align with actual usage patterns

The workspace build system handles compatibility verification automatically.

plugins/inworld/example/inworld_tts_example.py (3)

1-31: LGTM!

The imports and module setup are well-organized. The docstring clearly describes the requirements and purpose of the example.


67-69: LGTM!

The main entry point correctly uses the AgentLauncher pattern with the CLI.


38-38: No action required—the @ file reference syntax is supported.

The @inworld-audio-guide.md syntax in the instructions is fully supported by the system. The parse_instructions() function in agents-core/vision_agents/core/utils/utils.py explicitly handles this pattern, extracting file references and loading their contents into an Instructions object. This is documented with examples showing the exact same usage pattern, and active usage exists elsewhere in the codebase (e.g., agents-core/vision_agents/core/agents/agents.py line 84).

Likely an incorrect or invalid review comment.

@Nash0x7E2 Nash0x7E2 merged commit e2f7b1b into main Nov 13, 2025
7 checks passed
@Nash0x7E2 Nash0x7E2 deleted the feat/inworld-tts branch November 13, 2025 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants