feat: support gemini v2v

TEN-framework · Dec 12, 2024 · 5c5527b · 5c5527b
1 parent e6948b8
commit 5c5527b
Show file tree

Hide file tree

Showing 10 changed files with 847 additions and 1 deletion.
diff --git a/agents/ten_packages/extension/gemini_v2v_python/BUILD.gn b/agents/ten_packages/extension/gemini_v2v_python/BUILD.gn
@@ -0,0 +1,21 @@
+#
+#
+# Agora Real Time Engagement
+# Created by Wei Hu in 2022-11.
+# Copyright (c) 2024 Agora IO. All rights reserved.
+#
+#
+import("//build/feature/ten_package.gni")
+
+ten_package("gemini_v2v_python") {
+  package_kind = "extension"
+
+  resources = [
+    "__init__.py",
+    "addon.py",
+    "extension.py",
+    "log.py",
+    "manifest.json",
+    "property.json",
+  ]
+}
diff --git a/agents/ten_packages/extension/gemini_v2v_python/README.md b/agents/ten_packages/extension/gemini_v2v_python/README.md
@@ -0,0 +1,65 @@
+# openai_v2v_python
+
+An extension for integrating OpenAI's Next Generation of **Multimodal** AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration.
+
+## Features
+
+<!-- main features introduction -->
+
+- OpenAI **Multimodal** Integration: Leverage GPT **Multimodal** models for voice to voice as well as text processing.
+- Configurable: Easily customize API keys, model settings, prompts, temperature, etc.
+- Async Queue Processing: Supports real-time message processing with task cancellation and prioritization.
+<!-- - Tool Support: Integrate external tools like image recognition via OpenAI's API. -->
+
+## API
+
+Refer to `api` definition in [manifest.json] and default values in [property.json](property.json).
+
+<!-- Additional API.md can be referred to if extra introduction needed -->
+
+| **Property**               | **Type**   | **Description**                           |
+|----------------------------|------------|-------------------------------------------|
+| `api_key`                   | `string`   | API key for authenticating with OpenAI    |
+| `temperature`               | `float64`  | Sampling temperature, higher values mean more randomness |
+| `model`                     | `string`   | Model identifier (e.g., GPT-3.5, GPT-4)   |
+| `max_tokens`                | `int64`    | Maximum number of tokens to generate      |
+| `system_message`            | `string`   | Default system message to send to the model       |
+| `voice`                     | `string`   | Voice that OpenAI model speeches, such as `alloy`, `echo`, `shimmer`, etc |
+| `server_vad`                | `bool`     | Flag to enable or disable server vad of OpenAI |
+| `language`                  | `string`   | Language that OpenAO model reponds, such as `en-US`, `zh-CN`, etc | 
+| `dump`                      | `bool`     | Flag to enable or disable audio dump for debugging purpose  |
+
+### Data Out:
+| **Name**       | **Property** | **Type**   | **Description**               |
+|----------------|--------------|------------|-------------------------------|
+| `text_data`    | `text`       | `string`   | Outgoing text data             |
+
+### Command Out:
+| **Name**       | **Description**                             |
+|----------------|---------------------------------------------|
+| `flush`        | Response after flushing the current state    |
+
+### Audio Frame In:
+| **Name**         | **Description**                           |
+|------------------|-------------------------------------------|
+| `pcm_frame`      | Audio frame input for voice processing    |
+
+### Audio Frame Out:
+| **Name**         | **Description**                           |
+|------------------|-------------------------------------------|
+| `pcm_frame`    | Audio frame output after voice processing    |
+
+
+### Azure Support
+
+This extension also support Azure OpenAI Service, the propoerty settings are as follow:
+
+``` json
+{
+    "base_uri": "wss://xxx.openai.azure.com",
+    "path": "/openai/realtime?api-version=xxx&deployment=xxx",
+    "api_key": "xxx",
+    "model": "gpt-4o-realtime-preview",
+    "vendor": "azure"
+}
+```
diff --git a/agents/ten_packages/extension/gemini_v2v_python/__init__.py b/agents/ten_packages/extension/gemini_v2v_python/__init__.py
@@ -0,0 +1,8 @@
+#
+#
+# Agora Real Time Engagement
+# Created by Wei Hu in 2024-08.
+# Copyright (c) 2024 Agora IO. All rights reserved.
+#
+#
+from . import addon
diff --git a/agents/ten_packages/extension/gemini_v2v_python/addon.py b/agents/ten_packages/extension/gemini_v2v_python/addon.py
@@ -0,0 +1,21 @@
+#
+#
+# Agora Real Time Engagement
+# Created by Wei Hu in 2024-08.
+# Copyright (c) 2024 Agora IO. All rights reserved.
+#
+#
+from ten import (
+    Addon,
+    register_addon_as_extension,
+    TenEnv,
+)
+
+
+@register_addon_as_extension("gemini_v2v_python")
+class GeminiRealtimeExtensionAddon(Addon):
+
+    def on_create_instance(self, ten_env: TenEnv, name: str, context) -> None:
+        from .extension import GeminiRealtimeExtension
+        ten_env.log_info("GeminiRealtimeExtensionAddon on_create_instance")
+        ten_env.on_create_instance_done(GeminiRealtimeExtension(name), context)