-
Notifications
You must be signed in to change notification settings - Fork 371
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
847 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# | ||
# | ||
# Agora Real Time Engagement | ||
# Created by Wei Hu in 2022-11. | ||
# Copyright (c) 2024 Agora IO. All rights reserved. | ||
# | ||
# | ||
import("//build/feature/ten_package.gni") | ||
|
||
ten_package("gemini_v2v_python") { | ||
package_kind = "extension" | ||
|
||
resources = [ | ||
"__init__.py", | ||
"addon.py", | ||
"extension.py", | ||
"log.py", | ||
"manifest.json", | ||
"property.json", | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# openai_v2v_python | ||
|
||
An extension for integrating OpenAI's Next Generation of **Multimodal** AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration. | ||
|
||
## Features | ||
|
||
<!-- main features introduction --> | ||
|
||
- OpenAI **Multimodal** Integration: Leverage GPT **Multimodal** models for voice to voice as well as text processing. | ||
- Configurable: Easily customize API keys, model settings, prompts, temperature, etc. | ||
- Async Queue Processing: Supports real-time message processing with task cancellation and prioritization. | ||
<!-- - Tool Support: Integrate external tools like image recognition via OpenAI's API. --> | ||
|
||
## API | ||
|
||
Refer to `api` definition in [manifest.json] and default values in [property.json](property.json). | ||
|
||
<!-- Additional API.md can be referred to if extra introduction needed --> | ||
|
||
| **Property** | **Type** | **Description** | | ||
|----------------------------|------------|-------------------------------------------| | ||
| `api_key` | `string` | API key for authenticating with OpenAI | | ||
| `temperature` | `float64` | Sampling temperature, higher values mean more randomness | | ||
| `model` | `string` | Model identifier (e.g., GPT-3.5, GPT-4) | | ||
| `max_tokens` | `int64` | Maximum number of tokens to generate | | ||
| `system_message` | `string` | Default system message to send to the model | | ||
| `voice` | `string` | Voice that OpenAI model speeches, such as `alloy`, `echo`, `shimmer`, etc | | ||
| `server_vad` | `bool` | Flag to enable or disable server vad of OpenAI | | ||
| `language` | `string` | Language that OpenAO model reponds, such as `en-US`, `zh-CN`, etc | | ||
| `dump` | `bool` | Flag to enable or disable audio dump for debugging purpose | | ||
|
||
### Data Out: | ||
| **Name** | **Property** | **Type** | **Description** | | ||
|----------------|--------------|------------|-------------------------------| | ||
| `text_data` | `text` | `string` | Outgoing text data | | ||
|
||
### Command Out: | ||
| **Name** | **Description** | | ||
|----------------|---------------------------------------------| | ||
| `flush` | Response after flushing the current state | | ||
|
||
### Audio Frame In: | ||
| **Name** | **Description** | | ||
|------------------|-------------------------------------------| | ||
| `pcm_frame` | Audio frame input for voice processing | | ||
|
||
### Audio Frame Out: | ||
| **Name** | **Description** | | ||
|------------------|-------------------------------------------| | ||
| `pcm_frame` | Audio frame output after voice processing | | ||
|
||
|
||
### Azure Support | ||
|
||
This extension also support Azure OpenAI Service, the propoerty settings are as follow: | ||
|
||
``` json | ||
{ | ||
"base_uri": "wss://xxx.openai.azure.com", | ||
"path": "/openai/realtime?api-version=xxx&deployment=xxx", | ||
"api_key": "xxx", | ||
"model": "gpt-4o-realtime-preview", | ||
"vendor": "azure" | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# | ||
# | ||
# Agora Real Time Engagement | ||
# Created by Wei Hu in 2024-08. | ||
# Copyright (c) 2024 Agora IO. All rights reserved. | ||
# | ||
# | ||
from . import addon |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# | ||
# | ||
# Agora Real Time Engagement | ||
# Created by Wei Hu in 2024-08. | ||
# Copyright (c) 2024 Agora IO. All rights reserved. | ||
# | ||
# | ||
from ten import ( | ||
Addon, | ||
register_addon_as_extension, | ||
TenEnv, | ||
) | ||
|
||
|
||
@register_addon_as_extension("gemini_v2v_python") | ||
class GeminiRealtimeExtensionAddon(Addon): | ||
|
||
def on_create_instance(self, ten_env: TenEnv, name: str, context) -> None: | ||
from .extension import GeminiRealtimeExtension | ||
ten_env.log_info("GeminiRealtimeExtensionAddon on_create_instance") | ||
ten_env.on_create_instance_done(GeminiRealtimeExtension(name), context) |
Oops, something went wrong.