Skip to content

Commit

Permalink
feat: support gemini v2v
Browse files Browse the repository at this point in the history
  • Loading branch information
plutoless committed Dec 12, 2024
1 parent e6948b8 commit 5c5527b
Show file tree
Hide file tree
Showing 10 changed files with 847 additions and 1 deletion.
21 changes: 21 additions & 0 deletions agents/ten_packages/extension/gemini_v2v_python/BUILD.gn
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#
#
# Agora Real Time Engagement
# Created by Wei Hu in 2022-11.
# Copyright (c) 2024 Agora IO. All rights reserved.
#
#
import("//build/feature/ten_package.gni")

ten_package("gemini_v2v_python") {
package_kind = "extension"

resources = [
"__init__.py",
"addon.py",
"extension.py",
"log.py",
"manifest.json",
"property.json",
]
}
65 changes: 65 additions & 0 deletions agents/ten_packages/extension/gemini_v2v_python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# openai_v2v_python

An extension for integrating OpenAI's Next Generation of **Multimodal** AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration.

## Features

<!-- main features introduction -->

- OpenAI **Multimodal** Integration: Leverage GPT **Multimodal** models for voice to voice as well as text processing.
- Configurable: Easily customize API keys, model settings, prompts, temperature, etc.
- Async Queue Processing: Supports real-time message processing with task cancellation and prioritization.
<!-- - Tool Support: Integrate external tools like image recognition via OpenAI's API. -->

## API

Refer to `api` definition in [manifest.json] and default values in [property.json](property.json).

<!-- Additional API.md can be referred to if extra introduction needed -->

| **Property** | **Type** | **Description** |
|----------------------------|------------|-------------------------------------------|
| `api_key` | `string` | API key for authenticating with OpenAI |
| `temperature` | `float64` | Sampling temperature, higher values mean more randomness |
| `model` | `string` | Model identifier (e.g., GPT-3.5, GPT-4) |
| `max_tokens` | `int64` | Maximum number of tokens to generate |
| `system_message` | `string` | Default system message to send to the model |
| `voice` | `string` | Voice that OpenAI model speeches, such as `alloy`, `echo`, `shimmer`, etc |
| `server_vad` | `bool` | Flag to enable or disable server vad of OpenAI |
| `language` | `string` | Language that OpenAO model reponds, such as `en-US`, `zh-CN`, etc |
| `dump` | `bool` | Flag to enable or disable audio dump for debugging purpose |

### Data Out:
| **Name** | **Property** | **Type** | **Description** |
|----------------|--------------|------------|-------------------------------|
| `text_data` | `text` | `string` | Outgoing text data |

### Command Out:
| **Name** | **Description** |
|----------------|---------------------------------------------|
| `flush` | Response after flushing the current state |

### Audio Frame In:
| **Name** | **Description** |
|------------------|-------------------------------------------|
| `pcm_frame` | Audio frame input for voice processing |

### Audio Frame Out:
| **Name** | **Description** |
|------------------|-------------------------------------------|
| `pcm_frame` | Audio frame output after voice processing |


### Azure Support

This extension also support Azure OpenAI Service, the propoerty settings are as follow:

``` json
{
"base_uri": "wss://xxx.openai.azure.com",
"path": "/openai/realtime?api-version=xxx&deployment=xxx",
"api_key": "xxx",
"model": "gpt-4o-realtime-preview",
"vendor": "azure"
}
```
8 changes: 8 additions & 0 deletions agents/ten_packages/extension/gemini_v2v_python/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#
#
# Agora Real Time Engagement
# Created by Wei Hu in 2024-08.
# Copyright (c) 2024 Agora IO. All rights reserved.
#
#
from . import addon
21 changes: 21 additions & 0 deletions agents/ten_packages/extension/gemini_v2v_python/addon.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#
#
# Agora Real Time Engagement
# Created by Wei Hu in 2024-08.
# Copyright (c) 2024 Agora IO. All rights reserved.
#
#
from ten import (
Addon,
register_addon_as_extension,
TenEnv,
)


@register_addon_as_extension("gemini_v2v_python")
class GeminiRealtimeExtensionAddon(Addon):

def on_create_instance(self, ten_env: TenEnv, name: str, context) -> None:
from .extension import GeminiRealtimeExtension
ten_env.log_info("GeminiRealtimeExtensionAddon on_create_instance")
ten_env.on_create_instance_done(GeminiRealtimeExtension(name), context)
Loading

0 comments on commit 5c5527b

Please sign in to comment.