Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] OpenAI Realtime API #5672

Open
lloydzhou opened this issue Oct 15, 2024 · 6 comments
Open

[Feature Request] OpenAI Realtime API #5672

lloydzhou opened this issue Oct 15, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@lloydzhou
Copy link
Member

🥰 需求描述

https://openai.com/index/introducing-the-realtime-api/

https://platform.openai.com/docs/api-reference/realtime

https://github.com/openai/openai-realtime-console/blob/main/readme/realtime-console-demo.png

image

🧐 解决方案

逻辑

  1. realtime api,使用websocket接入
  2. api本身内置了sessions, conversation等概念,session支持配置modalities, instructions, voice, input_audio_format, output_audio_format, turn_detection, input_audio_transcription, tools等,支持function call
  3. 支持input_audio_buffer.append以及input_audio_buffer.commit方式上传音频,再通过response.create开始生成结果(turn_detection如果开启,可以不用手动调用)
  4. 支持客户端发送conversation.item.create将上下文的内容直接添加到当前的conversation,如果是历史记录,需要设置status=completed
  5. conversation.item.truncate支持打断输入
  6. 通过监听事件response.audio.delta拿到base64 audio data,通过response.text.delta同步拿到文本。
  7. 通过监听事件response.output_item.added拿到是否是function call, 通过监听response.function_call_arguments.delta拿到function call参数。或者直接在response.done里面拿function call相关信息?

交互

  1. 可能会新增OpenAI客户端一样的语音交互页面直接调用realtime api。
  2. 当前的语音交互界面,默认全屏,支持缩小到输入框大小(替换输入框位置)。同时保留语音输入界面以及chat history页面(保留这里,可以支持展示插件执行生成的中间结果等,例如中间调用插件生成一张图,语音是无法直接描述的)。
  3. 语音通话生成的结果(audio buffer)以及同时拿到的文本信息,需要持久化到sessions里面
  4. 语音通话支持选择voice,format,detection模式,tools等(这些按钮需要保留,或者在语音界面重新布局)

讨论

  1. realtime是一个新的model,但是这个model明显和之前的model是不对等的。应该怎么放?
  2. realtime api也支持modalities只填写text,会将语音给屏蔽掉(只是屏蔽语音,但还是支持一整套的通过websocket调用这个模型)。

📝 补充信息

价格
image

@Dogtiti
Copy link
Member

Dogtiti commented Nov 7, 2024

#5786

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


#5786

@Dogtiti
Copy link
Member

Dogtiti commented Nov 11, 2024

设置面板配置参数
image

@coderabbitai coderabbitai bot mentioned this issue Nov 11, 2024
10 tasks
@Dogtiti
Copy link
Member

Dogtiti commented Nov 11, 2024

@kitaev-chen
Copy link

请问这个有免费模型可用吗?还没聊1分钟就0.1$了。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Is there a free model available for this? It’s only 0.1$ after chatting for 1 minute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants