Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] 语音输入和输出支持 #208

Closed
zpng opened this issue Feb 25, 2024 · 23 comments
Closed

[Feature] 语音输入和输出支持 #208

zpng opened this issue Feb 25, 2024 · 23 comments
Labels
enhancement New feature or request

Comments

@zpng
Copy link

zpng commented Feb 25, 2024

为了提高交流效率,我们设立了官方 QQ 群和 QQ 频道,如果你在使用或者搭建过程中遇到了任何问题,请先第一时间加群或者频道咨询解决,除非是可以稳定复现的 Bug 或者较为有创意的功能建议,否则请不要随意往 Issue 区发送低质无意义帖子。

点击加入官方群聊

你想要什么功能或者有什么建议?
支持语音转文字和文字转语音的功能。

有没有可以参考的同类竞品?
类似下面这个人的项目里的功能:https://github.com/vual/ChatGPT-Next-Web-Pro
image

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Title: [Feature]

In order to improve communication efficiency, we have set up an official QQ group and QQ channel. If you encounter any problems during use or construction, please join the group or channel for consultation as soon as possible, unless it is a bug that can be stably reproduced or More creative feature suggestions, otherwise please do not send low-quality and meaningless posts to the Issue area.

Click to join the official group chat

**What features do you want or have any suggestions? **
Supports speech-to-text and text-to-speech functions.

**Are there any similar competing products that we can refer to? **
Similar to the function in the following person's project: https://github.com/vual/ChatGPT-Next-Web-Pro
image

@zpng zpng changed the title [Feature] [Feature] 语音输入和输出支持 Feb 25, 2024
@Hk-Gosuto Hk-Gosuto added the enhancement New feature or request label Feb 27, 2024
@Hk-Gosuto
Copy link
Owner

先支持了基于 OpenAI 的 TTS 功能,语音输入后面空了再加。
image
image

@ofllm
Copy link

ofllm commented Mar 7, 2024

点击语音报错:json f "stack": "Error: Failed to execut
e 'decodeAudioData' on 'BaseAudioCo
ntext': Unable to decode audio data"
请问除了页面设置之外还需要配置其他地方么?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Click the voice error report: json f "stack": "Error: Failed to execute
e 'decodeAudioData' on 'BaseAudioCo
ntext': Unable to decode audio data"
In addition to page settings, do I need to configure other places?

@Hk-Gosuto
Copy link
Owner

点击语音报错:json f "stack": "Error: Failed to execut e 'decodeAudioData' on 'BaseAudioCo ntext': Unable to decode audio data" 请问除了页面设置之外还需要配置其他地方么?

确定一下是否可以使用 openai 的 tts 模型

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Click the voice error: json f "stack": "Error: Failed to execut e 'decodeAudioData' on 'BaseAudioContext': Unable to decode audio data" Do you need to configure other places besides page settings?

Determine whether you can use openai’s tts model

@ofllm
Copy link

ofllm commented Mar 7, 2024

点击语音报错:json f "stack": "Error: Failed to execut e 'decodeAudioData' on 'BaseAudioCo ntext': Unable to decode audio data" 请问除了页面设置之外还需要配置其他地方么?

确定一下是否可以使用 openai 的 tts 模型

手动调用api发现无法正常使用tts模型,感谢大佬回复,谢谢。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Click the voice error: json f "stack": "Error: Failed to execut e 'decodeAudioData' on 'BaseAudioContext': Unable to decode audio data" Do you need to configure other places besides page settings?

Determine whether you can use openai’s tts model

I manually called the API and found that the tts model could not be used normally. Thank you for your reply. Thank you.

@zpng
Copy link
Author

zpng commented Mar 14, 2024

@Hk-Gosuto 大佬,语音输入的最终的效果演示图是什么样的?需要key支持什么模型?readme上写的需要https访问是指网址需要https域名吗,这个的原因是?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


@Hk-Gosuto Sir, what is the final effect of voice input? What model does the key need to support? The readme that requires https access means that the website requires an https domain name. What is the reason for this?

@Hk-Gosuto
Copy link
Owner

设置里开启后,发送按钮会变成语音输入,点击后开始说话,说完再点停止就行。
具体技术使用的是 SpeechRecognition API 不需要设置 key,关于浏览器兼容性可以参考:https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition
SpeechRecognition API 在大多数浏览器中要求使用HTTPS才能正常工作。

image
image

@zpng
Copy link
Author

zpng commented Mar 14, 2024

设置里开启后,发送按钮会变成语音输入,点击后开始说话,说完再点停止就行。 具体技术使用的是 SpeechRecognition API 不需要设置 key,关于浏览器兼容性可以参考:https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition SpeechRecognition API 在大多数浏览器中要求使用HTTPS才能正常工作。

image image

这个为啥不是通过调用openai的api实现的?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


After turning it on in the settings, the send button will turn into voice input. Click to start speaking, and then click to stop after speaking. The specific technology uses the SpeechRecognition API. There is no need to set a key. For browser compatibility, please refer to: https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition. The SpeechRecognition API is required in most browsers. Use HTTPS to work properly.

![image](https://private-user-images.githubusercontent.com/14031260/312761075-107011a9-93b3-4822-9926-d076d63e0306.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..oWzcJ RaoZbbgZqSx-Rg1wnBXHo_K_9u1Ly3iW_SAmAQ) ![image]( https://private-user-images.githubusercontent.com/14031260/312761327-388453b5-5c4e-4be4-8bf8-b8d01559e996.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..JJUwMh7CGoj5ZOD WND1A2cNPs_KR4yVQ6Ya8l4AWOvE)

Why is this not achieved by calling OpenAI’s API?

@Hk-Gosuto
Copy link
Owner

这个不收费,识别效果也挺好的,为啥要用 wishper?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


There is no charge for this, and the recognition effect is pretty good. Why use wishper?

@zpng
Copy link
Author

zpng commented Mar 14, 2024

噢噢好的

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Oh ok

@Hk-Gosuto
Copy link
Owner

可以多试一些场景,如果复杂场景效果不好的话,后面会考虑增加 wishper 适配。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


You can try more scenes. If the effect of complex scenes is not good, we will consider adding wishper adaptation later.

@jcr8745dqy100
Copy link

当使用openai tts时,每一次让它说,都会重新申请一次tts请求,能不能第一次就把语音下载到本地,过后重新听就不浪费请求了

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


When using openai tts, every time it is asked to speak, it will re-apply for a tts request. Can the voice be downloaded to the local for the first time, so that it can be listened to again later without wasting the request?

@Hk-Gosuto
Copy link
Owner

当使用openai tts时,每一次让它说,都会重新申请一次tts请求,能不能第一次就把语音下载到本地,过后重新听就不浪费请求了

我看看能不能把音频丢 indexedDB 里,可以先切换到 edge tts 那个不产生费用。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


When using openai tts, every time it is asked to speak, it will re-apply for a tts request. Can the voice be downloaded to the local for the first time, so that it can be listened to again later without wasting the request?

I'll see if I can throw the audio into indexedDB. I can switch to edge tts first which doesn't incur any charges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants