Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for recording and sending VoIP data #870

Closed
IoneGod opened this issue May 22, 2020 · 21 comments
Closed

Add support for recording and sending VoIP data #870

IoneGod opened this issue May 22, 2020 · 21 comments

Comments

@IoneGod
Copy link

IoneGod commented May 22, 2020

Describe the project you are working on:
Multiplayer shooting game

Describe the problem or limitation you are having in your project:
I am trying to send sound packets over the network to the other player so as to better my multiplayer game interaction

Describe the feature / enhancement and how it helps to overcome the problem or limitation:
Adding a Sound Recorder to record sounds and save them as temporary .wav or any other supported audio format files to send over the internet and other networks and can be useful for other purposes

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:
A SoundRecorder node would have able to save sounds like voice data which could be useful in sending greetings over a network by different players instead of having to type a message during game play , Voice data would also be useful in the authentication of users so as to prevent data manipulation by hackers and give better security for users

If this enhancement will not be used often, can it be worked around with a few lines of script?:
A few lines a code wouldn't cut it

Is there a reason why this should be core and not an add-on in the asset library?:
There are different platforms that have different sound recorder classes creating a sound recorder node will allow acess to all those platforms at once without having to rewrite the code for different platforms. A plugin wouldnt be able to manage that
Sound1
Sound2

@Jummit
Copy link

Jummit commented May 22, 2020

@Calinou
Copy link
Member

Calinou commented May 22, 2020

Audio recording is already supported since Godot 3.1. That said, it may not work on all platforms due to bugs (see godotengine/godot#33184).

@Calinou Calinou changed the title Sound Recorder Add support for recording and sending VoIP data May 22, 2020
@nonchip
Copy link

nonchip commented May 24, 2020

even though @Calinou reacted with a 😕 to my proposal over in the already mentioned lengthy discussion (without commenting, so not sure which part about it was confusing/bad, sorry, but i guess it might've been my not-so-professional wording as a result of reading through various overcomplicated proposed third party service bindings), i just wanna mention again, godot already successfully implements webrtc data channels for multiplayer on (afaik) all platforms, so it might be a good idea to just wrap the webrtc audio channels too and expose those to allow for VoIP.

@Calinou
Copy link
Member

Calinou commented May 24, 2020

@nonchip WebRTC adds a significant amount of complexity on its own. I'd advise not relying on it unless you need to support HTML5 exports somehow. Most networked games don't need to support HTML5 exports, so I would prefer an easier to set up solution. (Do you have STUN/TURN servers at hand? To my knowledge, this is pretty much required for WebRTC.)

@nonchip
Copy link

nonchip commented May 24, 2020

@Calinou good point, might be worthy to take into consideration for the html5 export of the voip solution though.
also about stun/turn you technically don't need it but you really want to because it's a pain to do it in any other way. i'm running a spreed WebRTC service myself, that has it all included, but is pretty much meant as a "go to that website and start talking" kinda thing, i looked into setting up stun/turn manually and decided my sanity was more important :P

@Wavesonics
Copy link

Wavesonics commented May 28, 2020

I believe that in order to implement real time voice streaming, we're still missing one part of WebRTC: #813

That is of course if you want to do it using WebRTC

@Wavesonics
Copy link

Wavesonics commented May 29, 2020

I've been digging into this a little more recently. I implemented a VOIP demo similar to the one @Jummit linked, and that highlighted some of the issues here to me.

I have some time right now where I could probably get a real solution implemented, so I thought I'd get some input on what a real solution would actually look like.

Here is the breakdown of the problem as I see it:

1) Recording audio:

This was solved in 3.1. Maybe it could be made a little more friendly with something like a SoundRecorder node as @iapps207 suggested. But even if not, it does work as it exists today.

2) Sending the data over the wire:

There's a variety of ways we could accomplish this, and we probably don't want to be too prescriptive here. But the problem with how my demo works and the cbarsugman demo is that they are not truly streaming the audio. They record, then send the whole audio buffer. It's simple, but pretty terrible for real time communication. So here's the options as I see them:

A) Send via existing network methods (rset or rpc argument). Depending on how this is configured, all data will pass through the server on the way to each client. This as far as I can figure it, will work like my existing demo, and not be truly real time.
B) WebRTC MediaStreams: Not currently implemented yet #813. This is very prescriptive, but it would be extremely easy to setup, and using STUN/TURN would allow peer-to-peer when available, saving lots of bandwidth on the server.
C) WebRTC without MediaStreams: I haven't gone too deep here yet, but I can't see why we couldn't just use the existing WebRTC data channel. This just puts the burden on the sender and receiver to properly handle the data like audio.
D) Some sort of lower level system based on existing Godot networking, I don't think this would require any new features, but it would have to be able to work along side the existing high level multiplayer API in my opinion. I haven't tried to combine the high level API with low level networking, is it possible?

3) Encoding the data for transit:

All of this is a moot point at the moment, because the data returned from the Microphone API is (as it should be) a wav. Obviously the data will be far too large to use for a real-time VOIP application. And as it stands right now, there is no Audio encoder exposed to the scripting interface (or from my discussion with @Calinou even in the engine at all). So I think this is the first issue that must be addressed. From some research it looks like Opus is the best open codec for voice data, so I cloned their repo and have been poking around the docs.

My question to anyone here is: should we have libOpus in the engine it's self for this purpose? I would certainly lean toward yes, but I can see this being only for VOIP so maybe being too specific of a use case.

I asked around on the opus IRC channel, and it looks like the higher level opus libs are specifically for file access, or http streams. So we'll probably be stuck with just the base libOpus...

Last thing to note: If we do go with opus, it has the added advantage of being the codec used by all browsers for WebRTC Media Streams. So it might pair nicely with an implementation of that.


I'm definitely looking for feedback/suggestions. Am I totally off the mark on anything here? Is this even a thing people are interested in?

@Wavesonics
Copy link

Wavesonics commented Jun 2, 2020

Ok I've taken the past few days to starting figuring out libOpus.
What I've got here is a proof of concept, mainly a way for me to learn how Opus works, and how we might be able to integrate it with Godot.

Here is a proof of concept GDNative library wrapping libOpus:
libopus-gdnative

And here is a demo project using the gdnative library:
libopus-gdnative-demo (only compiled for windows x64 currently)


One issue I ran into is that libOpus only accepts a select few sample rates, and Godot's 44.1kH is not one of them. The closest Opus has is 48kH.

Godot has a bicubic resampler it looks like for playback: AudioStreamPlaybackResampled, but ideally we'd be able to resample the input from the Microphone. If that's possible with AudioStreamPlaybackResampled I haven't figured it out yet.

This was causing me some problems, until I found a great hack. I get 44.1kH audio from Godot's microphone API. Then I tell libOpus that this is in fact 48kH audio. The resulting compression is distorted due to this. Then on the decode side, libOpus produces the distorted 48kH audio, which I hand off to Godot, but I tell Godot it is actually 44.1kh, and thus it de-warps it xD

As great as that is, if we really wanted 1st class support for VOIP, I think we'd need the Microphone API to allow us to specify the sample rate, as well as mono VS stereo. There is no need for Stereo PCM data for a microphone. For VOIP anyway.

Lastly, as expected, the compression ratio is just fantastic. In simple demos, I was seeing greater than 100x size reduction over the raw PCM audio.

@Wavesonics
Copy link

Wavesonics commented Jun 2, 2020

Just some notes here from looking around at solutions to the Sample Rate Conversion problem.

The most common one I've found is: libsamplerate
Which is C and looks good over all. The problem is it's GPL (It switched to BSD 2 in 2018)

Here is a C++ library which is MIT license: r8brain

That might be a good option if any of this work was ever considered for inclusion in Godot.

Lots of into about sample rate libs here: https://ccrma.stanford.edu/~jos/resample/Free_Resampling_Software.html

@Calinou
Copy link
Member

Calinou commented Jun 2, 2020

Doesn't Opus include its own resampler? I read that somewhere on a forum while searching for a solution to this specific issue.

@Wavesonics
Copy link

Wavesonics commented Jun 2, 2020

libOpus doesn't appear to? At least not that I could find looking through it's docs.

I think it's derivatives like opusenc or opusfile might.

@Wavesonics
Copy link

I was discussing this with iFire on discord, and he apparently had a PR which not only added libOpus support, but added the next component which I have not yet addressed: being able to actually stream the decoded audio into Godot's audio system.

fire/godot@37ec390

He said it was rejected, but we didn't have time to get into the details.

If any of the core contributors have any insight into what was wrong with it, how or if it could be changed to be acceptable I'd love to discuss it!

Lastly from my discussion with iFire, the existing AudioEffectRecord will probably not be ideal for streaming audio in it's current form as it does a large buffer re-allocation as part of it.


I'm going to pursue my current approach as a stop-gap (providing libOpus as a gdnative library) but that PR looks much more comprehensive starting point for providing true 1st class support for streaming VOIP.

@Wavesonics
Copy link

@Calinou found this in their FAQ, maybe opus-tools is what you had read about?

How do I use 44.1 kHz or some other sampling rate not directly supported by Opus?

Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.

The opus-tools package source code contains a small, high quality, high performance, BSD licensed resampler which can be used where resampling is required.

So maybe their small BSD licensed resampler would be a good option:
https://opus-codec.org/release/dev/2018/09/18/opus-tools-0_2.html

@Wavesonics
Copy link

I polished up my addon and put it on the library here:
https://godotengine.org/asset-library/asset/650

It's certainly far from the real streaming solution we'd like to get to, but I'm using it to pretty good effect I think in my project.

The lag is obvious, but people seem to adjust to it pretty quickly, if you want to see what the experience is like my project is here: https://github.com/FugitiveTheGame/Fugitive3D/

On the path to true streaming audio, the next biggest blocking factor is the need for direct access to an audio buffer inside Godot which we can write frames to directly. @fire 's PR that I linked above looks like a good starting point to me, but I honestly haven't dug into that part of the issue much yet.

Transport of the audio is still an issue, but much more solvable in various ways.

@IoneGod
Copy link
Author

IoneGod commented Jun 25, 2020

@Calinou @Wavesonics @nonchip you guys have to see Vivox which is integratable into other game engines made by unity https://unity.com/products/vivox

@Calinou
Copy link
Member

Calinou commented Jun 25, 2020

@iapps207 Vivox is a proprietary library, which is therefore unsuitable for inclusion in Godot. Nothing prevents a third-party from publishing a module for it though.

@NEO97online
Copy link

@Wavesonics have you made any discoveries on this front since then?

For reference, I found the relevant PR here with a little more details: godotengine/godot#35402
It includes some messages from @reduz on the topic, but that's about it.

Here's the original issue from @fire: #399

It seems they were closed in favor of a better implementation due to packet ordering and delays.

@Wavesonics
Copy link

Wavesonics commented Oct 2, 2020

@auderer no, I haven't spent any time on this recently. The opus plugin I released on the asset store allows some simple forms of voip to work. But the road block to true voip now is lower level access to an audio buffer like that PR you linked provides. For opus in particular, we need to stream individual opus packets, decompress them and insert them directly into an audio stream buffer. As far as I'm aware that's not possible with the current audio system implementation.

@fire
Copy link
Member

fire commented Jan 30, 2021

Announcement: godotengine/godot#45593 was approved for voip usage in #2013.

@Faless
Copy link

Faless commented Jul 23, 2021

The facilities to capture and process audio have been implemented via godotengine/godot#45593 , supporting a specific VoIP protocol is outside the scope of Godot core. It is now possible to create addons that implements VoIP via the new AudioEffectCapture API.

@dreadpon
Copy link

dreadpon commented Oct 18, 2023

@Faless
Sorry to bring it up again after all this time, but do

facilities to capture and process audio have been implemented via godotengine/godot#45593

Also provide

the Microphone API to allow us to specify the sample rate, as well as mono VS stereo

Because afaik they don't.

I'm currently porting a project to 4.2, and was surprised to discover that my custom opus compression code performs slower than AudioEffectCapture fills up its buffer.
Now, I could of course lower the mix_rate in ProjectSettings, but that would affect all other audio we might use in the project, resulting in poor music/sfx quality.

While my compression code likely isn't ideal, I think reducing the amount of input data is one of the most important optimizations I can do. I surely don't need my input audio to be 44100/48000 kHz when I intend to lossy compress it anyway

Edit:
After some debugging I realized I overstated the importance of mix rate in my particular case, but I think this concern is still valid and might become an issue if someone would actually want to do any sort of processing on the input audio data
(the problem in my case was related to PackedByteArray.resize(), it became quite a bit slower in Godot 4.x)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants