Welcome to the GodotEngine GDExtension twovoip that applies the xiph/opus compression library (and optionally the xiph/rnnoise de-noiser and OVRLipSync viseme detector) to an audio stream of speech from the microphone. The starting point for this project was one-voip-godot-4.
Thanks to @ajlennon and @DmitriySalnikov for indefatiguable work on the github actions that are successfully building this plugin across all six GodotEngine supported platforms.
There are reliability issues with the Godot microphone implementation due to slight inconsistencies
between the audio output and audio input frame rates that cannot be sustained by
a simple buffer implemnentation on some platforms. Fortunately this Pull Request 100508 fixes them
by giving direct access to audio sample chunks via the AudioStreamPlaybackMicrophone
.
An HTML5 demo is hosted at https://goatchurch.itch.io/twovoip-mqtt
The purpose of this demo is to test all the features so you can hear what the opus compression and noise cancelling settings do to a voice recording, as well as debug sample rate issues.
- Clone/Download this repository and open the project in the
example/
directory in Godot 4.3. - Go to assetlib, search for twovoip, and install it. (Now on version 3.3)
- Run the app.
- If the microphone is working, then you should see a waveform in the app like this:
If there is no response on MacOS, it could be this issue
Go to Project Settings (with Advanced Settings selected) -> Audio -> Driver -> Mix rate and set to 48000
The top section of the user interface has the PTT (Press to Talk) button and Vox button (Voice Activity Detection) where the activation threshold is given in the slider below it (on top of the waveform). Click on [De-noise] to hear how recordings sound with and without this feature.
There are two different mix_rates
values in the GodotEngine that vary according to platform:
- audio/driver/mix_rate is available in the ProjectSettings and can be overridden for different platforms
- AudioServer.mix_rate is set by the platform
Additionally, an AudioStream
can have its own mix_rate
, and the resampling ratio that is applied internally on the data in the stream will be target_mix_rate/(AudioStream.mix_rate*AudioStreamPlayer.pitch_scale)
.
All combinations are exposed in the TwoVoip
plugin and the example project to help you work out what settings are correct. If you record and playback on the same system then wrong settings can cancel out and make it appear that a bad signal between different systems is due to the transmission. The common problems are playing a 48KHz stream at 44.1KHz which will sound slow and off-key, or playing a decoded 44.1KHz stream from a network at 48KHz which will result in small gaps between the packets that are being consumed too fast and can sound like analog radio static distortion (which is impossible).
Because the Opus Compression and RNNoise libraries only work at certain sample rates (none of which are 44.1KHz) the AudioEffectOpusChunked
class has an internal resampler, though this could have been implemented by setting the pitch_scale
to 0.91875=44100/48000. Similarly on the output the AudioStreamOpusChunked
class also has a resampler that could be made redundant by tinkering with the pitch_scale
and mix_rate
. The properties of these classes are controlled by the frame size and sample rate instead of sample time to make it clear that these all relate to known fixed width arrays of floating point values. In fact the entire library can operate independently of the audio system and just on these Packed Arrays.
This section controls all the settings for the Opus compression in terms of frame duration, sample rate and bit rate. The purpose of the resampler definition on the middle line is match the sample rate told to the Opus compression library.
Use the [Play] button in the Recording Playback section to hear up to 10 seconds of the last recording you made by holding down the PTT (either manually or automatically by the Vox). The Bytes per second for the audio compression is shown here and is recalculated from the uncompressed recording whenever you change the Opus settings.
Finally, there is an MQTT transmission section to push audio packets over the network via a broker on a topic. Click the [Connect] button to go online while a friend does the same on another computer and you should be able to talk to one another over the internet (don't forget to use the PTT button). Several presets are given for convenience, and it will automatically use websockets if you are operating from HTML5.
MQTT is a lightweight protocol implemented in another GodotEngine GDExtension https://godotengine.org/asset-library/asset/1993 and described here. Its publish and subscribe, and retained and last will messages system provides a simple basis for each player track who is joining or leaving the network. There is a line of text beginning with mosquitto_sub
command that you can copy into your terminal window to watch the data fly by.
If you are familiar with the Godot Audio system, the following minimal use case of this plugin should make sense:
As outlined in the docs,
create an AudioStreamPlayer
with stream=AudioStreamMicrophone
, set it to Autoplay, and
ensure your ProjectSettings have audio/driver/enable_input
set to true.
Set its bus to a new bus called "MicrophoneBus" which should be Muted to
stop it creating a feedback loop to the output. Add an effect
OpusChunked
to the MicrophoneBus. This will only be an option if the twovoip
addon is installed.
Assuming that AudioEffectOpusChunked
is the first one on the bus, you can get a reference to it with
var microphoneidx = AudioServer.get_bus_index("MicrophoneBus")
var opuschunked : AudioEffectOpusChunked = AudioServer.get_bus_effect(microphoneidx, 0)
Now you can consume and transmit the byte chunks with the following code:
func _process(delta):
var prepend = PackedByteArray()
while opuschunked.chunk_available():
var opusdata : PackedByteArray = opuschunked.read_opus_packet(prepend)
opuschunked.drop_chunk()
transmit(opusdata)
At the other end you can decode the opus packets into an AudioStreamPlayer
whose
stream is set to an AudioStreamOpusChunked
.
var audiostreamopuschunked : AudioStreamOpusChunked = $AudioStreamPlayer.stream
var opuspacketsbuffer = [ ] # append incoming packets to this list
func _process(delta):
while audiostreamopuschunked.chunk_space_available():
audiostreamopuschunked.push_opus_packet(opuspacketsbuffer.pop_front(), 0, 0)
Opus packets don't have any context, so if you want to number them so they can be shuffled
if they get out of order in the particular network data channel you are using, you can use the prepend
array to splice an index value into a header.
Then prefixbyteslength
needs to be the same length as this header so it can be split off
on its way to the decoder.
The forward error correction flag, fec
, can be set to 1 if the previous packet is missing.
If you want to attach only native Godot classes to the audio busses and audio streams
you can do the same thing as above
using the corresponding AudioEffectCapture
and AudioStreamGeneratorPlayback
object to
handle the audio chunks in the form of
PackerVector2Array
s while running these two external classes in isolation, like
audioopuschunkedeffect.chunk_to_opus_packet(prefixbytes, audiosamples, denoise)
and:
var audiostreamgeneratorplayback = $AudioStreamPlayer.get_stream_playback()
while audiostreamgeneratorplayback.get_frames_available() > audiostreamopuschunked.audiosamplesize:
var audiochunk = audiostreamopuschunked.opus_packet_to_chunk(opuspacketsbuffer.pop_front(), prefixbyteslength, fec)
audiostreamgeneratorplayback.push_buffer(audiochunk)
audiostreamgeneratorplayback.push_buffer(audiopacketsbuffer.pop_front())
The chunk_max()
function is for implementing a Vox (Voice Activity Detection) feature
so that you can save processor cycles by dropping chunks before you opus encoding them.
Or you can use denoise_resampled_chunk()
(which requires resampling to 48kHz) to read a
speech probability, or optionally measure chunk_max()
post de-noising.
The opus compression and denoiser features need the chunks to be sent to them in order
because they use the state recorded from earlier audio samples to provide context and improve the performance
of the current chunk. Use flush_opus_encoder()
if you anticipate a gap from the previous chunk
(eg the PTT was off for a period and there was no processing).
The undrop_chunk()
function can roll back the chunk buffer and by some milliseconds
so you can avoid clipping at the start of a speech sequence.
There are three submodules in this repository.
godot-cpp is contains the header files and class definitions required to build a compiled GDExtension object that can dynamically link to the GodotEngine at runtime.
opus is the opus voice compression and decompression library from xiph.org that generally takes an array of 960 pairs of floats representing 20ms of stereo audio samples at 48kHz and returns 20 to 30 bytes of compressed data for that chunk.
noise-suppression-for-voice contains a copy of the xiph/rnnoise
code in its external/rnnoise directory with the all important CMakeLists.txt
file that makes it possible
to compile it on all the diffeerent platforms
The sequence of commands to build the system locally
nix-shell -p scons cmake ninja autoreconfHook # if you are on nix
scons apply_patches # optional
scons build_opus # build opus using cmake
scons build_rnnoise # build opus using cmake
scons # build this library
cp addons/twovoip/libs/*.so example/addons/twovoip/libs/
To compile for another platform like web, the commands are
scons apply_patches
scons platform=web target=template_release build_opus
scons platform=web target=template_release build_rnnoise
scons platform=web target=template_release
This is a highly speculative component that takes advantage of the chunking feature in the OpusChunked effect,
but which is currently closed source and distributed as a library only for Windows, Android and Mac.
There is no Linux version.
The github actions compiles a version for the available platforms
with scons lipsync=yes
and creates an addons/twovoip_lipsync
that can be copied into a project
Download the OVRLipSync libraries from https://developer.oculus.com/documentation/native/audio-ovrlipsync-native/ and unzip into top level as OVRLipSyncNative directory in this project. There is a stub include file for Linux that allows this GDExtension to compile without this library.
On Windows you may need to copy the OVRLipSyncNative/Lib/Win64/OVRLipSync.dll
file to the same directory
as your GodotEngine.exe
so that it finds and links it.
For the addon to work correctly, twovoip_lipsync
and twovoip
cannot be used in the same project.
The build system is defined by the flake.nix file
- makes a result directory that needs to be copied into addons
nix build
cp result/addons/twovoip/*so addons/twovoip
- android version:
nix build .#android
cp result/addons/twovoip/*so addons/twovoip
On Windows:
Use Visual Studio 2022 Community Edition with CMake option to open opus directory and convert cmake script to sln and then compile.
cd ../..
python -m SCons