DOA error correction through feedback of the speech enhancement quality.
Three ROS2 nodes are provided:
-
demucs
: a variation of the original Demucs Denoiser model, that uses a location-based strategy for target selection. It subscribes to thejackaudio
topic that is published by thebeamform2
ROS2 node, and publishes thejackaudio_filtered
topic that is the result of enhancing the speech from the beamforming output. -
online_sqa
: a SQUIM-based online speech quality estimator. It subscribes to thejackaudio_filtered
topic, and publishes theSDR
topic that is the speech quality estimation from the enhanced speech. -
doaoptimizer
: it aims to optimize the speech quality by correcting the direction of arrival (DOA) that is fed to a beamformer, based on the Adam optimizer. It subscribes to theSDR
topic, and publishes a thetheta
topic that is the corrected DOA.
The theta
topic is then subscribed to by the beamform2
node, closing the feedback loop.
-
Install and configure
jackaudio
. -
Clone and compile the
beamform2
ROS2 node. -
Configure the
beamform_config.yaml
ofbeamform2
so that it matches your microphone setup. -
Configure the
rosjack_config.yaml
ofbeamform2
so that:- Its output is fed through a ROS2 topic:
output_type
should be either0
or2
. - Its sampling rate matches the one that
demucs
was trained with:ros_output_sample_rate
should be16000
.
- Its output is fed through a ROS2 topic:
-
Install the python requirements of all the nodes:
pip install -r requirements.txt
-
Create a ROS2 package, place the
demucs
,doaoptimizer
andonline_sqa
directories inside the package'ssrc
directory, and runcolcon build
.
-
Start the
jackaudio
server. -
Start the
phase
beamformer frombeamform2
:ros2 launch beamform2 phase.launch
-
Run
demucs
:ros2 run demucs demucs
-
Start
jack_write
frombeamform2
to listen to the result:ros2 launch beamform2 rosjack_write.launch
-
Run
online_sqa
:ros2 run online_sqa online_sqa
-
Run
doaoptimizer
:ros2 run doaoptimizer doaoptimizer
The jackaudio_filtered
topic provides the DOA-corrected enhanced speech.
All the following hyperparamaters can be set using the --ros-args -p
argument, such as:
ros2 run module submodule --ros-args -p hyperparameter1:=value1 -p hyperparameter2:=value2
Here is the list of modules and their hyperparameters:
-
demucs
:input_length
: length (in seconds) of input window (default: 0.512). The higher, the better quality, but the greater response time.
-
online_sqa
:hop_secs
: time hop (in seconds) between SDR estimates (default: 1.5).win_len_secs
: length (in seconds) of input window (default: 3.0).smooth_weight
: smoothing weight to apply to SDR estimate output (default: 0.9).
-
doaoptimizer
:init_doa
: initial DOA estimate (in degrees) of the source of interest (default: 0.0).eta
: adaptation rate of the Adam variation optimizer (default: 0.3).wait_for_sdr
: time (in seconds) to wait for SDR estimate after a new DOA correction is published (default: 1.5). It is highly recommended to use the same value as theonline_sqa
'shop_secs
hyperparameter.opt_correction
: use new optimization mechanism (default: True). This is explained in the following section.
The doaoptimizer
module can run using the original optimization mechanism that is based on a variation of the Adam optimizer. It can also run using a new optimizer correction mechanism. This new mechanism is run by default, but the original mechanism can be used by running:
ros2 run doaoptimizer doaoptimizer --ros-args -p opt_correction:=False
If you end up using this software, please credit it as:
@article{rascon2024direction,
title={Direction of Arrival Correction through Speech Quality Feedback},
author={Rascon, Caleb},
journal={Digital Signal Processing},
pages={104960},
year={2024},
publisher={Elsevier}
}
You can also have a look at its arxiv version:
@article{rascon2024direction,
title={Direction of Arrival Correction through Speech Quality Feedback},
author={Rascon, Caleb},
journal={arXiv preprint arXiv:2408.07234},
year={2024}
}