This code consists of a universal transcoder tool. It generates a psychoacoustically motivated transcoding matrix to transform from any input format to any other given format or speaker layout, maximizing the preservation of spatial information.
See more details about the theoretical foundation and mathematical procedures behind this code in the paper "Universal Spatial Audio Transcoder" (AES paper, arXiv preprint).
A. Sagasti, D. Scaini, and D. Arteaga, “Universal spatial audio transcoder,” presented at the AES 156th Convention, Madrid (2024). https://aes2.org/publications/elibrary-page/?id=22517
@inproceedings{usat_2024,
title = {Universal Spatial Audio Transcoder},
author = {Amaia Sagasti and Davide Scaini and Daniel Arteaga},
year = {2024},
booktitle = {AES 156th Convention, Madrid},
eprint = {2405.04471},
archivePrefix = {arXiv},
primaryClass = {cs.SD},
url = {https://aes2.org/publications/elibrary-page/?id=22517}
}
In the folder paper/saved_results
you can find all data and plots corresponding to the results in the paper.
It is recommended Python 3.8 or newer.
You need the GEOS package. To install it on mac with Homebrew, you can:
brew install geos
One way to use universal_transcoder
is to create a virtual environment
python3.8 -m venv venv
source ./venv/bin/activate
and install universal_transcoder
with:
pip install -e .
Alternatively, you can also install the dependencies from the requirements.txt
file:
pip install -r requirements.txt
To try out this code, you can run the 4 scripts prepared with the examples shown in the paper:
ex1_5OAto704.py
, ex2_704to5OA.py
, ex3_502to301irr.py
and ex4_ObjectTo50.py
(found in the paper
folder).
The code corresponding to the USAT algorithms is located in the folder universal_transcoder
.
python -m pytest tests
USAT is an algorithm capable of calculating, through an optimization based on some psychoacoustic effects, the optimised transcoding matrix T_optimised
that transcodes a defined input format of
A dictionary like the one below is passed as input to the function optimize()
inside calculations/optimization.py
, which generates as output the optimized transcoding matrix.
dictionary = {
"input_matrix_optimization": input_matrix_optimization, # Input matrix that encodes in input format (LxM) **
"cloud_optimization": cloud_optimization, # Cloud of points sampling the sphere (L) **
"output_layout": output_layout, # Output (real/virtual) layout of speakers to decode(P) **
"Dspk": Dspk, # Decoding matrix from output format to layout of speakers (PxN) ***
"coefficients": { # List of coefficients to the cost function **
"energy": 5,
"radial_intensity": 2,
"transverse_intensity": 1,
"pressure": 0,
"radial_velocity": 0,
"transverse_velocity": 0,
"in_phase_quad": 10000,
"symmetry_quad": 0,
"in_phase_lin": 0,
"symmetry_lin": 0.0,
"total_gains_lin": 0,
"total_gains_quad": 0,
"sparsity_quad": 0.01,
"sparsity_lin": 0.001,
},
"directional_weights": 1, # Weights to directions sampling the sphere (1xL)
"show_results": show_results, # Flag to show results **
"save_results": save_results, # Flag to save results **
"results_file_name": "ex3_50to301irr_USAT", # Name of folder to save results
"input_matrix_plots": input_matrix_plots, # Auxiliary matrix that encodes in input format (L'xM)
"cloud_plots": cloud_plots, # Auxiliary cloud of points sampling the sphere for plotting (L')
"T_initial": T_initial, # Starting point of optimization
}
** mandatory inputs
*** mandatory only if output format is layout independent, like Ambisonics
This dictionary is passed as input to the function optimize()
inside the module `universal_transcoder/calculations/optimization.py. The following sections will explain in more detail each entry of the dictionary.
Variables containing set of points sampling the sphere.
These variables are formatted as the class MyCoordinates
, which is a subclass of pyfar.Coordinates
. They can be generated using the functions inside auxiliars/get_cloud_points.py
.
-
The data saved in key
"cloud_optimization"
corresponds to the set of points ($L$ points) sampling the sphere in which the optimization is desired. If the output format or speaker layout is 2D, the most appropiate cloud would be a 2D set of points, which can be generated usingget_equi_circumference_points()
(fromauxiliars/get_cloud_points.py
). On the other hand, if the output format or layout to which we aim to decode is 3D, it would be more appropiate to generate a 3D set of sampling points (get_sphere_points()
,get_equi_t_design_points()
,get_equi_fibonacci_sphere_points()
,get_all_sphere_points()
fromauxiliars/get_cloud_points.py
). It is possible to use any other function that generates points in the same format. For the opimization, it is recommended to have a set of points that are equally distributed in terms of energy across the sphere. -
For the case of key
"cloud_plots"
, this set of points ($L'$ points) corresponds to the sampling directions of the sphere to be shown in the plots if either"save_results"
or"show_results"
keys are active. If"cloud_plots"
is not defined but"save_results"
or"show_results"
are active, the program will use"cloud_optimization"
for the plots. Similarly to the case above, depending on the dimensions of the output layout,"cloud_plots"
should be set accordingly.
from auxiliars.get_cloud_points import get_equi_circumference_points,get_sphere_points,get_all_sphere_points
cloud_2D = get_equi_circumference_points(10)
cloud_3D = get_sphere_points(8)
Variables containing the encoding gains that encode each direction given in a cloud of points in a specific audio format. This constitutes one of the main inputs to the program, due to the fact it provides the information about the input format that we aim to decode to a speaker layout.
These variables are formatted as numpy.Array
. There are auxiliary functions to compute them for the most common spatial audio formats inside auxiliars/get_input_channels.m
.
-
The data stored in key
"input_matrix_optimization"
corresponds to the encoding gains that encode the set of$L$ directions in the cloud, into the input audio format of$M$ channels. Array of size$L\times M$ . -
The data stored in key
"input_matrix_plots"
corresponds to the encoding gains that encode the set of L' directions in the cloud, into the input audio format of$M$ channels. Array of size$L'\times M$ . This is only used for plotting. Ifcloud_plots
is not defined butsave_results
orshow_results
are active, the program will usecloud_optimization
andinput_matrix_optimization
for the plotting.
NOTE: Both of these variables must be generated with the same encoder: different set of points but same encoder.
from auxiliars.get_input_channels import get_input_channels_ambisonics,
order=1
#Clouds
cloud_optimization = get_sphere_points(8, False)
cloud_plots = get_all_sphere_points(5, False)
#Input matrices
input_matrix_optimization = get_input_channels_ambisonics(cloud_optimization, order)
input_matrix_plots = get_input_channels_ambisonics(cloud_plots, order)
Variable containing the set of directions where the speakers (real, in case of decoding to a layout of speakers, or virtual, in case of transcoding to a layout independent format, like Ambisonics) to which we aim to decode are located, (
These variables are formatted as MyCoordinates
. They are generated manually using the methods of MyCoordinates. For example, the method mult_points() generates the variable layout
receiving as input an array of size
layout = MyCoordinates.mult_points(
np.array(
[
(-120, 0, 1),
(-30, 0, 1),
(0, 0, 1),
(30, 0, 1),
(120, 0, 1),
]
)
)
These variables establish the different weights given to the different terms of the cost function, consequently giving more or less importance to the different psychoacoustical effects. Set 0 if variable is inactive.
These variables establish the different weights given to the different directions of the cloud of points, providing the possibility of giving more or less importance in the optimization to certain zones (for example those points close to speakers in the output), at expense of other zones.
These variables are formatted as a numpy.array
of size
If key show_results
is active, the system will show plots and print some logs through terminal.
If key save_results
is active, the system will store all the resulting plots and logs in folder /saved_results
inside a new folder called as set in key results_file_name
.
This variable is optional and defines the initial point of the optimization. Its size must be
This variable represents the decoding matrix of shape output_layout
of speakers. It is not needed when the output format already constitutes a set of N speakers (Dspk=1, N=P
). However, it is mandatory when the desired output format is a speaker-independent format, like Ambisonics.
In the code, it is provided the function get_ambisonics_decoder_matrix()
in universal_transcoder/auxiliars/get_decoder_matrices.py
, as implemented in ex2_704to5OA.py
, an example of transcoding to Ambisonics (output format = Ambisonics 5th Order)
Dspk = get_ambisonics_decoder_matrix(order, output_layout, "pseudo")