Design and Evaluation of a Global Workspace Agent Embodied in a Realistic Multimodal Environment | Soundspaces - Habitat-lab - Habitat-sim setup
- Introduction
- General guidelines for reproduction
- System Specifications
- Habitat-lab Stable 0.2.2
- Habitat-sim
- SoundSpaces 2.0 (can be skipped)
- SAVi for Global Workspace Agents experiments
- > Global Workspace Agent <
Link to the paper: Design and evaluation of a global workspace agent embodied in a realistic multimodal environment, Frontiers in Computational Neuroscience
Please cite it as:
@article{dossa2024gw_agent
author={Dossa, Rousslan Fernand Julien and Arulkumaran, Kai and Juliani, Arthur and Sasai, Shuntaro and Kanai, Ryota},
title={Design and evaluation of a global workspace agent embodied in a realistic multimodal environment},
journal={Frontiers in Computational Neuroscience},
volume={18},
year={2024},
url={https://www.frontiersin.org/articles/10.3389/fncom.2024.1352685},
doi={10.3389/fncom.2024.1352685},
issn={1662-5188},
}
- Soundspaces currently at commit fb68e410a4a1388e2d63279e6b92b6f082371fec
- While
habitat-lab
andhabitat-sim
recommend using python 3.7 at leats, this procedure goes as far as python 3.9 to have better compatibility with more recent Torch libraries. habitat-lab
is built at version 0.2.2habitat-sim
is built from the commit 80f8e31140eaf50fe6c5ab488525ae1bdf250bd9.
- Ubuntu 20.04
- SuperMicro X11DAI-N Motherboard
- Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz | 2 Sockets * 16 Cores pers sockets * 2 Threads per core
- NVIDIA RTX 3090 GPU
- NVIDIA 515.57 (straight from NVIDIA's website)
- CUDA Toolkit 11.7 (also from NVIDIA's website)
conda create -n ss-hab-headless-py39 python=3.9 cmake=3.14.0 -y
conda activate ss-hab
pip install pytest-xdist
pip install rsatoolbox # Neural activity pattern analysis
pip install seaborn matplotlib # For plots
git clone --branch stable https://github.com/facebookresearch/habitat-lab.git # Currently @ 0f454f62e41050bc90ca468c62db35d7484923ff
cd habitat-lab
# pip install -e .
# While installing deps., pip threw out error that some conflict due to TF 2.1.0 was preventing installing all the deps.
# Manually ran `pip install tensorflow-gpu==2.1.0` just to be sure
# Additionally, install all the other deps for dev. purposes
pip install -r requirements.txt
python setup.py develop --all # So far so good
# Leave the directory for the subsequent steps
cd ..
-
According to SS's docs, using the sound simulator requires building with
--audio
flag for sound support. -
Building
--with-cuda
requires CUDA toolkit to be installed and accessible through the following variable environmetns:PATH
contains/usr/local/cuda-11.7/bin
or similarLD_LIBRARY_PATH
contains/usr/local/cuda-11.7/lib64
# Makes sure all system level deps are installed.
sudo apt-get update || True
sudo apt-get install -y --no-install-recommends libjpeg-dev libglm-dev libgl1-mesa-glx libegl1-mesa-dev mesa-utils xorg-dev freeglut3-dev # Ubuntu
# Clone habitat-sim repository
git clone https://github.com/facebookresearch/habitat-sim.git # Current @ 80f8e31140eaf50fe6c5ab488525ae1bdf250bd9
cd habitat-sim
# Checkout the commit suggest by ChanganVR
git checkout 80f8e31140eaf50fe6c5ab488525ae1bdf250bd9
# Build habitat-sim with audio and headless support
python setup.py install --with-cuda --bullet --audio --headless # compilation goes brrrr...
pip install hypothesis # For the tests mainly
Additional building instructions
Getting the test scene_datasets:
Some tests (tests/test_physics.py, tests/test_simlulator.py) require "habitat test scenes" and "habitat test objects". Where are those ? Datasets ? The provided download tool is broken. Manually downloading those:
To get the habitat-test-scenes
, from the habitat-sim
root folder:
python src_python/habitat_sim/utils/datasets_download.py --uids habitat_test_scenes
Getting the (test / example) habitat objects:
python src_python/habitat_sim/utils/datasets_download.py --uids habitat_example_objects
With this, pytest tests/test_physics.py
should have 100% success rate.
Interactive testing
This assumes habitat-sim
was built with display support.
python examples/viewer.py --scene data/scene_datasets/habitat-test-scenes/skokloster-castle.glb
Note: In case it return ModuleNotFound examples.settings
error, edit the examples/viewer.py
and remove the examples.
from the relevant import line.
Non-interactive testing
With compiled habitat-sim
, this fails to run, returning a free(): invalid pointer
.
Should work if run from habitat-lab
directory instead of habitat-sim
.
python examples/example.py --scene data/scene_datasets/habitat-test-scenes/skokloster-castle.glb
Getting the ReplicaCAD dataset
python src_python/habitat_sim/utils/datasets_download.py --uids replica_cad_dataset
Running the physics interaction simulation:
python examples/viewer.py --dataset data/replica_cad/replicaCAD.scene_dataset_config.json --scene apt_1
Other notes
- To get
tests/test_controls.py
to pass, need topip install hypothesis
if not already. - When habitat-sim is manually compiled, it seems that commands such as
python -m habitat_sim.utils.datasets_download
do not work. Instead usecd /path/to/habitat-sim && python src_python/utils/dowload_datasets.py
. This allegedly does not happen ifhabitat-sim
was installed throughconda
.
To check that both habitat-lab
and habitat-sim
work with each other:
cd ../habitat-lab
python examples/example.py
It should say "... ran for 200 steps" or something similar.
# git clone https://github.com/facebookresearch/sound-spaces.git # Currently @ fb68e410a4a1388e2d63279e6b92b6f082371fec
git clone https://github.com/facebookresearch/sound-spaces.git
cd sound-spaces
git checkout fb68e410a4a1388e2d63279e6b92b6f082371fec
pip install -e .
Requires access to the download_mp.py
tool from official Matterport3D.
See https://github.com/facebookresearch/habitat-lab#matterport3d
mkdir -p data/scene_datasets
mkdir -p data/versioned_data/mp3d
python /path/to/download_mp.py --task habitat -o /path/to/sound-spaces/data/versioned_data/mp3d
This will download a ZIP file into /path/to/sound-spaces/data/versioned_data/mp3d/v1/tasks/mp3d_habitat.zip
Unzip this file to obtain /path/to/sound-spaces/data/versioned_data/mp3d/v1/tasks/mp3d
.
This folder should contain files like: 17DRP5sb8fy
, 1LXtFkjw3qL
, etc...
Make it so that /path/to/sound-spaces/data/scene_datasets/mp3d
points to /path/to/sound-spaces/data/versioned_data/mp3d/v1/tasks/mp3d
. For example:
ln -s /path/to/sound-spaces/data/versioned_data/mp3d/v1/tasks/mp3d` `/path/to/sound-spaces/data/scene_datasets/mp3d`
Some additional metadata that is intertwined with other datasets and features of soundspaces is also required:
# From /path/to/soundspaces/data, run:
wget http://dl.fbaipublicfiles.com/SoundSpaces/metadata.tar.xz && tar xvf metadata.tar.xz # 1M
wget http://dl.fbaipublicfiles.com/SoundSpaces/sounds.tar.xz && tar xvf sounds.tar.xz #13M
wget http://dl.fbaipublicfiles.com/SoundSpaces/datasets.tar.xz && tar xvf datasets.tar.xz #77M
wget http://dl.fbaipublicfiles.com/SoundSpaces/pretrained_weights.tar.xz && tar xvf
pretrained_weights.tar.xz
# This heavy file can be ignored for Sound Spaces 2.0
# wget http://dl.fbaipublicfiles.com/SoundSpaces/binaural_rirs.tar && tar xvf binaural_rirs.tar # 867G
SS 2.0 command provided in the Reamde are based on mp3d
datasets.
If trying to run the interactive mode command latter on, it will likely throw up some error about data/metadata/default/...
being absent.
This will require the following tweak:
- in
sound-spaces/data/metadata
:ln -s mp3d default
Download from the following link: https://github.com/facebookresearch/rlr-audio-propagation/blob/main/RLRAudioPropagationPkg/data/mp3d_material_config.json and put it in /path/to/soundspaces/data/
.
For a machine with display, and with habitat-sim
not being built with the --headless
flag.
python scripts/interactive_mode.py
Note: This worked on the internal graphics of the motherboard, but not on the RTX 3090 GPU. It might work on an older, or not so fancy GPU. In any case, interactive mode is not that important for the RL use case.
python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/mp3d/train_telephone/audiogoal_depth_ddppo.yaml --model-dir data/models/ss2/mp3d/dav_nav CONTINUOUS True
python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/mp3d/train_telephone/audiogoal_depth.yaml --model-dir data/models/ss2/mp3d/dav_nav_ppo/ CONTINUOUS True
This is done using the test confgurations suggested in the sound-spaces/ss_baselines/av_nav/README.md
file.
python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/mp3d/test_telephone/audiogoal_depth.yaml EVAL_CKPT_PATH_DIR data/models/ss2/mp3d/dav_nav/data/ckpt.100.pth CONTINUOUS True
Missing audiogoal
in observations
error
Runnin the commnad above will probably spit out an error related to the audiogal
field missing in the observations
directly.
The fix is to change the base task configuration with the following:
TASK:
TYPE: AudioNav
SUCCESS_DISTANCE: 1.0
# Original
# SENSORS: ['SPECTROGRAM_SENSOR']
# GOAL_SENSOR_UUID: spectrogram
# For eval support
SENSORS: ['AUDIOGOAL_SENSOR', 'SPECTROGRAM_SENSOR']
GOAL_SENSOR_UUID: spectrogram # audiogoal
Basically, it adds the AUDIOGOAL_SENSOR
to the config, which in turns generates the corresponding field in the observation of the agent
TypeError: write_gif() got an unexpected keyword argument 'verbose'
Best guess is some form of mismatch between the moviepy version that the tensorboard used here expectes and the one that is actually installed.
Current versions are torch==1.12.0
installed from conda according to the official pytorch website, and moviepy==2.0.0.dev2
install from PyPI.
A work around was to edit the make_video
in /path/to/venv/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py
to add the case when moviepy
does not support the verbose
argument:
def make_video(tensor, fps):
try:
import moviepy # noqa: F401
except ImportError:
print("add_video needs package moviepy")
return
try:
from moviepy import editor as mpy
except ImportError:
print(
"moviepy is installed, but can't import moviepy.editor.",
"Some packages could be missing [imageio, requests]",
)
return
import tempfile
t, h, w, c = tensor.shape
# encode sequence of images into gif string
clip = mpy.ImageSequenceClip(list(tensor), fps=fps)
filename = tempfile.NamedTemporaryFile(suffix=".gif", delete=False).name
try: # newer version of moviepy use logger instead of progress_bar argument.
clip.write_gif(filename, verbose=False, logger=None)
except TypeError:
try: # older version of moviepy does not support progress_bar argument.
clip.write_gif(filename, verbose=False, progress_bar=False)
except TypeError:
try: # in case verebose argument is also not supported
clip.write_gif(filename, verbose=False)
except TypeError:
clip.write_gif(filename)
SS2.0 supports RGB_SENSOR
and DEPTH_SENSOR
for agent visual percpetion.
For the acoustic perception, it supports the SPECTROGRAM_SENSOR
and the AUDIOGOAL_SENSOR
, the latter returns the waveform that are initially generated in thte observations
field of the env.step()
method returns.
A demonstration is given in the sound-spaces/env_test.ipynb
notebook in this repository.
Unfortunately, Tensorboard does not seem to support logging of video with incorporated audio.
However, WANDB is capable of doing so, but the logging step will be out of sync with the actual training step (Tensorboard logging step) of the agent.
The torch
install that comes with the dependencies should work by default on something like GTX 1080 Ti.
However, because that one relies on CUDA 10.2
it cannot be used with an RTX 3090 for example (CUDA Error: no kernel image is available for execution on the device ...).
Training on an RTX 3090 as of 2022-07-21, thus requires upgrading to a torch
version that supports CUDA 11
.
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
This will install torch==1.12.0
.
CUDA 11.6 unfortunately seems to create too many conflicts with other dependencies, solving the environment ad infinitum.
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
Documentation here: https://nvidia.github.io/apex/optimizers.html Github here: https://github.com/NVIDIA/apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
In case it throws a CUDA related error, make sure that the version used to compile Pytorch is the same. If not, overrides the last command with something like:
CUDA_HOME=/usr/local/cuda-11.3 pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Three main features that might be of interest for this project
- removes the limitation of only having a ringing telephone. Namely, adds 21 objects with their distinct sounds
- the sound is not continuous over the whole episode, but of variable length insetad. This supposedly forces the agent to learn the association between the category of the object and its acoustic features.
- Can also probe for the scene, on top of the target object's category.
On top of all the above steps so far for the default SoundSpaces installation,
- Run the
python scripts/cache_observations.py
. Note that since we only use mp3d dataset for now, it will require editing that script to comment out line 105, to make it skip thereplica
data set, which proper installation is skipped in the steps above. Once this is done, add the symbolic link so that the training scripts can find the file at the expected path:
ln -s /path/to/sound-spaces/data/scene_observations/mp3d /path/to/sound-spaces/data/scene_observations/default
- Run the scripts as per the README in the SAVi folder:
a. First, the pre-training the goal label predictor:
python ss_baselines/savi/pretraining/audiogoal_trainer.py --run-type train --model-dir data/models/savi --predict-label
This step seems to require that huge binaural dataset that was skipped back in sound-spaces dataset acquisition section earlier.
wget http://dl.fbaipublicfiles.com/SoundSpaces/binaural_rirs.tar && tar xvf binaural_rirs.tar # 867G
a. without pre-training, continuous simulator
python ss_baselines/savi/run.py --exp-config ss_baselines/savi/config/semantic_audionav/savi.yaml --model-dir data/models/savi CONTINUOUS True
Additional notes
- The pretrained weights will be found in
/path/to/sound-spaces/data/pretrained_weights
, assuming they were properly donwload in the dataset acquisition phase.
A simplified PPO + GRU implementation that exposes the core of the algorithm, as well interactoins with the environment.
To run training code from the ppo
folder, need to add a link to the data
folder from sound-spaces
for the environments to be properly created.
cd /path/to/ss-hab/ppo
ln -s ../sound-spaces/data .
# Individual deps. install
pip install wandb # 0.16.2
pip install nvsmi # 0.4.2, for experiments' GPU usage configuration
# One liner install
pip install wandb nvsmi
The agent architectures are located in ppo/models.py
Either executre the following script, or refer to the Jupyter Notebook of the same name for a more interactive configuration and inspection process.
# --total-steps: target for the number of steps in the dataset
# --num-envs: how many envs to use in parallel (note that each env has a large memory cost)
python SAVI_Oracle_DataCollection.py --total-steps 500000 --num-envs 10
Assuming a dataset was collected and stored under /path/to/ss-hab/ppo/SAVI_Oracle_Dataset_v0
(or set approprietely with --dataset-path
when running to the script):
python ppo_bc.py --agent-type gw --gw-size 64 # or `gru` for baseline agents
The dataset path, SAVI environment configs can be found in the ppo_bc.py
argparser config section and updated accordingly to the collected dataset.
The sweeps were conducted using Weight AND Biases (WANDB) sweep utility.
The WANDB configs for the search are stored in the ss-hab/ppo/wandb-sweeps
folder.
The usage is documented in the ss-hab/ppo/wandb-sweeps/ppo_bc_rev1_sweep.sh
script (not runnable).
For example, the hyparam search for GW 128
variant is done by creating the configuration wandb-sweeps/ppo_bc_rev1_sweep__gw_128.yml
, then running the following commnd to create the WandB sweep.
wandb sweep --project "ss-hab-bc-revised-sweep" ppo_bc_rev1_sweep__gw_128.yml
This would return the Sweep ID, such as: dosssman/ss-hab-bc-revised-sweep/altdwxen
.
Then, a WandB agent can be instantiated with using said Sweep id as follows:
wandb agent dosssman/ss-hab-bc-revised-sweep/altdwxen --count 10
Multiple agents can be run in parallel, each taking care of a run of the sweep.
We leverage WANDB sweep utility to manage the execution of runs across diverse machines.
The configuration for each agent are stored under ss-hab/ppo/wandb-sweeps-finals
, and their execution documented in the script in that folder.
Similary to the hyper parameter sweep, the final configuration for the GW 128
variant ca be found in wandb-sweeps-finals/ppo_bc_rev1_final__gw_128.yml
.
The same procedure as with the hyper parameter sweep described above apply.
The IQM-based performance plots were generated using the SAVI_PerfPlots_IQM.ipynb
Jupyter Notebook.
The attention weights, probing, and broadcast importance weights were generated by the SAVI_Analysis_Revised.ipynb
.
This will use RGB + Spectrogram as input for the agent, create a timestamped TensorBoard folder automatically and log training metrics as well as video, with and without audio.
python ppo_av_nav.py
To train SAVi agents, load the appropraite configuration file:
python ppo_av_nav.py --config-path env_configs/savi/<savi_config_file>.yaml
Example training comments are documented in the ppo/runs.sh
file.
The process used to collect samples for Behavior Cloning and more genreally, trajectory inspection is located in the ppo/ppo_collect_dataset.py
.
Just pass the path to the trained RL agent weights using the same configuration as during the training and it will collect a number of steps hardcoded in the script.
Once a dataset is collect under folder ppo_gru_dset_2022_09_21__750000_STEPS
for example, pass it with --dataset-path
to the the ppo_bc.py
script.
python ppo_bc.py --dataset-path ppo_gru_dset_2022_09_21__750000_STEPS
Example training comments are documented in the ppo/runs_bc.sh
file.
Past this point, there might be some RLRAudioPropagationChannelLayoutType
related error when trying to run the interactive mode.
If this fork of soundspaces was used: git clone https://github.com/dosssman/sound-spaces.git --branch ss2-tweaks
, then skip until the Finally testing SS2 section.
Otherwise, this will require a workaround in the soundspaces simulator.
Namely, comment or delete the line 125 of sound-spaces/soundspaces/simulator_continuous.py
and add:
import habitat_sim._ext.habitat_sim_bindings as hsim_bindings
channel_layout.channelType = hsim_bindings.RLRAudioPropagationChannelLayoutType.Binaural
instead.
This workaround is adapted from habitat-sim/examples/tutorials/audio_agent.py
.
The reason is because the soundspace author are using a version of habitat-sim
that is more recent than v2.2.0
, and where the habitat_sim.sensor.RLRAudioPropagationChannelLayoutType
object is properly defined.
Since we clone [email protected]
, however, we revert to using the hsim_bindings
directly.