Design and Evaluation of a Global Workspace Agent Embodied in a Realistic Multimodal Environment | Soundspaces - Habitat-lab - Habitat-sim setup

Introduction
General guidelines for reproduction
System Specifications
Habitat-lab Stable 0.2.2
Habitat-sim
- Preliminary testing
- Acquiring datasets necessary for simulations
SoundSpaces 2.0 (can be skipped)
SAVi for Global Workspace Agents experiments
- Addtional setup on top of SoundSpaces 1.0
> Global Workspace Agent <

Introduction

Link to the paper: Design and evaluation of a global workspace agent embodied in a realistic multimodal environment, Frontiers in Computational Neuroscience

Please cite it as:

@article{dossa2024gw_agent
	author={Dossa, Rousslan Fernand Julien and Arulkumaran, Kai and Juliani, Arthur and Sasai, Shuntaro and Kanai, Ryota},
	title={Design and evaluation of a global workspace agent embodied in a realistic multimodal environment},
	journal={Frontiers in Computational Neuroscience},
	volume={18},
	year={2024},
	url={https://www.frontiersin.org/articles/10.3389/fncom.2024.1352685},
	doi={10.3389/fncom.2024.1352685},
	issn={1662-5188},
}

General guidelines for reproduction (as of 2022-07-21)

Soundspaces currently at commit fb68e410a4a1388e2d63279e6b92b6f082371fec
While habitat-lab and habitat-sim recommend using python 3.7 at leats, this procedure goes as far as python 3.9 to have better compatibility with more recent Torch libraries.
habitat-lab is built at version 0.2.2
habitat-sim is built from the commit 80f8e31140eaf50fe6c5ab488525ae1bdf250bd9.

System specifications

Ubuntu 20.04
SuperMicro X11DAI-N Motherboard
Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz | 2 Sockets * 16 Cores pers sockets * 2 Threads per core
NVIDIA RTX 3090 GPU
NVIDIA 515.57 (straight from NVIDIA's website)
CUDA Toolkit 11.7 (also from NVIDIA's website)

conda create -n ss-hab-headless-py39 python=3.9 cmake=3.14.0 -y
conda activate ss-hab
pip install pytest-xdist
pip install rsatoolbox # Neural activity pattern analysis
pip install seaborn matplotlib # For plots

Habitat-lab Stable 0.2.2

git clone --branch stable https://github.com/facebookresearch/habitat-lab.git # Currently @ 0f454f62e41050bc90ca468c62db35d7484923ff
cd habitat-lab
# pip install -e .
# While installing deps., pip threw out error that some conflict due to TF 2.1.0 was preventing installing all the deps.
# Manually ran `pip install tensorflow-gpu==2.1.0` just to be sure
# Additionally, install all the other deps for dev. purposes
pip install -r requirements.txt
python setup.py develop --all # So far so good

# Leave the directory for the subsequent steps
cd ..

Habitat-sim

According to SS's docs, using the sound simulator requires building with --audio flag for sound support.
Building --with-cuda requires CUDA toolkit to be installed and accessible through the following variable environmetns:
- PATH contains /usr/local/cuda-11.7/bin or similar
- LD_LIBRARY_PATH contains /usr/local/cuda-11.7/lib64

# Makes sure all system level deps are installed.
sudo apt-get update || True
sudo apt-get install -y --no-install-recommends libjpeg-dev libglm-dev libgl1-mesa-glx libegl1-mesa-dev mesa-utils xorg-dev freeglut3-dev # Ubuntu
# Clone habitat-sim repository
git clone https://github.com/facebookresearch/habitat-sim.git # Current @ 80f8e31140eaf50fe6c5ab488525ae1bdf250bd9
cd habitat-sim
# Checkout the commit suggest by ChanganVR
git checkout 80f8e31140eaf50fe6c5ab488525ae1bdf250bd9
# Build habitat-sim with audio and headless support
python setup.py install --with-cuda --bullet --audio --headless # compilation goes brrrr...
pip install hypothesis # For the tests mainly

Additional building instructions

Preliminary testing

Getting the test scene_datasets:

Some tests (tests/test_physics.py, tests/test_simlulator.py) require "habitat test scenes" and "habitat test objects". Where are those ? Datasets ? The provided download tool is broken. Manually downloading those:

To get the habitat-test-scenes, from the habitat-sim root folder:

python src_python/habitat_sim/utils/datasets_download.py --uids habitat_test_scenes

Getting the (test / example) habitat objects:

python src_python/habitat_sim/utils/datasets_download.py --uids habitat_example_objects

With this, pytest tests/test_physics.py should have 100% success rate.

Interactive testing

This assumes habitat-sim was built with display support.

python examples/viewer.py --scene data/scene_datasets/habitat-test-scenes/skokloster-castle.glb

Note: In case it return ModuleNotFound examples.settings error, edit the examples/viewer.py and remove the examples. from the relevant import line.

Non-interactive testing

With compiled habitat-sim, this fails to run, returning a free(): invalid pointer. Should work if run from habitat-lab directory instead of habitat-sim.

python examples/example.py --scene data/scene_datasets/habitat-test-scenes/skokloster-castle.glb

Acquiring datasets necessary for simulations

Getting the ReplicaCAD dataset

python src_python/habitat_sim/utils/datasets_download.py --uids replica_cad_dataset

Running the physics interaction simulation:

python examples/viewer.py --dataset data/replica_cad/replicaCAD.scene_dataset_config.json --scene apt_1

Other notes

To get tests/test_controls.py to pass, need to pip install hypothesis if not already.
When habitat-sim is manually compiled, it seems that commands such as python -m habitat_sim.utils.datasets_download do not work. Instead use cd /path/to/habitat-sim && python src_python/utils/dowload_datasets.py. This allegedly does not happen if habitat-sim was installed through conda.

To check that both habitat-lab and habitat-sim work with each other:

cd ../habitat-lab
python examples/example.py

It should say "... ran for 200 steps" or something similar.

Soundspaces 2.0 (Unused)

# git clone https://github.com/facebookresearch/sound-spaces.git # Currently @ fb68e410a4a1388e2d63279e6b92b6f082371fec
git clone https://github.com/facebookresearch/sound-spaces.git
cd sound-spaces
git checkout fb68e410a4a1388e2d63279e6b92b6f082371fec
pip install -e .

Downloading the `scene_datasets` for Sound Spaces 2.0 habitat audio visual simulations

Requires access to the download_mp.py tool from official Matterport3D. See https://github.com/facebookresearch/habitat-lab#matterport3d

mkdir -p data/scene_datasets
mkdir -p data/versioned_data/mp3d
python /path/to/download_mp.py --task habitat -o /path/to/sound-spaces/data/versioned_data/mp3d

This will download a ZIP file into /path/to/sound-spaces/data/versioned_data/mp3d/v1/tasks/mp3d_habitat.zip

Unzip this file to obtain /path/to/sound-spaces/data/versioned_data/mp3d/v1/tasks/mp3d. This folder should contain files like: 17DRP5sb8fy, 1LXtFkjw3qL, etc...

Make it so that /path/to/sound-spaces/data/scene_datasets/mp3d points to /path/to/sound-spaces/data/versioned_data/mp3d/v1/tasks/mp3d. For example:

ln -s /path/to/sound-spaces/data/versioned_data/mp3d/v1/tasks/mp3d` `/path/to/sound-spaces/data/scene_datasets/mp3d`

Some additional metadata that is intertwined with other datasets and features of soundspaces is also required:

# From /path/to/soundspaces/data, run:
wget http://dl.fbaipublicfiles.com/SoundSpaces/metadata.tar.xz && tar xvf metadata.tar.xz # 1M
wget http://dl.fbaipublicfiles.com/SoundSpaces/sounds.tar.xz && tar xvf sounds.tar.xz #13M
wget http://dl.fbaipublicfiles.com/SoundSpaces/datasets.tar.xz && tar xvf datasets.tar.xz #77M
wget http://dl.fbaipublicfiles.com/SoundSpaces/pretrained_weights.tar.xz && tar xvf 
pretrained_weights.tar.xz
# This heavy file can be ignored for Sound Spaces 2.0
# wget http://dl.fbaipublicfiles.com/SoundSpaces/binaural_rirs.tar && tar xvf binaural_rirs.tar # 867G

SS 2.0 command provided in the Reamde are based on mp3d datasets. If trying to run the interactive mode command latter on, it will likely throw up some error about data/metadata/default/... being absent.

This will require the following tweak:

in sound-spaces/data/metadata: ln -s mp3d default

Downloading `mp3d_material_config.json`

Download from the following link: https://github.com/facebookresearch/rlr-audio-propagation/blob/main/RLRAudioPropagationPkg/data/mp3d_material_config.json and put it in /path/to/soundspaces/data/.

Testing SS2.0 in interactive mode

For a machine with display, and with habitat-sim not being built with the --headless flag.

python scripts/interactive_mode.py

Note: This worked on the internal graphics of the motherboard, but not on the RTX 3090 GPU. It might work on an older, or not so fancy GPU. In any case, interactive mode is not that important for the RL use case.

[New] Training continuous navigation agent DDPPO baseline

python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/mp3d/train_telephone/audiogoal_depth_ddppo.yaml --model-dir data/models/ss2/mp3d/dav_nav CONTINUOUS True

Training continuous navigation PPO baseline

python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/mp3d/train_telephone/audiogoal_depth.yaml --model-dir data/models/ss2/mp3d/dav_nav_ppo/ CONTINUOUS True

Evaluating the trained agent

This is done using the test confgurations suggested in the sound-spaces/ss_baselines/av_nav/README.md file.

python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/mp3d/test_telephone/audiogoal_depth.yaml EVAL_CKPT_PATH_DIR data/models/ss2/mp3d/dav_nav/data/ckpt.100.pth CONTINUOUS True

Missing audiogoal in observations error

Runnin the commnad above will probably spit out an error related to the audiogal field missing in the observations directly. The fix is to change the base task configuration with the following:

TASK:
  TYPE: AudioNav
  SUCCESS_DISTANCE: 1.0

  # Original
  # SENSORS: ['SPECTROGRAM_SENSOR']
  # GOAL_SENSOR_UUID: spectrogram

  # For eval support
  SENSORS: ['AUDIOGOAL_SENSOR', 'SPECTROGRAM_SENSOR']
  GOAL_SENSOR_UUID: spectrogram # audiogoal

Basically, it adds the AUDIOGOAL_SENSOR to the config, which in turns generates the corresponding field in the observation of the agent

TypeError: write_gif() got an unexpected keyword argument 'verbose'

Best guess is some form of mismatch between the moviepy version that the tensorboard used here expectes and the one that is actually installed. Current versions are torch==1.12.0 installed from conda according to the official pytorch website, and moviepy==2.0.0.dev2 install from PyPI.

A work around was to edit the make_video in /path/to/venv/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py to add the case when moviepy does not support the verbose argument:

def make_video(tensor, fps):
    try:
        import moviepy  # noqa: F401
    except ImportError:
        print("add_video needs package moviepy")
        return
    try:
        from moviepy import editor as mpy
    except ImportError:
        print(
            "moviepy is installed, but can't import moviepy.editor.",
            "Some packages could be missing [imageio, requests]",
        )
        return
    import tempfile

    t, h, w, c = tensor.shape

    # encode sequence of images into gif string
    clip = mpy.ImageSequenceClip(list(tensor), fps=fps)

    filename = tempfile.NamedTemporaryFile(suffix=".gif", delete=False).name
    try:  # newer version of moviepy use logger instead of progress_bar argument.
        clip.write_gif(filename, verbose=False, logger=None)
    except TypeError:
        try:  # older version of moviepy does not support progress_bar argument.
            clip.write_gif(filename, verbose=False, progress_bar=False)
        except TypeError:
            try: # in case verebose argument is also not supported
                clip.write_gif(filename, verbose=False)
            except TypeError:
                clip.write_gif(filename)

Generating audio and video from SS2.0 trajectories.

SS2.0 supports RGB_SENSOR and DEPTH_SENSOR for agent visual percpetion. For the acoustic perception, it supports the SPECTROGRAM_SENSOR and the AUDIOGOAL_SENSOR, the latter returns the waveform that are initially generated in thte observations field of the env.step() method returns.

A demonstration is given in the sound-spaces/env_test.ipynb notebook in this repository. Unfortunately, Tensorboard does not seem to support logging of video with incorporated audio. However, WANDB is capable of doing so, but the logging step will be out of sync with the actual training step (Tensorboard logging step) of the agent.

Torch

The torch install that comes with the dependencies should work by default on something like GTX 1080 Ti. However, because that one relies on CUDA 10.2 it cannot be used with an RTX 3090 for example (CUDA Error: no kernel image is available for execution on the device ...). Training on an RTX 3090 as of 2022-07-21, thus requires upgrading to a torch version that supports CUDA 11.

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

This will install torch==1.12.0.

CUDA 11.6 unfortunately seems to create too many conflicts with other dependencies, solving the environment ad infinitum.

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

APEX for Pytorch optimizers alternative

Documentation here: https://nvidia.github.io/apex/optimizers.html Github here: https://github.com/NVIDIA/apex

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

In case it throws a CUDA related error, make sure that the version used to compile Pytorch is the same. If not, overrides the last command with something like:

CUDA_HOME=/usr/local/cuda-11.3 pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

SAVi for Global Workspace Agents experiments

Three main features that might be of interest for this project

removes the limitation of only having a ringing telephone. Namely, adds 21 objects with their distinct sounds
the sound is not continuous over the whole episode, but of variable length insetad. This supposedly forces the agent to learn the association between the category of the object and its acoustic features.
Can also probe for the scene, on top of the target object's category.

Addtional setup on top of SoundSpaces 1.0

On top of all the above steps so far for the default SoundSpaces installation,

Run the python scripts/cache_observations.py. Note that since we only use mp3d dataset for now, it will require editing that script to comment out line 105, to make it skip the replica data set, which proper installation is skipped in the steps above. Once this is done, add the symbolic link so that the training scripts can find the file at the expected path:

ln -s /path/to/sound-spaces/data/scene_observations/mp3d /path/to/sound-spaces/data/scene_observations/default

Run the scripts as per the README in the SAVi folder:

a. First, the pre-training the goal label predictor:

python ss_baselines/savi/pretraining/audiogoal_trainer.py --run-type train --model-dir data/models/savi --predict-label

This step seems to require that huge binaural dataset that was skipped back in sound-spaces dataset acquisition section earlier.

wget http://dl.fbaipublicfiles.com/SoundSpaces/binaural_rirs.tar && tar xvf binaural_rirs.tar # 867G

a. without pre-training, continuous simulator

python ss_baselines/savi/run.py --exp-config ss_baselines/savi/config/semantic_audionav/savi.yaml --model-dir data/models/savi CONTINUOUS True

Additional notes

The pretrained weights will be found in /path/to/sound-spaces/data/pretrained_weights, assuming they were properly donwload in the dataset acquisition phase.

> Global Workspace Agent <

A simplified PPO + GRU implementation that exposes the core of the algorithm, as well interactoins with the environment.

To run training code from the ppo folder, need to add a link to the data folder from sound-spaces for the environments to be properly created.

cd /path/to/ss-hab/ppo
ln -s ../sound-spaces/data .

Additional dependencies

# Individual deps. install
pip install wandb # 0.16.2
pip install nvsmi # 0.4.2, for experiments' GPU usage configuration

# One liner install
pip install wandb nvsmi

The agent architectures are located in ppo/models.py

Behavior Cloning on SAVI

Colleccting dataset with Oracle

Either executre the following script, or refer to the Jupyter Notebook of the same name for a more interactive configuration and inspection process.

# --total-steps: target for the number of steps in the dataset
# --num-envs: how many envs to use in parallel (note that each env has a large memory cost)
python SAVI_Oracle_DataCollection.py --total-steps 500000 --num-envs 10

Training Behavior cloning agent on the collected dataset

Assuming a dataset was collected and stored under /path/to/ss-hab/ppo/SAVI_Oracle_Dataset_v0 (or set approprietely with --dataset-path when running to the script):

python ppo_bc.py --agent-type gw --gw-size 64 # or `gru` for baseline agents

The dataset path, SAVI environment configs can be found in the ppo_bc.py argparser config section and updated accordingly to the collected dataset.

Hyper parameter sweeps

The sweeps were conducted using Weight AND Biases (WANDB) sweep utility. The WANDB configs for the search are stored in the ss-hab/ppo/wandb-sweeps folder. The usage is documented in the ss-hab/ppo/wandb-sweeps/ppo_bc_rev1_sweep.sh script (not runnable).

For example, the hyparam search for GW 128 variant is done by creating the configuration wandb-sweeps/ppo_bc_rev1_sweep__gw_128.yml, then running the following commnd to create the WandB sweep.

wandb sweep --project "ss-hab-bc-revised-sweep" ppo_bc_rev1_sweep__gw_128.yml

This would return the Sweep ID, such as: dosssman/ss-hab-bc-revised-sweep/altdwxen.

Then, a WandB agent can be instantiated with using said Sweep id as follows:

wandb agent dosssman/ss-hab-bc-revised-sweep/altdwxen --count 10

Multiple agents can be run in parallel, each taking care of a run of the sweep.

Final runs for revision

We leverage WANDB sweep utility to manage the execution of runs across diverse machines. The configuration for each agent are stored under ss-hab/ppo/wandb-sweeps-finals, and their execution documented in the script in that folder.

Similary to the hyper parameter sweep, the final configuration for the GW 128 variant ca be found in wandb-sweeps-finals/ppo_bc_rev1_final__gw_128.yml. The same procedure as with the hyper parameter sweep described above apply.

Analysis

The IQM-based performance plots were generated using the SAVI_PerfPlots_IQM.ipynb Jupyter Notebook.

The attention weights, probing, and broadcast importance weights were generated by the SAVI_Analysis_Revised.ipynb.

Training RL agents on SoundSpaces AvNav (deprecated)

This will use RGB + Spectrogram as input for the agent, create a timestamped TensorBoard folder automatically and log training metrics as well as video, with and without audio.

python ppo_av_nav.py

To train SAVi agents, load the appropraite configuration file:

python ppo_av_nav.py --config-path env_configs/savi/<savi_config_file>.yaml

Example training comments are documented in the ppo/runs.sh file.

Collecting dataset

The process used to collect samples for Behavior Cloning and more genreally, trajectory inspection is located in the ppo/ppo_collect_dataset.py. Just pass the path to the trained RL agent weights using the same configuration as during the training and it will collect a number of steps hardcoded in the script.

Training Behavior Cloning (BC) agents

Once a dataset is collect under folder ppo_gru_dset_2022_09_21__750000_STEPS for example, pass it with --dataset-path to the the ppo_bc.py script.

python ppo_bc.py --dataset-path ppo_gru_dset_2022_09_21__750000_STEPS

Example training comments are documented in the ppo/runs_bc.sh file.

[OUTDATED as of 2022-07-21] RLRAudioPropagationChannelLayoutType` error workaround

Past this point, there might be some RLRAudioPropagationChannelLayoutType related error when trying to run the interactive mode.

If this fork of soundspaces was used: git clone https://github.com/dosssman/sound-spaces.git --branch ss2-tweaks, then skip until the Finally testing SS2 section.

Otherwise, this will require a workaround in the soundspaces simulator.

Namely, comment or delete the line 125 of sound-spaces/soundspaces/simulator_continuous.py and add:

import habitat_sim._ext.habitat_sim_bindings as hsim_bindings
channel_layout.channelType = hsim_bindings.RLRAudioPropagationChannelLayoutType.Binaural

instead.

This workaround is adapted from habitat-sim/examples/tutorials/audio_agent.py.

The reason is because the soundspace author are using a version of habitat-sim that is more recent than v2.2.0, and where the habitat_sim.sensor.RLRAudioPropagationChannelLayoutType object is properly defined. Since we clone [email protected], however, we revert to using the hsim_bindings directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Design and Evaluation of a Global Workspace Agent Embodied in a Realistic Multimodal Environment | Soundspaces - Habitat-lab - Habitat-sim setup

Introduction

General guidelines for reproduction (as of 2022-07-21)

System specifications

Habitat-lab Stable 0.2.2

Habitat-sim

Preliminary testing

Acquiring datasets necessary for simulations

Soundspaces 2.0 (Unused)

Downloading the `scene_datasets` for Sound Spaces 2.0 habitat audio visual simulations

Downloading `mp3d_material_config.json`

Testing SS2.0 in interactive mode

[New] Training continuous navigation agent DDPPO baseline

Training continuous navigation PPO baseline

Evaluating the trained agent

Generating audio and video from SS2.0 trajectories.

Torch

APEX for Pytorch optimizers alternative

SAVi for Global Workspace Agents experiments

Addtional setup on top of SoundSpaces 1.0

> Global Workspace Agent <

Additional dependencies

Behavior Cloning on SAVI

Colleccting dataset with Oracle

Training Behavior cloning agent on the collected dataset

Hyper parameter sweeps

Final runs for revision

Analysis

Training RL agents on SoundSpaces AvNav (deprecated)

Collecting dataset

Training Behavior Cloning (BC) agents

[OUTDATED as of 2022-07-21] RLRAudioPropagationChannelLayoutType` error workaround

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 391 Commits
.vscode		.vscode
ppo		ppo
sound-spaces		sound-spaces
.gitignore		.gitignore
README.md		README.md

arayabrain/multimodal-global-workspace-agent

Folders and files

Latest commit

History

Repository files navigation

Design and Evaluation of a Global Workspace Agent Embodied in a Realistic Multimodal Environment | Soundspaces - Habitat-lab - Habitat-sim setup

Introduction

General guidelines for reproduction (as of 2022-07-21)

System specifications

Habitat-lab Stable 0.2.2

Habitat-sim

Preliminary testing

Acquiring datasets necessary for simulations

Soundspaces 2.0 (Unused)

Downloading the scene_datasets for Sound Spaces 2.0 habitat audio visual simulations

Downloading mp3d_material_config.json

Testing SS2.0 in interactive mode

[New] Training continuous navigation agent DDPPO baseline

Training continuous navigation PPO baseline

Evaluating the trained agent

Generating audio and video from SS2.0 trajectories.

Torch

APEX for Pytorch optimizers alternative

SAVi for Global Workspace Agents experiments

Addtional setup on top of SoundSpaces 1.0

> Global Workspace Agent <

Additional dependencies

Behavior Cloning on SAVI

Colleccting dataset with Oracle

Training Behavior cloning agent on the collected dataset

Hyper parameter sweeps

Final runs for revision

Analysis

Training RL agents on SoundSpaces AvNav (deprecated)

Collecting dataset

Training Behavior Cloning (BC) agents

[OUTDATED as of 2022-07-21] RLRAudioPropagationChannelLayoutType` error workaround

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Downloading the `scene_datasets` for Sound Spaces 2.0 habitat audio visual simulations

Downloading `mp3d_material_config.json`

Packages