Memory networks in BenchMARL
BenchMARL release paired with TorchRL 0.6.0
Highlights
RNNs in BenchMARL
We now support RNNs as models!
We have implemented and added to the library GRU and LSTM.
These can be used in the policy or in the critic (both from local agent inputs and from global centralized inputs). They are also compatible with any parameter sharing choice.
We have benchmarked GRU on a multiagent version of repeat_previous and it does solve the task, while just MLP fails
In cotrast to traditional RNNs, we do not do any time padding. Instead we do a for loop over time in the sequence while reading the “is_init” flag. This approach is slower but leads to better results.
Oh also as always you can chain them with as many models as you desire (CNN, GNN, ...)
Simplified model reloading and evaluation
We have added some useful tools described at here
In particular, we have added experiment.evaluate()
and some useful command line tols like benchmarl/evaluate.py
and benchmarl/resume.py
that just take the path to a checkpoint file.
Basically now you can reload models from the hydra run without giving again all the config, the scripts will find the configs you used in the hydra folders automatically.
Better logging of episode metrics
BenchaMARL will now consider an episode done when the global task done is set. Thus, it will allow for agents done early (as long as the global done is set on all()
Here is an overview of how episode metrics are computed:
BenchMARL will be looking at the global done
(always assumed to be set), which can usually be computed using any
or all
over the single agents dones.
In all cases the global done is what is used to compute the episode reward.
We log episode_reward
min, mean, max over episodes at three different levels:
- agent, (disabled by default, can be turned on manually)
- group, averaged over agents in group
- global, averaged over agents in groups and gropus
What's Changed
- [BugFix] Fix collect with grad by @matteobettini in #114
- [Refactor] Update values headed to deprecation by @matteobettini in #118
- [Docs] Citation by @matteobettini in #119
- [Model] GRU and general RNN compatibility by @matteobettini in #116
- [Model] LSTM by @matteobettini in #120
- [BugFix, Feature] GNN
position_key
andvelocity_key
not inobservation_spec
by @matteobettini in #125 - [BugFix] Small fix to multi-group eval and add wandb
project_name
by @matteobettini in #126 - [Feature] Improved experiment reloading and evaluation by @matteobettini in #127
- [BugFix] GNN position and velocity key by @matteobettini in #132
- [BugFix] More flexible episode_reward computation in logger by @matteobettini in #136
- [Feature] RNNs in SAC losses by @matteobettini in #139
- [Version] 1.3.0 by @matteobettini in #140
Full Changelog: 1.2.1...1.3.0