[Feature Request] Implementation of Lattice exploration (Chiappa et al., NeurIPS 2023) #1829
Open
2 tasks done
Labels
enhancement
New feature or request
🚀 Feature
I propose to include in Stable Baselines 3 an option to use Lattice exploration, an action noise that some colleagues and I have presented in this NeurIPS paper last year. Lattice introduces noise in the policy network before the last dense layer, making the action distribution a multivariate gaussian with full covariance matrix. It can improve the performance of SAC and PPO in high-dimensional environments with many actuators. In particular, we have been using it with success in the musculoskeletal simulation library MyoSuite, where we benchmarked it together with recurrent PPO and obtained good results:
We also tested together with SAC in the common PyBullet locomotion environments, where it is especially competitive in Humanoid:
It also powered our winning solution to the NeurIPS MyoChallenge 2023.
Motivation
It would be easier for the users of SB3 to test Lattice in their environment of choice if it is part of the library, vs installing a separate package or downloading another repository. The change does not break any of the current behavior of the library, as the feature is incremental.
Pitch
I have tried my best to integrate Lattice in SB3 modifying the codebase as little as possible. In the branch feature/lattice of this fork of SB3 I have implemented Lattice for SAC and PPO. It can be used by setting the argument "use_lattice=True" and passing additional hyperparameters in a dictionary called "lattice_kwargs". It seems to work correctly when called from the configuration files of SB3 zoo. I would invite a SB3 developer to check whether the integration I propose follows the library's guidelines and spirit. If you have no major concern, I would be happy to prepare a pull request!
Alternatives
Alternatively, Lattice could become part of the contrib repository of SB3. However, I don't see a way to implement it this way without creating entirely new algorithms (e.g., LatticePPO, LatticeSAC, …), which is, in my opinion, excessive, given that relatively limited changes have to be implemented in the original algorithms to enable this option.
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: