-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] TRPO needed #467
Comments
Hello,
TRPO is planned (see roadmap #1), however there is no clear deadline, especially as PPO is usually a competitive alternative to ti. We would welcome a PR that adds it ;) (and that must replicate previous results, as we do for the contrib repo: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/CONTRIBUTING.md) |
Thanks, I see. |
Hi, I am actually in the process of implementing TRPO and was wondering how/where to discuss certain design choices? One question I am wondering about is the ActorCriticPolicy design. Currently the Adam optimizer is set to optimize both actor/critic params but in TRPO we would want self.optimizer to just update the critic. This can be done with a custom policy but seems wasteful given how much code repetition there would be. Thanks! |
@jdchang1 Hey! Adding new algos is always welcome! I recommend looking at contrib repo for a place to add the algorithm. There you are free to do more ad-hoc solutions without fear of breaking or bloating up the core repository (this one), so you can create a new version of the ActorCriticPolicy that does not update both parts. Feel free to open a new issue discussing adding TRPO there :) |
closing in fabor of Stable-Baselines-Team/stable-baselines3-contrib#38 |
🚀 Feature
I found there is no TRPO in this repo, however, in https://github.com/hill-a/stable-baselines, it has. Will you add it?
Motivation
Pitch
Add TRPO from https://github.com/hill-a/stable-baselines
### Checklist
The text was updated successfully, but these errors were encountered: