-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recurrent policy implementation in ppo [feature-request] #18
recurrent policy implementation in ppo [feature-request] #18
Comments
Hello, It is planned but for v1.1+ (so not before 1 or 2 months at least). In the meantime, you can always use frame-stacking if you need to account for history (it yields most of the time competitive results). This feature will need extra care because it may complexify the codebase, it is a feature wanted by users and it is also an open research question. |
Related #160 |
@pushkalkatara Having worked on getting recurrent networks to work with DDPG, TD3 and SAC (https://arxiv.org/pdf/2110.12628.pdf), one important question is, do you want to apply recurrent to take into account of the (1) entire history or (2) just a short window of it? As arrafin mentioned, if your problem is (2), then stacking would be an easier option. In fact, there isn't a simple solution that allow both (1) and (2) to be implemented together, so before diving into coding we should reflect on our actual needs :D. |
I have a very experimental version of recurrent PPO in a SB3 contrib branch based on SB2/cleanRL implementation: Stable-Baselines-Team/stable-baselines3-contrib#53 Use it at your own risk :p |
Hi, is CNNLSTM based policy implementation anytime soon for ppo?
The text was updated successfully, but these errors were encountered: