Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recurrent policy implementation in ppo [feature-request] #18

Closed
pushkalkatara opened this issue May 12, 2020 · 4 comments · Fixed by Stable-Baselines-Team/stable-baselines3-contrib#53
Labels
enhancement New feature or request help wanted Help from contributors is welcomed
Milestone

Comments

@pushkalkatara
Copy link

Hi, is CNNLSTM based policy implementation anytime soon for ppo?

@araffin araffin added the enhancement New feature or request label May 12, 2020
@araffin
Copy link
Member

araffin commented May 12, 2020

Hello,
Please take a look at the roadmap: #1

It is planned but for v1.1+ (so not before 1 or 2 months at least). In the meantime, you can always use frame-stacking if you need to account for history (it yields most of the time competitive results).

This feature will need extra care because it may complexify the codebase, it is a feature wanted by users and it is also an open research question.

@Miffyli Miffyli added this to the v1.2 milestone Jun 15, 2020
@araffin araffin added the help wanted Help from contributors is welcomed label Mar 18, 2021
@araffin
Copy link
Member

araffin commented May 10, 2021

Related #160

@zhihanyang2022
Copy link

zhihanyang2022 commented Nov 14, 2021

@pushkalkatara Having worked on getting recurrent networks to work with DDPG, TD3 and SAC (https://arxiv.org/pdf/2110.12628.pdf), one important question is, do you want to apply recurrent to take into account of the (1) entire history or (2) just a short window of it? As arrafin mentioned, if your problem is (2), then stacking would be an easier option. In fact, there isn't a simple solution that allow both (1) and (2) to be implemented together, so before diving into coding we should reflect on our actual needs :D.

@araffin
Copy link
Member

araffin commented Nov 25, 2021

I have a very experimental version of recurrent PPO in a SB3 contrib branch based on SB2/cleanRL implementation: Stable-Baselines-Team/stable-baselines3-contrib#53

Use it at your own risk :p
(I will try to continue to work on it but help is welcome too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Help from contributors is welcomed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants