Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reccurent policy #210

Closed
yiwc opened this issue Nov 1, 2020 · 9 comments
Closed

reccurent policy #210

yiwc opened this issue Nov 1, 2020 · 9 comments
Labels
duplicate This issue or pull request already exists

Comments

@yiwc
Copy link

yiwc commented Nov 1, 2020

is there a easy way to implement a recurrent policy now in the bate BS3?

I notice these is not such policy like lstm. But, we are in urgently need that.

Thanks!

@araffin araffin added the duplicate This issue or pull request already exists label Nov 1, 2020
@Miffyli
Copy link
Collaborator

Miffyli commented Nov 1, 2020

Closing as duplicate of #1 and #160.

LSTM/Recurrent support is planned for v1.1. Adding support for them is not as trivial as in supervised learning so it will require modifications all around the algorithms.

@sycz00
Copy link

sycz00 commented Dec 1, 2021

@Miffyli
Hey I hope its okay to post that into this thread and not one of the others.
I know that recurrent policies are not available in SB3 atm. I am working on a project for which I need some sort of recurrent structure, since its a POMDP env. My question is: do you think it could work using a TCN in the end of my custom feature extractor to learn long term dependencies ? because as far as I've understood, a TCN does not need BPTT.
Greetings:)

@Miffyli
Copy link
Collaborator

Miffyli commented Dec 1, 2021

@sycz00 TCN (temporal convolution network, I assume?) could work for that case. There is now also an experimental LSTM support for PPO here. You need to experiment to find out what works for your case :)

@sycz00
Copy link

sycz00 commented Dec 1, 2021

Yes thats what I mean. Wow thats amazing, thank you. Iam gonna look into this
It seems to be not working so far ? thu-ml/tianshou#486

@sycz00
Copy link

sycz00 commented Dec 1, 2021

pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
does not contain the ppo_lstm branch ?

@Miffyli
Copy link
Collaborator

Miffyli commented Dec 1, 2021

@sycz00 You need to checkout feat/ppo-lstm branch, not ppo_lstm.

@sycz00
Copy link

sycz00 commented Dec 1, 2021

@Miffyli Thanks:)

It looks like MultiInputPolicy is not available yet ? cause I got dict observation space. For now I am trying it with just one image.

@araffin
Copy link
Member

araffin commented Dec 1, 2021

It looks like MultiInputPolicy is not available yet ? cause I got dict observation space. For now I am trying it with just one image.

it is not (yet) but you can add the support, it is mostly a question of copy-pasting what is in the replay buffer code...

@sycz00
Copy link

sycz00 commented Dec 1, 2021

Okay good:) does the LSTM only exists for the actor ? if yes, why ? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

4 participants