reccurent policy #210

yiwc · 2020-11-01T03:29:12Z

is there a easy way to implement a recurrent policy now in the bate BS3?

I notice these is not such policy like lstm. But, we are in urgently need that.

Thanks!

Miffyli · 2020-11-01T14:20:16Z

Closing as duplicate of #1 and #160.

LSTM/Recurrent support is planned for v1.1. Adding support for them is not as trivial as in supervised learning so it will require modifications all around the algorithms.

sycz00 · 2021-12-01T14:54:37Z

@Miffyli
Hey I hope its okay to post that into this thread and not one of the others.
I know that recurrent policies are not available in SB3 atm. I am working on a project for which I need some sort of recurrent structure, since its a POMDP env. My question is: do you think it could work using a TCN in the end of my custom feature extractor to learn long term dependencies ? because as far as I've understood, a TCN does not need BPTT.
Greetings:)

Miffyli · 2021-12-01T14:57:49Z

@sycz00 TCN (temporal convolution network, I assume?) could work for that case. There is now also an experimental LSTM support for PPO here. You need to experiment to find out what works for your case :)

sycz00 · 2021-12-01T14:59:06Z

Yes thats what I mean. Wow thats amazing, thank you. Iam gonna look into this
It seems to be not working so far ? thu-ml/tianshou#486

sycz00 · 2021-12-01T15:35:40Z

pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
does not contain the ppo_lstm branch ?

Miffyli · 2021-12-01T15:42:09Z

@sycz00 You need to checkout feat/ppo-lstm branch, not ppo_lstm.

sycz00 · 2021-12-01T17:58:08Z

@Miffyli Thanks:)

It looks like MultiInputPolicy is not available yet ? cause I got dict observation space. For now I am trying it with just one image.

araffin · 2021-12-01T17:59:59Z

It looks like MultiInputPolicy is not available yet ? cause I got dict observation space. For now I am trying it with just one image.

it is not (yet) but you can add the support, it is mostly a question of copy-pasting what is in the replay buffer code...

sycz00 · 2021-12-01T18:04:03Z

Okay good:) does the LSTM only exists for the actor ? if yes, why ? :)

araffin added the duplicate This issue or pull request already exists label Nov 1, 2020

Miffyli closed this as completed Nov 1, 2020

RajS999 mentioned this issue Mar 26, 2021

Implementing RNN Structure AI4Finance-Foundation/FinRL#195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reccurent policy #210

reccurent policy #210

yiwc commented Nov 1, 2020

Miffyli commented Nov 1, 2020 •

edited

Loading

sycz00 commented Dec 1, 2021

Miffyli commented Dec 1, 2021

sycz00 commented Dec 1, 2021 •

edited

Loading

sycz00 commented Dec 1, 2021

Miffyli commented Dec 1, 2021

sycz00 commented Dec 1, 2021

araffin commented Dec 1, 2021

sycz00 commented Dec 1, 2021

reccurent policy #210

reccurent policy #210

Comments

yiwc commented Nov 1, 2020

Miffyli commented Nov 1, 2020 • edited Loading

sycz00 commented Dec 1, 2021

Miffyli commented Dec 1, 2021

sycz00 commented Dec 1, 2021 • edited Loading

sycz00 commented Dec 1, 2021

Miffyli commented Dec 1, 2021

sycz00 commented Dec 1, 2021

araffin commented Dec 1, 2021

sycz00 commented Dec 1, 2021

Miffyli commented Nov 1, 2020 •

edited

Loading

sycz00 commented Dec 1, 2021 •

edited

Loading