Dreamer v1 / v2 [Model-based RL] #345

dosssman · 2023-01-10T05:06:46Z

Description

Adds Dreamer-v1 and Dreamer-v2 algorithm simplified.
Tested mainly on Atari.
Documentation WIP
Uses Truncated Backpropagation Through Time (TBPTT) as motivated in [https://arxiv.org/pdf/2210.13383.pdf](EVALUATING LONG-TERM MEMORY IN 3D MAZES)
Hopefully building toward JAX variant for even faster training, as well as Director algorithm.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

…et, but still WIP

…ory sampling

…loss computation, still have to check that it works as expecte dthough

…ogress on the ACtor critci functionality

…l update step

…training faster ?

… by around 5 times !

…_runs.sh

…s's noe hot categorical straight through instead of hte custom implementation; addded ETA and UPS related trainng stats logging

…e jsik's instead of danijar though, as the former was tested more than enough in our experiments

…tch size trade-off investigation experimetns

…at least

…stigataon for Pong

vercel · 2023-01-10T05:06:50Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add your feedback	Jan 23, 2023 at 11:51PM (UTC)

vwxyzjn · 2023-01-11T14:45:06Z

FWIW: https://www.reddit.com/r/reinforcementlearning/comments/108t325/dreamv3_mastering_diverse_domains_through_world/

dosssman · 2023-01-13T03:56:43Z

Preliminary results for Atari Pong and Breakout available on WANDB (did not want to clutter the cleanrl project too much), but will add runs once we have convene on a more or less final structure for the code.

While the code in dreamerv2_atari.py is pretty much done and working, I was thinking of asking for review once I finished up the documentations to justify some of the design choices.
However, would appreciate some early feedback especially on the parts that do not match CleanRL's coding style.

Otherwise, will try to get done with the docs and adding the baseline comparison plots ASAP.
Thanks.

…d not pan out much

sai-prasanna · 2023-01-26T10:03:41Z

@dosssman How can I help? I have experience with dreamer and have ported a replica of dreamerv2 to torch. I would very much love having a cleanRL impl of it (or preferably dreamerv3 for future experimentation).

dosssman · 2023-01-26T13:35:53Z

@sai-prasanna Thanks a lot for chimming in.

Right now, there is a functional implementation of v1 and v2 here, albeit not necessarily as simple as what CleanRL aims to provide..

Would greatly appreciate another pair of eyes going over it, if even summarily, to

Point out some parts that are not really cleanrl-ish, because of some design choices owing to the MbRL nature of the algorithm
Point out the parts that might be hard to understand

Currently working on the documentation while accounting to the difference in implementation compared to baseline.
For example, this implementation uses Truncated Back Propagation Through Time which samples batch of sequential trajectories (I went for TBPTT over the default version because in the subsequent memory-maze paper they have shown that TBPTT does work better than sampling non-contiguous batch of trajectories for the training. Furthermore, it just feels more logical to do.

A rough outline of the documentation that expands a bit more on the design choice is available here.
Apologies in advance for the roughness though.

Once we can convene on a standard for MbRL methods, or at least Dreamer type agents, we can use this as basis to extend toward Dreamer v3, hopefully, which feels more like a pack of practical implementation details and well generalizing hyper parameter sets, without changing much of the underlying logic of the algorithm.

Overall, I think Dreamer v1 and v2 are probably better to understand the different parts of the algorithms than Dreamer v3.
The latter is more oriented toward juicing out performance over tasks by using practical / implementation tweaks and methods, which I think are orthogonal to the core theory underlying the algorithm.
Of course, it will be a nice addition latter, on top of v1 and v2.

Another thing that might be worth doing is also adding support for DMC / Mujoco task support based on dreamer_atari.py.
The required changes would essential be to add support for the continuous control environment itself, as well as changing adapting the ActorHead method here.
The rest should be working out of the box.

dosssman · 2023-01-26T13:46:33Z

Another aspect that could probably benefit from improving it would be the slow training speed of the algorithm as is. Did my best to cut out most bottleneck, but training still takes a long time namely owing to the for loop over the batch length for the dynamics estimation by the RSSM (GRU cell).

Would be nice to find a way to a) either improve the training speed in Pytorch (JIT, functorch, etc... ?), or 2) port it to JAX once we have convened on the final Pytorch version. Last option c) would deviate from the original hyperparameters by using shorter batch length T=20 instead of T-50 (default) to reduce the RNN related bottleneck while still getting good enough results thanks to the TBPTT. Did some preliminary tests on Atari Pong, and using B = 50 and T = 20 does not seem to hurt that much.

I think this it is important to reduce the training time of this algorithm to make it more affordable to experiment with.

dosssman · 2023-01-28T06:29:09Z

In case getting the code under this fork / branch running poses some problems, here is how I setup the environment.
Probably best to checkout the latest commit of this PR's branch.

For recent GPU (RTX family and CUDA >=11.6):

conda create -n cleanrl-mbrl-dreamer python=3.9 -y
conda activate cleanrl-mbrl-dreamer
# Poetry install inside conda
pip install poetry
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
# Atari support
poetry install --with atari
# Depending on the GPU used, overriding Pytorch and CUDA version can help train faster
# Namely, for GTX 1080 Ti and similar use Torch 1.10.2 and CUDA 10.2 instead of 11.3
# conda install pytorch=1.10.2 torchvision torchaudio cudatoolkit=10.2 -c pytorch -y
# Jupyter kernel support
pip install ipykernel
# Video logging support with TensorboardX
pip install moviepy torchinfo

For older GPUs such as GTX 1080 Ti:

# Older python 3.8 + CUDA 10.2 and Pytorch 1.10.2 for compat ?
conda create -n cleanrl-mbrl-cleanrl-10.2 python=3.8 -y
conda activate ...
poetry install --with atari
conda install pytorch=1.10.2 torchvision torchaudio cudatoolkit=10.2 -c pytorch -y
pip install moviepy

Hope it helps.

sai-prasanna · 2023-01-30T20:11:53Z

@dosssman Thanks for your well thought out, detailed plan for action! I will ease my way into these tasks starting from a review of the existing code soon. After that I will take a stab on atari/continuous action space support.

Yep, TBTT makes sense for long horizon credit assignment, for short horizons, using zeros as hidden start state potentially acts as a regularizer. But so long as there is no performance difference, I think TBTT as is a the best default.

I agree your point on dreamer v3. It's going to be purely few changes to reward scaling, hyper parameters, and value function implementation (they use a distributional RL type value network prediction).

sai-prasanna · 2023-02-17T09:57:41Z

@dosssman Sorry, I couldn't do as even the little I planned to. Taking a high level look on the code, I am not sure if encapsulating the training code in world model and the actor-critic is "clean-rl"-like. But that's purely from comparing with model-free algorithms where there is only a single training block without too much abstractions.

If we are going for a single training code block type thing we should simply the world model & Actor-Critic into dumb torch modules. And extract the train/imagine functions out, either into a single train function or small pure functions.

But if you think the original dreamer approach is cleaner, then there isn't much to do.

dosssman · 2023-02-17T11:42:18Z

@sai-prasanna No worries. I know life get sin the way haha.
Thanks a lot for the feedback. WIll try to make a draft of the variant with the training and imagine functions as a block.
Might probably be easier to understand, especially for the actor critic part.

The current Dreamer like approach is actually based on some of my research projects, where I need to easily swap different World Models and Actor Critic type, but this is probably not that relevant in this case.

dosssman added 30 commits November 24, 2022 19:20

Preliminary work for Dreamer v1 / v2: seting up the env and the datas…

70fe0df

…et, but still WIP

Added Atari env, Iterable dataset with TBTT support for batch traject…

372bca1

…ory sampling

Dreamer Atari with general training outline

1febb97

Further flesh out of the world model, up to the forwrad pass and the …

de8f2d4

…loss computation, still have to check that it works as expecte dthough

World model pretty mcuh done, with video generation

b95ff6f

Dreamer atari: added actor critic to the Dremaer agent

7f997f0

Added sample_action and imagine method to the world model, further pr…

e5dfa89

…ogress on the ACtor critci functionality

Essentially working dreamer.atari.py

de4054b

Dreamer atari a bit of clean up, queue up benchmark runs

915db6b

Slight tweaks of dreamer_atari.py

c4cd412

Dreamer atari training stast logging mechanism tweaked

d7d7712

Further tweaks of the logging in dremaer_atari.py

a4be29c

fruther flesh out of the logging and training logic, how often to cal…

c3bff68

…l update step

Further tweak of the train_every

3fdd0a4

Tweaked the dreamer_runs.sh benchmarking script

ff860fa

Fixed world model loss computation missing KL_loss scaled term

4c99839

Queued up runs with B 250 and T 10: shorter sequence could help make …

2619922

…training faster ?

Removed the notebook variant of dreamer_atari

2562e53

Added dreamer atari variant with AMP support, improving traiing speed…

5866325

… by around 5 times !

Added more Dreamer atari AMP experimetns with num-envs = 4 to dreamer…

1fc5211

…_runs.sh

Added more Dreamer atari AMP experimetns with num-envs = 4 to dreamer…

52ff455

…_runs.sh

Fix gradient scaling befor grad norm clipping uncorrecctly implemented

98ad4e0

Queued up num envs = 1 dreamer_atari_ampfix

fa5d4c8

Added runs for dreamer_atari with more frequent training

1371fe9

Dreamer Atari Amp small tweak

fe011b5

dreamer atari small tweaks

63691d3

Queud up dreamer_runs with with train every 8 steps instead of 4

7bd1efd

further tweaks to the default Dreamer Atari: using Torch distribution…

151d351

…s's noe hot categorical straight through instead of hte custom implementation; addded ETA and UPS related trainng stats logging

Made slow value a default setting, using baseline = value(s_list) lik…

18e96b8

…e jsik's instead of danijar though, as the former was tested more than enough in our experiments

Just added ac omment

22151ba

dosssman added 10 commits December 14, 2022 13:08

Fxied some hyper parameters

9f1f83b

Requeud Pong Dreamer runs experimetns: basseline, bathc length and ba…

57462d8

…tch size trade-off investigation experimetns

Removed torchinfo usage to stop the gitbot testing failures, for now …

8f7dd12

…at least

Queued up and commented out Dreamer runs on Breakout

44f2c77

Queud upa nd comment out longer runs for Breakout task, num-envs inve…

9e23123

…stigataon for Pong

Adding preliminary documentation of the Dreamer

a0cbe95

Added Breakout runs with batch length of 20

26aa8ea

Fixed Breakout with shorter batch length

66c3c62

Added design choices explanation to the Dreamer docs

42039cd

more writing of the docs for the Dreamer implementation

54de809

vercel bot deployed to Preview January 10, 2023 05:07 View deployment

dosssman self-assigned this Jan 10, 2023

Removed deprecated dreamer_atari_amp and _thdbern variants as they di…

d5e718e

…d not pan out much

vercel bot deployed to Preview January 13, 2023 05:42 View deployment

Queued up and commented out the dreamer_runs.sh for Montezuma Revenge

c4fa178

vercel bot deployed to Preview January 23, 2023 23:42 View deployment

dosssman added 2 commits January 24, 2023 08:50

Fixed the dreamer_runs.sh script for Montezuma experimetns

16001e8

Fixed the dreamer_runs.sh script for Montezuma experimetns

876ab53

vercel bot deployed to Preview January 23, 2023 23:51 View deployment

vwxyzjn closed this Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dreamer v1 / v2 [Model-based RL] #345

Dreamer v1 / v2 [Model-based RL] #345

dosssman commented Jan 10, 2023 •

edited

Loading

vercel bot commented Jan 10, 2023 •

edited

Loading

vwxyzjn commented Jan 11, 2023

dosssman commented Jan 13, 2023 •

edited

Loading

sai-prasanna commented Jan 26, 2023

dosssman commented Jan 26, 2023 •

edited

Loading

dosssman commented Jan 26, 2023

dosssman commented Jan 28, 2023 •

edited

Loading

sai-prasanna commented Jan 30, 2023 •

edited

Loading

sai-prasanna commented Feb 17, 2023 •

edited

Loading

dosssman commented Feb 17, 2023 •

edited

Loading

Dreamer v1 / v2 [Model-based RL] #345

Dreamer v1 / v2 [Model-based RL] #345

Conversation

dosssman commented Jan 10, 2023 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented Jan 10, 2023 • edited Loading

vwxyzjn commented Jan 11, 2023

dosssman commented Jan 13, 2023 • edited Loading

sai-prasanna commented Jan 26, 2023

dosssman commented Jan 26, 2023 • edited Loading

dosssman commented Jan 26, 2023

dosssman commented Jan 28, 2023 • edited Loading

sai-prasanna commented Jan 30, 2023 • edited Loading

sai-prasanna commented Feb 17, 2023 • edited Loading

dosssman commented Feb 17, 2023 • edited Loading

dosssman commented Jan 10, 2023 •

edited

Loading

vercel bot commented Jan 10, 2023 •

edited

Loading

dosssman commented Jan 13, 2023 •

edited

Loading

dosssman commented Jan 26, 2023 •

edited

Loading

dosssman commented Jan 28, 2023 •

edited

Loading

sai-prasanna commented Jan 30, 2023 •

edited

Loading

sai-prasanna commented Feb 17, 2023 •

edited

Loading

dosssman commented Feb 17, 2023 •

edited

Loading