-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dreamer v1 / v2 [Model-based RL] #345
Conversation
…et, but still WIP
…loss computation, still have to check that it works as expecte dthough
…ogress on the ACtor critci functionality
…training faster ?
… by around 5 times !
…s's noe hot categorical straight through instead of hte custom implementation; addded ETA and UPS related trainng stats logging
…e jsik's instead of danijar though, as the former was tested more than enough in our experiments
…tch size trade-off investigation experimetns
…stigataon for Pong
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Preliminary results for Atari Pong and Breakout available on WANDB (did not want to clutter the cleanrl project too much), but will add runs once we have convene on a more or less final structure for the code. While the code in Otherwise, will try to get done with the docs and adding the baseline comparison plots ASAP. |
…d not pan out much
@dosssman How can I help? I have experience with dreamer and have ported a replica of dreamerv2 to torch. I would very much love having a cleanRL impl of it (or preferably dreamerv3 for future experimentation). |
@sai-prasanna Thanks a lot for chimming in. Right now, there is a functional implementation of v1 and v2 here, albeit not necessarily as simple as what CleanRL aims to provide.. Would greatly appreciate another pair of eyes going over it, if even summarily, to
Currently working on the documentation while accounting to the difference in implementation compared to baseline. A rough outline of the documentation that expands a bit more on the design choice is available here. Once we can convene on a standard for MbRL methods, or at least Dreamer type agents, we can use this as basis to extend toward Dreamer v3, hopefully, which feels more like a pack of practical implementation details and well generalizing hyper parameter sets, without changing much of the underlying logic of the algorithm. Overall, I think Dreamer v1 and v2 are probably better to understand the different parts of the algorithms than Dreamer v3. Another thing that might be worth doing is also adding support for DMC / Mujoco task support based on |
Another aspect that could probably benefit from improving it would be the slow training speed of the algorithm as is. Did my best to cut out most bottleneck, but training still takes a long time namely owing to the for loop over the batch length for the dynamics estimation by the RSSM (GRU cell). Would be nice to find a way to a) either improve the training speed in Pytorch (JIT, functorch, etc... ?), or 2) port it to JAX once we have convened on the final Pytorch version. Last option c) would deviate from the original hyperparameters by using shorter batch length T=20 instead of T-50 (default) to reduce the RNN related bottleneck while still getting good enough results thanks to the TBPTT. Did some preliminary tests on Atari Pong, and using B = 50 and T = 20 does not seem to hurt that much. I think this it is important to reduce the training time of this algorithm to make it more affordable to experiment with. |
In case getting the code under this fork / branch running poses some problems, here is how I setup the environment. For recent GPU (RTX family and CUDA >=11.6): conda create -n cleanrl-mbrl-dreamer python=3.9 -y
conda activate cleanrl-mbrl-dreamer
# Poetry install inside conda
pip install poetry
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
# Atari support
poetry install --with atari
# Depending on the GPU used, overriding Pytorch and CUDA version can help train faster
# Namely, for GTX 1080 Ti and similar use Torch 1.10.2 and CUDA 10.2 instead of 11.3
# conda install pytorch=1.10.2 torchvision torchaudio cudatoolkit=10.2 -c pytorch -y
# Jupyter kernel support
pip install ipykernel
# Video logging support with TensorboardX
pip install moviepy torchinfo For older GPUs such as GTX 1080 Ti: # Older python 3.8 + CUDA 10.2 and Pytorch 1.10.2 for compat ?
conda create -n cleanrl-mbrl-cleanrl-10.2 python=3.8 -y
conda activate ...
poetry install --with atari
conda install pytorch=1.10.2 torchvision torchaudio cudatoolkit=10.2 -c pytorch -y
pip install moviepy Hope it helps. |
@dosssman Thanks for your well thought out, detailed plan for action! I will ease my way into these tasks starting from a review of the existing code soon. After that I will take a stab on atari/continuous action space support. Yep, TBTT makes sense for long horizon credit assignment, for short horizons, using zeros as hidden start state potentially acts as a regularizer. But so long as there is no performance difference, I think TBTT as is a the best default. I agree your point on dreamer v3. It's going to be purely few changes to reward scaling, hyper parameters, and value function implementation (they use a distributional RL type value network prediction). |
@dosssman Sorry, I couldn't do as even the little I planned to. Taking a high level look on the code, I am not sure if encapsulating the training code in world model and the actor-critic is "clean-rl"-like. But that's purely from comparing with model-free algorithms where there is only a single training block without too much abstractions. If we are going for a single training code block type thing we should simply the world model & Actor-Critic into dumb torch modules. And extract the train/imagine functions out, either into a single train function or small pure functions. But if you think the original dreamer approach is cleaner, then there isn't much to do. |
@sai-prasanna No worries. I know life get sin the way haha. The current Dreamer like approach is actually based on some of my research projects, where I need to easily swap different World Models and Actor Critic type, but this is probably not that relevant in this case. |
Description
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.