-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAC-discrete implementation #270
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Is there any benchmark run for this ? How does it perform ? |
The results of the original paper are in #266. There's quite a number of games where its performance at 100k doesn't differ much from a random agent, e.g. Frostbite. I don't think it makes much sense evaluating those. Right now I just ran it a couple of times manually to see that the code actually works. A modified version of this codebase has been able to solve Catcher (PyGame) and some simple Minigrid environments. In general, I'm going to try and find some good hyperparameters and then running it on a few environments where performance actually differs from a random agent. Don't know for sure when I have time for that though... Once that's done, I'll put a report in here. Maybe I'm going to run it for 200k steps also to verify the results. After that plan on starting with some Docs :) |
Posting an update: |
I ran some experiments on Here's a link to a report with some results. The entropy regularization coefficient This implementation doesn't quite match the results of the paper which might be due to not using evaluation mode (i.e. deterministic policy). If it's desired I can implement a test loop evaluating a deterministic policy. The performance does match the reported values in this other implementation I found. Are there any other experiments you'd like me to run, e.g. specific environments or more seeds?
Are there ways I can do this without If everything's fine, I'm gonna start writing Docs :) |
Hi @timoklein, thank you! The experiments look very interesting.
Unless it makes a huge difference in the reported results, I wouldn't worry about it.
Thanks for running the environments for three random seeds, which is great. For my personal interest, I would like to see the results in Pong, Breakout, and BeamRider, which are the common three environments that we used to benchmark other algorithms (see example here).
If this is an issue, feel free to run the experiments without xvfb. The video recordings are nice to have, but I am not requiring every run to have them. For example, it's not practical to obtain videos in EnvPool, so I didn't do it. That said, please save the model at the end of training so that we can visualize them later. Feel free to use the following code: torch.save(agent.state_dict(), f"models/{run_name}/agent.pt")
if args.prod_mode:
wandb.save(f"models/{run_name}/agent.pt", base_path=f"models/{run_name}", policy="now") |
@timoklein Sometimes target entropy maybe just very high and hard to reach and the loss can explode (as alpha will grow and grow), so usually I tune a bit coefficient (which is 0.98 defualt in code right know) Some discussion about it: |
From my point of view this is done now. I ran the new experiments and updated all plots and results. I also deleted all runs that aren't used for the final results from the openRL benchmark project. Docs should also work. |
Great contribution ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: the title for the pong figure is not the right one
Thanks, good catch! |
Anything more to do here for me? If this is blocking
I can just remove the citation so that the CI error is gone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything LGTM. Feel free to merge after you have resolved the minor target-entropy-scale
issue. You should already have contributor access. Thanks so much for this contribution, and sorry for the delay.
I can just remove the citation so that the CI error is gone.
Please keep the citation and just add ignore words in the pre-commit configs instead.
cleanrl/.pre-commit-config.yaml
Line 38 in 3f5535c
- --ignore-words-list=nd,reacher,thist,ths,magent |
No problem. It's been a fun experience and I learned a lot. Looking forward to contributing more in the future! |
@vwxyzjn "Ba" should be correctly added to the pre-commit cleanrl/.pre-commit-config.yaml Line 38 in c3fc57d
I'm going to fix it but it might be Sunday before I get around to doing it. EDIT: Probably a capitalization issue codespell-project/codespell#2137 |
Description
Adds the SAC-discrete algorithm as discussed in #266.
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).