Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAC discrete #266

Closed
2 tasks done
timoklein opened this issue Aug 26, 2022 · 3 comments
Closed
2 tasks done

SAC discrete #266

timoklein opened this issue Aug 26, 2022 · 3 comments

Comments

@timoklein
Copy link
Collaborator

Hey there!

I've used this repo's SAC code as starting point for an implementation of SAC-discrete (paper) for a project of mine. If you're interested, I'd be willing to contribute it to cleanRL.

The differences to SAC for continuous action spaces aren't too big and I can start from a working implementation, so this shouldn't take too long.

What do you think?

Checklist

@vwxyzjn
Copy link
Owner

vwxyzjn commented Aug 28, 2022

Hi @timoklein, thanks for being interested in submitting a contribution! SAC discrete indeed sounds like an interesting addition to CleanRL. I just glanced at the paper and would recommend prototyping a sac_atari.py to work with Atari games as done in the paper.

I was a bit surprised to see the algorithm performs poorly on Pong. Do you have any insight on this? Maybe this is some implementation details stuff... CC @dosssman, who was the main contributor for the CleanRL's SAC implementation.

@timoklein
Copy link
Collaborator Author

I just glanced at the paper and would recommend prototyping a sac_atari.py to work with Atari games as done in the paper.

Getting to work on it!

I was a bit surprised to see the algorithm performs poorly on Pong. Do you have any insight on this? Maybe this is some implementation details stuff... CC @dosssman, who was the main contributor for the CleanRL's SAC implementation.

For reference, here are the reported results in the paper:
SAC_discrete_results

In my opinion, the bad results on Pong are due to the evaluation scheme. Evaluation at 100k time steps on Atari is in my opinion a very tough setting for a "standard" model-free RL (some newer methods like CURL or DrQ may perform better). Rainbow also doesn't improve over a random agent in this setting. Therefore we should focus the evaluation on games where meaningful improvements over a random baseline can be made, e.g. Seaquest, James Bond or Road Runner.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 28, 2023

Closed by #270

@vwxyzjn vwxyzjn closed this as completed Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants