SAC discrete #266

timoklein · 2022-08-26T11:34:44Z

Hey there!

I've used this repo's SAC code as starting point for an implementation of SAC-discrete (paper) for a project of mine. If you're interested, I'd be willing to contribute it to cleanRL.

The differences to SAC for continuous action spaces aren't too big and I can start from a working implementation, so this shouldn't take too long.

What do you think?

Checklist

I have installed dependencies via poetry install (see CleanRL's installation guideline.
I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

vwxyzjn · 2022-08-28T02:39:32Z

Hi @timoklein, thanks for being interested in submitting a contribution! SAC discrete indeed sounds like an interesting addition to CleanRL. I just glanced at the paper and would recommend prototyping a sac_atari.py to work with Atari games as done in the paper.

I was a bit surprised to see the algorithm performs poorly on Pong. Do you have any insight on this? Maybe this is some implementation details stuff... CC @dosssman, who was the main contributor for the CleanRL's SAC implementation.

timoklein · 2022-08-29T05:17:32Z

I just glanced at the paper and would recommend prototyping a sac_atari.py to work with Atari games as done in the paper.

Getting to work on it!

I was a bit surprised to see the algorithm performs poorly on Pong. Do you have any insight on this? Maybe this is some implementation details stuff... CC @dosssman, who was the main contributor for the CleanRL's SAC implementation.

For reference, here are the reported results in the paper:

In my opinion, the bad results on Pong are due to the evaluation scheme. Evaluation at 100k time steps on Atari is in my opinion a very tough setting for a "standard" model-free RL (some newer methods like CURL or DrQ may perform better). Rainbow also doesn't improve over a random agent in this setting. Therefore we should focus the evaluation on games where meaningful improvements over a random baseline can be made, e.g. Seaquest, James Bond or Road Runner.

vwxyzjn · 2023-11-28T02:40:10Z

Closed by #270

timoklein mentioned this issue Aug 29, 2022

SAC-discrete implementation #270

Merged

20 tasks

vwxyzjn closed this as completed Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAC discrete #266

SAC discrete #266

timoklein commented Aug 26, 2022

vwxyzjn commented Aug 28, 2022 •

edited

Loading

timoklein commented Aug 29, 2022

vwxyzjn commented Nov 28, 2023

SAC discrete #266

SAC discrete #266

Comments

timoklein commented Aug 26, 2022

Checklist

vwxyzjn commented Aug 28, 2022 • edited Loading

timoklein commented Aug 29, 2022

vwxyzjn commented Nov 28, 2023

vwxyzjn commented Aug 28, 2022 •

edited

Loading