Not sample efficient enough #2

muupan · 2016-05-08T09:11:30Z

From Figure 6 in the paper, their A3C only needs 20 epochs (20 million steps) to achieve average scores of around 400 at Breakout. My current implementation needs more.

muupan · 2016-05-10T09:42:20Z

Following the authors' feedback, now it's only slightly worse than theirs.

miyosuda · 2016-05-10T13:24:23Z

@muupan
Thank you for sharing implementation and setting with great result!

Your wiki helps a lot, and I'm going to try your setting.

Let me ask you something not written in wiki.

There is loss normalization code for when sequence terminated at the middle

https://github.com/muupan/async-rl/blob/master/a3c.py#L113-L118

Are you using this now?

There is an action skipping code at ALE # initialize()

https://github.com/muupan/async-rl/blob/master/ale.py#L146-L149

What is this for?

And I'm going to adjust my parameter as written in your wiki. Thanks!!

muupan · 2016-05-10T13:41:09Z

No, I don't use it now.
It is called "no-op max" in the Nature DQN paper. It adds some randomness to initial states.

miyosuda · 2016-05-10T14:05:01Z

I see. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not sample efficient enough #2

Not sample efficient enough #2

muupan commented May 8, 2016 •

edited

Loading

muupan commented May 10, 2016

miyosuda commented May 10, 2016

muupan commented May 10, 2016

miyosuda commented May 10, 2016

Not sample efficient enough #2

Not sample efficient enough #2

Comments

muupan commented May 8, 2016 • edited Loading

muupan commented May 10, 2016

miyosuda commented May 10, 2016

muupan commented May 10, 2016

miyosuda commented May 10, 2016

muupan commented May 8, 2016 •

edited

Loading