Require HEAD version of universe #22

AdamStelmaszczyk · 2018-01-05T09:36:30Z

Without it, visualizations will not work properly, see issue: openai/universe-starter-agent#133

pathak22 · 2018-02-17T08:21:33Z

Oh, it has been updated -- thanks for spotting it. Are you sure that changing the version of universe doesn't break other things?

AdamStelmaszczyk · 2018-02-17T14:00:57Z

Oh, it has been updated -- thanks for spotting it.

I'm not 100% sure what you are referring to, my feeling is that to universe.

The universe commit which fixes visualizations for Python 2.7 is from March 2017 :)

However, it seems that universe and universe-starter-agent projects are deprecated and have no support. Last commits to universe and universe-starter-agent were in May 2017, nobody answers to PR or Issues.

Are you sure that changing the version of universe doesn't break other things?

Unfortunately, I'm not sure about that. To be sure I'd need to reproduce your Mario and Doom results - that would take some effort.

Let me tell you why I came to noreward-rl on the first place. I wanted to run it on Montezuma and see how it works (was hoping it would help in exploration). So, I ran it and the scores were close to 0 (for Montezuma that's not surprising:)). But I thought, this isn't fair a test - the hyperparams are probably wrong (I took them from Mario), so perhaps the ICM didn't perform as it could.

So, I tried to run noreward-rl on Breakout and Seaquest (where the score is >0) from Atari (yes, I saw this question). The scores were way too low. I thought - ok, noreward-rl is basing on A3C from universe-starter-agent. So, let's first reproduce that, let's find correct hyperparams for vanilla A3C. I tried, but I couldn't reproduce the A3C results. I searched, but other people also couldn't reproduce it on other Atari games except Pong. It seems to me there are 2 options:

universe-starter-agent correctly implements A3C, it's just that nobody used the correct hyperparameters - I searched for them, like many others, but nobody found working ones.
universe-starter-agent incorrectly implements A3C. There is one or more bugs that make the results on Atari games (except Pong) much lower than the original A3C results. That case would be unfortunate for noreward-rl.

Let me ask you possibly a "difficult" question, sorry for that :) But it would shed a lot of light:

Did you make sure universe-starter-agent is working correctly before using it - have you or somebody else reproduced Atari results with it? If yes, what hyperparams are needed? Or do you know which hyperparams I need to tune? I could run some hyperparams grid search.

pathak22 · 2018-02-18T19:14:35Z

Alright, let me answer your "difficult" question since I myself went through these steps 1.5years back.

I guess universe-starter-agent has correct implementation of A3C but definitely with quite a few design changes, e.g., unshared optimizer across workers, different hyper-parameters like input size, learning rate etc., and different network architectures etc. I first "tuned" it to make sure I can reproduce ATARI results to some extent (note: it's quite hard to replicate original paper results because they use Torch and initialization was different -- training is sensitive). I could reach close to the results for "breakout" and few other games in "non-shared optimizer scenario" (see original A3C paper supplementary) but did not get exactly same numbers because of difference in initialization, Tensorflow vs. Torch etc. By the word "tuning" above I meant: changing architecture, changing loss equation to mean loss and not the total loss, changing hyper-parameters etc.

But since the main goal of our curiosity-driven work was to explore policy generalization across environments and levels, e.g., novel levels in Mario and novel mazes in VizDoom. Since generalization was not possible to test in ATARI games, we decided to use VizDoom and I then "tune" (see definition of tuning above) the baseline A3C on VizDoom to get the best reward (it could get the maximum reward -- see results in curiosity paper) on DoomMyWayHome dense reward game. All the hyper-parameters you see in noreward-rl are basically based on that tuning. I never tuned my curiosity model (except the coefficient between forward and inverse loss), and all the hyper-parameters were taken from best performing baseline A3C on DoomMyWayHome dense reward game. Hence, all the comparisons were fair and rather to the advantage of the baseline A3C.

Hope, it answers your question! :-)

AdamStelmaszczyk · 2018-02-20T17:15:31Z

It does, thanks a lot :)

Because you mentioned that the original A3C work used Torch (which I didn't know) I googled more and found the hyperparams they used. Also good to know about universe-starter-agent design changes (which makes everything harder).

Would you expect noreward-rl improve the results on Montezuma over universe-starter-agent or not?

pathak22 · 2018-02-22T06:37:44Z

You can try tuning the hyper-parameters of state-predictor version of curiosity from the noreward-rl codebase on the Montezuma game. That definitely has higher chances of vanilla universe-starter-agent as the latter does not contain any exploration incentive.

Require HEAD version of universe

6cee110

Without it, visualizations will not work properly, see issue: openai/universe-starter-agent#133

AdamStelmaszczyk mentioned this pull request Feb 20, 2018

Parameters for Breakout openai/universe-starter-agent#125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Require HEAD version of universe #22

Require HEAD version of universe #22

AdamStelmaszczyk commented Jan 5, 2018

pathak22 commented Feb 17, 2018 •

edited

Loading

AdamStelmaszczyk commented Feb 17, 2018

pathak22 commented Feb 18, 2018

AdamStelmaszczyk commented Feb 20, 2018

pathak22 commented Feb 22, 2018

Require HEAD version of universe #22

Are you sure you want to change the base?

Require HEAD version of universe #22

Conversation

AdamStelmaszczyk commented Jan 5, 2018

pathak22 commented Feb 17, 2018 • edited Loading

AdamStelmaszczyk commented Feb 17, 2018

pathak22 commented Feb 18, 2018

AdamStelmaszczyk commented Feb 20, 2018

pathak22 commented Feb 22, 2018

pathak22 commented Feb 17, 2018 •

edited

Loading