-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require HEAD version of universe #22
base: master
Are you sure you want to change the base?
Conversation
Without it, visualizations will not work properly, see issue: openai/universe-starter-agent#133
Oh, it has been updated -- thanks for spotting it. Are you sure that changing the version of universe doesn't break other things? |
I'm not 100% sure what you are referring to, my feeling is that to The universe commit which fixes visualizations for Python 2.7 is from March 2017 :) However, it seems that
Unfortunately, I'm not sure about that. To be sure I'd need to reproduce your Mario and Doom results - that would take some effort. Let me tell you why I came to So, I tried to run
Let me ask you possibly a "difficult" question, sorry for that :) But it would shed a lot of light: Did you make sure |
Alright, let me answer your "difficult" question since I myself went through these steps 1.5years back. I guess universe-starter-agent has correct implementation of A3C but definitely with quite a few design changes, e.g., unshared optimizer across workers, different hyper-parameters like input size, learning rate etc., and different network architectures etc. I first "tuned" it to make sure I can reproduce ATARI results to some extent (note: it's quite hard to replicate original paper results because they use Torch and initialization was different -- training is sensitive). I could reach close to the results for "breakout" and few other games in "non-shared optimizer scenario" (see original A3C paper supplementary) but did not get exactly same numbers because of difference in initialization, Tensorflow vs. Torch etc. By the word "tuning" above I meant: changing architecture, changing loss equation to mean loss and not the total loss, changing hyper-parameters etc. But since the main goal of our curiosity-driven work was to explore policy generalization across environments and levels, e.g., novel levels in Mario and novel mazes in VizDoom. Since generalization was not possible to test in ATARI games, we decided to use VizDoom and I then "tune" (see definition of tuning above) the baseline A3C on VizDoom to get the best reward (it could get the maximum reward -- see results in curiosity paper) on DoomMyWayHome dense reward game. All the hyper-parameters you see in noreward-rl are basically based on that tuning. I never tuned my curiosity model (except the coefficient between forward and inverse loss), and all the hyper-parameters were taken from best performing baseline A3C on DoomMyWayHome dense reward game. Hence, all the comparisons were fair and rather to the advantage of the baseline A3C. Hope, it answers your question! :-) |
It does, thanks a lot :) Because you mentioned that the original A3C work used Torch (which I didn't know) I googled more and found the hyperparams they used. Also good to know about Would you expect |
You can try tuning the hyper-parameters of state-predictor version of curiosity from the |
Without it, visualizations will not work properly, see issue: openai/universe-starter-agent#133