Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README asset #5

Open
jcwleo opened this issue Nov 17, 2018 · 18 comments
Open

README asset #5

jcwleo opened this issue Nov 17, 2018 · 18 comments

Comments

@jcwleo
Copy link
Owner

jcwleo commented Nov 17, 2018

image

@jcwleo
Copy link
Owner Author

jcwleo commented Nov 20, 2018

2018-11-20 11 13 35

@jcwleo
Copy link
Owner Author

jcwleo commented Nov 20, 2018

image

@jcwleo
Copy link
Owner Author

jcwleo commented Jan 5, 2019

image

@kslazarev
Copy link
Contributor

https://github.com/jcwleo/random-network-distillation-pytorch/blob/master/config.conf
Is this config the last one to get similar results as on images above?

I see last pull request is about normalization, maybe UseNorm = True improve reward_per_epi or speed of convergence? And what about UseNoisyNet, when it could better to use?

@jcwleo
Copy link
Owner Author

jcwleo commented Jan 11, 2019

@kslazarev
Hi, I used that config. but only NumEnv is 128 and MaxStepPerEpisode is 4500.
In paper, author did not announce Advantage Norm and Noisynet.
so I disabled that config.

@kslazarev
Copy link
Contributor

kslazarev commented Jan 11, 2019

Result by config in master branch.

MontezumaRevengeNoFrameskip-v4
2019-01-11 15 05 01 gmt 03 00

Right now, set NumEnv==128 and MaxStepPerEpisode==4500.
I'll attach result when get 1200-2000 updates.

@kslazarev
Copy link
Contributor

@jcwleo I see the difference in x-axis scale in reward_per_epi and reward_per_rollout plots.
On your MontezumaRevengeNoFrameskip-v4 image they are 1.200k and 12.00k (10x scale).
But on my temporary progress image they are 200 and 600 (3x scale). Maybe need to change additional option in config?
2019-01-11 21 58 49 gmt 03 00

@kslazarev
Copy link
Contributor

kslazarev commented Jan 11, 2019

Or the x-axis scale (global_update and sample_episode) depends on player survival/experience so on later updates x-axis scale will be the same?

@jcwleo
Copy link
Owner Author

jcwleo commented Jan 11, 2019

@kslazarev per_rollout and per_epi is not same scale. per_rollout means just one global update(enter agent.train_model()). but per_epi means Env’s one episode info that is one of parallel env.
If one episode’s total step is 1024 and Num_step(rollout size) is 128, each scale of x-axis is 8 times different.

@kslazarev
Copy link
Contributor

kslazarev commented Jan 11, 2019

@jcwleo Yes, correct. I have another small questions about code. How could be appropriate to ask? Every question as new issue, or move forward to ask in this issue?

@jcwleo
Copy link
Owner Author

jcwleo commented Jan 12, 2019

@kslazarev I want you to create an issue for each question. :)

@kslazarev
Copy link
Contributor

kslazarev commented Jan 13, 2019

NumEnv=128 and MaxStepPerEpisode==4500
2019-01-13 7 40 16 gmt 03 00

Looks similar as in README. On NumEnv=128 I've stopped the process because swap is used.

@xiaioding
Copy link

Hello, can you tell me how many Gpus you used and how long it took you to see this effect?

@kslazarev
Copy link
Contributor

Hello. Not fast. Don't remember exactly, 1 or 2 NV 1080 Ti

@xiaioding
Copy link

@kslazarev Excuse me, I use 1 3090,2 envs, run for more than 2 hours, the reward is still 0, is this normal? I didn't load the pre-training model

@kslazarev
Copy link
Contributor

It was 3 years ago. Could not help, I don't remember exactly what problem could cause.

@xiaioding
Copy link

@kslazarev Ok, thanks

@jcwleo
Copy link
Owner Author

jcwleo commented Apr 17, 2023

@kslazarev
Thank you for answering for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants