-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rnd_ppo.py
documentation and refactor
#127
Comments
19 tasks
Oh wow, this is really nice! How long did the experiment take? |
Almost 11 days with envpool and 1080. |
19 tasks
Oh wow that’s taking a really long time. I think given the insane amount of computing required, running it for three random seeds might not be necessary… |
Yeah, I also don't want to spend so much time running it lol. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
rnd_ppo.py
is a bit dated, and I recommend refactoring it to match other PPO style, which would include:rnd_ppo.py
toppo_rnd.py
from gym.wrappers.normalize import RunningMeanStd
instead of the implementing ourselves (note the implementation might be a bit different).make_env
function likecleanrl/cleanrl/ppo_atari.py
Lines 88 to 103 in 0b3f8ea
ProbsVisualizationWrapper
)def get_value
anddef get_action_and_value
for theAgent
classcleanrl/cleanrl/rnd_ppo.py
Lines 706 to 708 in 0b3f8ea
curiosity_reward
instead?cleanrl/cleanrl/rnd_ppo.py
Line 848 in 0b3f8ea
total_reward_per_env
tocuriosity_return
cleanrl/cleanrl/rnd_ppo.py
Line 854 in 0b3f8ea
Overall I suggest selecting
ppo_atari.py
andrnd_ppo.py
and useCompare Selected
on VSCode to see the file difference and minimize the file difference:Types of changes
Checklist:
pre-commit run --all-files
passes (required).If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).The text was updated successfully, but these errors were encountered: