Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rnd_ppo.py documentation and refactor #127

Closed
16 of 28 tasks
Tracked by #115
vwxyzjn opened this issue Feb 28, 2022 · 5 comments · Fixed by #151
Closed
16 of 28 tasks
Tracked by #115

Add rnd_ppo.py documentation and refactor #127

vwxyzjn opened this issue Feb 28, 2022 · 5 comments · Fixed by #151
Assignees

Comments

@vwxyzjn
Copy link
Owner

vwxyzjn commented Feb 28, 2022

rnd_ppo.py is a bit dated, and I recommend refactoring it to match other PPO style, which would include:

  • change the name from rnd_ppo.py to ppo_rnd.py
  • use from gym.wrappers.normalize import RunningMeanStd instead of the implementing ourselves (note the implementation might be a bit different).
  • create a make_env function like
    def make_env(env_id, seed, idx, capture_video, run_name):
    def thunk():
    env = gym.make(env_id)
    env = gym.wrappers.RecordEpisodeStatistics(env)
    if capture_video:
    if idx == 0:
    env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
    env = NoopResetEnv(env, noop_max=30)
    env = MaxAndSkipEnv(env, skip=4)
    env = EpisodicLifeEnv(env)
    if "FIRE" in env.unwrapped.get_action_meanings():
    env = FireResetEnv(env)
    env = ClipRewardEnv(env)
    env = gym.wrappers.ResizeObservation(env, (84, 84))
    env = gym.wrappers.GrayScaleObservation(env)
    env = gym.wrappers.FrameStack(env, 4)
  • remove the visualization (i.e., ProbsVisualizationWrapper)
  • use def get_value and def get_action_and_value for the Agent class
  • remove

    cleanrl/cleanrl/rnd_ppo.py

    Lines 706 to 708 in 0b3f8ea

    class Flatten(nn.Module):
    def forward(self, input):
    return input.view(input.size(0), -1)
  • maybe log the average curiosity_reward instead?
    f"global_step={global_step}, episodic_return={info['episode']['r']}, curiosity_reward={curiosity_rewards[step][idx]}"
  • name total_reward_per_env to curiosity_return
    total_reward_per_env = np.array(
  • Add SPS (steps per second) metric.

Overall I suggest selecting ppo_atari.py and rnd_ppo.py and use Compare Selected on VSCode to see the file difference and minimize the file difference:

image

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation accordingly.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments.

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
  • I have updated the tests accordingly (if applicable).
@vwxyzjn vwxyzjn mentioned this issue Feb 28, 2022
10 tasks
@vwxyzjn vwxyzjn changed the title rnd_ppo.py Add rnd_ppo.py documentation and refactor Feb 28, 2022
@yooceii yooceii linked a pull request Apr 3, 2022 that will close this issue
19 tasks
@vwxyzjn vwxyzjn mentioned this issue Jun 1, 2022
19 tasks
@yooceii
Copy link
Collaborator

yooceii commented Jul 26, 2022

image
Finally got a finished run and looks close to their blog's result
image

@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Jul 26, 2022

Oh wow, this is really nice! How long did the experiment take?

@yooceii
Copy link
Collaborator

yooceii commented Jul 27, 2022

Almost 11 days with envpool and 1080.

@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Jul 31, 2022

Oh wow that’s taking a really long time. I think given the insane amount of computing required, running it for three random seeds might not be necessary…

@yooceii
Copy link
Collaborator

yooceii commented Jul 31, 2022

Yeah, I also don't want to spend so much time running it lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants