Add `rnd_ppo.py` documentation and refactor #127

vwxyzjn · 2022-02-28T15:46:36Z

rnd_ppo.py is a bit dated, and I recommend refactoring it to match other PPO style, which would include:

change the name from rnd_ppo.py to ppo_rnd.py
use from gym.wrappers.normalize import RunningMeanStd instead of the implementing ourselves (note the implementation might be a bit different).

create a make_env function like

Lines 88 to 103 in 0b3f8ea

    
           def make_env(env_id, seed, idx, capture_video, run_name): 
        
               def thunk(): 
        
                   env = gym.make(env_id) 
        
                   env = gym.wrappers.RecordEpisodeStatistics(env) 
        
                   if capture_video: 
        
                       if idx == 0: 
        
                           env = gym.wrappers.RecordVideo(env, f"videos/{run_name}") 
        
                   env = NoopResetEnv(env, noop_max=30) 
        
                   env = MaxAndSkipEnv(env, skip=4) 
        
                   env = EpisodicLifeEnv(env) 
        
                   if "FIRE" in env.unwrapped.get_action_meanings(): 
        
                       env = FireResetEnv(env) 
        
                   env = ClipRewardEnv(env) 
        
                   env = gym.wrappers.ResizeObservation(env, (84, 84)) 
        
                   env = gym.wrappers.GrayScaleObservation(env) 
        
                   env = gym.wrappers.FrameStack(env, 4)

remove the visualization (i.e., ProbsVisualizationWrapper)
use def get_value and def get_action_and_value for the Agent class

remove

cleanrl/cleanrl/rnd_ppo.py

Lines 706 to 708 in 0b3f8ea

    
           class Flatten(nn.Module): 
        
               def forward(self, input): 
        
                   return input.view(input.size(0), -1)

maybe log the average curiosity_reward instead?

cleanrl/cleanrl/rnd_ppo.py

Line 848 in 0b3f8ea

f"global_step={global_step}, episodic_return={info['episode']['r']}, curiosity_reward={curiosity_rewards[step][idx]}"
name total_reward_per_env to curiosity_return

cleanrl/cleanrl/rnd_ppo.py

Line 854 in 0b3f8ea

total_reward_per_env = np.array(
Add SPS (steps per second) metric.

Overall I suggest selecting ppo_atari.py and rnd_ppo.py and use Compare Selected on VSCode to see the file difference and minimize the file difference:

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation accordingly.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments.

The text was updated successfully, but these errors were encountered:

yooceii · 2022-07-26T06:12:32Z

Finally got a finished run and looks close to their blog's result

vwxyzjn · 2022-07-26T13:35:34Z

Oh wow, this is really nice! How long did the experiment take?

yooceii · 2022-07-27T04:19:48Z

Almost 11 days with envpool and 1080.

vwxyzjn · 2022-07-31T20:44:53Z

Oh wow that’s taking a really long time. I think given the insane amount of computing required, running it for three random seeds might not be necessary…

yooceii · 2022-07-31T22:36:45Z

Yeah, I also don't want to spend so much time running it lol.

vwxyzjn mentioned this issue Feb 28, 2022

Refactor documentation #121

Closed

10 tasks

vwxyzjn changed the title ~~rnd_ppo.py~~ Add rnd_ppo.py documentation and refactor Feb 28, 2022

vwxyzjn assigned yooceii Feb 28, 2022

vwxyzjn mentioned this issue Mar 9, 2022

Tweaks for PR 'add-sps-qvalues' #130

Merged

yooceii linked a pull request Apr 3, 2022 that will close this issue

Add rnd_ppo.py documentation and refactor #151

Merged

19 tasks

vwxyzjn mentioned this issue Jun 1, 2022

Roadmap for CleanRL #115

Closed

19 tasks

vwxyzjn mentioned this issue Jul 31, 2022

Add rnd_ppo.py documentation and refactor #151

Merged

19 tasks

vwxyzjn closed this as completed in #151 Aug 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `rnd_ppo.py` documentation and refactor #127

Add `rnd_ppo.py` documentation and refactor #127

vwxyzjn commented Feb 28, 2022 •

edited by yooceii

Loading

yooceii commented Jul 26, 2022

vwxyzjn commented Jul 26, 2022

yooceii commented Jul 27, 2022

vwxyzjn commented Jul 31, 2022

yooceii commented Jul 31, 2022

Add rnd_ppo.py documentation and refactor #127

Add rnd_ppo.py documentation and refactor #127

Comments

vwxyzjn commented Feb 28, 2022 • edited by yooceii Loading

Types of changes

Checklist:

yooceii commented Jul 26, 2022

vwxyzjn commented Jul 26, 2022

yooceii commented Jul 27, 2022

vwxyzjn commented Jul 31, 2022

yooceii commented Jul 31, 2022

Add `rnd_ppo.py` documentation and refactor #127

Add `rnd_ppo.py` documentation and refactor #127

vwxyzjn commented Feb 28, 2022 •

edited by yooceii

Loading