New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

PPO Tutorial #2156

Merged

svekars merged 55 commits into pytorch:main from vmoens:ppo_tutorial

Mar 15, 2023

Contributor

vmoens commented Dec 14, 2022

Introduces a PPO tutorial using TorchRL.

facebook-github-bot added the cla signed label

vmoens marked this pull request as draft

December 14, 2022 20:47

netlify bot commented Dec 14, 2022 •

edited

Loading

✅ Deploy Preview for pytorch-tutorials-preview ready!

Name	Link
🔨 Latest commit	`847cd91`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/64124c3043e77f0008b85128
😎 Deploy Preview	https://deploy-preview-2156--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

svekars added the 2.0 label

vmoens mentioned this pull request

[Feature Request] Doc revamp pytorch/rl#883

Open

5 tasks

svekars reviewed

View reviewed changes

Contributor

svekars left a comment

Some editorial suggestions. Let me know if you have any questions.

intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved

intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved

intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved

intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved

intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved

intermediate_source/reinforcement_ppo.py Outdated

+              # -------------
+              #
+              # The PPO loss can be directly imported from torchrl for convenience using the
+              # :class:`ClipPPOLoss` class. This is the easiest way of utilising PPO:

Contributor

svekars Feb 8, 2023

Suggested change

      
            # :class:`ClipPPOLoss` class. This is the easiest way of utilising PPO:
          
            # :class:`ClipPPOLoss` class. This is the easiest way of utilizing PPO:

intermediate_source/reinforcement_ppo.py Outdated

+              # is a value that reflects an expectancy over the return value while dealing with
+              # the bias / variance tradeoff.
+              # To compute the advantage, one just needs to (1) build the advantage module, which
+              # utilises our value operator, and (2) pass each batch of data through it before each

Contributor

svekars Feb 8, 2023

Suggested change

      
            # utilises our value operator, and (2) pass each batch of data through it before each
          
            # utilizes our value operator, and (2) pass each batch of data through it before each

intermediate_source/reinforcement_ppo.py Outdated

+              # Training loop
+              # -------------
+              # We now have all the pieces needed to code our training loop.
+              # The steps are quite easy: collect data, compute advantage, loop over the collected

Contributor

svekars Feb 8, 2023

Suggested change

      
            # The steps are quite easy: collect data, compute advantage, loop over the collected
          
            # The steps include:
          
            #
          
            # * Collect data
          
            # * Compute advantage
          
            # * Loop over the collected to compute loss values
          
            # * Back propagate
          
            # * Optimize
          
            # * Repeat

intermediate_source/reinforcement_ppo.py Outdated

+              # -------------
+              # We now have all the pieces needed to code our training loop.
+              # The steps are quite easy: collect data, compute advantage, loop over the collected
+              # data to compute loss values, backpropagate, optimize and repeat.

Contributor

svekars Feb 8, 2023

Suggested change

# data to compute loss values, backpropagate, optimize and repeat.

intermediate_source/reinforcement_ppo.py Outdated

+              plt.show()
+              ######################################################################
+              # Next steps

Contributor

svekars Feb 8, 2023

We need a Conclusion section. I think you can rewrite this section a bit to become the conclusion section. For example:

In this tutorial, we have learned:
1.
2.
3.

If you want to experiment with this tutorial a bit more, you can apply the following modifications: 

* This algorithm ...

* From a logging perspective .....

For more information about TorchRL, go to <link to the TorchRL docs>

vmoens changed the title ~~[WIP] PPO Tutorial~~ PPO Tutorial

vmoens marked this pull request as ready for review

February 12, 2023 17:01

Contributor Author

vmoens commented Feb 12, 2023

@svekars this is ready for review

svekars changed the base branch from main to 2.0-RC-TEST

February 24, 2023 20:49

vmoens force-pushed the ppo_tutorial branch 4 times, most recently from 8165c0c to d9ad408 Compare

March 7, 2023 14:11

vmoens mentioned this pull request

New Tutorial : Add more Reinforcement Learning Tutorials #1280

Closed

vmoens added 15 commits

March 15, 2023 15:59


          init

a77f70b


          edits

d45c894


          edits


          edits

4527f99


          edits

9aff0c2


          edits

ba1a5bf


          edits

54e75e9


          edits

5c67fc1


          no rendering

8a3951b


          refining

059d2bc


          refining

162e9f4


          gym version

8c79f87


          gym version

19e2fda


          gym version

fafed96


          gym version

5a5c844

vmoens added 23 commits

March 15, 2023 15:59


          rendering

9c01de7


          rendering

0f02ea3


          rendering

abd7fe8


          rendering

5d63fdb


          rendering

f3a85a0


          rendering

3e514a1


          rendering

a6132d2


          rendering

ab6c4e3


          rendering

3e281e1


          rendering

60da0b1


          rendering

15dbc0c


          rendering

f566e3a


          add link

33bc95b


          add link

9ea4f84


          add link

b220d00


          minor

bf00120

fix

b334189


          empty

63e6259


          merge from main

15c8c61


          empty

d800302


          device bugfix

eb53a6c


          gymnasium bugfix

d0fe418


          formatting

e65806f

vmoens force-pushed the ppo_tutorial branch from a90e7a8 to e65806f Compare

March 15, 2023 15:59

svekars changed the base branch from 2.0-RC-TEST to main

March 15, 2023 16:02

Svetlana Karslioglu added 3 commits

March 15, 2023 09:07


          Merge branch 'main' into ppo_tutorial

52b2b78


          Merge branch 'main' into ppo_tutorial

5f3b6a2


          Merge branch 'main' into ppo_tutorial

847cd91

svekars merged commit edf145d into pytorch:main

vmoens mentioned this pull request

Add more reinforcement learning tutorials #2422

Closed

4 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels