Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO Tutorial #2156

Merged
merged 55 commits into from
Mar 15, 2023
Merged

PPO Tutorial #2156

merged 55 commits into from
Mar 15, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Dec 14, 2022

Introduces a PPO tutorial using TorchRL.

@vmoens vmoens marked this pull request as draft December 14, 2022 20:47
@netlify
Copy link

netlify bot commented Dec 14, 2022

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit 847cd91
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/64124c3043e77f0008b85128
😎 Deploy Preview https://deploy-preview-2156--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some editorial suggestions. Let me know if you have any questions.

intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved
intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved
intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved
intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved
intermediate_source/reinforcement_ppo.py Outdated Show resolved Hide resolved
# -------------
#
# The PPO loss can be directly imported from torchrl for convenience using the
# :class:`ClipPPOLoss` class. This is the easiest way of utilising PPO:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# :class:`ClipPPOLoss` class. This is the easiest way of utilising PPO:
# :class:`ClipPPOLoss` class. This is the easiest way of utilizing PPO:

# is a value that reflects an expectancy over the return value while dealing with
# the bias / variance tradeoff.
# To compute the advantage, one just needs to (1) build the advantage module, which
# utilises our value operator, and (2) pass each batch of data through it before each
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# utilises our value operator, and (2) pass each batch of data through it before each
# utilizes our value operator, and (2) pass each batch of data through it before each

# Training loop
# -------------
# We now have all the pieces needed to code our training loop.
# The steps are quite easy: collect data, compute advantage, loop over the collected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The steps are quite easy: collect data, compute advantage, loop over the collected
# The steps include:
#
# * Collect data
# * Compute advantage
# * Loop over the collected to compute loss values
# * Back propagate
# * Optimize
# * Repeat

# -------------
# We now have all the pieces needed to code our training loop.
# The steps are quite easy: collect data, compute advantage, loop over the collected
# data to compute loss values, backpropagate, optimize and repeat.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# data to compute loss values, backpropagate, optimize and repeat.

plt.show()

######################################################################
# Next steps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a Conclusion section. I think you can rewrite this section a bit to become the conclusion section. For example:

In this tutorial, we have learned:
1.
2.
3.

If you want to experiment with this tutorial a bit more, you can apply the following modifications: 

* This algorithm ...

* From a logging perspective .....

For more information about TorchRL, go to <link to the TorchRL docs>

@vmoens vmoens changed the title [WIP] PPO Tutorial PPO Tutorial Feb 12, 2023
@vmoens vmoens marked this pull request as ready for review February 12, 2023 17:01
@vmoens
Copy link
Contributor Author

vmoens commented Feb 12, 2023

@svekars this is ready for review

@svekars svekars changed the base branch from main to 2.0-RC-TEST February 24, 2023 20:49
@vmoens vmoens force-pushed the ppo_tutorial branch 4 times, most recently from 8165c0c to d9ad408 Compare March 7, 2023 14:11
@svekars svekars changed the base branch from 2.0-RC-TEST to main March 15, 2023 16:02
@svekars svekars merged commit edf145d into pytorch:main Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants