-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ddpg_continuous_action.py
docs
#137
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/CCjtcjP4CWC3iPu2nuVKoyG9EuMD |
Hey @yooceii @dosssman, the PR is ready for review. Could you take a look at https://cleanrl-git-ddpg-docs-vwxyzjn.vercel.app/rl-algorithms/ddpg/? Thank you |
docs/rl-algorithms/ddpg.md
Outdated
* `losses/qf1_loss`: the MSE between the Q values at timestep $t$ and the target Q values at timestep $t+1$, which minimizes temporal difference. | ||
* `losses/actor_loss`: implemented as `-qf1(data.observations, actor(data.observations)).mean()`; it is the *negative* average Q values calculated based on the 1) observations and the 2) actions computed by the actor based on these observations. By minimizing `actor_loss`, the optimizer updates the actors parameter using the following gradient (Lillicrap et al., 2016, Equation 6)[^1]: | ||
|
||
$$ \mathbb{E}_{s_{t} \sim \rho^{\beta}}\left[\left.\left.\nabla_{a} Q\left(s, a \mid \theta^{Q}\right)\right|_{s=s_{t}, a=\mu\left(s_{t}\right)} \nabla_{\theta_{\mu}} \mu\left(s \mid \theta^{\mu}\right)\right|_{s=s_{t}}\right]$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right. Thanks for the couch and I have just fixed it.
Hey, @yooceii thanks for reviewing the PR :) Let me know if there are other issues. |
Given that all the comments @yooceii are addressed, I am merging the PR as is but happy to open follow-up PRs if anything else is needed. |
This RP adds docs for
ddpg_continuous_action.py
.Checklist for
ddpg_continuous_action.py
:pre-commit run --all-files
passes (required).--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).