You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm confused about the adaptive KL divergence that you used in your code in order to update the actor model (two separate actor and critic model case). In your code, you use both object clip and the adaptive approx-kl, and if the $\text{approx-kl} \le 1.5 \times \text{target-kl}$, the actor model is updated. After reading the PPO, I saw that adaptive KL should belong to TRPO instead cause TRPO has a constraint at Equation 4. Since along with the two constraints including both clip and adaptive KL actor finds it very hard to be updated.
To my viewpoint, I think you are using CLIP and TRPO $L^{KLPEN}$ at the same time, and the $L^{KLPEN}$ should be constructed as
surr=ratio*advantageifklaffter<=0.66*target_kl:
kl_coef/=2elifklafter>1.5*target_kl:
kl_coef*=2else:
print("KL is close enough")
actor_loss=surr-kl_coef*klafter# Backwarding the actor loss ...
After calculating the KL coefficient $\beta$, it's used for calculating the loss and the gradient in Equation 8
And, only $L^{KLPEN}$ or $L^{CLIP}$ is used in training
The text was updated successfully, but these errors were encountered:
Hi ShangtongZhang,
I'm confused about the adaptive KL divergence that you used in your code in order to update the actor model (two separate actor and critic model case). In your code, you use both object clip and the adaptive approx-kl, and if the$\text{approx-kl} \le 1.5 \times \text{target-kl}$ , the actor model is updated. After reading the PPO, I saw that adaptive KL should belong to TRPO instead cause TRPO has a constraint at Equation 4. Since along with the two constraints including both clip and adaptive KL actor finds it very hard to be updated.
To my viewpoint, I think you are using CLIP and TRPO$L^{KLPEN}$ at the same time, and the $L^{KLPEN}$ should be constructed as
After calculating the KL coefficient$\beta$ , it's used for calculating the loss and the gradient in Equation 8
And, only$L^{KLPEN}$ or $L^{CLIP}$ is used in training
The text was updated successfully, but these errors were encountered: