Double constraints for updating actor? #109

KhoiDOO · 2023-03-07T03:47:06Z

Hi ShangtongZhang,

I'm confused about the adaptive KL divergence that you used in your code in order to update the actor model (two separate actor and critic model case). In your code, you use both object clip and the adaptive approx-kl, and if the $\text{approx-kl} \le 1.5 \times \text{target-kl}$, the actor model is updated. After reading the PPO, I saw that adaptive KL should belong to TRPO instead cause TRPO has a constraint at Equation 4. Since along with the two constraints including both clip and adaptive KL actor finds it very hard to be updated.

To my viewpoint, I think you are using CLIP and TRPO $L^{KLPEN}$ at the same time, and the $L^{KLPEN}$ should be constructed as

surr = ratio * advantage

if klaffter <= 0.66 * target_kl:
   kl_coef /= 2
elif klafter > 1.5 * target_kl:
   kl_coef *= 2
else:
   print("KL is close enough")

actor_loss = surr - kl_coef * klafter

# Backwarding the actor loss ...

After calculating the KL coefficient $\beta$, it's used for calculating the loss and the gradient in Equation 8

And, only $L^{KLPEN}$ or $L^{CLIP}$ is used in training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double constraints for updating actor? #109

Double constraints for updating actor? #109

KhoiDOO commented Mar 7, 2023 •

edited

Loading

Double constraints for updating actor? #109

Double constraints for updating actor? #109

Comments

KhoiDOO commented Mar 7, 2023 • edited Loading

KhoiDOO commented Mar 7, 2023 •

edited

Loading