You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run halfcheetach env using MBPO(SAC+3NNs(dynamics), and my training loss increases with this the reward.
I don't have intuition to interpret this
why training loss of model based policy optimization increases?
I can share wandb
Thanks
The text was updated successfully, but these errors were encountered:
I run halfcheetach env using MBPO(SAC+3NNs(dynamics), and my training loss increases with this the reward.
I don't have intuition to interpret this
why training loss of model based policy optimization increases?
I can share wandb
Thanks
The text was updated successfully, but these errors were encountered: