You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking at your implementation of the PPO model.
After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.
Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this:
'''Python
assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv)
for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())])
'''
Therefore could you please confirm this is indeed the case. Thanks!
The text was updated successfully, but these errors were encountered:
Hi Simon
I am looking at your implementation of the PPO model.
After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.
Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this:
'''Python
assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv)
for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())])
'''
Therefore could you please confirm this is indeed the case. Thanks!
The text was updated successfully, but these errors were encountered: