The old and the new model is effectively the same? #60

yuan1202 · 2019-05-23T09:27:18Z

Hi Simon

I am looking at your implementation of the PPO model.

After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.

Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this:
'''Python
assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv)
for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())])
'''

Therefore could you please confirm this is indeed the case. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The old and the new model is effectively the same? #60

The old and the new model is effectively the same? #60

yuan1202 commented May 23, 2019

The old and the new model is effectively the same? #60

The old and the new model is effectively the same? #60

Comments

yuan1202 commented May 23, 2019