off policy？ #17

gxc-nb · 2024-11-05T15:28:21Z

I noticed that the original paper states that the MPO algorithm is an off-policy algorithm, but the logic of your code implementation seems to be on-policy. After updating the target model, the sample pool is also updated, and data collection is restarted. Is my understanding correct?"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

off policy？ #17

off policy？ #17

gxc-nb commented Nov 5, 2024

off policy？ #17

off policy？ #17

Comments

gxc-nb commented Nov 5, 2024