You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I took a look at the current code. The Webots driven Open AI gym seems correct, but I don't follow the rest. What are we expecting the agent to learn from a constant forward motion? Why are discretizing everything (and so coarsely)? What are we trying to learn here? We want to adapt or correct a user policy, not learn another that can replace them.
I took a look at the current code. The Webots driven Open AI gym seems correct, but I don't follow the rest. What are we expecting the agent to learn from a constant forward motion? Why are discretizing everything (and so coarsely)? What are we trying to learn here? We want to adapt or correct a user policy, not learn another that can replace them.
I'd strongly suggest we revisit the approaches we spent time studying. Both https://arxiv.org/pdf/1802.01744 and https://arxiv.org/pdf/2004.05097 are quite clear on what they are doing. There is sample code for both too, see https://github.com/rddy/deepassist and https://github.com/cbschaff/rsa.
The text was updated successfully, but these errors were encountered: