Revisit existing approaches #7

hidmic · 2024-05-17T02:51:37Z

I took a look at the current code. The Webots driven Open AI gym seems correct, but I don't follow the rest. What are we expecting the agent to learn from a constant forward motion? Why are discretizing everything (and so coarsely)? What are we trying to learn here? We want to adapt or correct a user policy, not learn another that can replace them.

I'd strongly suggest we revisit the approaches we spent time studying. Both https://arxiv.org/pdf/1802.01744 and https://arxiv.org/pdf/2004.05097 are quite clear on what they are doing. There is sample code for both too, see https://github.com/rddy/deepassist and https://github.com/cbschaff/rsa.

hidmic · 2024-05-17T02:51:52Z

FYI @glpuga @olmerg

anadiedrichs self-assigned this Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit existing approaches #7

Revisit existing approaches #7

hidmic commented May 17, 2024 •

edited

Loading

hidmic commented May 17, 2024

Revisit existing approaches #7

Revisit existing approaches #7

Comments

hidmic commented May 17, 2024 • edited Loading

hidmic commented May 17, 2024

hidmic commented May 17, 2024 •

edited

Loading