-
Notifications
You must be signed in to change notification settings - Fork 126
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Hello,
I was studying carefully the code for the panda reach task and 2 questions came up to my mind:
- The
observation
vector returned by the system contains theposition
of the end-effector of the robot. I wonder, whether it would work if theobservation
of the system consists of thejoint angles
of the robot instead of the position of the end-effector. Theoretically, the agent should be able to learn anyway. Or not? - The
reward
is calculated based on the distance between the target and the end-effector or, insparse
mode, it consists of only zeros and ones, when thedistance < distance_threshold
. But in case of sparse reward any DDPG, PPO, SAC agent will fail to learn. How do you train the agent using the sparse reward? Did you use the hindsight experience replay from SB3?
Thanks
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested