Skip to content

Clarification about the observation or system state returned by the task class #96

@wilhem

Description

@wilhem

Hello,

I was studying carefully the code for the panda reach task and 2 questions came up to my mind:

  1. The observation vector returned by the system contains the position of the end-effector of the robot. I wonder, whether it would work if the observation of the system consists of the joint angles of the robot instead of the position of the end-effector. Theoretically, the agent should be able to learn anyway. Or not?
  2. The reward is calculated based on the distance between the target and the end-effector or, in sparse mode, it consists of only zeros and ones, when the distance < distance_threshold. But in case of sparse reward any DDPG, PPO, SAC agent will fail to learn. How do you train the agent using the sparse reward? Did you use the hindsight experience replay from SB3?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions