Clarification about the observation or system state returned by the task class

Hello,

I was studying carefully the code for the panda reach task and 2 questions came up to my mind:

1. The `observation` vector returned by the system contains the `position` of the end-effector of the robot. I wonder, whether it would work if the `observation` of the system consists of the` joint angles` of the robot instead of the position of the end-effector. Theoretically, the agent should be able to learn anyway. Or not?
2.  The `reward` is calculated based on the distance between the target and the end-effector or, in `sparse` mode, it consists of only zeros and ones, when the `distance < distance_threshold`.  But in case of sparse reward any DDPG, PPO, SAC agent will fail to learn. How do you train the agent using the sparse reward? Did you use the hindsight experience replay from SB3?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification about the observation or system state returned by the task class #96

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clarification about the observation or system state returned by the task class #96

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions