-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added RollBall env #366
Added RollBall env #366
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide the exact command line script for ppo that I can just copy and paste?
Thank you.
Updated everything in accordance with your recommendations. Use the following command to run PPO
|
::: | ||
|
||
<video preload="auto" controls="True" width="100%"> | ||
<source src="https://github.com/haosulab/ManiSkill/raw/main/figures/environment_demos/RollBall-v1.mp4" type="video/mp4"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked the video. how did you generate it? It looks like the first frame is from another episode (asking about how you generated since this could be a bug with a tool of ours if you used our tools)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found where bug could be, when I am saving the trajectory from PPO with "--evaluate" its reset params is empty so that could be root cause. Is there a way to prevent this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so you are straight running the PPO code in the repo. I will check this
I am still training the environment @guru-narayana to check it solves in a reasonalbe time (the PPO script uses 100M steps which is a lot. If it solves in about an hour on a 3080 I will merge this in anyway). I recommend trying to tune the reward function a bit more (maybe a staged reward function can work better). Another point is that the reward gets the agent to be 0.05m behind the ball, but this can be suboptimal as you need to hit the ball at an angle. |
Actually for the example ppo script did you mean to write 100_000_000, it says 100_00_000 |
I made suggested modification and now the agent needs to be at 0.05m behind the ball and along the direction of the goal to get reward. I also made the reward function staged. please use this modified command to test the environment. |
100_00_000 was correct previously, but now please use the command in my recent comment for execution. |
Furthermore can you merge in the main branch? It seems you may have used a version on the main branch that had a small bug with the ManiSkillVectorEnv (apoligies for that). Things should run faster now / correctly. Otherwise i can verify this task works correctly, only that one small concern about the need to have a delayed boolean |
Done |
A simple task where the objective is to push and roll a ball to a goal region at the other end of the table.
When testing with baseline PPO please use max_steps of 60, the ball takes time to roll