Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added RollBall env #366

Merged
merged 9 commits into from
Jun 13, 2024
Merged

Added RollBall env #366

merged 9 commits into from
Jun 13, 2024

Conversation

guru-narayana
Copy link
Contributor

A simple task where the objective is to push and roll a ball to a goal region at the other end of the table.
When testing with baseline PPO please use max_steps of 60, the ball takes time to roll

Copy link
Member

@StoneT2000 StoneT2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide the exact command line script for ppo that I can just copy and paste?

Thank you.

@guru-narayana
Copy link
Contributor Author

Updated everything in accordance with your recommendations.

Use the following command to run PPO

python examples/baselines/ppo/ppo.py --env_id="RollBall-v1"  --num_envs=1024 --update_epochs=8 --num_minibatches=32 --seed=100 --total_timesteps=100_00_000 --eval_freq=8 --num-steps=60 --num_eval_steps=60 --gamma 0.95

:::

<video preload="auto" controls="True" width="100%">
<source src="https://github.com/haosulab/ManiSkill/raw/main/figures/environment_demos/RollBall-v1.mp4" type="video/mp4">
Copy link
Member

@StoneT2000 StoneT2000 Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked the video. how did you generate it? It looks like the first frame is from another episode (asking about how you generated since this could be a bug with a tool of ours if you used our tools)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found where bug could be, when I am saving the trajectory from PPO with "--evaluate" its reset params is empty so that could be root cause. Is there a way to prevent this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so you are straight running the PPO code in the repo. I will check this

@StoneT2000
Copy link
Member

I am still training the environment @guru-narayana to check it solves in a reasonalbe time (the PPO script uses 100M steps which is a lot. If it solves in about an hour on a 3080 I will merge this in anyway).

I recommend trying to tune the reward function a bit more (maybe a staged reward function can work better). Another point is that the reward gets the agent to be 0.05m behind the ball, but this can be suboptimal as you need to hit the ball at an angle.

@StoneT2000
Copy link
Member

Actually for the example ppo script did you mean to write 100_000_000, it says 100_00_000

@guru-narayana
Copy link
Contributor Author

guru-narayana commented Jun 11, 2024

I am still training the environment @guru-narayana to check it solves in a reasonalbe time (the PPO script uses 100M steps which is a lot. If it solves in about an hour on a 3080 I will merge this in anyway).

I recommend trying to tune the reward function a bit more (maybe a staged reward function can work better). Another point is that the reward gets the agent to be 0.05m behind the ball, but this can be suboptimal as you need to hit the ball at an angle.

I made suggested modification and now the agent needs to be at 0.05m behind the ball and along the direction of the goal to get reward. I also made the reward function staged.

please use this modified command to test the environment.
python ManiSkill/examples/baselines/ppo/ppo.py --env_id="RollBall-v1" --num_eval_envs=8 --num_envs=1024 --update_epochs=8 --num_minibatches=32 --total_timesteps=20_000_000 --eval_freq=10 --num-steps=80 --num_eval_steps=80 --gamma=0.95

@guru-narayana
Copy link
Contributor Author

guru-narayana commented Jun 11, 2024

Actually for the example ppo script did you mean to write 100_000_000, it says 100_00_000

100_00_000 was correct previously, but now please use the command in my recent comment for execution.

@StoneT2000
Copy link
Member

Furthermore can you merge in the main branch? It seems you may have used a version on the main branch that had a small bug with the ManiSkillVectorEnv (apoligies for that). Things should run faster now / correctly. Otherwise i can verify this task works correctly, only that one small concern about the need to have a delayed boolean

@guru-narayana
Copy link
Contributor Author

Furthermore can you merge in the main branch? It seems you may have used a version on the main branch that had a small bug with the ManiSkillVectorEnv (apoligies for that). Things should run faster now / correctly. Otherwise i can verify this task works correctly, only that one small concern about the need to have a delayed boolean

Done

@StoneT2000 StoneT2000 merged commit 144b5b6 into haosulab:main Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants