-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half-Cheetah and Ant reach high rewards (1000+) but get stuck in a state and don't walk #1718
Comments
Thanks for the report. Please attach a zip file with the modified ARS files.
|
Thanks a lot for your reply and help. Please find attached the zip file.
|
See the modifications in this repo, in particular the numpy np.clip operations and the episode length. Ideally we make a 100% compatible locomotion environment to the original Gym MuJoCo envs (and drop the Roboschool version) |
I have similar problems when training cheetah and ant with ppo. I will
check your solution and see if there is a more generic way to make the envs
work in pybullet. Is it because the simulation does not get some params
from the xml file?
…On Wed, May 30, 2018, 18:11 erwincoumans ***@***.***> wrote:
See the modifications in this repo, in particular the numpy np.clip
operations and the episode length.
***@***.***
<jietan/ARS@f781beb>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1718 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC97q0JzQ89biZFsSZqm5wUDxgPfAKT9ks5t3sTNgaJpZM4UTdmK>
.
|
Oh I see that you are just clipping to [-1;1]. Why does that make sense? Is
this just to make the action not overload the motors in the sense that it
uses the proper range of force, velocity or position?
On Wed, May 30, 2018, 22:33 Benjamin Ellenberger <[email protected]>
wrote:
… I have similar problems when training cheetah and ant with ppo. I will
check your solution and see if there is a more generic way to make the envs
work in pybullet. Is it because the simulation does not get some params
from the xml file?
On Wed, May 30, 2018, 18:11 erwincoumans ***@***.***> wrote:
> See the modifications in this repo, in particular the numpy np.clip
> operations and the episode length.
> ***@***.***
> <jietan/ARS@f781beb>
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#1718 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AC97q0JzQ89biZFsSZqm5wUDxgPfAKT9ks5t3sTNgaJpZM4UTdmK>
> .
>
|
Thanks a lot Erwin! I've just relaunched the training after adding np.clip(..., -1.0, +1.0). Besides I noticed pybullet==2.0.0 was just released so I've just upgraded as well. Let's see if it's going to work now. @benelot let me know if that fixes the issue on your side. Will keep you posted. |
Note that ESTool trains fine for PyBullet Ant and Half Cheetah. You need to use the clipping for both training and test/rollout. You need to clip, since you cannot apply randomly large actions, the action space is in the range [-1.1] |
Hello, I clipped the actions for both training and test but unfortunately I am getting the same results on the Half-Cheetah: the reward reached 1000 but the agent is still getting stuck not moving. Please find attached a zip folder containing:
Maybe I am still missing something but that must be a tiny detail since it works very well with Mujoco (even with no clipping). Would you mind having a quick look at it? I'm close but another pair of eyes might be helpful :) |
Hello, I forced the exploration on the full episode by setting Done to False here: Now it works and the agent walks just fine. Will try on HumanoidFlagrunHarderBulletEnv-v0 tonight. |
Hi, Hadelin2p, |
The Stable-Baselines has implementations of A2D and PPO that trains the HalfCheetahBulletEnv-v0 fine, check out: |
Hello,
I trained the ARS (from the paper here: https://arxiv.org/pdf/1803.07055.pdf) on the Half-Cheetah and Ant environments. It works very well with Mujoco but not with PyBullet. In PyBullet the reward keeps increasing up to 1000+ but the agent cannot walk, it gets stuck in a state at some point without moving. Same with the ant and the humanoid. Would you have any idea of what could be wrong? I'd highly appreciate your help on this. Please find below the code. Kind regards.
`# AI 2018
Importing the libraries
import os
import numpy as np
import gym
from gym import wrappers
import pybullet_envs
Setting the Hyper Parameters
class Hp():
Normalizing the states
class Normalizer():
Building the AI
class Policy():
Exploring the policy on one specific direction and over one episode
def explore(env, normalizer, policy, direction = None, delta = None):
state = env.reset()
done = False
num_plays = 0.
sum_rewards = 0
while not done and num_plays < hp.episode_length:
normalizer.observe(state)
state = normalizer.normalize(state)
action = policy.evaluate(state, delta, direction)
state, reward, done, _ = env.step(action)
reward = max(min(reward, 1), -1)
sum_rewards += reward
num_plays += 1
return sum_rewards
Training the AI
def train(env, policy, normalizer, hp):
Running the main code
def mkdir(base, name):
path = os.path.join(base, name)
if not os.path.exists(path):
os.makedirs(path)
return path
work_dir = mkdir('exp', 'brs')
monitor_dir = mkdir(work_dir, 'monitor')
hp = Hp()
np.random.seed(hp.seed)
env = gym.make(hp.env_name)
env = wrappers.Monitor(env, monitor_dir, force = True)
nb_inputs = env.observation_space.shape[0]
nb_outputs = env.action_space.shape[0]
policy = Policy(nb_inputs, nb_outputs)
normalizer = Normalizer(nb_inputs)
train(env, policy, normalizer, hp)
`
The text was updated successfully, but these errors were encountered: