The code base accompanying the paper with the above title, accepted in CoRL 2020 and to be published in PMLR. The preprint of the submitted work is available here and the supplementary video can be viewed here.
To install the package and its dependenclies run to following command, inside the folder:
python -m pip install .
The code base was tested with gym (0.17.2), pybullet (2.8.2) with a python version of 3.6.9. However it is expected to work fine for any future versions of these packages, though they havent been tested.
The following tables represent the evalualtion of our controller on different inclines across multiple orientations.The limiations (depicted by ❌) are due to factors like the kinematics limits, robot dimensions(height to width ratio), and dynamics of the robot.
Orientation\Elevation | -13° | -11° | -9° | 7° | -5° | 5° | 7° | 9° | 11° | 13° |
---|---|---|---|---|---|---|---|---|---|---|
0° | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
30° | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
60° | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
90° | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
Orientation\Elevation | -15° | -13° | -11° | -9° | 9° | 11° | 13° | 15° |
---|---|---|---|---|---|---|---|---|
0° | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
30° | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
45° | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Orientation\Elevation | -15° | -13° | -11° | -9° | 9° | 11° | 13° | 15° |
---|---|---|---|---|---|---|---|---|
0° | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
30° | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
45° | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
As explained in the paper, we take a guided learning approach wherein the role of a initial policy is quite crucial.To train your own initial policy,run the following command
python create_initial_policy.py --policyName filename --robotName Stoch2
This saves the initial policy as filename.npy in the initial_policies folder.This file is to be loaded later as the initial polcy when you want train your own polciy.However there are a few initial policies present in the same folder which could be directly used to start the ARS training.
Parameter | About | type |
---|---|---|
--policyName | name of the intial policy | str |
--robotName | name of the robot (Stoch2/Laikago/HyQ) | str |
Note: The initial policies are by default saved in the initial_policies folder.
This is where the ARS trainiing starts, with the initial policy trained in the previous step.
python trainStoch2_policy.py
The above command starts the training for Stoch2 in the default settings and by far the best observed hyperparameters. However, the following parameters can also be customized in the training as desired by the user. The policies for Laikago and HyQ could be trained similarly by running the scripts trainLaikago_policy.py and trainHyQ_policy.py respectively.
Parameter | About | type |
---|---|---|
--render | flag for rendering | bool |
--policy | initial polciy to start the training with | str |
--logdir | Directory root to log policy files (npy) | str |
--lr | learning rate | float |
--noise | amount of random noise to be added to the weights | float |
--msg | any message acompanying the training | str |
--curi_learn | Number of learning iteration before changing the curriculum | int |
--eval_step | Number of policy iterations before a policy update | int |
--episode_length | Horizon of a episode | int |
--domain_Rand | randomizatize the dynamics of the environment while training | int(ony 0 or 1) |
--episode_length | Horizon of a episode | int |
--anti_clock_ori | Anti-clock orientation/Clockwise orientation | bool |
For example,
python trainStoch2_policy.py --lr 0.05 --noise 0.04 --logdir testDir --policy init_policy_Stoch2.npy --msg "Training with some paramters" --episode_length 400
Note:
-
The initial policies are by default loaded from the initial_policies folder and the log directory is saved inside the experiments folder.
-
Domain randomization has not yet been tested with the robots Laikago and HyQ.
-
The are a few other insignificant parameters which need not be changed for the training, for more info about the parameters run
python trainStoch2_policy.py --help
To run a policy in default conditions, the following command is to be used.
python testStoch2_policy.py
The policies for Laikago and HyQ could be tested similarly by running the scripts testLaikago_policy.py and testHyQ_policy.py respectively.The following test parameters can be changed while testing the policy,
Parameter | About | type | Allowed values | unit |
---|---|---|---|---|
--PolicyDir | directory of the policy to be tested (best policies are loaded by default) | str | (check the experiments folder) | - |
--Stairs | load staircase | bool | True or False | unitless |
--WedgeIncline | the elevation angle of wedge | int | 0,5,7,9,11,13,15 | Degrees(°) |
--WedgeOrientation | the yaw angle of wedge about world z axis | float | -90.0 to 90.0 | Degrees(°) |
--EpisodeLength | number of gait steps of a episode | int | 0 to inf | number of steps |
--MotorStrength | maximum motor strength that could be applied | float | 5.0 to 8.0 | NewtonMetre(Nm) |
--FrictionCoeff | coefficient of friction to be set | float | 0.55 to 0.80 | unitless |
--FrontMass | mass to be loaded to the front half of the body | float | 0.0 to 0.15 | Kilograms(Kg) |
--BackMass | mass to be loaded to the rear half of the body | float | 0.0 to 0.15 | Kilograms(Kg) |
--RandomTest | flag to activate random sampling | bool | True or False | unitless |
--seed | seed for random sampling | int | - | unitless |
--PerturbForce | perturbation force to applied perpendicular to the heading direction of the robot | float | -120 to 120 | Newton(N) |
--AddImuNoise | flag to add noise in IMU readings | bool | True or False | unitless |
Thus, for a
-
custom test
python testStoch2_policy.py --PolicyDir 23July3 --WedgeIncline 11 --WedgeOrientation 15 --FrontMass 0.1 --FrictionCoeff 0.6
-
random test
python testStoch2_policy.py --PolicyDir 23July3 --RandomTest True --seed 100
To run a policy on a staircase of fixed dimensions, the following command is to be used. As of now only avaialable for Stoch2.
python testStoch2_policy.py --Stairs True
To run a policy on a arbitary slope track, the following command is to be used. As of now only avaialable for Stoch2.
python arbitary_slope_test.py
Note:
- The test policies are by default loaded from the path experiments/given_logdir_name/iterations/best_policy.npy", if not specified it loads the best ever policy pre-trained by us.
- In our method we only train for +ve roll and -ve pitch conditions of support plane, the trained policy is able to generalize for other conditions too.
- Our env is not fully supported for training in downhill case, but you can evalute policy in downhill conditions.
- The features like stairs, domain parameters that could be randomized, and arbitary slopes have not yet been added in testLaikago_policy.py and testHyQ_policy.py.