forked from LasseMay/test_pandoc_ci
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update Acrobot Simulation Leaderboard
- Loading branch information
Showing
16 changed files
with
15,027 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Average-Reward Entropy Advantage Policy Optimisation Controller | ||
|
||
This controller uses a policy trained with Average-Reward Entropy Advantage Policy Optmisation. AR-EAPO is an extension of the model-free, maximum entropy reinforcement learning algorithm [EAPO](https://arxiv.org/abs/2407.18143), applied in an average reward setting. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score | ||
1.0,1.394000000000001,8.322843658448338,1.5169970532884,0.008076883429364525,117.96157720290056,0.6326131852346231 |
5,001 changes: 5,001 additions & 0 deletions
5,001
data/acrobot/simulation_v2/ar_eapo/sim_swingup.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Evolutionary SAC | ||
|
||
## Trajectory Learning, Optimization and Stabilization | ||
|
||
This controller uses a trajectory dictated by the policy learned in the following way: | ||
|
||
1. SAC training with loose surrogate reward | ||
2. SAC training with stricter surrogate reward | ||
3. SNES training with challenge reward + injected noise in the action | ||
|
||
The controller uses the final policy network |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score | ||
1.0,0.9640000000000007,9.260615537277895,2.713043878883556,0.0300476622396909,96.56281902349662,0.5240572289076553 |
5,001 changes: 5,001 additions & 0 deletions
5,001
data/acrobot/simulation_v2/evolsac/sim_swingup.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score | ||
1.0,1.446000000000001,19.42529671831582,3.220376833800871,0.09690809960262078,253.58977826055238,0.3164769436997448 |
5,001 changes: 5,001 additions & 0 deletions
5,001
data/acrobot/simulation_v2/mcpilco/sim_swingup.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
leaderboards/acrobot_simulation_performance_leaderboard_v2.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,7 @@ | ||
Controller,Short Controller Description,Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score,Username,Data | ||
[mcpilco](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/README.md),Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR.,1/1,1.45,19.43,3.22,0.097,253.59,0.316,turcato-niccolo,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/sim_video.gif) | ||
[TVLQR](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/README.md),Stabilization of iLQR trajectory with time-varying LQR.,1/1,4.05,10.43,1.87,0.016,105.83,0.504,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/sim_video.gif) | ||
[AR-EAPO](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/README.md),Policy trained with average reward maximum entropy RL,1/1,1.39,8.32,1.52,0.008,117.96,0.633,rnilva,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/sim_video.gif) | ||
[iLQR Riccati Gains](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/README.md),Stabilization of iLQR trajectory with Riccati gains. Top stabilization with LQR.,1/1,4.04,10.55,1.98,0.067,106.49,0.396,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/sim_video.gif) | ||
[evolsac](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/evolsac/README.md),Evolutionary SAC for both swingup and stabilisation,1/1,0.96,9.26,2.71,0.03,96.56,0.524,AlbertoSinigaglia,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/evolsac/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/evolsac/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/evolsac/sim_video.gif) | ||
[iLQR MPC stabilization](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_ilqrmpc_lqr/README.md),Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR.,1/1,4.86,11.54,2.68,0.096,110.4,0.345,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_ilqrmpc_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_ilqrmpc_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_ilqrmpc_lqr/sim_video.gif) |