Skip to content

Commit

Permalink
Update Acrobot Simulation Leaderboard
Browse files Browse the repository at this point in the history
  • Loading branch information
fwiebe committed Aug 27, 2024
1 parent e561135 commit 609b43c
Show file tree
Hide file tree
Showing 16 changed files with 15,027 additions and 0 deletions.
4 changes: 4 additions & 0 deletions data/acrobot/simulation_v2/ar_eapo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Average-Reward Entropy Advantage Policy Optimisation Controller

This controller uses a policy trained with Average-Reward Entropy Advantage Policy Optmisation. AR-EAPO is an extension of the model-free, maximum entropy reinforcement learning algorithm [EAPO](https://arxiv.org/abs/2407.18143), applied in an average reward setting.

2 changes: 2 additions & 0 deletions data/acrobot/simulation_v2/ar_eapo/scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
1.0,1.394000000000001,8.322843658448338,1.5169970532884,0.008076883429364525,117.96157720290056,0.6326131852346231
5,001 changes: 5,001 additions & 0 deletions data/acrobot/simulation_v2/ar_eapo/sim_swingup.csv

Large diffs are not rendered by default.

Binary file added data/acrobot/simulation_v2/ar_eapo/sim_video.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added data/acrobot/simulation_v2/ar_eapo/timeseries.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions data/acrobot/simulation_v2/evolsac/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Evolutionary SAC

## Trajectory Learning, Optimization and Stabilization

This controller uses a trajectory dictated by the policy learned in the following way:

1. SAC training with loose surrogate reward
2. SAC training with stricter surrogate reward
3. SNES training with challenge reward + injected noise in the action

The controller uses the final policy network
2 changes: 2 additions & 0 deletions data/acrobot/simulation_v2/evolsac/scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
1.0,0.9640000000000007,9.260615537277895,2.713043878883556,0.0300476622396909,96.56281902349662,0.5240572289076553
5,001 changes: 5,001 additions & 0 deletions data/acrobot/simulation_v2/evolsac/sim_swingup.csv

Large diffs are not rendered by default.

Binary file added data/acrobot/simulation_v2/evolsac/sim_video.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added data/acrobot/simulation_v2/evolsac/timeseries.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
2 changes: 2 additions & 0 deletions data/acrobot/simulation_v2/mcpilco/scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
1.0,1.446000000000001,19.42529671831582,3.220376833800871,0.09690809960262078,253.58977826055238,0.3164769436997448
5,001 changes: 5,001 additions & 0 deletions data/acrobot/simulation_v2/mcpilco/sim_swingup.csv

Large diffs are not rendered by default.

Binary file added data/acrobot/simulation_v2/mcpilco/sim_video.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added data/acrobot/simulation_v2/mcpilco/timeseries.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
Controller,Short Controller Description,Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score,Username,Data
[mcpilco](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/README.md),Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR.,1/1,1.45,19.43,3.22,0.097,253.59,0.316,turcato-niccolo,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/sim_video.gif)
[TVLQR](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/README.md),Stabilization of iLQR trajectory with time-varying LQR.,1/1,4.05,10.43,1.87,0.016,105.83,0.504,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/sim_video.gif)
[AR-EAPO](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/README.md),Policy trained with average reward maximum entropy RL,1/1,1.39,8.32,1.52,0.008,117.96,0.633,rnilva,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/sim_video.gif)
[iLQR Riccati Gains](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/README.md),Stabilization of iLQR trajectory with Riccati gains. Top stabilization with LQR.,1/1,4.04,10.55,1.98,0.067,106.49,0.396,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/sim_video.gif)
[evolsac](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/evolsac/README.md),Evolutionary SAC for both swingup and stabilisation,1/1,0.96,9.26,2.71,0.03,96.56,0.524,AlbertoSinigaglia,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/evolsac/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/evolsac/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/evolsac/sim_video.gif)
[iLQR MPC stabilization](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_ilqrmpc_lqr/README.md),Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR.,1/1,4.86,11.54,2.68,0.096,110.4,0.345,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_ilqrmpc_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_ilqrmpc_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_ilqrmpc_lqr/sim_video.gif)

0 comments on commit 609b43c

Please sign in to comment.