Skip to content

Commit

Permalink
Update Pendubot Simulation Performance Leaderboard
Browse files Browse the repository at this point in the history
  • Loading branch information
fwiebe committed Aug 27, 2024
1 parent 609b43c commit a2f212c
Show file tree
Hide file tree
Showing 16 changed files with 15,027 additions and 0 deletions.
4 changes: 4 additions & 0 deletions data/pendubot/simulation_v2/ar_eapo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Average-Reward Entropy Advantage Policy Optimisation Controller

This controller uses a policy trained with Average-Reward Entropy Advantage Policy Optmisation. AR-EAPO is an extension of the model-free, maximum entropy reinforcement learning algorithm [EAPO](https://arxiv.org/abs/2407.18143), applied in an average reward setting.

2 changes: 2 additions & 0 deletions data/pendubot/simulation_v2/ar_eapo/scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
1.0,1.1500000000000008,7.719128043253907,2.426981034454814,0.010329143078101377,64.32603787855236,0.6588462186786841
5,001 changes: 5,001 additions & 0 deletions data/pendubot/simulation_v2/ar_eapo/sim_swingup.csv

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions data/pendubot/simulation_v2/evolsac/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Evolutionary SAC

## Trajectory Learning, Optimization and Stabilization

This controller uses a trajectory dictated by the policy learned in the following way:

1. SAC training with loose surrogate reward
2. SAC training with stricter surrogate reward
3. SNES training with challenge reward + injected noise in the action

The controller uses the final policy network
2 changes: 2 additions & 0 deletions data/pendubot/simulation_v2/evolsac/scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
1.0,0.7060000000000005,9.827938629838249,4.373127468653795,0.013972766079172845,58.15285978325633,0.5959660563516244
5,001 changes: 5,001 additions & 0 deletions data/pendubot/simulation_v2/evolsac/sim_swingup.csv

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
2 changes: 2 additions & 0 deletions data/pendubot/simulation_v2/mcpilco/scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
1.0,1.1380000000000008,8.346170584799614,2.424529125027399,0.05389399646663416,114.2579014428372,0.4797920906741808
5,001 changes: 5,001 additions & 0 deletions data/pendubot/simulation_v2/mcpilco/sim_swingup.csv

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
Controller,Short Controller Description,Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score,Username,Data
[mcpilco](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/mcpilco/README.md),Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR.,1/1,1.14,8.35,2.42,0.054,114.26,0.48,turcato-niccolo,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/mcpilco/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/mcpilco/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/mcpilco/sim_video.gif)
[AR-EAPO](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ar_eapo/README.md),Policy trained with average reward maximum entropy RL,1/1,1.15,7.72,2.43,0.01,64.33,0.659,rnilva,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ar_eapo/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ar_eapo/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ar_eapo/sim_video.gif)
[iLQR Riccati Gains](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_riccati_lqr/README.md),Stabilization of iLQR trajectorry with Riccati gains. Top stabilizaion with LQR.,1/1,4.13,9.53,1.25,0.005,211.34,0.536,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_riccati_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_riccati_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_riccati_lqr/sim_video.gif)
[evolsac](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/evolsac/README.md),Evolutionary SAC for both swingup and stabilisation,1/1,0.71,9.83,4.37,0.014,58.15,0.596,AlbertoSinigaglia,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/evolsac/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/evolsac/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/evolsac/sim_video.gif)
[TVLQR](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_tvlqr_lqr/README.md),Stabilization of iLQR trajectory with time-varying LQR.,1/1,4.13,9.53,1.26,0.007,211.12,0.526,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_tvlqr_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_tvlqr_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_tvlqr_lqr/sim_video.gif)
[iLQR MPC stabilization](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_ilqrmpc_lqr/README.md),Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR.,1/1,4.12,9.91,1.77,0.083,211.98,0.353,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_ilqrmpc_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_ilqrmpc_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_ilqrmpc_lqr/sim_video.gif)

0 comments on commit a2f212c

Please sign in to comment.