Skip to content

Commit

Permalink
Update Acrobot Simulation Leaderboard
Browse files Browse the repository at this point in the history
  • Loading branch information
fwiebe committed Sep 3, 2024
1 parent d412ed5 commit f398b99
Show file tree
Hide file tree
Showing 6 changed files with 5,006 additions and 0 deletions.
2 changes: 2 additions & 0 deletions data/acrobot/simulation_v2/history_sac/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# History-based Soft Actor-Critic
This controller uses a policy trained on an altered version of the model-free, maximum entropy-based Soft Actor-Critic Reinforcement Learning algorithm [Soft Actor-Critic](https://arxiv.org/abs/1801.01290). The model learns latent dynamics from temporal data.
2 changes: 2 additions & 0 deletions data/acrobot/simulation_v2/history_sac/scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
1.0,1.0000000000000007,8.077749161960673,1.8037118196867141,0.009546752841345449,88.4965605525382,0.6552699844147534
5,001 changes: 5,001 additions & 0 deletions data/acrobot/simulation_v2/history_sac/sim_swingup.csv

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
Controller,Short Controller Description,Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score,Username,Data
[mcpilco](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/README.md),Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR.,1/1,1.45,19.43,3.22,0.097,253.59,0.316,turcato-niccolo,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/sim_video.gif)
[History SAC](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/history_sac/README.md),SAC using custom model architecture to encode system dynamics.,1/1,1.0,8.08,1.8,0.01,88.5,0.655,tfaust,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/history_sac/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/history_sac/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/history_sac/sim_video.gif)
[TVLQR](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/README.md),Stabilization of iLQR trajectory with time-varying LQR.,1/1,4.05,10.43,1.87,0.016,105.83,0.504,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/sim_video.gif)
[AR-EAPO](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/README.md),Policy trained with average reward maximum entropy RL,1/1,1.39,8.32,1.52,0.008,117.96,0.633,rnilva,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/sim_video.gif)
[iLQR Riccati Gains](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/README.md),Stabilization of iLQR trajectory with Riccati gains. Top stabilization with LQR.,1/1,4.04,10.55,1.98,0.067,106.49,0.396,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/sim_video.gif)
Expand Down

0 comments on commit f398b99

Please sign in to comment.