Update Acrobot Simulation Leaderboard

dfki-ric-underactuated-lab · Sep 3, 2024 · f398b99 · f398b99
1 parent d412ed5
commit f398b99
Show file tree

Hide file tree

Showing 6 changed files with 5,006 additions and 0 deletions.
diff --git a/data/acrobot/simulation_v2/history_sac/README.md b/data/acrobot/simulation_v2/history_sac/README.md
@@ -0,0 +1,2 @@
+# History-based Soft Actor-Critic
+This controller uses a policy trained on an altered version of the model-free, maximum entropy-based Soft Actor-Critic Reinforcement Learning algorithm [Soft Actor-Critic](https://arxiv.org/abs/1801.01290). The model learns latent dynamics from temporal data.
diff --git a/data/acrobot/simulation_v2/history_sac/scores.csv b/data/acrobot/simulation_v2/history_sac/scores.csv
@@ -0,0 +1,2 @@
+Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
+1.0,1.0000000000000007,8.077749161960673,1.8037118196867141,0.009546752841345449,88.4965605525382,0.6552699844147534
diff --git a/data/acrobot/simulation_v2/history_sac/sim_swingup.csv b/data/acrobot/simulation_v2/history_sac/sim_swingup.csv
diff --git a/data/acrobot/simulation_v2/history_sac/sim_video.gif b/data/acrobot/simulation_v2/history_sac/sim_video.gif
diff --git a/data/acrobot/simulation_v2/history_sac/timeseries.png b/data/acrobot/simulation_v2/history_sac/timeseries.png
diff --git a/leaderboards/acrobot_simulation_performance_leaderboard_v2.csv b/leaderboards/acrobot_simulation_performance_leaderboard_v2.csv
@@ -1,5 +1,6 @@
 Controller,Short Controller Description,Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score,Username,Data
 [mcpilco](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/README.md),Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR.,1/1,1.45,19.43,3.22,0.097,253.59,0.316,turcato-niccolo,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/mcpilco/sim_video.gif)
+[History SAC](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/history_sac/README.md),SAC using custom model architecture to encode system dynamics.,1/1,1.0,8.08,1.8,0.01,88.5,0.655,tfaust,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/history_sac/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/history_sac/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/history_sac/sim_video.gif)
 [TVLQR](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/README.md),Stabilization of iLQR trajectory with time-varying LQR.,1/1,4.05,10.43,1.87,0.016,105.83,0.504,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_tvlqr/sim_video.gif)
 [AR-EAPO](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/README.md),Policy trained with average reward maximum entropy RL,1/1,1.39,8.32,1.52,0.008,117.96,0.633,rnilva,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ar_eapo/sim_video.gif)
 [iLQR Riccati Gains](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/README.md),Stabilization of iLQR trajectory with Riccati gains. Top stabilization with LQR.,1/1,4.04,10.55,1.98,0.067,106.49,0.396,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/acrobot/simulation_v2/ilqr_riccati_lqr/sim_video.gif)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# History-based Soft Actor-Critic
		This controller uses a policy trained on an altered version of the model-free, maximum entropy-based Soft Actor-Critic Reinforcement Learning algorithm [Soft Actor-Critic](https://arxiv.org/abs/1801.01290). The model learns latent dynamics from temporal data.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
		1.0,1.0000000000000007,8.077749161960673,1.8037118196867141,0.009546752841345449,88.4965605525382,0.6552699844147534