Update Pendubot Simulation Performance Leaderboard

dfki-ric-underactuated-lab · Aug 27, 2024 · a2f212c · a2f212c
1 parent 609b43c
commit a2f212c
Show file tree

Hide file tree

Showing 16 changed files with 15,027 additions and 0 deletions.
diff --git a/data/pendubot/simulation_v2/ar_eapo/README.md b/data/pendubot/simulation_v2/ar_eapo/README.md
@@ -0,0 +1,4 @@
+# Average-Reward Entropy Advantage Policy Optimisation Controller
+
+This controller uses a policy trained with Average-Reward Entropy Advantage Policy Optmisation. AR-EAPO is an extension of the model-free, maximum entropy reinforcement learning algorithm [EAPO](https://arxiv.org/abs/2407.18143), applied in an average reward setting.
+
diff --git a/data/pendubot/simulation_v2/ar_eapo/scores.csv b/data/pendubot/simulation_v2/ar_eapo/scores.csv
@@ -0,0 +1,2 @@
+Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
+1.0,1.1500000000000008,7.719128043253907,2.426981034454814,0.010329143078101377,64.32603787855236,0.6588462186786841
diff --git a/data/pendubot/simulation_v2/ar_eapo/sim_swingup.csv b/data/pendubot/simulation_v2/ar_eapo/sim_swingup.csv
diff --git a/data/pendubot/simulation_v2/ar_eapo/sim_video.gif b/data/pendubot/simulation_v2/ar_eapo/sim_video.gif
diff --git a/data/pendubot/simulation_v2/ar_eapo/timeseries.png b/data/pendubot/simulation_v2/ar_eapo/timeseries.png
diff --git a/data/pendubot/simulation_v2/evolsac/README.md b/data/pendubot/simulation_v2/evolsac/README.md
@@ -0,0 +1,11 @@
+# Evolutionary SAC
+
+## Trajectory Learning, Optimization and Stabilization
+
+This controller uses a trajectory dictated by the policy learned in the following way:
+
+  1. SAC training with loose surrogate reward
+  2. SAC training with stricter surrogate reward
+  3. SNES training with challenge reward + injected noise in the action
+
+The controller uses the final policy network
diff --git a/data/pendubot/simulation_v2/evolsac/scores.csv b/data/pendubot/simulation_v2/evolsac/scores.csv
@@ -0,0 +1,2 @@
+Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
+1.0,0.7060000000000005,9.827938629838249,4.373127468653795,0.013972766079172845,58.15285978325633,0.5959660563516244
diff --git a/data/pendubot/simulation_v2/evolsac/sim_swingup.csv b/data/pendubot/simulation_v2/evolsac/sim_swingup.csv
diff --git a/data/pendubot/simulation_v2/evolsac/sim_video.gif b/data/pendubot/simulation_v2/evolsac/sim_video.gif
diff --git a/data/pendubot/simulation_v2/evolsac/timeseries.png b/data/pendubot/simulation_v2/evolsac/timeseries.png
diff --git a/data/pendubot/simulation_v2/mcpilco/README.md b/data/pendubot/simulation_v2/mcpilco/README.md
diff --git a/data/pendubot/simulation_v2/mcpilco/scores.csv b/data/pendubot/simulation_v2/mcpilco/scores.csv
@@ -0,0 +1,2 @@
+Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
+1.0,1.1380000000000008,8.346170584799614,2.424529125027399,0.05389399646663416,114.2579014428372,0.4797920906741808
diff --git a/data/pendubot/simulation_v2/mcpilco/sim_swingup.csv b/data/pendubot/simulation_v2/mcpilco/sim_swingup.csv
diff --git a/data/pendubot/simulation_v2/mcpilco/sim_video.gif b/data/pendubot/simulation_v2/mcpilco/sim_video.gif
diff --git a/data/pendubot/simulation_v2/mcpilco/timeseries.png b/data/pendubot/simulation_v2/mcpilco/timeseries.png
diff --git a/leaderboards/pendubot_simulation_performance_leaderboard_v2.csv b/leaderboards/pendubot_simulation_performance_leaderboard_v2.csv
@@ -1,4 +1,7 @@
 Controller,Short Controller Description,Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score,Username,Data
+[mcpilco](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/mcpilco/README.md),Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR.,1/1,1.14,8.35,2.42,0.054,114.26,0.48,turcato-niccolo,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/mcpilco/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/mcpilco/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/mcpilco/sim_video.gif)
+[AR-EAPO](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ar_eapo/README.md),Policy trained with average reward maximum entropy RL,1/1,1.15,7.72,2.43,0.01,64.33,0.659,rnilva,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ar_eapo/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ar_eapo/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ar_eapo/sim_video.gif)
 [iLQR Riccati Gains](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_riccati_lqr/README.md),Stabilization of iLQR trajectorry with Riccati gains. Top stabilizaion with LQR.,1/1,4.13,9.53,1.25,0.005,211.34,0.536,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_riccati_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_riccati_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_riccati_lqr/sim_video.gif)
+[evolsac](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/evolsac/README.md),Evolutionary SAC for both swingup and stabilisation,1/1,0.71,9.83,4.37,0.014,58.15,0.596,AlbertoSinigaglia,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/evolsac/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/evolsac/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/evolsac/sim_video.gif)
 [TVLQR](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_tvlqr_lqr/README.md),Stabilization of iLQR trajectory with time-varying LQR.,1/1,4.13,9.53,1.26,0.007,211.12,0.526,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_tvlqr_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_tvlqr_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_tvlqr_lqr/sim_video.gif)
 [iLQR MPC stabilization](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_ilqrmpc_lqr/README.md),Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR.,1/1,4.12,9.91,1.77,0.083,211.98,0.353,fwiebe,[data](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_ilqrmpc_lqr/sim_swingup.csv) [plot](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_ilqrmpc_lqr/timeseries.png) [video](https://github.com/dfki-ric-underactuated-lab/real_ai_gym_leaderboard/tree/main/data/pendubot/simulation_v2/ilqr_ilqrmpc_lqr/sim_video.gif)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@
		# Average-Reward Entropy Advantage Policy Optimisation Controller

		This controller uses a policy trained with Average-Reward Entropy Advantage Policy Optmisation. AR-EAPO is an extension of the model-free, maximum entropy reinforcement learning algorithm [EAPO](https://arxiv.org/abs/2407.18143), applied in an average reward setting.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
		1.0,1.1500000000000008,7.719128043253907,2.426981034454814,0.010329143078101377,64.32603787855236,0.6588462186786841
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
		1.0,0.7060000000000005,9.827938629838249,4.373127468653795,0.013972766079172845,58.15285978325633,0.5959660563516244
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Swingup Success,Swingup Time [s],Energy [J],Torque Cost[N²m²],Torque Smoothness [Nm],Velocity Cost [m²/s²],RealAI Score
		1.0,1.1380000000000008,8.346170584799614,2.424529125027399,0.05389399646663416,114.2579014428372,0.4797920906741808