merged master

Denys88 · Nov 24, 2023 · 005ec17 · 005ec17
2 parents 825bdb1 + a5d788a
commit 005ec17
Show file tree

Hide file tree

Showing 176 changed files with 4,709 additions and 1,956 deletions.
diff --git a/README.md b/README.md
diff --git a/docs/DEEPMIND_ENVPOOL.md b/docs/DEEPMIND_ENVPOOL.md
@@ -0,0 +1,32 @@
+# Deepmind Control (https://github.com/deepmind/dm_control)  
+
+* I could not find any ppo deepmind_control benchmark. It is a first version only. Will be updated later.
+
+## How to run:
+* **Humanoid (Stand, Walk or Run)** 
+```
+poetry install -E envpool
+poetry run pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
+poetry run python runner.py --train --file rl_games/configs/dm_control/humanoid_walk.yaml
+```
+
+## Results:
+
+* No tuning. I just run it on a couple of envs.
+* I used 4000 epochs which is ~32M steps for almost all envs except HumanoidRun. But a few millions of steps was enough for the most of the envs.
+* Deepmind used a pretty strange reward and training rules. A simple reward transformation: log(reward + 1) achieves best scores faster.
+
+| Env           | Rewards       |
+| ------------- | ------------- |
+| Ball In Cup Catch  | 938  |
+| Cartpole Balance  | 988  |
+| Cheetah Run | 685  |
+| Fish Swim  | 600  |
+| Hopper Stand  | 557  |
+| Humanoid Stand  | 653  |
+| Humanoid Walk  | 621  |
+| Humanoid Run  | 200  |
+| Pendulum Swingup  | 706  |
+| Walker Stand  | 907  |
+| Walker Walk  | 917  |
+| Walker Run  | 702  |
diff --git a/notebooks/brax_training.ipynb b/notebooks/brax_training.ipynb
diff --git a/notebooks/mujoco_envpool_training.ipynb b/notebooks/mujoco_envpool_training.ipynb
@@ -44,33 +44,27 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "!nvidia-smi -L"
+        "!pip show rl-games"
       ]
     },
     {
       "cell_type": "code",
       "execution_count": null,
-      "metadata": {
-        "id": "6qvHCGgpxrvZ"
-      },
+      "metadata": {},
       "outputs": [],
       "source": [
-        "%load_ext tensorboard"
+        "!nvidia-smi -L"
       ]
     },
     {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "GFv1FDtJyC0z",
-        "outputId": "4082ccf2-139d-415a-c832-8b39f622e899"
+        "id": "6qvHCGgpxrvZ"
       },
       "outputs": [],
       "source": [
-        "!pip show rl-games"
+        "%load_ext tensorboard"
       ]
     },
     {
@@ -367,17 +361,6 @@
         "%tensorboard --logdir 'runs/'"
       ]
     },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "fyvlWdM_abGR"
-      },
-      "outputs": [],
-      "source": [
-        "from rl_games.torch_runner import Runner"
-      ]
-    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -500,9 +483,10 @@
       "outputs": [],
       "source": [
         "import yaml\n",
+        "from rl_games.torch_runner import Runner\n",
         "\n",
         "config = walker_config\n",
-        "config['params']['config']['full_experiment_name'] = 'mujoco'\n",
+        "config['params']['config']['full_experiment_name'] = 'Walker2d_mujoco'\n",
         "config['params']['config']['max_epochs'] = 500\n",
         "config['params']['config']['horizon_length'] = 512\n",
         "config['params']['config']['num_actors'] = 8\n",
@@ -531,11 +515,10 @@
         "config = player_walker_config\n",
         "config['params']['config']['player']['render'] = False\n",
         "config['params']['config']['player']['games_num'] = 2\n",
-        "    \n",
-        "runner = Runner()\n",
+        "\n",
         "runner.load(config)\n",
         "agent = runner.create_player()\n",
-        "agent.restore('runs/mujoco/nn/Walker2d-v4.pth')"
+        "agent.restore('runs/Walker2d_mujoco/nn/Walker2d-v4.pth')"
       ]
     },
     {

diff --git a/notebooks/train_and_export_onnx_example.ipynb b/notebooks/train_and_export_onnx_example.ipynb