Cannot reproduce the result for run_and_gun #1

ruipengZ · 2024-04-10T15:31:17Z

Hi! I have trouble reproducing the result for the single environment run_and_gun. I run python run_single.py --scenario run_and_gun and get the following prints: (after 15 epochs, the training success is still 0)

2024-04-10 07:32:16.450011: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-10 07:32:16.630269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-04-10 07:32:16.630319: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-04-10 07:32:17.626895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-04-10 07:32:17.627104: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-04-10 07:32:17.627124: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-04-10 07:32:26.142719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.142959: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.143142: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.143322: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.765469: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.769994: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2024-04-10 07:32:26.770554: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Saving config:
{
"activation": "lrelu",
"agent_policy_exploration": false,
"alpha": "auto",
"augment": false,
"augmentation": null,
"batch_size": 128,
"buffer_type": "fifo",
"cl_method": null,
"cl_reg_coef": 0.0,
"clipnorm": null,
"envs": [
"default"
],
"episodic_batch_size": 0,
"episodic_mem_per_task": 0,
"episodic_memory_from_buffer": true,
"exploration_kind": null,
"frame_height": 84,
"frame_skip": 4,
"frame_stack": 4,
"frame_width": 84,
"gamma": 0.99,
"gpu": null,
"group_id": "default_group",
"hidden_sizes": [
256,
256
],
"hide_task_id": false,
"log_every": 1000,
"logger_output": [
"tsv",
"tensorboard"
],
"lr": 0.001,
"lr_decay": "linear",
"lr_decay_rate": 0.1,
"lr_decay_steps": 100000,
"model_path": null,
"multihead_archs": true,
"n_updates": 50,
"no_test": false,
"num_repeats": 1,
"packnet_retrain_steps": 0,
"penalty_ammo_used": -0.1,
"penalty_death": -1.0,
"penalty_health_dtc": -1.0,
"penalty_health_has": -5.0,
"penalty_health_hg": -0.01,
"penalty_lava": -0.1,
"penalty_passivity": -0.1,
"penalty_projectile": -0.01,
"random_order": false,
"record": false,
"record_every": 100,
"regularize_critic": false,
"render": false,
"render_sleep": 0.0,
"replay_size": 50000,
"reset_buffer_on_task_change": true,
"reset_critic_on_task_change": false,
"reset_optimizer_on_task_change": true,
"resolution": null,
"reward_delivery": 30.0,
"reward_frame_survived": 0.01,
"reward_health_has": 5.0,
"reward_health_hg": 15.0,
"reward_kill_chain": 5.0,
"reward_kill_dtc": 1.0,
"reward_kill_rag": 5.0,
"reward_on_platform": 0.1,
"reward_platform_reached": 1.0,
"reward_scaler_pitfall": 0.1,
"reward_scaler_traversal": 0.001,
"reward_switch_pressed": 15.0,
"reward_weapon_ad": 15.0,
"save_freq_epochs": 25,
"scenarios": [
"run_and_gun"
],
"seed": 0,
"sequence": null,
"sparse_rewards": false,
"start_from": 0,
"start_steps": 10000,
"steps_per_env": 200000,
"target_output_std": 0.089,
"test": true,
"test_envs": [],
"test_episodes": 3,
"test_only": false,
"update_after": 5000,
"update_every": 500,
"use_layer_norm": true,
"use_lstm": false,
"variable_queue_length": 5,
"vcl_first_task_kl": false,
"video_folder": "videos",
"with_wandb": false
}
Logging data to ./logs/default_group/2024_04_10__07_32_36_eK6l57
/usr/local/lib/python3.10/dist-packages/gym/core.py:317: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future.
deprecation(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.num_tasks to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.num_tasks for environment variables or env.get_wrapper_attr('num_tasks') that will search the reminding wrappers.
logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.get_active_env to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.get_active_env for environment variables or env.get_wrapper_attr('get_active_env') that will search the reminding wrappers.
logger.warn(
2024-04-10 07:32:37 - Observations shape: (4, 84, 84, 3)
2024-04-10 07:32:37 - Actions shape: 12
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.task_id to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.task_id for environment variables or env.get_wrapper_attr('task_id') that will search the reminding wrappers.
logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.cur_seq_idx to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.cur_seq_idx for environment variables or env.get_wrapper_attr('cur_seq_idx') that will search the reminding wrappers.
logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.name to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.name for environment variables or env.get_wrapper_attr('name') that will search the reminding wrappers.
logger.warn(
2024-04-10 07:32:39 - Episode 1 duration: 0.9786. Buffer capacity: 0.63% (313/50000)
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.get_statistics to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.get_statistics for environment variables or env.get_wrapper_attr('get_statistics') that will search the reminding wrappers.
logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.clear_episode_statistics to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.clear_episode_statistics for environment variables or env.get_wrapper_attr('clear_episode_statistics') that will search the reminding wrappers.
logger.warn(
2024-04-10 07:32:40 - Episode 2 duration: 0.9469. Buffer capacity: 1.25% (626/50000)
2024-04-10 07:32:41 - Episode 3 duration: 0.9312. Buffer capacity: 1.88% (939/50000)

| train/actions/0 | 80 |
| train/actions/1 | 71 |
| train/actions/2 | 77 |
| train/actions/3 | 87 |
| train/actions/4 | 79 |
| train/actions/5 | 69 |
| train/actions/6 | 97 |
| train/actions/7 | 85 |
| train/actions/8 | 92 |
| train/actions/9 | 89 |
| train/actions/10 | 81 |
| train/actions/11 | 93 |
| epoch | 1 |
| learning_rate | 0.001 |
| train/return/avg | 22.3 |
| train/return/std | 4.89 |
| train/return/max | 29 |
| train/return/min | 17.5 |
| train/ep_length | 313 |
| total_env_steps | 1e+03 |
| current_task_steps | 1e+03 |
| buffer_capacity | 1.25 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 2 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 2.33 |
| train/ammo | 69.7 |
| train/movement | 34 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 3.12 |
Scalar logging time: 0.0392305850982666
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:32:42 - Time elapsed for logging: 0.04345273971557617
2024-04-10 07:32:42 - Episode 4 duration: 0.6291. Buffer capacity: 2.50% (1252/50000)
2024-04-10 07:32:43 - Episode 5 duration: 0.7800. Buffer capacity: 3.13% (1565/50000)
2024-04-10 07:32:44 - Episode 6 duration: 0.7743. Buffer capacity: 3.76% (1878/50000)

| train/actions/0 | 71 |
| train/actions/1 | 73 |
| train/actions/2 | 89 |
| train/actions/3 | 84 |
| train/actions/4 | 83 |
| train/actions/5 | 94 |
| train/actions/6 | 84 |
| train/actions/7 | 84 |
| train/actions/8 | 78 |
| train/actions/9 | 85 |
| train/actions/10 | 93 |
| train/actions/11 | 82 |
| epoch | 2 |
| learning_rate | 0.001 |
| train/return/avg | 17.9 |
| train/return/std | 6.1 |
| train/return/max | 25 |
| train/return/min | 10.1 |
| train/ep_length | 313 |
| total_env_steps | 2e+03 |
| current_task_steps | 2e+03 |
| buffer_capacity | 3.13 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 5 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 2 |
| train/ammo | 70.7 |
| train/movement | 25.4 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 5.67 |
Scalar logging time: 0.022348403930664062
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:32:44 - Time elapsed for logging: 0.023654937744140625
2024-04-10 07:32:45 - Episode 7 duration: 0.4842. Buffer capacity: 4.38% (2191/50000)
2024-04-10 07:32:45 - Episode 8 duration: 0.7810. Buffer capacity: 5.01% (2504/50000)
2024-04-10 07:32:46 - Episode 9 duration: 0.8011. Buffer capacity: 5.63% (2817/50000)

| train/actions/0 | 94 |
| train/actions/1 | 90 |
| train/actions/2 | 69 |
| train/actions/3 | 77 |
| train/actions/4 | 76 |
| train/actions/5 | 72 |
| train/actions/6 | 83 |
| train/actions/7 | 89 |
| train/actions/8 | 72 |
| train/actions/9 | 94 |
| train/actions/10 | 85 |
| train/actions/11 | 99 |
| epoch | 3 |
| learning_rate | 0.001 |
| train/return/avg | 14.4 |
| train/return/std | 6.22 |
| train/return/max | 19.5 |
| train/return/min | 5.64 |
| train/ep_length | 313 |
| total_env_steps | 3e+03 |
| current_task_steps | 3e+03 |
| buffer_capacity | 5.01 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 8 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 1.33 |
| train/ammo | 68.7 |
| train/movement | 24.8 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 8.24 |
Scalar logging time: 0.020737409591674805
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:32:47 - Time elapsed for logging: 0.021990299224853516
2024-04-10 07:32:47 - Episode 10 duration: 0.3172. Buffer capacity: 6.26% (3130/50000)
2024-04-10 07:32:48 - Episode 11 duration: 0.7862. Buffer capacity: 6.89% (3443/50000)
2024-04-10 07:32:49 - Episode 12 duration: 0.8000. Buffer capacity: 7.51% (3756/50000)

| train/actions/0 | 97 |
| train/actions/1 | 80 |
| train/actions/2 | 95 |
| train/actions/3 | 83 |
| train/actions/4 | 83 |
| train/actions/5 | 98 |
| train/actions/6 | 82 |
| train/actions/7 | 73 |
| train/actions/8 | 72 |
| train/actions/9 | 70 |
| train/actions/10 | 81 |
| train/actions/11 | 86 |
| epoch | 4 |
| learning_rate | 0.001 |
| train/return/avg | 15.1 |
| train/return/std | 8.24 |
| train/return/max | 23.3 |
| train/return/min | 3.8 |
| train/ep_length | 313 |
| total_env_steps | 4e+03 |
| current_task_steps | 4e+03 |
| buffer_capacity | 6.89 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 11 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 1.67 |
| train/ammo | 70 |
| train/movement | 21.6 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 10.8 |
Scalar logging time: 0.020810604095458984
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:32:49 - Time elapsed for logging: 0.022226572036743164
2024-04-10 07:32:49 - Episode 13 duration: 0.1740. Buffer capacity: 8.14% (4069/50000)
2024-04-10 07:32:50 - Episode 14 duration: 0.7823. Buffer capacity: 8.76% (4382/50000)
2024-04-10 07:32:51 - Episode 15 duration: 0.7740. Buffer capacity: 9.39% (4695/50000)

| train/actions/0 | 97 |
| train/actions/1 | 89 |
| train/actions/2 | 72 |
| train/actions/3 | 76 |
| train/actions/4 | 95 |
| train/actions/5 | 89 |
| train/actions/6 | 81 |
| train/actions/7 | 79 |
| train/actions/8 | 77 |
| train/actions/9 | 67 |
| train/actions/10 | 102 |
| train/actions/11 | 76 |
| epoch | 5 |
| learning_rate | 0.001 |
| train/return/avg | 18.7 |
| train/return/std | 1.31 |
| train/return/max | 20.2 |
| train/return/min | 17 |
| train/ep_length | 313 |
| total_env_steps | 5e+03 |
| current_task_steps | 5e+03 |
| buffer_capacity | 8.76 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 14 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 2 |
| train/ammo | 67 |
| train/movement | 27.8 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 13.4 |
Scalar logging time: 0.027970314025878906
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:32:52 - Time elapsed for logging: 0.02982330322265625
2024-04-10 07:35:13 - Time elapsed for a policy update: 140.75769782066345
2024-04-10 07:35:13 - Episode 16 duration: 140.7804. Buffer capacity: 10.02% (5008/50000)
2024-04-10 07:35:14 - Episode 17 duration: 0.9168. Buffer capacity: 10.64% (5321/50000)
2024-04-10 07:37:31 - Time elapsed for a policy update: 137.16376042366028
2024-04-10 07:37:32 - Episode 18 duration: 138.0552. Buffer capacity: 11.27% (5634/50000)
2024-04-10 07:37:32 - Episode 19 duration: 0.7878. Buffer capacity: 11.89% (5947/50000)

| train/actions/0 | 90 |
| train/actions/1 | 85 |
| train/actions/2 | 82 |
| train/actions/3 | 79 |
| train/actions/4 | 85 |
| train/actions/5 | 83 |
| train/actions/6 | 94 |
| train/actions/7 | 77 |
| train/actions/8 | 84 |
| train/actions/9 | 89 |
| train/actions/10 | 76 |
| train/actions/11 | 76 |
| epoch | 6 |
| learning_rate | 0.000997 |
| train/return/avg | 13.4 |
| train/return/std | 11.5 |
| train/return/max | 33.3 |
| train/return/min | 5.59 |
| train/ep_length | 313 |
| total_env_steps | 6e+03 |
| current_task_steps | 6e+03 |
| buffer_capacity | 11 |
| train/loss_kl | 0 |
| train/loss_pi | 3.27 |
| train/loss_q1 | 0.833 |
| train/loss_q2 | 0.939 |
| train/episodes | 17.5 |
| train/alpha/0 | 2.53 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1.5 |
| train/ammo | 67 |
| train/movement | 19.1 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 294 |
Scalar logging time: 0.023244857788085938
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:37:33 - Time elapsed for logging: 0.03106379508972168
2024-04-10 07:39:47 - Time elapsed for a policy update: 134.37921953201294
2024-04-10 07:39:48 - Episode 20 duration: 135.0360. Buffer capacity: 12.52% (6260/50000)
2024-04-10 07:42:00 - Time elapsed for a policy update: 131.47774195671082
2024-04-10 07:42:00 - Episode 21 duration: 132.2652. Buffer capacity: 13.15% (6573/50000)
2024-04-10 07:42:01 - Episode 22 duration: 0.7792. Buffer capacity: 13.77% (6886/50000)

| train/actions/0 | 91 |
| train/actions/1 | 87 |
| train/actions/2 | 85 |
| train/actions/3 | 79 |
| train/actions/4 | 76 |
| train/actions/5 | 103 |
| train/actions/6 | 82 |
| train/actions/7 | 84 |
| train/actions/8 | 72 |
| train/actions/9 | 86 |
| train/actions/10 | 68 |
| train/actions/11 | 87 |
| epoch | 7 |
| learning_rate | 0.000995 |
| train/return/avg | 12.1 |
| train/return/std | 1.31 |
| train/return/max | 14 |
| train/return/min | 10.9 |
| train/ep_length | 313 |
| total_env_steps | 7e+03 |
| current_task_steps | 7e+03 |
| buffer_capacity | 13.1 |
| train/loss_kl | 0 |
| train/loss_pi | 6.3 |
| train/loss_q1 | 0.264 |
| train/loss_q2 | 0.262 |
| train/episodes | 21 |
| train/alpha/0 | 2.15 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 0.667 |
| train/ammo | 71 |
| train/movement | 28.3 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 563 |
Scalar logging time: 0.022394180297851562
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:42:01 - Time elapsed for logging: 0.032239437103271484
2024-04-10 07:44:14 - Time elapsed for a policy update: 132.7520670890808
2024-04-10 07:44:14 - Episode 23 duration: 133.2505. Buffer capacity: 14.40% (7199/50000)
2024-04-10 07:46:28 - Time elapsed for a policy update: 133.12892532348633
2024-04-10 07:46:28 - Episode 24 duration: 133.9228. Buffer capacity: 15.02% (7512/50000)
2024-04-10 07:46:29 - Episode 25 duration: 0.9700. Buffer capacity: 15.65% (7825/50000)

| train/actions/0 | 87 |
| train/actions/1 | 78 |
| train/actions/2 | 83 |
| train/actions/3 | 79 |
| train/actions/4 | 85 |
| train/actions/5 | 78 |
| train/actions/6 | 78 |
| train/actions/7 | 89 |
| train/actions/8 | 105 |
| train/actions/9 | 92 |
| train/actions/10 | 76 |
| train/actions/11 | 70 |
| epoch | 8 |
| learning_rate | 0.000992 |
| train/return/avg | 18.8 |
| train/return/std | 6.76 |
| train/return/max | 28.2 |
| train/return/min | 12.8 |
| train/ep_length | 313 |
| total_env_steps | 8e+03 |
| current_task_steps | 8e+03 |
| buffer_capacity | 15 |
| train/loss_kl | 0 |
| train/loss_pi | 8.42 |
| train/loss_q1 | 0.357 |
| train/loss_q2 | 0.344 |
| train/episodes | 24 |
| train/alpha/0 | 1.84 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 2.33 |
| train/ammo | 68.7 |
| train/movement | 22.8 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 831 |
Scalar logging time: 0.02210068702697754
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:46:30 - Time elapsed for logging: 0.0294036865234375
2024-04-10 07:48:41 - Time elapsed for a policy update: 131.84115433692932
2024-04-10 07:48:42 - Episode 26 duration: 132.1760. Buffer capacity: 16.28% (8138/50000)
2024-04-10 07:48:43 - Episode 27 duration: 0.7789. Buffer capacity: 16.90% (8451/50000)
2024-04-10 07:50:55 - Time elapsed for a policy update: 131.83530712127686
2024-04-10 07:50:55 - Episode 28 duration: 132.6131. Buffer capacity: 17.53% (8764/50000)

| train/actions/0 | 86 |
| train/actions/1 | 85 |
| train/actions/2 | 100 |
| train/actions/3 | 86 |
| train/actions/4 | 70 |
| train/actions/5 | 67 |
| train/actions/6 | 83 |
| train/actions/7 | 75 |
| train/actions/8 | 83 |
| train/actions/9 | 89 |
| train/actions/10 | 94 |
| train/actions/11 | 82 |
| epoch | 9 |
| learning_rate | 0.000989 |
| train/return/avg | 21 |
| train/return/std | 10 |
| train/return/max | 33.5 |
| train/return/min | 8.93 |
| train/ep_length | 313 |
| total_env_steps | 9e+03 |
| current_task_steps | 9e+03 |
| buffer_capacity | 16.9 |
| train/loss_kl | 0 |
| train/loss_pi | 10.2 |
| train/loss_q1 | 0.317 |
| train/loss_q2 | 0.314 |
| train/episodes | 27 |
| train/alpha/0 | 1.58 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 2 |
| train/ammo | 68 |
| train/movement | 35.3 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 1.1e+03 |
Scalar logging time: 0.02165365219116211
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:50:56 - Time elapsed for logging: 0.0289609432220459
2024-04-10 07:53:06 - Time elapsed for a policy update: 130.4822416305542
2024-04-10 07:53:07 - Episode 29 duration: 130.6657. Buffer capacity: 18.15% (9077/50000)
2024-04-10 07:53:07 - Episode 30 duration: 0.7674. Buffer capacity: 18.78% (9390/50000)
2024-04-10 07:55:20 - Time elapsed for a policy update: 132.6010184288025
2024-04-10 07:55:21 - Episode 31 duration: 133.3647. Buffer capacity: 19.41% (9703/50000)

| train/actions/0 | 77 |
| train/actions/1 | 81 |
| train/actions/2 | 90 |
| train/actions/3 | 69 |
| train/actions/4 | 81 |
| train/actions/5 | 74 |
| train/actions/6 | 95 |
| train/actions/7 | 93 |
| train/actions/8 | 74 |
| train/actions/9 | 82 |
| train/actions/10 | 81 |
| train/actions/11 | 103 |
| epoch | 10 |
| learning_rate | 0.000987 |
| train/return/avg | 11.8 |
| train/return/std | 6.47 |
| train/return/max | 20.6 |
| train/return/min | 5.28 |
| train/ep_length | 313 |
| total_env_steps | 1e+04 |
| current_task_steps | 1e+04 |
| buffer_capacity | 18.8 |
| train/loss_kl | 0 |
| train/loss_pi | 11.8 |
| train/loss_q1 | 0.467 |
| train/loss_q2 | 0.47 |
| train/episodes | 30 |
| train/alpha/0 | 1.37 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1 |
| train/ammo | 68.3 |
| train/movement | 21.7 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 1.36e+03 |
Scalar logging time: 0.023178577423095703
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:55:21 - Time elapsed for logging: 0.03100419044494629
2024-04-10 07:57:33 - Time elapsed for a policy update: 131.05696892738342
2024-04-10 07:57:33 - Episode 32 duration: 131.4975. Buffer capacity: 20.03% (10016/50000)
2024-04-10 07:57:36 - Episode 33 duration: 2.5918. Buffer capacity: 20.66% (10329/50000)
2024-04-10 07:59:47 - Time elapsed for a policy update: 130.77035212516785
2024-04-10 07:59:48 - Episode 34 duration: 132.6114. Buffer capacity: 21.28% (10642/50000)
2024-04-10 07:59:50 - Episode 35 duration: 1.8083. Buffer capacity: 21.91% (10955/50000)

| train/actions/0 | 94 |
| train/actions/1 | 102 |
| train/actions/2 | 97 |
| train/actions/3 | 70 |
| train/actions/4 | 80 |
| train/actions/5 | 71 |
| train/actions/6 | 88 |
| train/actions/7 | 80 |
| train/actions/8 | 82 |
| train/actions/9 | 88 |
| train/actions/10 | 73 |
| train/actions/11 | 75 |
| epoch | 11 |
| learning_rate | 0.000984 |
| train/return/avg | 20.7 |
| train/return/std | 14.4 |
| train/return/max | 45 |
| train/return/min | 7.37 |
| train/ep_length | 313 |
| total_env_steps | 1.1e+04 |
| current_task_steps | 1.1e+04 |
| buffer_capacity | 21 |
| train/loss_kl | 0 |
| train/loss_pi | 13.3 |
| train/loss_q1 | 0.412 |
| train/loss_q2 | 0.434 |
| train/episodes | 33.5 |
| train/alpha/0 | 1.19 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 2.5 |
| train/ammo | 69.5 |
| train/movement | 26.4 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 1.63e+03 |
Scalar logging time: 0.024907588958740234
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 07:59:51 - Time elapsed for logging: 0.0372774600982666
2024-04-10 08:02:04 - Time elapsed for a policy update: 132.96354508399963
2024-04-10 08:02:05 - Episode 36 duration: 134.5055. Buffer capacity: 22.54% (11268/50000)
2024-04-10 08:04:21 - Time elapsed for a policy update: 134.33584332466125
2024-04-10 08:04:21 - Episode 37 duration: 136.1606. Buffer capacity: 23.16% (11581/50000)
2024-04-10 08:04:23 - Episode 38 duration: 1.8366. Buffer capacity: 23.79% (11894/50000)

| train/actions/0 | 80 |
| train/actions/1 | 90 |
| train/actions/2 | 77 |
| train/actions/3 | 81 |
| train/actions/4 | 78 |
| train/actions/5 | 76 |
| train/actions/6 | 73 |
| train/actions/7 | 88 |
| train/actions/8 | 88 |
| train/actions/9 | 73 |
| train/actions/10 | 108 |
| train/actions/11 | 88 |
| epoch | 12 |
| learning_rate | 0.000981 |
| train/return/avg | 18.5 |
| train/return/std | 10 |
| train/return/max | 31 |
| train/return/min | 6.42 |
| train/ep_length | 313 |
| total_env_steps | 1.2e+04 |
| current_task_steps | 1.2e+04 |
| buffer_capacity | 23.2 |
| train/loss_kl | 0 |
| train/loss_pi | 14.6 |
| train/loss_q1 | 0.557 |
| train/loss_q2 | 0.606 |
| train/episodes | 37 |
| train/alpha/0 | 1.04 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 2 |
| train/ammo | 68.7 |
| train/movement | 27.3 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 1.91e+03 |
Scalar logging time: 0.021389484405517578
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 08:04:24 - Time elapsed for logging: 0.02859330177307129
2024-04-10 08:06:33 - Time elapsed for a policy update: 129.65999293327332
2024-04-10 08:06:35 - Episode 39 duration: 130.8193. Buffer capacity: 24.41% (12207/50000)
2024-04-10 08:08:47 - Time elapsed for a policy update: 130.4491686820984
2024-04-10 08:08:47 - Episode 40 duration: 132.8615. Buffer capacity: 25.04% (12520/50000)
2024-04-10 08:08:49 - Episode 41 duration: 1.7971. Buffer capacity: 25.67% (12833/50000)

| train/actions/0 | 62 |
| train/actions/1 | 94 |
| train/actions/2 | 80 |
| train/actions/3 | 103 |
| train/actions/4 | 79 |
| train/actions/5 | 89 |
| train/actions/6 | 81 |
| train/actions/7 | 86 |
| train/actions/8 | 90 |
| train/actions/9 | 71 |
| train/actions/10 | 82 |
| train/actions/11 | 83 |
| epoch | 13 |
| learning_rate | 0.000978 |
| train/return/avg | 14.3 |
| train/return/std | 1.2 |
| train/return/max | 16 |
| train/return/min | 13.1 |
| train/ep_length | 313 |
| total_env_steps | 1.3e+04 |
| current_task_steps | 1.3e+04 |
| buffer_capacity | 25 |
| train/loss_kl | 0 |
| train/loss_pi | 15.8 |
| train/loss_q1 | 0.486 |
| train/loss_q2 | 0.625 |
| train/episodes | 40 |
| train/alpha/0 | 0.917 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1.33 |
| train/ammo | 72 |
| train/movement | 24.6 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 2.17e+03 |
Scalar logging time: 0.020668745040893555
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 08:08:50 - Time elapsed for logging: 0.027668476104736328
2024-04-10 08:11:00 - Time elapsed for a policy update: 129.80614304542542
2024-04-10 08:11:01 - Episode 42 duration: 131.0184. Buffer capacity: 26.29% (13146/50000)
2024-04-10 08:11:03 - Episode 43 duration: 2.2282. Buffer capacity: 26.92% (13459/50000)
2024-04-10 08:13:13 - Time elapsed for a policy update: 129.6457862854004
2024-04-10 08:13:15 - Episode 44 duration: 131.4255. Buffer capacity: 27.54% (13772/50000)

| train/actions/0 | 73 |
| train/actions/1 | 70 |
| train/actions/2 | 92 |
| train/actions/3 | 91 |
| train/actions/4 | 72 |
| train/actions/5 | 83 |
| train/actions/6 | 73 |
| train/actions/7 | 74 |
| train/actions/8 | 89 |
| train/actions/9 | 100 |
| train/actions/10 | 88 |
| train/actions/11 | 95 |
| epoch | 14 |
| learning_rate | 0.000976 |
| train/return/avg | 16 |
| train/return/std | 7.33 |
| train/return/max | 25.7 |
| train/return/min | 7.94 |
| train/ep_length | 313 |
| total_env_steps | 1.4e+04 |
| current_task_steps | 1.4e+04 |
| buffer_capacity | 26.9 |
| train/loss_kl | 0 |
| train/loss_pi | 16.9 |
| train/loss_q1 | 0.337 |
| train/loss_q2 | 0.536 |
| train/episodes | 43 |
| train/alpha/0 | 0.808 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1.33 |
| train/ammo | 69.3 |
| train/movement | 30 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 2.44e+03 |
Scalar logging time: 0.021115779876708984
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 08:13:16 - Time elapsed for logging: 0.028461217880249023
2024-04-10 08:15:27 - Time elapsed for a policy update: 130.677344083786
2024-04-10 08:15:28 - Episode 45 duration: 131.3143. Buffer capacity: 28.17% (14085/50000)
2024-04-10 08:15:29 - Episode 46 duration: 1.8413. Buffer capacity: 28.80% (14398/50000)
2024-04-10 08:17:40 - Time elapsed for a policy update: 129.83444237709045
2024-04-10 08:17:41 - Episode 47 duration: 131.6774. Buffer capacity: 29.42% (14711/50000)

| train/actions/0 | 88 |
| train/actions/1 | 80 |
| train/actions/2 | 93 |
| train/actions/3 | 82 |
| train/actions/4 | 80 |
| train/actions/5 | 59 |
| train/actions/6 | 91 |
| train/actions/7 | 80 |
| train/actions/8 | 71 |
| train/actions/9 | 97 |
| train/actions/10 | 102 |
| train/actions/11 | 77 |
| epoch | 15 |
| learning_rate | 0.000973 |
| train/return/avg | 14 |
| train/return/std | 12.2 |
| train/return/max | 30.9 |
| train/return/min | 2.97 |
| train/ep_length | 313 |
| total_env_steps | 1.5e+04 |
| current_task_steps | 1.5e+04 |
| buffer_capacity | 28.8 |
| train/loss_kl | 0 |
| train/loss_pi | 17.8 |
| train/loss_q1 | 0.212 |
| train/loss_q2 | 0.333 |
| train/episodes | 46 |
| train/alpha/0 | 0.715 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1 |
| train/ammo | 66.3 |
| train/movement | 28.9 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 2.7e+03 |
Scalar logging time: 0.03137946128845215
Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds
2024-04-10 08:17:43 - Time elapsed for logging: 0.04452013969421387
2024-04-10 08:19:53 - Time elapsed for a policy update: 129.8362638950348
2024-04-10 08:19:53 - Episode 48 duration: 129.9821. Buffer capacity: 30.05% (15024/50000)
2024-04-10 08:19:55 - Episode 49 duration: 1.8883. Buffer capacity: 30.67% (15337/50000)

The text was updated successfully, but these errors were encountered:

TTomilin · 2024-08-07T15:21:58Z

My apologies that I've completely missed this issue. In case this is still relevant, from the logs I cannot determine much to be fundamentally wrong, apart from that tensorflow was not able to register a GPU (either due to a lack of one or because CUDA is not properly integrated). This slows down the process a lot. From the logs you can see that it takes ~2 minutes for a single policy update. This is normally a matter of seconds on a GPU. Moreover, it may take more than 15 epochs before the agent has learned any meaningful behavior. I suggest ensuring that CUDA is properly installed to support using a GPU and then trying again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce the result for run_and_gun #1

Cannot reproduce the result for run_and_gun #1

ruipengZ commented Apr 10, 2024

TTomilin commented Aug 7, 2024

Cannot reproduce the result for run_and_gun #1

Cannot reproduce the result for run_and_gun #1

Comments

ruipengZ commented Apr 10, 2024

TTomilin commented Aug 7, 2024