Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the result for run_and_gun #1

Open
ruipengZ opened this issue Apr 10, 2024 · 1 comment
Open

Cannot reproduce the result for run_and_gun #1

ruipengZ opened this issue Apr 10, 2024 · 1 comment

Comments

@ruipengZ
Copy link

Hi! I have trouble reproducing the result for the single environment run_and_gun. I run python run_single.py --scenario run_and_gun and get the following prints: (after 15 epochs, the training success is still 0)

2024-04-10 07:32:16.450011: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-10 07:32:16.630269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-04-10 07:32:16.630319: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-04-10 07:32:17.626895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-04-10 07:32:17.627104: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-04-10 07:32:17.627124: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-04-10 07:32:26.142719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.142959: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.143142: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.143322: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.765469: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2024-04-10 07:32:26.769994: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2024-04-10 07:32:26.770554: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Saving config:
{
"activation": "lrelu",
"agent_policy_exploration": false,
"alpha": "auto",
"augment": false,
"augmentation": null,
"batch_size": 128,
"buffer_type": "fifo",
"cl_method": null,
"cl_reg_coef": 0.0,
"clipnorm": null,
"envs": [
"default"
],
"episodic_batch_size": 0,
"episodic_mem_per_task": 0,
"episodic_memory_from_buffer": true,
"exploration_kind": null,
"frame_height": 84,
"frame_skip": 4,
"frame_stack": 4,
"frame_width": 84,
"gamma": 0.99,
"gpu": null,
"group_id": "default_group",
"hidden_sizes": [
256,
256
],
"hide_task_id": false,
"log_every": 1000,
"logger_output": [
"tsv",
"tensorboard"
],
"lr": 0.001,
"lr_decay": "linear",
"lr_decay_rate": 0.1,
"lr_decay_steps": 100000,
"model_path": null,
"multihead_archs": true,
"n_updates": 50,
"no_test": false,
"num_repeats": 1,
"packnet_retrain_steps": 0,
"penalty_ammo_used": -0.1,
"penalty_death": -1.0,
"penalty_health_dtc": -1.0,
"penalty_health_has": -5.0,
"penalty_health_hg": -0.01,
"penalty_lava": -0.1,
"penalty_passivity": -0.1,
"penalty_projectile": -0.01,
"random_order": false,
"record": false,
"record_every": 100,
"regularize_critic": false,
"render": false,
"render_sleep": 0.0,
"replay_size": 50000,
"reset_buffer_on_task_change": true,
"reset_critic_on_task_change": false,
"reset_optimizer_on_task_change": true,
"resolution": null,
"reward_delivery": 30.0,
"reward_frame_survived": 0.01,
"reward_health_has": 5.0,
"reward_health_hg": 15.0,
"reward_kill_chain": 5.0,
"reward_kill_dtc": 1.0,
"reward_kill_rag": 5.0,
"reward_on_platform": 0.1,
"reward_platform_reached": 1.0,
"reward_scaler_pitfall": 0.1,
"reward_scaler_traversal": 0.001,
"reward_switch_pressed": 15.0,
"reward_weapon_ad": 15.0,
"save_freq_epochs": 25,
"scenarios": [
"run_and_gun"
],
"seed": 0,
"sequence": null,
"sparse_rewards": false,
"start_from": 0,
"start_steps": 10000,
"steps_per_env": 200000,
"target_output_std": 0.089,
"test": true,
"test_envs": [],
"test_episodes": 3,
"test_only": false,
"update_after": 5000,
"update_every": 500,
"use_layer_norm": true,
"use_lstm": false,
"variable_queue_length": 5,
"vcl_first_task_kl": false,
"video_folder": "videos",
"with_wandb": false
}
Logging data to ./logs/default_group/2024_04_10__07_32_36_eK6l57
/usr/local/lib/python3.10/dist-packages/gym/core.py:317: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future.
deprecation(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.num_tasks to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.num_tasks for environment variables or env.get_wrapper_attr('num_tasks') that will search the reminding wrappers.
logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.get_active_env to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.get_active_env for environment variables or env.get_wrapper_attr('get_active_env') that will search the reminding wrappers.
logger.warn(
2024-04-10 07:32:37 - Observations shape: (4, 84, 84, 3)
2024-04-10 07:32:37 - Actions shape: 12
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.task_id to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.task_id for environment variables or env.get_wrapper_attr('task_id') that will search the reminding wrappers.
logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.cur_seq_idx to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.cur_seq_idx for environment variables or env.get_wrapper_attr('cur_seq_idx') that will search the reminding wrappers.
logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.name to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.name for environment variables or env.get_wrapper_attr('name') that will search the reminding wrappers.
logger.warn(
2024-04-10 07:32:39 - Episode 1 duration: 0.9786. Buffer capacity: 0.63% (313/50000)
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.get_statistics to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.get_statistics for environment variables or env.get_wrapper_attr('get_statistics') that will search the reminding wrappers.
logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.clear_episode_statistics to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.clear_episode_statistics for environment variables or env.get_wrapper_attr('clear_episode_statistics') that will search the reminding wrappers.
logger.warn(
2024-04-10 07:32:40 - Episode 2 duration: 0.9469. Buffer capacity: 1.25% (626/50000)
2024-04-10 07:32:41 - Episode 3 duration: 0.9312. Buffer capacity: 1.88% (939/50000)


| train/actions/0 | 80 |
| train/actions/1 | 71 |
| train/actions/2 | 77 |
| train/actions/3 | 87 |
| train/actions/4 | 79 |
| train/actions/5 | 69 |
| train/actions/6 | 97 |
| train/actions/7 | 85 |
| train/actions/8 | 92 |
| train/actions/9 | 89 |
| train/actions/10 | 81 |
| train/actions/11 | 93 |
| epoch | 1 |
| learning_rate | 0.001 |
| train/return/avg | 22.3 |
| train/return/std | 4.89 |
| train/return/max | 29 |
| train/return/min | 17.5 |
| train/ep_length | 313 |
| total_env_steps | 1e+03 |
| current_task_steps | 1e+03 |
| buffer_capacity | 1.25 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 2 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 2.33 |
| train/ammo | 69.7 |
| train/movement | 34 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 3.12 |
Scalar logging time: 0.0392305850982666
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:32:42 - Time elapsed for logging: 0.04345273971557617
2024-04-10 07:32:42 - Episode 4 duration: 0.6291. Buffer capacity: 2.50% (1252/50000)
2024-04-10 07:32:43 - Episode 5 duration: 0.7800. Buffer capacity: 3.13% (1565/50000)
2024-04-10 07:32:44 - Episode 6 duration: 0.7743. Buffer capacity: 3.76% (1878/50000)


| train/actions/0 | 71 |
| train/actions/1 | 73 |
| train/actions/2 | 89 |
| train/actions/3 | 84 |
| train/actions/4 | 83 |
| train/actions/5 | 94 |
| train/actions/6 | 84 |
| train/actions/7 | 84 |
| train/actions/8 | 78 |
| train/actions/9 | 85 |
| train/actions/10 | 93 |
| train/actions/11 | 82 |
| epoch | 2 |
| learning_rate | 0.001 |
| train/return/avg | 17.9 |
| train/return/std | 6.1 |
| train/return/max | 25 |
| train/return/min | 10.1 |
| train/ep_length | 313 |
| total_env_steps | 2e+03 |
| current_task_steps | 2e+03 |
| buffer_capacity | 3.13 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 5 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 2 |
| train/ammo | 70.7 |
| train/movement | 25.4 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 5.67 |
Scalar logging time: 0.022348403930664062
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:32:44 - Time elapsed for logging: 0.023654937744140625
2024-04-10 07:32:45 - Episode 7 duration: 0.4842. Buffer capacity: 4.38% (2191/50000)
2024-04-10 07:32:45 - Episode 8 duration: 0.7810. Buffer capacity: 5.01% (2504/50000)
2024-04-10 07:32:46 - Episode 9 duration: 0.8011. Buffer capacity: 5.63% (2817/50000)


| train/actions/0 | 94 |
| train/actions/1 | 90 |
| train/actions/2 | 69 |
| train/actions/3 | 77 |
| train/actions/4 | 76 |
| train/actions/5 | 72 |
| train/actions/6 | 83 |
| train/actions/7 | 89 |
| train/actions/8 | 72 |
| train/actions/9 | 94 |
| train/actions/10 | 85 |
| train/actions/11 | 99 |
| epoch | 3 |
| learning_rate | 0.001 |
| train/return/avg | 14.4 |
| train/return/std | 6.22 |
| train/return/max | 19.5 |
| train/return/min | 5.64 |
| train/ep_length | 313 |
| total_env_steps | 3e+03 |
| current_task_steps | 3e+03 |
| buffer_capacity | 5.01 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 8 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 1.33 |
| train/ammo | 68.7 |
| train/movement | 24.8 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 8.24 |
Scalar logging time: 0.020737409591674805
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:32:47 - Time elapsed for logging: 0.021990299224853516
2024-04-10 07:32:47 - Episode 10 duration: 0.3172. Buffer capacity: 6.26% (3130/50000)
2024-04-10 07:32:48 - Episode 11 duration: 0.7862. Buffer capacity: 6.89% (3443/50000)
2024-04-10 07:32:49 - Episode 12 duration: 0.8000. Buffer capacity: 7.51% (3756/50000)


| train/actions/0 | 97 |
| train/actions/1 | 80 |
| train/actions/2 | 95 |
| train/actions/3 | 83 |
| train/actions/4 | 83 |
| train/actions/5 | 98 |
| train/actions/6 | 82 |
| train/actions/7 | 73 |
| train/actions/8 | 72 |
| train/actions/9 | 70 |
| train/actions/10 | 81 |
| train/actions/11 | 86 |
| epoch | 4 |
| learning_rate | 0.001 |
| train/return/avg | 15.1 |
| train/return/std | 8.24 |
| train/return/max | 23.3 |
| train/return/min | 3.8 |
| train/ep_length | 313 |
| total_env_steps | 4e+03 |
| current_task_steps | 4e+03 |
| buffer_capacity | 6.89 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 11 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 1.67 |
| train/ammo | 70 |
| train/movement | 21.6 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 10.8 |
Scalar logging time: 0.020810604095458984
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:32:49 - Time elapsed for logging: 0.022226572036743164
2024-04-10 07:32:49 - Episode 13 duration: 0.1740. Buffer capacity: 8.14% (4069/50000)
2024-04-10 07:32:50 - Episode 14 duration: 0.7823. Buffer capacity: 8.76% (4382/50000)
2024-04-10 07:32:51 - Episode 15 duration: 0.7740. Buffer capacity: 9.39% (4695/50000)


| train/actions/0 | 97 |
| train/actions/1 | 89 |
| train/actions/2 | 72 |
| train/actions/3 | 76 |
| train/actions/4 | 95 |
| train/actions/5 | 89 |
| train/actions/6 | 81 |
| train/actions/7 | 79 |
| train/actions/8 | 77 |
| train/actions/9 | 67 |
| train/actions/10 | 102 |
| train/actions/11 | 76 |
| epoch | 5 |
| learning_rate | 0.001 |
| train/return/avg | 18.7 |
| train/return/std | 1.31 |
| train/return/max | 20.2 |
| train/return/min | 17 |
| train/ep_length | 313 |
| total_env_steps | 5e+03 |
| current_task_steps | 5e+03 |
| buffer_capacity | 8.76 |
| train/loss_kl | nan |
| train/loss_pi | nan |
| train/loss_q1 | nan |
| train/loss_q2 | nan |
| train/episodes | 14 |
| train/alpha/0 | nan |
| train/loss_reg | nan |
| train/health | 100 |
| train/kills | 2 |
| train/ammo | 67 |
| train/movement | 27.8 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 13.4 |
Scalar logging time: 0.027970314025878906
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:32:52 - Time elapsed for logging: 0.02982330322265625
2024-04-10 07:35:13 - Time elapsed for a policy update: 140.75769782066345
2024-04-10 07:35:13 - Episode 16 duration: 140.7804. Buffer capacity: 10.02% (5008/50000)
2024-04-10 07:35:14 - Episode 17 duration: 0.9168. Buffer capacity: 10.64% (5321/50000)
2024-04-10 07:37:31 - Time elapsed for a policy update: 137.16376042366028
2024-04-10 07:37:32 - Episode 18 duration: 138.0552. Buffer capacity: 11.27% (5634/50000)
2024-04-10 07:37:32 - Episode 19 duration: 0.7878. Buffer capacity: 11.89% (5947/50000)


| train/actions/0 | 90 |
| train/actions/1 | 85 |
| train/actions/2 | 82 |
| train/actions/3 | 79 |
| train/actions/4 | 85 |
| train/actions/5 | 83 |
| train/actions/6 | 94 |
| train/actions/7 | 77 |
| train/actions/8 | 84 |
| train/actions/9 | 89 |
| train/actions/10 | 76 |
| train/actions/11 | 76 |
| epoch | 6 |
| learning_rate | 0.000997 |
| train/return/avg | 13.4 |
| train/return/std | 11.5 |
| train/return/max | 33.3 |
| train/return/min | 5.59 |
| train/ep_length | 313 |
| total_env_steps | 6e+03 |
| current_task_steps | 6e+03 |
| buffer_capacity | 11 |
| train/loss_kl | 0 |
| train/loss_pi | 3.27 |
| train/loss_q1 | 0.833 |
| train/loss_q2 | 0.939 |
| train/episodes | 17.5 |
| train/alpha/0 | 2.53 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1.5 |
| train/ammo | 67 |
| train/movement | 19.1 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 294 |
Scalar logging time: 0.023244857788085938
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:37:33 - Time elapsed for logging: 0.03106379508972168
2024-04-10 07:39:47 - Time elapsed for a policy update: 134.37921953201294
2024-04-10 07:39:48 - Episode 20 duration: 135.0360. Buffer capacity: 12.52% (6260/50000)
2024-04-10 07:42:00 - Time elapsed for a policy update: 131.47774195671082
2024-04-10 07:42:00 - Episode 21 duration: 132.2652. Buffer capacity: 13.15% (6573/50000)
2024-04-10 07:42:01 - Episode 22 duration: 0.7792. Buffer capacity: 13.77% (6886/50000)


| train/actions/0 | 91 |
| train/actions/1 | 87 |
| train/actions/2 | 85 |
| train/actions/3 | 79 |
| train/actions/4 | 76 |
| train/actions/5 | 103 |
| train/actions/6 | 82 |
| train/actions/7 | 84 |
| train/actions/8 | 72 |
| train/actions/9 | 86 |
| train/actions/10 | 68 |
| train/actions/11 | 87 |
| epoch | 7 |
| learning_rate | 0.000995 |
| train/return/avg | 12.1 |
| train/return/std | 1.31 |
| train/return/max | 14 |
| train/return/min | 10.9 |
| train/ep_length | 313 |
| total_env_steps | 7e+03 |
| current_task_steps | 7e+03 |
| buffer_capacity | 13.1 |
| train/loss_kl | 0 |
| train/loss_pi | 6.3 |
| train/loss_q1 | 0.264 |
| train/loss_q2 | 0.262 |
| train/episodes | 21 |
| train/alpha/0 | 2.15 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 0.667 |
| train/ammo | 71 |
| train/movement | 28.3 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 563 |
Scalar logging time: 0.022394180297851562
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:42:01 - Time elapsed for logging: 0.032239437103271484
2024-04-10 07:44:14 - Time elapsed for a policy update: 132.7520670890808
2024-04-10 07:44:14 - Episode 23 duration: 133.2505. Buffer capacity: 14.40% (7199/50000)
2024-04-10 07:46:28 - Time elapsed for a policy update: 133.12892532348633
2024-04-10 07:46:28 - Episode 24 duration: 133.9228. Buffer capacity: 15.02% (7512/50000)
2024-04-10 07:46:29 - Episode 25 duration: 0.9700. Buffer capacity: 15.65% (7825/50000)


| train/actions/0 | 87 |
| train/actions/1 | 78 |
| train/actions/2 | 83 |
| train/actions/3 | 79 |
| train/actions/4 | 85 |
| train/actions/5 | 78 |
| train/actions/6 | 78 |
| train/actions/7 | 89 |
| train/actions/8 | 105 |
| train/actions/9 | 92 |
| train/actions/10 | 76 |
| train/actions/11 | 70 |
| epoch | 8 |
| learning_rate | 0.000992 |
| train/return/avg | 18.8 |
| train/return/std | 6.76 |
| train/return/max | 28.2 |
| train/return/min | 12.8 |
| train/ep_length | 313 |
| total_env_steps | 8e+03 |
| current_task_steps | 8e+03 |
| buffer_capacity | 15 |
| train/loss_kl | 0 |
| train/loss_pi | 8.42 |
| train/loss_q1 | 0.357 |
| train/loss_q2 | 0.344 |
| train/episodes | 24 |
| train/alpha/0 | 1.84 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 2.33 |
| train/ammo | 68.7 |
| train/movement | 22.8 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 831 |
Scalar logging time: 0.02210068702697754
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:46:30 - Time elapsed for logging: 0.0294036865234375
2024-04-10 07:48:41 - Time elapsed for a policy update: 131.84115433692932
2024-04-10 07:48:42 - Episode 26 duration: 132.1760. Buffer capacity: 16.28% (8138/50000)
2024-04-10 07:48:43 - Episode 27 duration: 0.7789. Buffer capacity: 16.90% (8451/50000)
2024-04-10 07:50:55 - Time elapsed for a policy update: 131.83530712127686
2024-04-10 07:50:55 - Episode 28 duration: 132.6131. Buffer capacity: 17.53% (8764/50000)


| train/actions/0 | 86 |
| train/actions/1 | 85 |
| train/actions/2 | 100 |
| train/actions/3 | 86 |
| train/actions/4 | 70 |
| train/actions/5 | 67 |
| train/actions/6 | 83 |
| train/actions/7 | 75 |
| train/actions/8 | 83 |
| train/actions/9 | 89 |
| train/actions/10 | 94 |
| train/actions/11 | 82 |
| epoch | 9 |
| learning_rate | 0.000989 |
| train/return/avg | 21 |
| train/return/std | 10 |
| train/return/max | 33.5 |
| train/return/min | 8.93 |
| train/ep_length | 313 |
| total_env_steps | 9e+03 |
| current_task_steps | 9e+03 |
| buffer_capacity | 16.9 |
| train/loss_kl | 0 |
| train/loss_pi | 10.2 |
| train/loss_q1 | 0.317 |
| train/loss_q2 | 0.314 |
| train/episodes | 27 |
| train/alpha/0 | 1.58 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 2 |
| train/ammo | 68 |
| train/movement | 35.3 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 1.1e+03 |
Scalar logging time: 0.02165365219116211
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:50:56 - Time elapsed for logging: 0.0289609432220459
2024-04-10 07:53:06 - Time elapsed for a policy update: 130.4822416305542
2024-04-10 07:53:07 - Episode 29 duration: 130.6657. Buffer capacity: 18.15% (9077/50000)
2024-04-10 07:53:07 - Episode 30 duration: 0.7674. Buffer capacity: 18.78% (9390/50000)
2024-04-10 07:55:20 - Time elapsed for a policy update: 132.6010184288025
2024-04-10 07:55:21 - Episode 31 duration: 133.3647. Buffer capacity: 19.41% (9703/50000)


| train/actions/0 | 77 |
| train/actions/1 | 81 |
| train/actions/2 | 90 |
| train/actions/3 | 69 |
| train/actions/4 | 81 |
| train/actions/5 | 74 |
| train/actions/6 | 95 |
| train/actions/7 | 93 |
| train/actions/8 | 74 |
| train/actions/9 | 82 |
| train/actions/10 | 81 |
| train/actions/11 | 103 |
| epoch | 10 |
| learning_rate | 0.000987 |
| train/return/avg | 11.8 |
| train/return/std | 6.47 |
| train/return/max | 20.6 |
| train/return/min | 5.28 |
| train/ep_length | 313 |
| total_env_steps | 1e+04 |
| current_task_steps | 1e+04 |
| buffer_capacity | 18.8 |
| train/loss_kl | 0 |
| train/loss_pi | 11.8 |
| train/loss_q1 | 0.467 |
| train/loss_q2 | 0.47 |
| train/episodes | 30 |
| train/alpha/0 | 1.37 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1 |
| train/ammo | 68.3 |
| train/movement | 21.7 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 1.36e+03 |
Scalar logging time: 0.023178577423095703
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:55:21 - Time elapsed for logging: 0.03100419044494629
2024-04-10 07:57:33 - Time elapsed for a policy update: 131.05696892738342
2024-04-10 07:57:33 - Episode 32 duration: 131.4975. Buffer capacity: 20.03% (10016/50000)
2024-04-10 07:57:36 - Episode 33 duration: 2.5918. Buffer capacity: 20.66% (10329/50000)
2024-04-10 07:59:47 - Time elapsed for a policy update: 130.77035212516785
2024-04-10 07:59:48 - Episode 34 duration: 132.6114. Buffer capacity: 21.28% (10642/50000)
2024-04-10 07:59:50 - Episode 35 duration: 1.8083. Buffer capacity: 21.91% (10955/50000)


| train/actions/0 | 94 |
| train/actions/1 | 102 |
| train/actions/2 | 97 |
| train/actions/3 | 70 |
| train/actions/4 | 80 |
| train/actions/5 | 71 |
| train/actions/6 | 88 |
| train/actions/7 | 80 |
| train/actions/8 | 82 |
| train/actions/9 | 88 |
| train/actions/10 | 73 |
| train/actions/11 | 75 |
| epoch | 11 |
| learning_rate | 0.000984 |
| train/return/avg | 20.7 |
| train/return/std | 14.4 |
| train/return/max | 45 |
| train/return/min | 7.37 |
| train/ep_length | 313 |
| total_env_steps | 1.1e+04 |
| current_task_steps | 1.1e+04 |
| buffer_capacity | 21 |
| train/loss_kl | 0 |
| train/loss_pi | 13.3 |
| train/loss_q1 | 0.412 |
| train/loss_q2 | 0.434 |
| train/episodes | 33.5 |
| train/alpha/0 | 1.19 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 2.5 |
| train/ammo | 69.5 |
| train/movement | 26.4 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 1.63e+03 |
Scalar logging time: 0.024907588958740234
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 07:59:51 - Time elapsed for logging: 0.0372774600982666
2024-04-10 08:02:04 - Time elapsed for a policy update: 132.96354508399963
2024-04-10 08:02:05 - Episode 36 duration: 134.5055. Buffer capacity: 22.54% (11268/50000)
2024-04-10 08:04:21 - Time elapsed for a policy update: 134.33584332466125
2024-04-10 08:04:21 - Episode 37 duration: 136.1606. Buffer capacity: 23.16% (11581/50000)
2024-04-10 08:04:23 - Episode 38 duration: 1.8366. Buffer capacity: 23.79% (11894/50000)


| train/actions/0 | 80 |
| train/actions/1 | 90 |
| train/actions/2 | 77 |
| train/actions/3 | 81 |
| train/actions/4 | 78 |
| train/actions/5 | 76 |
| train/actions/6 | 73 |
| train/actions/7 | 88 |
| train/actions/8 | 88 |
| train/actions/9 | 73 |
| train/actions/10 | 108 |
| train/actions/11 | 88 |
| epoch | 12 |
| learning_rate | 0.000981 |
| train/return/avg | 18.5 |
| train/return/std | 10 |
| train/return/max | 31 |
| train/return/min | 6.42 |
| train/ep_length | 313 |
| total_env_steps | 1.2e+04 |
| current_task_steps | 1.2e+04 |
| buffer_capacity | 23.2 |
| train/loss_kl | 0 |
| train/loss_pi | 14.6 |
| train/loss_q1 | 0.557 |
| train/loss_q2 | 0.606 |
| train/episodes | 37 |
| train/alpha/0 | 1.04 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 2 |
| train/ammo | 68.7 |
| train/movement | 27.3 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 1.91e+03 |
Scalar logging time: 0.021389484405517578
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 08:04:24 - Time elapsed for logging: 0.02859330177307129
2024-04-10 08:06:33 - Time elapsed for a policy update: 129.65999293327332
2024-04-10 08:06:35 - Episode 39 duration: 130.8193. Buffer capacity: 24.41% (12207/50000)
2024-04-10 08:08:47 - Time elapsed for a policy update: 130.4491686820984
2024-04-10 08:08:47 - Episode 40 duration: 132.8615. Buffer capacity: 25.04% (12520/50000)
2024-04-10 08:08:49 - Episode 41 duration: 1.7971. Buffer capacity: 25.67% (12833/50000)


| train/actions/0 | 62 |
| train/actions/1 | 94 |
| train/actions/2 | 80 |
| train/actions/3 | 103 |
| train/actions/4 | 79 |
| train/actions/5 | 89 |
| train/actions/6 | 81 |
| train/actions/7 | 86 |
| train/actions/8 | 90 |
| train/actions/9 | 71 |
| train/actions/10 | 82 |
| train/actions/11 | 83 |
| epoch | 13 |
| learning_rate | 0.000978 |
| train/return/avg | 14.3 |
| train/return/std | 1.2 |
| train/return/max | 16 |
| train/return/min | 13.1 |
| train/ep_length | 313 |
| total_env_steps | 1.3e+04 |
| current_task_steps | 1.3e+04 |
| buffer_capacity | 25 |
| train/loss_kl | 0 |
| train/loss_pi | 15.8 |
| train/loss_q1 | 0.486 |
| train/loss_q2 | 0.625 |
| train/episodes | 40 |
| train/alpha/0 | 0.917 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1.33 |
| train/ammo | 72 |
| train/movement | 24.6 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 2.17e+03 |
Scalar logging time: 0.020668745040893555
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 08:08:50 - Time elapsed for logging: 0.027668476104736328
2024-04-10 08:11:00 - Time elapsed for a policy update: 129.80614304542542
2024-04-10 08:11:01 - Episode 42 duration: 131.0184. Buffer capacity: 26.29% (13146/50000)
2024-04-10 08:11:03 - Episode 43 duration: 2.2282. Buffer capacity: 26.92% (13459/50000)
2024-04-10 08:13:13 - Time elapsed for a policy update: 129.6457862854004
2024-04-10 08:13:15 - Episode 44 duration: 131.4255. Buffer capacity: 27.54% (13772/50000)


| train/actions/0 | 73 |
| train/actions/1 | 70 |
| train/actions/2 | 92 |
| train/actions/3 | 91 |
| train/actions/4 | 72 |
| train/actions/5 | 83 |
| train/actions/6 | 73 |
| train/actions/7 | 74 |
| train/actions/8 | 89 |
| train/actions/9 | 100 |
| train/actions/10 | 88 |
| train/actions/11 | 95 |
| epoch | 14 |
| learning_rate | 0.000976 |
| train/return/avg | 16 |
| train/return/std | 7.33 |
| train/return/max | 25.7 |
| train/return/min | 7.94 |
| train/ep_length | 313 |
| total_env_steps | 1.4e+04 |
| current_task_steps | 1.4e+04 |
| buffer_capacity | 26.9 |
| train/loss_kl | 0 |
| train/loss_pi | 16.9 |
| train/loss_q1 | 0.337 |
| train/loss_q2 | 0.536 |
| train/episodes | 43 |
| train/alpha/0 | 0.808 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1.33 |
| train/ammo | 69.3 |
| train/movement | 30 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 2.44e+03 |
Scalar logging time: 0.021115779876708984
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 08:13:16 - Time elapsed for logging: 0.028461217880249023
2024-04-10 08:15:27 - Time elapsed for a policy update: 130.677344083786
2024-04-10 08:15:28 - Episode 45 duration: 131.3143. Buffer capacity: 28.17% (14085/50000)
2024-04-10 08:15:29 - Episode 46 duration: 1.8413. Buffer capacity: 28.80% (14398/50000)
2024-04-10 08:17:40 - Time elapsed for a policy update: 129.83444237709045
2024-04-10 08:17:41 - Episode 47 duration: 131.6774. Buffer capacity: 29.42% (14711/50000)


| train/actions/0 | 88 |
| train/actions/1 | 80 |
| train/actions/2 | 93 |
| train/actions/3 | 82 |
| train/actions/4 | 80 |
| train/actions/5 | 59 |
| train/actions/6 | 91 |
| train/actions/7 | 80 |
| train/actions/8 | 71 |
| train/actions/9 | 97 |
| train/actions/10 | 102 |
| train/actions/11 | 77 |
| epoch | 15 |
| learning_rate | 0.000973 |
| train/return/avg | 14 |
| train/return/std | 12.2 |
| train/return/max | 30.9 |
| train/return/min | 2.97 |
| train/ep_length | 313 |
| total_env_steps | 1.5e+04 |
| current_task_steps | 1.5e+04 |
| buffer_capacity | 28.8 |
| train/loss_kl | 0 |
| train/loss_pi | 17.8 |
| train/loss_q1 | 0.212 |
| train/loss_q2 | 0.333 |
| train/episodes | 46 |
| train/alpha/0 | 0.715 |
| train/loss_reg | 0 |
| train/health | 100 |
| train/kills | 1 |
| train/ammo | 66.3 |
| train/movement | 28.9 |
| train/hits_taken | 0 |
| train/success | 0 |
| walltime | 2.7e+03 |
Scalar logging time: 0.03137946128845215
Flushed tensorboard in 0.00 seconds


Wrote to output file in 0.00 seconds
2024-04-10 08:17:43 - Time elapsed for logging: 0.04452013969421387
2024-04-10 08:19:53 - Time elapsed for a policy update: 129.8362638950348
2024-04-10 08:19:53 - Episode 48 duration: 129.9821. Buffer capacity: 30.05% (15024/50000)
2024-04-10 08:19:55 - Episode 49 duration: 1.8883. Buffer capacity: 30.67% (15337/50000)

@TTomilin
Copy link
Owner

TTomilin commented Aug 7, 2024

My apologies that I've completely missed this issue. In case this is still relevant, from the logs I cannot determine much to be fundamentally wrong, apart from that tensorflow was not able to register a GPU (either due to a lack of one or because CUDA is not properly integrated). This slows down the process a lot. From the logs you can see that it takes ~2 minutes for a single policy update. This is normally a matter of seconds on a GPU. Moreover, it may take more than 15 epochs before the agent has learned any meaningful behavior. I suggest ensuring that CUDA is properly installed to support using a GPU and then trying again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants