-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Autoregressive action distributions #5304
Conversation
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
This comment has been minimized.
This comment has been minimized.
I think that's due to a -inf value showing up for the log probabilities,
likely due to a tf.log(0.) for some action_prob=0. Not sure if it's a bug
introduced in the refactoring or numerical instability.
Eric
…On Mon, Jul 29, 2019 at 7:59 PM yangshanchao ***@***.***> wrote:
Hi, @ericl <https://github.com/ericl>, thanks for your work. I compiled
ray from your project source, and I found there is something wrong with
running the examples, namely, cartpole_lstm.py, custom_keras_rnn_model.py,
The error logs are shown as below
/home/noone/anaconda3/envs/lab/bin/python /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/examples/cartpole_lstm.py
2019-07-30 10:57:13,442 INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-07-30_10-57-13_442445_18362/logs.
2019-07-30 10:57:13,546 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:18959 to respond...
2019-07-30 10:57:13,694 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:14994 to respond...
2019-07-30 10:57:13,699 INFO services.py:809 -- Starting Redis shard with 10.0 GB max memory.
2019-07-30 10:57:13,746 INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-07-30_10-57-13_442445_18362/logs.
2019-07-30 10:57:13,746 WARNING services.py:1301 -- Warning: Capping object memory store to 20.0GB. To increase this further, specify `object_store_memory` when calling ray.init() or ray start.
2019-07-30 10:57:13,747 INFO services.py:1475 -- Starting the Plasma object store with 20.0 GB memory using /dev/shm.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/1 GPUs
Memory usage on this node: 21.3/67.5 GB
2019-07-30 10:57:13,837 INFO trial_runner.py:176 -- Starting a new experiment.
WARNING: Logging before flag parsing goes to stderr.
W0730 10:57:15.464846 139644543887168 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2019-07-30 10:57:15,670 WARNING signature.py:108 -- The function with_updates has a **kwargs argument, which is currently not supported.
2019-07-30 10:57:15,670 WARNING logger.py:227 -- Could not instantiate <class 'ray.tune.logger.TFLogger'> - skipping.
2019-07-30 10:57:15,670 ERROR log_sync.py:34 -- Log sync requires cluster to be setup with `ray up`.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 3/12 CPUs, 0/1 GPUs
Memory usage on this node: 21.8/67.5 GB
Result logdir: /home/noone/ray_results/PPO
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
- PPO_cartpole_stateless_0: RUNNING
(pid=18408) WARNING: Logging before flag parsing goes to stderr.
(pid=18408) W0730 10:57:16.966650 140193080096576 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) non-resource variables are not supported in the long term
(pid=18408) 2019-07-30 10:57:17,170 INFO rollout_worker.py:319 -- Creating policy evaluation worker 0 on CPU (please ignore any CUDA init errors)
(pid=18408) 2019-07-30 10:57:17.170824: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
(pid=18408) 2019-07-30 10:57:17.174655: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
(pid=18408) 2019-07-30 10:57:17.177082: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
(pid=18408) 2019-07-30 10:57:17.177110: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: noone-will-not-die
(pid=18408) 2019-07-30 10:57:17.177117: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: noone-will-not-die
(pid=18408) 2019-07-30 10:57:17.177160: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.14.0
(pid=18408) 2019-07-30 10:57:17.177179: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.14.0
(pid=18408) 2019-07-30 10:57:17.177184: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.14.0
(pid=18408) 2019-07-30 10:57:17.200560: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
(pid=18408) 2019-07-30 10:57:17.200991: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5572a2d2f200 executing computations on platform Host. Devices:
(pid=18408) 2019-07-30 10:57:17.201010: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
(pid=18408) W0730 10:57:17.206105 140193080096576 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/fcnet_v1.py:48: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) Use keras.layers.dense instead.
(pid=18408) W0730 10:57:17.439087 140193080096576 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/lstm_v1.py:47: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
(pid=18408) W0730 10:57:17.439558 140193080096576 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/lstm_v1.py:71: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) Please use `keras.layers.RNN(cell)`, which is equivalent to this API
(pid=18408) W0730 10:57:17.488314 140193080096576 deprecation.py:506] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:957: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) Call initializer instance with the dtype argument instead of passing it to the constructor
(pid=18408) W0730 10:57:17.965165 140193080096576 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:244: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) Use tf.where in 2.0, which has the same broadcast rule as np.where
(pid=18408) W0730 10:57:17.986760 140193080096576 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:81: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) Use `tf.random.categorical` instead.
(pid=18408) 2019-07-30 10:57:18.022115: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1483] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
(pid=18408) 2019-07-30 10:57:18,065 INFO dynamic_tf_policy.py:333 -- Initializing loss function with dummy input:
(pid=18408)
(pid=18408) { 'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(pid=18408) 'actions': <tf.Tensor 'default_policy/actions:0' shape=(?,) dtype=int64>,
(pid=18408) 'advantages': <tf.Tensor 'default_policy/advantages:0' shape=(?,) dtype=float32>,
(pid=18408) 'behaviour_logits': <tf.Tensor 'default_policy/behaviour_logits:0' shape=(?, 2) dtype=float32>,
(pid=18408) 'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=bool>,
(pid=18408) 'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 2) dtype=float32>,
(pid=18408) 'obs': <tf.Tensor 'default_policy/observation:0' shape=(?, 2) dtype=float32>,
(pid=18408) 'prev_actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(pid=18408) 'prev_rewards': <tf.Tensor 'default_policy/prev_reward:0' shape=(?,) dtype=float32>,
(pid=18408) 'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(pid=18408) 'seq_lens': <tf.Tensor 'default_policy/seq_lens_1:0' shape=(?,) dtype=int32>,
(pid=18408) 'state_in_0': <tf.Tensor 'default_policy/state_in_0:0' shape=(?, 256) dtype=float32>,
(pid=18408) 'state_in_1': <tf.Tensor 'default_policy/state_in_1:0' shape=(?, 256) dtype=float32>,
(pid=18408) 'state_out_0': <tf.Tensor 'default_policy/state_out_0:0' shape=(?, 256) dtype=float32>,
(pid=18408) 'state_out_1': <tf.Tensor 'default_policy/state_out_1:0' shape=(?, 256) dtype=float32>,
(pid=18408) 'value_targets': <tf.Tensor 'default_policy/value_targets:0' shape=(?,) dtype=float32>,
(pid=18408) 'vf_preds': <tf.Tensor 'default_policy/vf_preds:0' shape=(?,) dtype=float32>}
(pid=18408)
(pid=18408) W0730 10:57:18.073951 140193080096576 deprecation.py:506] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:68: calling reduce_max_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) keep_dims is deprecated, use keepdims instead
(pid=18408) W0730 10:57:18.077138 140193080096576 deprecation.py:506] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:73: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) keep_dims is deprecated, use keepdims instead
(pid=18408) 2019-07-30 10:57:18,902 INFO rollout_worker.py:742 -- Built policy map: {'default_policy': <ray.rllib.policy.tf_policy_template.PPOTFPolicy object at 0x7f803be310f0>}
(pid=18408) 2019-07-30 10:57:18,902 INFO rollout_worker.py:743 -- Built preprocessor map: {'default_policy': <ray.rllib.models.preprocessors.NoPreprocessor object at 0x7f803be2ee80>}
(pid=18408) 2019-07-30 10:57:18,903 INFO rollout_worker.py:356 -- Built filter map: {'default_policy': <ray.rllib.utils.filter.NoFilter object at 0x7f803be2ecc0>}
(pid=18408) 2019-07-30 10:57:18,924 INFO multi_gpu_optimizer.py:93 -- LocalMultiGPUOptimizer devices ['/cpu:0']
(pid=18407) WARNING: Logging before flag parsing goes to stderr.
(pid=18407) W0730 10:57:20.316230 140478414169920 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) non-resource variables are not supported in the long term
(pid=18414) WARNING: Logging before flag parsing goes to stderr.
(pid=18414) W0730 10:57:20.306662 139887666513728 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) non-resource variables are not supported in the long term
(pid=18407) 2019-07-30 10:57:20,521 INFO rollout_worker.py:319 -- Creating policy evaluation worker 1 on CPU (please ignore any CUDA init errors)
(pid=18407) 2019-07-30 10:57:20.531500: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
(pid=18407) 2019-07-30 10:57:20.535347: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
(pid=18407) 2019-07-30 10:57:20.538141: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
(pid=18407) 2019-07-30 10:57:20.538183: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: noone-will-not-die
(pid=18407) 2019-07-30 10:57:20.538190: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: noone-will-not-die
(pid=18407) 2019-07-30 10:57:20.538259: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.14.0
(pid=18407) 2019-07-30 10:57:20.538279: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.14.0
(pid=18407) 2019-07-30 10:57:20.538284: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.14.0
(pid=18407) 2019-07-30 10:57:20.539690: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
(pid=18407) 2019-07-30 10:57:20.540061: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564917f83240 executing computations on platform Host. Devices:
(pid=18407) 2019-07-30 10:57:20.540077: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
(pid=18407) W0730 10:57:20.544504 140478414169920 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/fcnet_v1.py:48: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) Use keras.layers.dense instead.
(pid=18414) 2019-07-30 10:57:20,519 INFO rollout_worker.py:319 -- Creating policy evaluation worker 2 on CPU (please ignore any CUDA init errors)
(pid=18414) 2019-07-30 10:57:20.529410: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
(pid=18414) 2019-07-30 10:57:20.533353: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
(pid=18414) 2019-07-30 10:57:20.536063: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
(pid=18414) 2019-07-30 10:57:20.536094: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: noone-will-not-die
(pid=18414) 2019-07-30 10:57:20.536100: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: noone-will-not-die
(pid=18414) 2019-07-30 10:57:20.536162: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.14.0
(pid=18414) 2019-07-30 10:57:20.536182: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.14.0
(pid=18414) 2019-07-30 10:57:20.536188: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.14.0
(pid=18414) 2019-07-30 10:57:20.556584: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
(pid=18414) 2019-07-30 10:57:20.557034: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559e99221020 executing computations on platform Host. Devices:
(pid=18414) 2019-07-30 10:57:20.557054: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
(pid=18414) W0730 10:57:20.561483 139887666513728 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/fcnet_v1.py:48: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) Use keras.layers.dense instead.
(pid=18407) W0730 10:57:20.769307 140478414169920 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/lstm_v1.py:47: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
(pid=18407) W0730 10:57:20.769812 140478414169920 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/lstm_v1.py:71: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) Please use `keras.layers.RNN(cell)`, which is equivalent to this API
(pid=18407) W0730 10:57:20.819773 140478414169920 deprecation.py:506] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:957: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) Call initializer instance with the dtype argument instead of passing it to the constructor
(pid=18414) W0730 10:57:20.788882 139887666513728 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/lstm_v1.py:47: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
(pid=18414) W0730 10:57:20.789384 139887666513728 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/lstm_v1.py:71: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) Please use `keras.layers.RNN(cell)`, which is equivalent to this API
(pid=18414) W0730 10:57:20.842777 139887666513728 deprecation.py:506] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:957: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) Call initializer instance with the dtype argument instead of passing it to the constructor
(pid=18407) W0730 10:57:21.309464 140478414169920 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:244: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) Use tf.where in 2.0, which has the same broadcast rule as np.where
(pid=18407) W0730 10:57:21.330303 140478414169920 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:81: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) Use `tf.random.categorical` instead.
(pid=18407) 2019-07-30 10:57:21.365374: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1483] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
(pid=18414) W0730 10:57:21.335618 139887666513728 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:244: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) Use tf.where in 2.0, which has the same broadcast rule as np.where
(pid=18414) W0730 10:57:21.356825 139887666513728 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:81: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) Use `tf.random.categorical` instead.
(pid=18407) 2019-07-30 10:57:21,412 INFO dynamic_tf_policy.py:333 -- Initializing loss function with dummy input:
(pid=18407)
(pid=18407) { 'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(pid=18407) 'actions': <tf.Tensor 'default_policy/actions:0' shape=(?,) dtype=int64>,
(pid=18407) 'advantages': <tf.Tensor 'default_policy/advantages:0' shape=(?,) dtype=float32>,
(pid=18407) 'behaviour_logits': <tf.Tensor 'default_policy/behaviour_logits:0' shape=(?, 2) dtype=float32>,
(pid=18407) 'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=bool>,
(pid=18407) 'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 2) dtype=float32>,
(pid=18407) 'obs': <tf.Tensor 'default_policy/observation:0' shape=(?, 2) dtype=float32>,
(pid=18407) 'prev_actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(pid=18407) 'prev_rewards': <tf.Tensor 'default_policy/prev_reward:0' shape=(?,) dtype=float32>,
(pid=18407) 'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(pid=18407) 'seq_lens': <tf.Tensor 'default_policy/seq_lens_1:0' shape=(?,) dtype=int32>,
(pid=18407) 'state_in_0': <tf.Tensor 'default_policy/state_in_0:0' shape=(?, 256) dtype=float32>,
(pid=18407) 'state_in_1': <tf.Tensor 'default_policy/state_in_1:0' shape=(?, 256) dtype=float32>,
(pid=18407) 'state_out_0': <tf.Tensor 'default_policy/state_out_0:0' shape=(?, 256) dtype=float32>,
(pid=18407) 'state_out_1': <tf.Tensor 'default_policy/state_out_1:0' shape=(?, 256) dtype=float32>,
(pid=18407) 'value_targets': <tf.Tensor 'default_policy/value_targets:0' shape=(?,) dtype=float32>,
(pid=18407) 'vf_preds': <tf.Tensor 'default_policy/vf_preds:0' shape=(?,) dtype=float32>}
(pid=18407)
(pid=18407) W0730 10:57:21.420790 140478414169920 deprecation.py:506] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:68: calling reduce_max_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) keep_dims is deprecated, use keepdims instead
(pid=18407) W0730 10:57:21.423832 140478414169920 deprecation.py:506] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:73: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) keep_dims is deprecated, use keepdims instead
(pid=18414) 2019-07-30 10:57:21.394307: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1483] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
(pid=18414) W0730 10:57:21.449012 139887666513728 deprecation.py:506] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:68: calling reduce_max_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) keep_dims is deprecated, use keepdims instead
(pid=18414) W0730 10:57:21.452185 139887666513728 deprecation.py:506] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:73: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) keep_dims is deprecated, use keepdims instead
(pid=18408) W0730 10:57:21.600737 140193080096576 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/policy/tf_policy.py:572: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
(pid=18408) Instructions for updating:
(pid=18408) Prefer Variable.assign which has equivalent behavior in 2.X.
(pid=18407) W0730 10:57:22.294743 140478414169920 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/policy/tf_policy.py:572: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
(pid=18407) Instructions for updating:
(pid=18407) Prefer Variable.assign which has equivalent behavior in 2.X.
(pid=18407) 2019-07-30 10:57:22,332 INFO rollout_worker.py:451 -- Generating sample batch of size 200
(pid=18407) 2019-07-30 10:57:22,332 INFO sampler.py:304 -- Raw obs from env: { 0: { 'agent0': np.ndarray((2,), dtype=float64, min=-0.027, max=-0.006, mean=-0.017)}}
(pid=18407) 2019-07-30 10:57:22,332 INFO sampler.py:305 -- Info return from env: {0: {'agent0': None}}
(pid=18407) 2019-07-30 10:57:22,333 INFO sampler.py:403 -- Preprocessed obs: np.ndarray((2,), dtype=float64, min=-0.027, max=-0.006, mean=-0.017)
(pid=18407) 2019-07-30 10:57:22,333 INFO sampler.py:407 -- Filtered obs: np.ndarray((2,), dtype=float64, min=-0.027, max=-0.006, mean=-0.017)
(pid=18407) 2019-07-30 10:57:22,333 INFO sampler.py:521 -- Inputs to compute_actions():
(pid=18407)
(pid=18407) { 'default_policy': [ { 'data': { 'agent_id': 'agent0',
(pid=18407) 'env_id': 0,
(pid=18407) 'info': None,
(pid=18407) 'obs': np.ndarray((2,), dtype=float64, min=-0.027, max=-0.006, mean=-0.017),
(pid=18407) 'prev_action': np.ndarray((), dtype=int64, min=0.0, max=0.0, mean=0.0),
(pid=18407) 'prev_reward': 0.0,
(pid=18407) 'rnn_state': [ np.ndarray((256,), dtype=float32, min=0.0, max=0.0, mean=0.0),
(pid=18407) np.ndarray((256,), dtype=float32, min=0.0, max=0.0, mean=0.0)]},
(pid=18407) 'type': 'PolicyEvalData'}]}
(pid=18407)
(pid=18407) 2019-07-30 10:57:22,333 INFO tf_run_builder.py:92 -- Executing TF run without tracing. To dump TF timeline traces to disk, set the TF_TIMELINE_DIR environment variable.
(pid=18414) W0730 10:57:22.315722 139887666513728 deprecation.py:323] From /home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/policy/tf_policy.py:572: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
(pid=18414) Instructions for updating:
(pid=18414) Prefer Variable.assign which has equivalent behavior in 2.X.
(pid=18407) 2019-07-30 10:57:22,381 INFO sampler.py:548 -- Outputs of compute_actions():
(pid=18407)
(pid=18407) { 'default_policy': ( np.ndarray((1,), dtype=int64, min=1.0, max=1.0, mean=1.0),
(pid=18407) [ np.ndarray((1, 256), dtype=float32, min=-0.018, max=0.015, mean=-0.0),
(pid=18407) np.ndarray((1, 256), dtype=float32, min=-0.009, max=0.007, mean=-0.0)],
(pid=18407) { 'action_prob': np.ndarray((1,), dtype=float32, min=0.5, max=0.5, mean=0.5),
(pid=18407) 'behaviour_logits': np.ndarray((1, 2), dtype=float32, min=-0.0, max=0.0, mean=0.0),
(pid=18407) 'vf_preds': np.ndarray((1,), dtype=float32, min=-0.001, max=-0.001, mean=-0.001)})}
(pid=18407)
(pid=18407) 2019-07-30 10:57:22,401 INFO sample_batch_builder.py:161 -- Trajectory fragment after postprocess_trajectory():
(pid=18407)
(pid=18407) { 'agent0': { 'data': { 'action_prob': np.ndarray((16,), dtype=float32, min=0.5, max=0.5, mean=0.5),
(pid=18407) 'actions': np.ndarray((16,), dtype=int64, min=0.0, max=1.0, mean=0.375),
(pid=18407) 'advantages': np.ndarray((16,), dtype=float32, min=1.083, max=14.855, mean=8.117),
(pid=18407) 'agent_index': np.ndarray((16,), dtype=int64, min=0.0, max=0.0, mean=0.0),
(pid=18407) 'behaviour_logits': np.ndarray((16, 2), dtype=float32, min=-0.001, max=0.0, mean=-0.0),
(pid=18407) 'dones': np.ndarray((16,), dtype=bool, min=0.0, max=1.0, mean=0.062),
(pid=18407) 'eps_id': np.ndarray((16,), dtype=int64, min=1907260751.0, max=1907260751.0, mean=1907260751.0),
(pid=18407) 'infos': np.ndarray((16,), dtype=object, head={}),
(pid=18407) 'new_obs': np.ndarray((16, 2), dtype=float32, min=-0.161, max=0.219, mean=-0.001),
(pid=18407) 'obs': np.ndarray((16, 2), dtype=float32, min=-0.142, max=0.184, mean=-0.004),
(pid=18407) 'prev_actions': np.ndarray((16,), dtype=int64, min=0.0, max=1.0, mean=0.312),
(pid=18407) 'prev_rewards': np.ndarray((16,), dtype=float32, min=0.0, max=1.0, mean=0.938),
(pid=18407) 'rewards': np.ndarray((16,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(pid=18407) 'state_in_0': np.ndarray((16, 256), dtype=float32, min=-0.297, max=0.309, mean=-0.002),
(pid=18407) 'state_in_1': np.ndarray((16, 256), dtype=float32, min=-0.145, max=0.155, mean=-0.001),
(pid=18407) 'state_out_0': np.ndarray((16, 256), dtype=float32, min=-0.344, max=0.363, mean=-0.002),
(pid=18407) 'state_out_1': np.ndarray((16, 256), dtype=float32, min=-0.167, max=0.181, mean=-0.001),
(pid=18407) 't': np.ndarray((16,), dtype=int64, min=0.0, max=15.0, mean=7.5),
(pid=18407) 'unroll_id': np.ndarray((16,), dtype=int64, min=0.0, max=0.0, mean=0.0),
(pid=18407) 'value_targets': np.ndarray((16,), dtype=float32, min=1.0, max=14.854, mean=8.089),
(pid=18407) 'vf_preds': np.ndarray((16,), dtype=float32, min=-0.083, max=-0.001, mean=-0.028)},
(pid=18407) 'type': 'SampleBatch'}}
(pid=18407)
(pid=18407) 2019-07-30 10:57:22,556 INFO rollout_worker.py:485 -- Completed sample batch:
(pid=18407)
(pid=18407) { 'data': { 'action_prob': np.ndarray((200,), dtype=float32, min=0.5, max=0.5, mean=0.5),
(pid=18407) 'actions': np.ndarray((200,), dtype=int64, min=0.0, max=1.0, mean=0.55),
(pid=18407) 'advantages': np.ndarray((200,), dtype=float32, min=0.934, max=37.016, mean=11.662),
(pid=18407) 'agent_index': np.ndarray((200,), dtype=int64, min=0.0, max=0.0, mean=0.0),
(pid=18407) 'behaviour_logits': np.ndarray((200, 2), dtype=float32, min=-0.001, max=0.001, mean=0.0),
(pid=18407) 'dones': np.ndarray((200,), dtype=bool, min=0.0, max=1.0, mean=0.045),
(pid=18407) 'eps_id': np.ndarray((200,), dtype=int64, min=301938920.0, max=1907260751.0, mean=1179643498.465),
(pid=18407) 'infos': np.ndarray((200,), dtype=object, head={}),
(pid=18407) 'new_obs': np.ndarray((200, 2), dtype=float32, min=-0.235, max=0.219, mean=-0.013),
(pid=18407) 'obs': np.ndarray((200, 2), dtype=float32, min=-0.208, max=0.196, mean=-0.012),
(pid=18407) 'prev_actions': np.ndarray((200,), dtype=int64, min=0.0, max=1.0, mean=0.52),
(pid=18407) 'prev_rewards': np.ndarray((200,), dtype=float32, min=0.0, max=1.0, mean=0.95),
(pid=18407) 'rewards': np.ndarray((200,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(pid=18407) 'state_in_0': np.ndarray((200, 256), dtype=float32, min=-0.33, max=0.349, mean=0.0),
(pid=18407) 'state_in_1': np.ndarray((200, 256), dtype=float32, min=-0.157, max=0.173, mean=0.0),
(pid=18407) 'state_out_0': np.ndarray((200, 256), dtype=float32, min=-0.367, max=0.41, mean=0.0),
(pid=18407) 'state_out_1': np.ndarray((200, 256), dtype=float32, min=-0.177, max=0.201, mean=0.0),
(pid=18407) 't': np.ndarray((200,), dtype=int64, min=0.0, max=45.0, mean=11.825),
(pid=18407) 'unroll_id': np.ndarray((200,), dtype=int64, min=0.0, max=0.0, mean=0.0),
(pid=18407) 'value_targets': np.ndarray((200,), dtype=float32, min=1.0, max=37.018, mean=11.66),
(pid=18407) 'vf_preds': np.ndarray((200,), dtype=float32, min=-0.101, max=0.066, mean=-0.001)},
(pid=18407) 'type': 'SampleBatch'}
(pid=18407)
(pid=18409) WARNING: Logging before flag parsing goes to stderr.
(pid=18409) W0730 10:57:24.763863 140488989505280 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18409) Instructions for updating:
(pid=18409) non-resource variables are not supported in the long term
(pid=18411) WARNING: Logging before flag parsing goes to stderr.
(pid=18411) W0730 10:57:24.746196 139662328850176 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18411) Instructions for updating:
(pid=18411) non-resource variables are not supported in the long term
(pid=18403) WARNING: Logging before flag parsing goes to stderr.
(pid=18403) W0730 10:57:24.916916 140590455617280 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18403) Instructions for updating:
(pid=18403) non-resource variables are not supported in the long term
(pid=18413) WARNING: Logging before flag parsing goes to stderr.
(pid=18413) W0730 10:57:24.874173 139748529788672 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18413) Instructions for updating:
(pid=18413) non-resource variables are not supported in the long term
(pid=18405) WARNING: Logging before flag parsing goes to stderr.
(pid=18405) W0730 10:57:24.884049 140014912661248 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18405) Instructions for updating:
(pid=18405) non-resource variables are not supported in the long term
(pid=18406) WARNING: Logging before flag parsing goes to stderr.
(pid=18406) W0730 10:57:24.923823 139687792867072 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18406) Instructions for updating:
(pid=18406) non-resource variables are not supported in the long term
(pid=18412) WARNING: Logging before flag parsing goes to stderr.
(pid=18412) W0730 10:57:24.944961 139744931452672 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18412) Instructions for updating:
(pid=18412) non-resource variables are not supported in the long term
(pid=18410) WARNING: Logging before flag parsing goes to stderr.
(pid=18410) W0730 10:57:24.940662 139820158273280 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18410) Instructions for updating:
(pid=18410) non-resource variables are not supported in the long term
(pid=18404) WARNING: Logging before flag parsing goes to stderr.
(pid=18404) W0730 10:57:25.000994 140114953090816 deprecation.py:323] From /home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18404) Instructions for updating:
(pid=18404) non-resource variables are not supported in the long term
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/default_model/fc1/kernel:0' shape=(2, 256) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/default_model/fc1/bias:0' shape=(256,) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/default_model/fc2/kernel:0' shape=(256, 256) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/default_model/fc2/bias:0' shape=(256,) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/default_model/rnn/lstm_cell/kernel:0' shape=(512, 1024) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/default_model/rnn/lstm_cell/bias:0' shape=(1024,) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/default_model/action/w:0' shape=(256, 2) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/default_model/action/b:0' shape=(2,) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/value_function/value_function/w:0' shape=(256, 1) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,777 INFO tf_policy.py:357 -- Optimizing variable <tf.Variable 'default_policy/value_function/value_function/b:0' shape=(1,) dtype=float32_ref>
(pid=18408) 2019-07-30 10:57:25,798 INFO tf_policy.py:549 -- Padded input for RNN:
(pid=18408)
(pid=18408) { 'features': [ np.ndarray((5800,), dtype=float64, min=0.0, max=1.0, mean=0.324),
(pid=18408) np.ndarray((5800,), dtype=float64, min=0.0, max=1.0, mean=0.657),
(pid=18408) np.ndarray((5800, 2), dtype=float64, min=-0.316, max=0.325, mean=0.002),
(pid=18408) np.ndarray((5800,), dtype=float64, min=0.0, max=0.501, mean=0.345),
(pid=18408) np.ndarray((5800,), dtype=float64, min=0.0, max=1.0, mean=0.341),
(pid=18408) np.ndarray((5800,), dtype=float64, min=-1.325, max=3.515, mean=-0.0),
(pid=18408) np.ndarray((5800, 2), dtype=float64, min=-0.001, max=0.001, mean=-0.0),
(pid=18408) np.ndarray((5800,), dtype=float64, min=0.0, max=40.104, mean=7.999),
(pid=18408) np.ndarray((5800,), dtype=float64, min=-0.141, max=0.14, mean=-0.001)],
(pid=18408) 'initial_states': [ np.ndarray((290, 256), dtype=float32, min=-0.43, max=0.462, mean=-0.0),
(pid=18408) np.ndarray((290, 256), dtype=float32, min=-0.216, max=0.216, mean=-0.0)],
(pid=18408) 'max_seq_len': 20,
(pid=18408) 'seq_lens': np.ndarray((290,), dtype=int64, min=1.0, max=20.0, mean=13.793)}
(pid=18408)
(pid=18408) 2019-07-30 10:57:25,799 INFO multi_gpu_impl.py:146 -- Training on concatenated sample batches:
(pid=18408)
(pid=18408) { 'inputs': [ np.ndarray((5800,), dtype=float64, min=0.0, max=1.0, mean=0.324),
(pid=18408) np.ndarray((5800,), dtype=float64, min=0.0, max=1.0, mean=0.657),
(pid=18408) np.ndarray((5800, 2), dtype=float64, min=-0.316, max=0.325, mean=0.002),
(pid=18408) np.ndarray((5800,), dtype=float64, min=0.0, max=0.501, mean=0.345),
(pid=18408) np.ndarray((5800,), dtype=float64, min=0.0, max=1.0, mean=0.341),
(pid=18408) np.ndarray((5800,), dtype=float64, min=-1.325, max=3.515, mean=-0.0),
(pid=18408) np.ndarray((5800, 2), dtype=float64, min=-0.001, max=0.001, mean=-0.0),
(pid=18408) np.ndarray((5800,), dtype=float64, min=0.0, max=40.104, mean=7.999),
(pid=18408) np.ndarray((5800,), dtype=float64, min=-0.141, max=0.14, mean=-0.001)],
(pid=18408) 'placeholders': [ <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(pid=18408) <tf.Tensor 'default_policy/prev_reward:0' shape=(?,) dtype=float32>,
(pid=18408) <tf.Tensor 'default_policy/observation:0' shape=(?, 2) dtype=float32>,
(pid=18408) <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(pid=18408) <tf.Tensor 'default_policy/actions:0' shape=(?,) dtype=int64>,
(pid=18408) <tf.Tensor 'default_policy/advantages:0' shape=(?,) dtype=float32>,
(pid=18408) <tf.Tensor 'default_policy/behaviour_logits:0' shape=(?, 2) dtype=float32>,
(pid=18408) <tf.Tensor 'default_policy/value_targets:0' shape=(?,) dtype=float32>,
(pid=18408) <tf.Tensor 'default_policy/vf_preds:0' shape=(?,) dtype=float32>,
(pid=18408) <tf.Tensor 'default_policy/Placeholder:0' shape=(?, 256) dtype=float32>,
(pid=18408) <tf.Tensor 'default_policy/Placeholder_1:0' shape=(?, 256) dtype=float32>,
(pid=18408) <tf.Tensor 'default_policy/seq_lens:0' shape=(?,) dtype=int32>],
(pid=18408) 'state_inputs': [ np.ndarray((290, 256), dtype=float32, min=-0.43, max=0.462, mean=-0.0),
(pid=18408) np.ndarray((290, 256), dtype=float32, min=-0.216, max=0.216, mean=-0.0),
(pid=18408) np.ndarray((290,), dtype=int64, min=1.0, max=20.0, mean=13.793)]}
(pid=18408)
(pid=18408) 2019-07-30 10:57:25,799 INFO multi_gpu_impl.py:191 -- Divided 290 rollout sequences, each of length 20, among 1 devices.
Result for PPO_cartpole_stateless_0:
custom_metrics: {}
date: 2019-07-30_10-57-29
done: false
episode_len_mean: 21.015873015873016
episode_reward_max: 55.0
episode_reward_mean: 21.015873015873016
episode_reward_min: 8.0
episodes_this_iter: 189
episodes_total: 189
experiment_id: 9ffde182bcfe4609bfc8d6ad77aa4703
hostname: noone-will-not-die
info:
grad_time_ms: 3127.938
learner:
default_policy:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: .nan
entropy_coeff: 0.0
kl: .nan
policy_loss: .nan
total_loss: .nan
vf_explained_var: -1.0
vf_loss: .nan
load_time_ms: 72.371
num_steps_sampled: 4000
num_steps_trained: 5760
sample_time_ms: 3620.068
update_time_ms: 509.852
iterations_since_restore: 1
node_ip: 10.170.34.144
num_healthy_workers: 2
off_policy_estimator: {}
pid: 18408
policy_reward_mean: {}
sampler_perf:
mean_env_wait_ms: 0.09694922668843232
mean_inference_ms: 1.4001183253509777
mean_processing_ms: 0.17961225506642745
time_since_restore: 7.389065742492676
time_this_iter_s: 7.389065742492676
time_total_s: 7.389065742492676
timestamp: 1564455449
timesteps_since_restore: 4000
timesteps_this_iter: 4000
timesteps_total: 4000
training_iteration: 1
trial_id: bc9a7408
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 3/12 CPUs, 0/1 GPUs
Memory usage on this node: 24.3/67.5 GB
Result logdir: /home/noone/ray_results/PPO
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
- PPO_cartpole_stateless_0: RUNNING, [3 CPUs, 0 GPUs], [pid=18408], 7 s, 1 iter, 4000 ts, 21 rew
2019-07-30 10:57:29,064 ERROR trial_runner.py:550 -- Error processing event.
Traceback (most recent call last):
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/tune/trial_runner.py", line 498, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/tune/ray_trial_executor.py", line 342, in fetch_result
result = ray.get(trial_future[0])
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/worker.py", line 2246, in get
raise value
ray.exceptions.RayTaskError: ray_PPO:train() (pid=18408, host=noone-will-not-die)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/agents/trainer.py", line 369, in train
raise e
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/agents/trainer.py", line 358, in train
result = Trainable.train(self)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/tune/trainable.py", line 171, in train
result = self._train()
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/agents/trainer_template.py", line 126, in _train
fetches = self.optimizer.step()
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/optimizers/multi_gpu_optimizer.py", line 140, in step
self.num_envs_per_worker, self.train_batch_size)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/optimizers/rollout.py", line 29, in collect_samples
next_sample = ray_get_and_free(fut_sample)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/utils/memory.py", line 33, in ray_get_and_free
result = ray.get(object_ids)
ray.exceptions.RayTaskError: ray_RolloutWorker:sample() (pid=18414, host=noone-will-not-die)
File "/home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 2 which is outside the valid range of [0, 2). Label values: 2
[[{{node default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
During handling of the above exception, another exception occurred:
ray_RolloutWorker:sample() (pid=18414, host=noone-will-not-die)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/utils/tf_run_builder.py", line 48, in get
self.feed_dict, os.environ.get("TF_TIMELINE_DIR"))
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/utils/tf_run_builder.py", line 94, in run_timeline
fetches = sess.run(ops, feed_dict=feed_dict)
File "/home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/home/noone/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 2 which is outside the valid range of [0, 2). Label values: 2
[[node default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py:54) ]]
Errors may have originated from an input operation.
Input Source operations connected to node default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits:
default_policy/default_model_1/add (defined at /Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/misc.py:69)
Original stack trace for 'default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits':
File "/Documents/New_TF/ray-autoregressive/python/ray/workers/default_worker.py", line 98, in <module>
ray.worker.global_worker.main_loop()
File "/Documents/New_TF/ray-autoregressive/python/ray/rllib/evaluation/rollout_worker.py", line 334, in __init__
self._build_policy_map(policy_dict, policy_config)
File "/Documents/New_TF/ray-autoregressive/python/ray/rllib/evaluation/rollout_worker.py", line 738, in _build_policy_map
policy_map[name] = cls(obs_space, act_space, merged_conf)
File "/Documents/New_TF/ray-autoregressive/python/ray/rllib/policy/tf_policy_template.py", line 144, in __init__
obs_include_prev_action_reward=obs_include_prev_action_reward)
File "/Documents/New_TF/ray-autoregressive/python/ray/rllib/policy/dynamic_tf_policy.py", line 178, in __init__
action_prob = self.action_dist.sampled_action_prob()
File "/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py", line 41, in sampled_action_prob
return tf.exp(self.logp(self.sample_op))
File "/Documents/New_TF/ray-autoregressive/python/ray/rllib/models/tf/tf_action_dist.py", line 54, in logp
logits=self.inputs, labels=tf.cast(x, tf.int32))
File "/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 3338, in sparse_softmax_cross_entropy_with_logits
precise_logits, labels, name=name)
File "/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 11815, in sparse_softmax_cross_entropy_with_logits
labels=labels, name=name)
File "/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3296, in create_op
op_def=op_def)
File "/anaconda3/envs/lab/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1692, in __init__
self._traceback = tf_stack.extract_stack()
During handling of the above exception, another exception occurred:
ray_RolloutWorker:sample() (pid=18414, host=noone-will-not-die)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/evaluation/rollout_worker.py", line 453, in sample
batches = [self.input_reader.next()]
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/evaluation/sampler.py", line 56, in next
batches = [self.get_data()]
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/evaluation/sampler.py", line 97, in get_data
item = next(self.rollout_provider)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/evaluation/sampler.py", line 321, in _env_runner
active_episodes)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/evaluation/sampler.py", line 544, in _do_policy_eval
eval_results[k] = builder.get(v)
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/utils/tf_run_builder.py", line 53, in get
self.fetches, self.feed_dict))
ValueError: Error fetching: [<tf.Tensor 'default_policy/Squeeze:0' shape=(?,) dtype=int64>, <tf.Tensor 'default_policy/default_model_1/rnn/while/Exit_3:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'default_policy/default_model_1/rnn/while/Exit_4:0' shape=(?, 256) dtype=float32>, {'action_prob': <tf.Tensor 'default_policy/Exp:0' shape=(?,) dtype=float32>, 'vf_preds': <tf.Tensor 'default_policy/value_function/Reshape:0' shape=(?,) dtype=float32>, 'behaviour_logits': <tf.Tensor 'default_policy/default_model_1/add:0' shape=(?, 2) dtype=float32>}], feed_dict={<tf.Tensor 'default_policy/observation:0' shape=(?, 2) dtype=float32>: [array([-0.01396689, 0.04697164])], <tf.Tensor 'default_policy/seq_lens:0' shape=(?,) dtype=int32>: array([1.]), <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>: [0], <tf.Tensor 'default_policy/prev_reward:0' shape=(?,) dtype=float32>: [1.0], <tf.Tensor 'default_policy/PlaceholderWithDefault:0' shape=() dtype=bool>: False, <tf.Tensor 'default_policy/Placeholder:0' shape=(?, 256) dtype=float32>: [array([ 2.34208116e-03, 2.55136266e-02, 3.10041010e-04, 9.96489171e-03,
7.67951598e-03, 1.36089586e-02, 3.93364485e-03, 2.33557466e-02,
-2.86707655e-05, -2.02056170e-02, -2.16141902e-02, 1.11253373e-02,
1.11075882e-02, 1.64756086e-04, 5.24371862e-03, 9.52905789e-03,
6.41135685e-03, 1.19743608e-02, -2.50140689e-02, 2.69174203e-03,
-1.46168256e-02, -1.42096751e-03, 8.91837385e-03, 1.54695939e-03,
1.35066248e-02, 3.65241454e-03, -4.80578281e-04, -8.75154696e-03,
-1.37766516e-02, -2.54603452e-03, 1.32534420e-04, 1.43313948e-02,
-9.28479526e-03, 1.54519873e-02, -2.06236504e-02, 5.41525614e-03,
-5.89275034e-04, 1.17849987e-02, 2.13536602e-02, -1.13948854e-02,
6.57908805e-03, 2.68336590e-02, 7.36262556e-03, -6.47960976e-03,
-5.87534113e-03, -1.23023894e-02, -3.09356488e-03, -4.59006289e-03,
-5.72994212e-03, 8.42755754e-03, -7.12607848e-03, 1.62174199e-02,
-3.50747630e-02, 1.19702499e-02, 7.45555107e-03, 2.46695825e-03,
-9.18773748e-03, -1.69583615e-02, -2.05076821e-02, 1.43612996e-02,
-1.42295612e-02, 2.84119882e-02, -1.52846221e-02, -5.77235501e-03,
5.37288608e-03, 1.31639875e-02, 3.14234942e-02, 1.20935515e-02,
-1.32916663e-02, -3.43763432e-03, 1.45858154e-02, 1.71487732e-03,
-3.28830117e-03, -3.96627467e-04, 4.75734659e-03, -1.25105372e-02,
1.08230552e-02, -7.62496609e-03, -4.86280117e-03, 3.62977292e-03,
3.24787479e-03, 4.03538812e-03, -1.01199895e-02, -8.23730417e-03,
3.41572277e-02, 6.17882842e-03, 2.34485269e-02, -5.80853922e-03,
-1.94547419e-02, 2.36917660e-02, -9.88009013e-03, -2.83728130e-02,
6.70244545e-03, 1.70527566e-02, 5.37017360e-03, -4.68060840e-03,
-2.86601763e-02, 9.00363829e-03, 1.74534097e-02, 1.88261382e-02,
6.83917664e-04, -1.66722983e-02, -2.42553055e-02, -5.75246150e-03,
1.77971367e-03, 3.37178633e-03, -7.38237519e-04, -9.40109417e-03,
-1.31571665e-03, -1.79875195e-02, -3.12234741e-02, 4.24024370e-03,
2.51976270e-02, -2.56316345e-02, -2.17599049e-02, 1.16055962e-02,
1.28593529e-02, -7.72294728e-03, -1.88843105e-02, -9.87186097e-03,
7.03094527e-03, 1.61100421e-02, 9.15877149e-03, -2.48155985e-02,
1.70227215e-02, 1.75281633e-02, 7.41484575e-04, -5.46360854e-03,
7.57378759e-03, -5.77460742e-03, -1.32503342e-02, 1.66880563e-02,
2.07601655e-02, 1.20913619e-02, 1.82248242e-02, -6.32480765e-03,
-7.68216234e-03, 4.15570429e-03, -2.86008790e-03, -2.44393721e-02,
-7.42992386e-03, 1.26600377e-02, -2.26717740e-02, 2.65147677e-03,
-4.83682752e-03, 1.37622301e-02, 3.06521868e-03, 2.24752873e-02,
-7.29710935e-03, 2.55866088e-02, 7.12188147e-03, -8.18472169e-03,
-9.69675183e-03, 1.69819742e-02, -2.62668356e-03, -6.96957298e-03,
3.74922063e-04, -1.14858216e-02, 1.45210158e-02, 6.33438677e-03,
-4.39420808e-04, -1.34746283e-02, 1.85619406e-02, 5.10506425e-03,
3.78300902e-04, 2.45013908e-02, 5.17692138e-03, 1.25240255e-02,
-1.22654308e-02, -6.59019407e-03, 1.71815343e-02, -1.23361982e-02,
1.73524837e-04, 1.52290203e-02, 2.08190158e-02, 2.56576631e-02,
-1.66522879e-02, 1.97979547e-02, -1.19318878e-02, 3.46780522e-04,
-2.58961581e-02, -7.94736762e-03, 2.49958225e-02, 2.41569579e-02,
-1.81597024e-02, 1.87295135e-02, -4.49576089e-03, 3.46424282e-02,
3.23430123e-03, -2.12764367e-04, 1.14036025e-02, -9.37443040e-03,
8.58873222e-03, -2.67743645e-03, -1.42927673e-02, -7.09009683e-03,
-8.12167861e-03, -1.07923560e-02, -1.41440099e-02, -1.11807305e-02,
-1.56251024e-02, 9.76091437e-03, -1.27944164e-02, 6.36064168e-03,
-3.56043242e-02, -5.62395016e-03, 9.02843382e-03, -9.05743055e-03,
-2.44271848e-02, -1.91021189e-02, 8.41616374e-03, -7.17950054e-03,
1.55365458e-02, -1.00921113e-02, -1.90953724e-03, 2.11523436e-02,
1.20006055e-02, -5.95899764e-03, -2.78958678e-02, 2.07987279e-02,
-1.82158705e-02, 1.86963833e-03, 2.38565505e-02, -1.14945155e-02,
4.22356650e-04, -1.64230131e-02, 9.71477851e-03, 2.66764313e-04,
-1.26275728e-02, -6.90515293e-03, -3.75322974e-03, -2.98748501e-02,
1.10895783e-02, 1.63925346e-03, -1.84452217e-02, -2.77221277e-02,
2.21742354e-02, -2.65139202e-03, -6.82438724e-04, -1.03760781e-02,
-3.60063389e-02, 6.30152645e-03, 4.35772911e-03, 4.66712750e-03,
1.97364576e-02, 1.11448281e-02, -2.51266873e-04, 1.86248217e-02,
5.73047902e-04, 1.24117425e-02, 9.43047367e-03, 2.12754272e-02,
-1.74181163e-02, 1.52502870e-02, -6.38781209e-03, -9.77559946e-04],
dtype=float32)], <tf.Tensor 'default_policy/Placeholder_1:0' shape=(?, 256) dtype=float32>: [array([ 1.17238273e-03, 1.27084013e-02, 1.55073009e-04, 5.00793103e-03,
3.88638955e-03, 6.82488224e-03, 1.96659961e-03, 1.15871131e-02,
-1.44172964e-05, -1.01980362e-02, -1.08604459e-02, 5.56492480e-03,
5.56727732e-03, 8.24103772e-05, 2.59335106e-03, 4.71801404e-03,
3.20580299e-03, 6.05184818e-03, -1.24633387e-02, 1.33451342e-03,
-7.35613285e-03, -7.10336491e-04, 4.48231213e-03, 7.77668203e-04,
6.79887785e-03, 1.83072349e-03, -2.40165798e-04, -4.38162545e-03,
-6.85304077e-03, -1.27821346e-03, 6.56899647e-05, 7.10834516e-03,
-4.65078233e-03, 7.70917209e-03, -1.02481218e-02, 2.69175344e-03,
-2.94873491e-04, 5.90227125e-03, 1.07231382e-02, -5.69103705e-03,
3.28548462e-03, 1.34077407e-02, 3.68866138e-03, -3.23944702e-03,
-2.95028952e-03, -6.19992102e-03, -1.53574580e-03, -2.28263182e-03,
-2.86566350e-03, 4.21292242e-03, -3.52866529e-03, 8.19102488e-03,
-1.74916927e-02, 6.00119447e-03, 3.75456433e-03, 1.23948266e-03,
-4.60752519e-03, -8.46950337e-03, -1.02336025e-02, 7.15918979e-03,
-7.06216646e-03, 1.41680036e-02, -7.52482004e-03, -2.88344361e-03,
2.66507361e-03, 6.60943612e-03, 1.57196652e-02, 5.97457821e-03,
-6.67590043e-03, -1.71697454e-03, 7.22507853e-03, 8.52008583e-04,
-1.65462436e-03, -1.98466121e-04, 2.36212881e-03, -6.25975942e-03,
5.40321134e-03, -3.79721378e-03, -2.44808616e-03, 1.80749816e-03,
1.62892696e-03, 2.01255269e-03, -5.06624579e-03, -4.10329457e-03,
1.71391647e-02, 3.07555520e-03, 1.18216462e-02, -2.94066337e-03,
-9.84815415e-03, 1.18902121e-02, -4.91099013e-03, -1.40925311e-02,
3.35323438e-03, 8.55381228e-03, 2.68215523e-03, -2.34483043e-03,
-1.42324241e-02, 4.49846219e-03, 8.69784970e-03, 9.47340578e-03,
3.41146748e-04, -8.31721723e-03, -1.21942014e-02, -2.87110009e-03,
8.89509625e-04, 1.67377363e-03, -3.68013163e-04, -4.68676351e-03,
-6.57542842e-04, -9.05317534e-03, -1.54710319e-02, 2.13262602e-03,
1.25569459e-02, -1.28419753e-02, -1.09657412e-02, 5.78984711e-03,
6.43341988e-03, -3.89518915e-03, -9.47754737e-03, -4.99988673e-03,
3.54408170e-03, 8.12951103e-03, 4.57565859e-03, -1.23730823e-02,
8.45211558e-03, 8.76343064e-03, 3.71449045e-04, -2.76468811e-03,
3.75670311e-03, -2.88341474e-03, -6.59114681e-03, 8.40351451e-03,
1.02620153e-02, 5.98482555e-03, 9.22203809e-03, -3.14923213e-03,
-3.83959548e-03, 2.07268097e-03, -1.41904084e-03, -1.22563094e-02,
-3.75004741e-03, 6.26969757e-03, -1.13490243e-02, 1.32939592e-03,
-2.44862540e-03, 6.83564739e-03, 1.53364742e-03, 1.13061294e-02,
-3.68273421e-03, 1.27833197e-02, 3.56690167e-03, -4.11900412e-03,
-4.82183276e-03, 8.57665576e-03, -1.31439767e-03, -3.48364119e-03,
1.87370446e-04, -5.80565073e-03, 7.32264994e-03, 3.14342743e-03,
-2.17068635e-04, -6.72905287e-03, 9.35985707e-03, 2.51369923e-03,
1.90431863e-04, 1.23629933e-02, 2.57665524e-03, 6.27149502e-03,
-6.11352827e-03, -3.31815402e-03, 8.53789877e-03, -6.19749539e-03,
8.71637167e-05, 7.64493365e-03, 1.04704006e-02, 1.29027953e-02,
-8.34038015e-03, 9.93120018e-03, -5.98284742e-03, 1.73498833e-04,
-1.30167268e-02, -3.96047113e-03, 1.24854911e-02, 1.20910667e-02,
-9.09861084e-03, 9.35788173e-03, -2.26239464e-03, 1.74810700e-02,
1.59913069e-03, -1.05616731e-04, 5.66452276e-03, -4.67391964e-03,
4.29031951e-03, -1.34624145e-03, -7.20456801e-03, -3.55883781e-03,
-4.10289178e-03, -5.38766757e-03, -7.05783907e-03, -5.67587046e-03,
-7.68225873e-03, 4.92800586e-03, -6.35477109e-03, 3.16568674e-03,
-1.78863946e-02, -2.77919136e-03, 4.50720917e-03, -4.54648072e-03,
-1.22682890e-02, -9.62348096e-03, 4.20519477e-03, -3.56953125e-03,
7.89650250e-03, -4.98719327e-03, -9.51231981e-04, 1.05055440e-02,
5.98566188e-03, -2.98265391e-03, -1.40018947e-02, 1.04478272e-02,
-9.14274622e-03, 9.26883193e-04, 1.19316401e-02, -5.70638711e-03,
2.11794802e-04, -8.13554693e-03, 4.87645296e-03, 1.31857232e-04,
-6.33089384e-03, -3.49620776e-03, -1.88731821e-03, -1.50755728e-02,
5.56117389e-03, 8.14314408e-04, -9.27157514e-03, -1.39756259e-02,
1.10190241e-02, -1.31390942e-03, -3.40362982e-04, -5.18372376e-03,
-1.79849193e-02, 3.16347950e-03, 2.16951757e-03, 2.34221458e-03,
9.86323785e-03, 5.54536097e-03, -1.25075836e-04, 9.30343010e-03,
2.84098554e-04, 6.25181384e-03, 4.75017307e-03, 1.06740957e-02,
-8.78479239e-03, 7.56963762e-03, -3.20972619e-03, -4.90455655e-04],
dtype=float32)]}
Traceback (most recent call last):
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/rllib/examples/cartpole_lstm.py", line 190, in <module>
"lstm_use_prev_action_reward": args.use_prev_action_reward,
File "/home/noone/Documents/New_TF/ray-autoregressive/python/ray/tune/tune.py", line 262, in run
raise TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', [PPO_cartpole_stateless_0])
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/1 GPUs
Memory u
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This is a bit hacky, but if you are defining a custom model you can in your forward() save the input dict as e.g., model.last_input_dict. Then, you can access model.last_input_dict from your action distribution.
This should be addressed once #5164 is merged, since that PR allows custom action distributions to specify their own output size using Let me know if these work. |
Test FAILed. |
Hi, @ericl . Sorry again to trouble you, though the autoregressive action distribution is very mature enough, but I think there is still one tiny function issue about how to fetch the action logits of each subaction, instead of returning the hidden state as the behaviour_logits when we perform testing. This would be very helpful when we debug the programs. Here are what I have tried. 1. Add the logits directly as the attribute of the modelI noticed that the function def vf_preds_and_logits_fetches(policy):
"""Adds value function and logits outputs to experience batches."""
return {
SampleBatch.VF_PREDS: policy.value_function,
BEHAVIOUR_LOGITS: policy.model_out,
'action_mask': policy.model.last_input_dict,
# 'ac_1_logits': policy.model.a1_logits,
# 'ac_2_logits': policy.model.a2_logits,
} But it turns that the BinaryAutoregressiveOutput distribution depends on the hidden state and a1_input as the Input, so it fails since there is no input for them. 2. Trying to call the action_model directlyBased on the first failure, I tried to feed the # Line 204
a_action, p_state, info = agent.compute_action(
a_obs,
state=agent_states[agent_id],
prev_action=prev_actions[agent_id],
prev_reward=prev_rewards[agent_id],
policy_id=policy_id, full_fetch=True)
input_size = 256
agent.get_policy(policy_id).model.action_model([info['behaviour_logits'][None], np.zeros((1, input_size), dtype=np.float32)]) But an error shows that
Where the model_1 is the ParametricActionsModel model, and the hidden state size is 128, and a1_input size is 256, so 384==128+256 Do you have any thoughts on fetching the action logits? The training is not well, and I hope I can watch the logits for helping debugging a little. Sorry for disturbing you again. Best Wishes Shanchao |
Test FAILed. |
@yangysc I would probably add some tf.Prints inside the action distribution object itself, since you want to capture the logit outputs during the sampling process. It might also be possible to assign to self.model inside the action distribution object to capture the right conditioned output. |
Test FAILed. |
Test FAILed. |
Test PASSed. |
ModelCatalog.register_custom_model("autoregressive_model", | ||
AutoregressiveActionsModel) | ||
ModelCatalog.register_custom_action_dist("binary_autoreg_output", | ||
BinaryAutoregressiveOutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why make this explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could auto register it but that would be more effort for sure.
BATCH = tf.shape(self.inputs)[0] | ||
a1_logits, _ = self.model.action_model( | ||
[self.inputs, tf.zeros((BATCH, 1))]) | ||
a1_dist = Categorical(a1_logits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this have issues? Like somehow adding nodes to the tf graph over and over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine since this entire thing is only called once in graph mode.
# and state of each episode (i.e., for multiagent). You can do | ||
# whatever is needed here, e.g., MCTS rollouts. | ||
return action_batch | ||
class BinaryAutoregressiveOutput(ActionDistribution): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use literalinclude
so that this doesn't go out of sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good though we should probably add a check to make sure the TF graph doesn't change over time...
What do these changes do?
Right now the custom action distribution is injected via an override_action_dist hack, this will be removed once #5164 merges.
Related issue number
Closes #4939
Closes #5419
Linter
scripts/format.sh
to lint the changes in this PR.