Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in training #6

Open
Jay-Vim-Lv opened this issue Sep 28, 2023 · 6 comments
Open

ValueError in training #6

Jay-Vim-Lv opened this issue Sep 28, 2023 · 6 comments

Comments

@Jay-Vim-Lv
Copy link

Hi, when i tried to replicate your code, i meet some issues. i can not find where the problem is or how to solve it, could you help me?
my environment is builted the same as you recommend, the system is ubuntu 18.04 LTS.
there are 2 gpus : 1080Ti & titan X
in the code, I only modified the 'num_workers' and 'batch_size' in the YAML file to match my hardware.
when i run python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml,It generated the following error message:
`
(/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF) lxd@lxd-T630:/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football$ python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml
[2023-09-28 09:18:34,036][WARNING] No active cluster detected, will create local ray instance.
[2023-09-28 09:18:44,991][WARNING] ============== Cluster Info ==============
{'node_ip_address': '192.168.1.109', 'raylet_ip_address': '192.168.1.109', 'redis_address': '192.168.1.109:6379', 'object_store_address': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469', 'metrics_export_port': 55494, 'node_id': 'a8211a7e16deb107246a6dfd4b68c7d43f1a31ddb9fdba7c482c3b64'}
[2023-09-28 09:18:44,993][WARNING] * cluster resources:
{'accelerator_type:G': 1.0, 'GPU': 2.0, 'object_store_memory': 17054784307.0, 'memory': 34109568615.0, 'node:192.168.1.109': 1.0, 'CPU': 48.0}
[2023-09-28 09:18:44,993][WARNING] this worker ip: 192.168.1.109
[2023-09-28 09:18:44,994][WARNING] Automatically set master ip to local ip address: 192.168.1.109
[2023-09-28 09:18:46,480][INFO] AgentManager initialized
[2023-09-28 09:18:46,514][WARNING] use meta solver type: nash
[2023-09-28 09:18:46,991][INFO] PBTRunner psro initialized
[2023-09-28 09:18:46,991][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1
[2023-09-28 09:18:46,995][WARNING] use model type: gr_football.built_in_11
(pid=47592) [2023-09-28 09:18:49,787][INFO] DataServer initialized
(pid=47595) [2023-09-28 09:18:49,798][INFO] PolicyServer initialized
[2023-09-28 09:18:50,411][INFO] Load initial policy built_in_11 from light_malib/trained_models/gr_football/11_vs_11/built_in
[2023-09-28 09:18:50,426][WARNING] use model type: gr_football.basic_11
[2023-09-28 09:18:50,479][WARNING] agent_0: agent_0-default-0 is initialized from random
[2023-09-28 09:18:50,479][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-09-28 09:18:50,523][WARNING] after initialization:

policy_ids: ['built_in_11', 'agent_0-default-0'] populations:

policy_ids:['built_in_11', 'agent_0-default-0']

policy_ids:['built_in_11', 'agent_0-default-0']

[2023-09-28 09:18:50,524][WARNING] Evaluation rollouts (num: 50) for 3 policy combinations: [{'agent_0': {'built_in_11': 1.0}, 'agent_1': {'built_in_11': 1.0}}, {'agent_0': {'built_in_11': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}]
(pid=47611) [2023-09-28 09:18:51,072][INFO] TrainingManager initialized
(pid=47610) [2023-09-28 09:18:51,149][INFO] RolloutManager initialized
(pid=47606) [2023-09-28 09:19:02,415][INFO] DataPrefetcher initialized
(pid=47599) [2023-09-28 09:19:02,593][INFO] trainer_1 (local rank: 1) initialized
(pid=47609) [2023-09-28 09:19:02,603][INFO] trainer_0 (local rank: 0) initialized
Elo = dict_items([('built_in_11', 1015.631846603239), ('agent_0-default-0', 984.368153396761)])
[2023-09-28 09:30:57,920][INFO] policy_data: [('built_in_11', 'built_in_11'):{'payoff': 5.551115123125783e-17, 'score': 0.5, 'win': 0.28, 'lose': 0.28, 'my_goal': 0.43, 'goal_diff': 0.0}],[('built_in_11', 'agent_0-default-0'):{'payoff': 1.0, 'score': 1.0, 'win': 1.0, 'lose': 0.0, 'my_goal': 3.883116883116883, 'goal_diff': 3.883116883116883}],[('agent_0-default-0', 'built_in_11'):{'payoff': -1.0, 'score': 0.0, 'win': 0.0, 'lose': 1.0, 'my_goal': 0.0, 'goal_diff': -3.883116883116883}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 0.0, 'score': 0.5, 'win': 0.25, 'lose': 0.25, 'my_goal': 0.42, 'goal_diff': 0.0}],
(pid=47605) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:59: UserWarning: Starting a Matplotlib GUI outside of the main thread will likely fail.
(pid=47605) fig = plt.figure()
(pid=47605) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:63: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=47605) ax.set_xticklabels([""] + xpid, rotation=90)
(pid=47605) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:64: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=47605) ax.set_yticklabels([""] + ypid)
[2023-09-28 09:30:58,519][INFO] payoff table:
+-------------+---------------+-------------+
| | built_in_11 | default-0 |
+=============+===============+=============+
| built_in_11 | +0 | +100 |
+-------------+---------------+-------------+
| default-0 | -100 | +0 |
+-------------+---------------+-------------+
[2023-09-28 09:30:58,520][INFO] default-0's top 10 worst opponents are:
+-------------+----------+
| policy_id | payoff |
+=============+==========+
| built_in_11 | -100.00 |
+-------------+----------+
| default-0 | +0.00 |
+-------------+----------+
[2023-09-28 09:31:10,202][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0
[2023-09-28 09:31:10,203][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-09-28 09:31:10,223][WARNING] ********** Generation[0] Agent[agent_0] START **********
[2023-09-28 09:31:10,223][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7f815d04d790>, kwargs={})
(pid=47592) [2023-09-28 09:31:10,243][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 8}}
(pid=47592) [2023-09-28 09:31:10,248][INFO] DataServer created data table agent_0-default-1
(pid=47610) [2023-09-28 09:31:10,281][INFO] Rollout 1
(pid=47599) [2023-09-28 09:31:10,431][INFO] local_rank: 1 cuda_visible_devices:1
(pid=47609) [2023-09-28 09:31:10,405][INFO] local_rank: 0 cuda_visible_devices:0
(pid=47599) [2023-09-28 09:31:12,242][WARNING] trainer_1 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7fd2166e3e20>, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:192.168.1.109': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=47609) [2023-09-28 09:31:12,229][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7f9099940400>, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:192.168.1.109': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=47609) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py:53: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/torch/csrc/utils/tensor_numpy.cpp:206.)
(pid=47609) value = torch.FloatTensor(value)
(pid=47599) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py:53: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/torch/csrc/utils/tensor_numpy.cpp:206.)
(pid=47599) value = torch.FloatTensor(value)
(pid=47610) [2023-09-28 09:32:56,022][WARNING] save the best model(average reward:-5092.5,average win:0.0)
(pid=47610) [2023-09-28 09:32:56,081][INFO] Rollout 2
(pid=47610) [2023-09-28 09:34:40,549][WARNING] save the best model(average reward:-3465.0,average win:0.0)
(pid=47610) [2023-09-28 09:34:40,601][INFO] Rollout 3
(pid=47611) 2023-09-28 09:35:41,233 ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::DistributedTrainer.optimize() (pid=47599, ip=192.168.1.109, repr=<light_malib.training.distributed_trainer.DistributedTrainer object at 0x7fd2166e3d60>)
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/distributed_trainer.py", line 200, in optimize
(pid=47611) training_info = self.trainer.optimize(batch)
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
(pid=47611) tmp_opt_result = self.loss(mini_batch)
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/common/loss_func.py", line 70, in call
(pid=47611) return tensor_cast(
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/general.py", line 110, in wrap
(pid=47611) rets = func(*new_args, **kwargs)
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
(pid=47611) values, action_log_probs, dist_entropy = self._evaluate_actions(
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
(pid=47611) dist = torch.distributions.Categorical(logits=logits)
(pid=47611) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/categorical.py", line 66, in init
(pid=47611) super().init(batch_shape, validate_args=validate_args)
(pid=47611) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/distribution.py", line 62, in init
(pid=47611) raise ValueError(
(pid=47611) ValueError: Expected parameter logits (Tensor of shape (40000, 19)) of distribution Categorical(logits: torch.Size([40000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
(pid=47611) tensor([[nan, nan, nan, ..., nan, nan, nan],
(pid=47611) [nan, nan, nan, ..., nan, nan, nan],
(pid=47611) [nan, nan, nan, ..., nan, nan, nan],
(pid=47611) ...,
(pid=47611) [nan, nan, nan, ..., nan, nan, nan],
(pid=47611) [nan, nan, nan, ..., nan, nan, nan],
(pid=47611) [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
(pid=47611) grad_fn=)
(pid=47610) [2023-09-28 09:35:41,283][INFO] Saving model agent_0 agent_0-default-1 3 to /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/./logs/gr_football/10_vs_10_psro/2023-09-28-09-18-44/agent_0/agent_0-default-1/3
Traceback (most recent call last):
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 126, in
main()
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 114, in main
runner.run()
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/framework/pbt_runner.py", line 106, in run
ray.get(training_task_ref)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/worker.py", line 1625, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=47611, ip=192.168.1.109, repr=<light_malib.training.training_manager.TrainingManager object at 0x7f2ff2ba04f0>)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/decorator.py", line 22, in wrapper
return func(self, *args, **kwargs)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/training_manager.py", line 146, in train
statistics_list = ray.get(
ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=47609, ip=192.168.1.109, repr=<light_malib.training.distributed_trainer.DistributedTrainer object at 0x7f8bbeab0d60>)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/distributed_trainer.py", line 200, in optimize
training_info = self.trainer.optimize(batch)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
tmp_opt_result = self.loss(mini_batch)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/common/loss_func.py", line 70, in call
return tensor_cast(
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/general.py", line 110, in wrap
rets = func(*new_args, **kwargs)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
values, action_log_probs, dist_entropy = self._evaluate_actions(
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
dist = torch.distributions.Categorical(logits=logits)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/categorical.py", line 66, in init
super().init(batch_shape, validate_args=validate_args)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/distribution.py", line 62, in init
raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (40000, 19)) of distribution Categorical(logits: torch.Size([40000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=)
`

i am not sure if it was a hardware issure, so i tried training with just one TITAN X, but it still generated the following error message:

`
(/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF) lxd@lxd-T630:/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football$ python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml
[2023-09-28 09:55:44,004][WARNING] No active cluster detected, will create local ray instance.
[2023-09-28 09:55:52,920][WARNING] ============== Cluster Info ==============
{'node_ip_address': '192.168.1.109', 'raylet_ip_address': '192.168.1.109', 'redis_address': '192.168.1.109:6379', 'object_store_address': '/tmp/ray/session_2023-09-28_09-55-44_005995_37830/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-09-28_09-55-44_005995_37830/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-09-28_09-55-44_005995_37830', 'metrics_export_port': 58593, 'node_id': '0b4c8573ddd5462ff763c6db9c7b0cd22dbe01d81d14b7398a7e5ece'}
[2023-09-28 09:55:52,923][WARNING] * cluster resources:
{'object_store_memory': 17818028851.0, 'GPU': 2.0, 'accelerator_type:G': 1.0, 'node:192.168.1.109': 1.0, 'memory': 35636057703.0, 'CPU': 48.0}
[2023-09-28 09:55:52,923][WARNING] this worker ip: 192.168.1.109
[2023-09-28 09:55:52,924][WARNING] Automatically set master ip to local ip address: 192.168.1.109
[2023-09-28 09:55:54,333][INFO] AgentManager initialized
[2023-09-28 09:55:54,366][WARNING] use meta solver type: nash
[2023-09-28 09:55:54,844][INFO] PBTRunner psro initialized
[2023-09-28 09:55:54,845][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1
[2023-09-28 09:55:54,849][WARNING] use model type: gr_football.built_in_11
(pid=37950) [2023-09-28 09:55:57,624][INFO] PolicyServer initialized
(pid=37956) [2023-09-28 09:55:57,675][INFO] DataServer initialized
[2023-09-28 09:55:58,195][INFO] Load initial policy built_in_11 from light_malib/trained_models/gr_football/11_vs_11/built_in
[2023-09-28 09:55:58,210][WARNING] use model type: gr_football.basic_11
[2023-09-28 09:55:58,257][WARNING] agent_0: agent_0-default-0 is initialized from random
[2023-09-28 09:55:58,257][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-09-28 09:55:58,286][WARNING] after initialization:

policy_ids: ['built_in_11', 'agent_0-default-0'] populations:

policy_ids:['built_in_11', 'agent_0-default-0']

policy_ids:['built_in_11', 'agent_0-default-0']

[2023-09-28 09:55:58,287][WARNING] Evaluation rollouts (num: 50) for 3 policy combinations: [{'agent_0': {'built_in_11': 1.0}, 'agent_1': {'built_in_11': 1.0}}, {'agent_0': {'built_in_11': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}]
(pid=37940) [2023-09-28 09:55:58,899][INFO] TrainingManager initialized
(pid=37954) [2023-09-28 09:55:58,891][INFO] RolloutManager initialized
(pid=37970) [2023-09-28 09:56:08,109][INFO] trainer_0 (local rank: 0) initialized
(pid=37957) [2023-09-28 09:56:08,385][INFO] DataPrefetcher initialized
Elo = dict_items([('built_in_11', 1015.3241542955467), ('agent_0-default-0', 984.6758457044533)])
[2023-09-28 10:07:43,192][INFO] policy_data: [('built_in_11', 'built_in_11'):{'payoff': 0.0, 'score': 0.5, 'win': 0.27, 'lose': 0.27, 'my_goal': 0.5, 'goal_diff': 0.0}],[('built_in_11', 'agent_0-default-0'):{'payoff': 0.9807692307692307, 'score': 0.9903846153846154, 'win': 0.9807692307692308, 'lose': 0.0, 'my_goal': 4.035256410256411, 'goal_diff': 4.035256410256411}],[('agent_0-default-0', 'built_in_11'):{'payoff': -0.9807692307692308, 'score': 0.009615384615384616, 'win': 0.0, 'lose': 0.9807692307692308, 'my_goal': 0.0, 'goal_diff': -4.035256410256411}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 5.551115123125783e-17, 'score': 0.5, 'win': 0.29000000000000004, 'lose': 0.29000000000000004, 'my_goal': 0.44, 'goal_diff': 0.0}],
(pid=37960) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:59: UserWarning: Starting a Matplotlib GUI outside of the main thread will likely fail.
(pid=37960) fig = plt.figure()
(pid=37960) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:63: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=37960) ax.set_xticklabels([""] + xpid, rotation=90)
(pid=37960) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:64: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=37960) ax.set_yticklabels([""] + ypid)
[2023-09-28 10:07:43,815][INFO] payoff table:
+-------------+---------------+-------------+
| | built_in_11 | default-0 |
+=============+===============+=============+
| built_in_11 | +0 | +98 |
+-------------+---------------+-------------+
| default-0 | -98 | +0 |
+-------------+---------------+-------------+
[2023-09-28 10:07:43,816][INFO] default-0's top 10 worst opponents are:
+-------------+----------+
| policy_id | payoff |
+=============+==========+
| built_in_11 | -98.08 |
+-------------+----------+
| default-0 | +0.00 |
+-------------+----------+
[2023-09-28 10:07:56,080][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0
[2023-09-28 10:07:56,081][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-09-28 10:07:56,107][WARNING] ********** Generation[0] Agent[agent_0] START **********
[2023-09-28 10:07:56,107][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7fd043b98ac0>, kwargs={})
(pid=37956) [2023-09-28 10:07:56,125][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 8}}
(pid=37956) [2023-09-28 10:07:56,129][INFO] DataServer created data table agent_0-default-1
(pid=37954) [2023-09-28 10:07:56,159][INFO] Rollout 1
(pid=37970) [2023-09-28 10:07:56,375][INFO] local_rank: 0 cuda_visible_devices:0
(pid=37970) [2023-09-28 10:07:57,988][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7fb385f97460>, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:192.168.1.109': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=37970) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py:53: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/torch/csrc/utils/tensor_numpy.cpp:206.)
(pid=37970) value = torch.FloatTensor(value)
(pid=37954) [2023-09-28 10:09:29,829][WARNING] save the best model(average reward:-5103.75,average win:0.0)
(pid=37954) [2023-09-28 10:09:29,896][INFO] Rollout 2
(pid=37954) [2023-09-28 10:11:04,900][WARNING] save the best model(average reward:-3472.5,average win:0.0)
(pid=37954) [2023-09-28 10:11:04,950][INFO] Rollout 3
(pid=37954) [2023-09-28 10:12:38,904][WARNING] save the best model(average reward:-2661.875,average win:0.0)
(pid=37954) [2023-09-28 10:12:38,938][INFO] Rollout 4
(pid=37954) [2023-09-28 10:14:12,399][WARNING] save the best model(average reward:-2166.5,average win:0.0)
(pid=37954) [2023-09-28 10:14:12,440][INFO] Rollout 5
(pid=37960) Exception ignored in: <function Image.del at 0x7f7c80696550>
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/init.py", line 4016, in del
(pid=37960) self.tk.call('image', 'delete', self.name)
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37960) Exception ignored in: <function Variable.del at 0x7f7c806dec10>
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/init.py", line 351, in del
(pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37970) [2023-09-28 10:15:54,407][WARNING] queue is full. May have bugs in training.
(pid=37960) Exception ignored in: <function Variable.del at 0x7f7c806dec10>
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/init.py", line 351, in del
(pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37960) Exception ignored in: <function Variable.del at 0x7f7c806dec10>
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/init.py", line 351, in del
(pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37960) Exception ignored in: <function Variable.del at 0x7f7c806dec10>
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/init.py", line 351, in del
(pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37954) [2023-09-28 10:15:57,987][WARNING] save the best model(average reward:-1838.75,average win:0.0)
(pid=37954) [2023-09-28 10:15:58,037][INFO] Rollout 6
(pid=37954) [2023-09-28 10:17:20,960][WARNING] save the best model(average reward:-1609.642857142857,average win:0.0)
(pid=37954) [2023-09-28 10:17:21,004][INFO] Rollout 7
(pid=37954) [2023-09-28 10:18:54,245][WARNING] save the best model(average reward:-1433.125,average win:0.0)
(pid=37954) [2023-09-28 10:18:54,289][INFO] Rollout 8
(pid=37954) [2023-09-28 10:20:04,518][INFO] Saving model agent_0 agent_0-default-1 8 to /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/./logs/gr_football/10_vs_10_psro/2023-09-28-09-55-52/agent_0/agent_0-default-1/8
Traceback (most recent call last):
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 126, in
main()
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 114, in main
runner.run()
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/framework/pbt_runner.py", line 106, in run
ray.get(training_task_ref)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/worker.py", line 1625, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=37940, ip=192.168.1.109, repr=<light_malib.training.training_manager.TrainingManager object at 0x7efa6f4cd4c0>)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/decorator.py", line 22, in wrapper
return func(self, *args, **kwargs)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/training_manager.py", line 146, in train
statistics_list = ray.get(
ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=37970, ip=192.168.1.109, repr=<light_malib.training.distributed_trainer.DistributedTrainer object at 0x7fae7d95fd90>)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/distributed_trainer.py", line 200, in optimize
training_info = self.trainer.optimize(batch)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
tmp_opt_result = self.loss(mini_batch)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/common/loss_func.py", line 70, in call
return tensor_cast(
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/general.py", line 110, in wrap
rets = func(*new_args, **kwargs)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
values, action_log_probs, dist_entropy = self._evaluate_actions(
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
dist = torch.distributions.Categorical(logits=logits)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/categorical.py", line 66, in init
super().init(batch_shape, validate_args=validate_args)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/distribution.py", line 62, in init
raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (80000, 19)) of distribution Categorical(logits: torch.Size([80000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=)
`
do you know why this happened?

@YanSong97
Copy link
Collaborator

Hi Jay:

What worker_num and batch size you used? Have you tried difference values?

@Jay-Vim-Lv
Copy link
Author

num_workers=20 or 30
batch_size=8 or 32 or else
nothing else has been changed

@qyh-stbz
Copy link

I'm also getting the same error

@ZHQ-air
Copy link

ZHQ-air commented Nov 7, 2023

Hi, I have also encountered the similar problem as Jay-Vim-Lv, and do you konw how to solve this problem. The error information is as follows(错误输出信息如下所示):

(light-malib) zhq@zhq-Taitan:~/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1$ python main_pbt.py --config light_malib/expr/gr_football/expr_5_vs_5_psro.yaml
[2023-11-07 16:02:59,921][WARNING] No active cluster detected, will create local ray instance.
[2023-11-07 16:03:01,223][WARNING] ============== Cluster Info ==============
{'node_ip_address': '10.1.80.147', 'raylet_ip_address': '10.1.80.147', 'redis_address': '10.1.80.147:6379', 'object_store_address': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030', 'metrics_export_port': 60763, 'node_id': '2841381510c4b1ba545ad7dcb7719998de2b9228147bcb839aa9b7d0'}
[2023-11-07 16:03:01,227][WARNING] * cluster resources:
{'accelerator_type:G': 1.0, 'memory': 37538726708.0, 'GPU': 1.0, 'CPU': 12.0, 'object_store_memory': 18769363353.0, 'node:10.1.80.147': 1.0}
[2023-11-07 16:03:01,228][WARNING] this worker ip: 10.1.80.147
[2023-11-07 16:03:01,232][WARNING] Automatically set master ip to local ip address: 10.1.80.147
[2023-11-07 16:03:01,747][INFO] AgentManager initialized
[2023-11-07 16:03:01,754][WARNING] use meta solver type: nash
[2023-11-07 16:03:01,839][INFO] PBTRunner psro initialized
[2023-11-07 16:03:01,839][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1
[2023-11-07 16:03:01,840][WARNING] use model type: gr_football.built_in_5
(pid=687111) [2023-11-07 16:03:02,545][INFO] DataServer initialized
(pid=687117) [2023-11-07 16:03:02,596][INFO] PolicyServer initialized
[2023-11-07 16:03:02,694][INFO] Load initial policy built_in_5 from light_malib/trained_models/gr_football/5_vs_5/built_in
[2023-11-07 16:03:02,696][WARNING] use model type: gr_football.basic_5
[2023-11-07 16:03:02,704][WARNING] agent_0: agent_0-default-0 is initialized from random
[2023-11-07 16:03:02,704][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-11-07 16:03:02,716][WARNING] after initialization:

<A agent_0>
policy_ids:
['built_in_5', 'agent_0-default-0']
populations:
<P __all__> policy_ids:['built_in_5', 'agent_0-default-0']<P default> policy_ids:['built_in_5', 'agent_0-default-0']

[2023-11-07 16:03:02,716][WARNING] Evaluation rollouts (num: 5) for 3 policy combinations: [{'agent_0': {'built_in_5': 1.0}, 'agent_1': {'built_in_5': 1.0}}, {'agent_0': {'built_in_5': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}]
(pid=687118) [2023-11-07 16:03:02,857][INFO] RolloutManager initialized
(pid=687114) [2023-11-07 16:03:03,011][INFO] TrainingManager initialized
(pid=687108) [2023-11-07 16:03:04,067][INFO] trainer_0 (local rank: 0) initialized
(pid=687107) [2023-11-07 16:03:04,142][INFO] DataPrefetcher initialized
Elo = dict_items([('agent_0-default-0', 984.368153396761), ('built_in_5', 1015.631846603239)])
[2023-11-07 16:04:22,723][INFO] policy_data: [('built_in_5', 'built_in_5'):{'payoff': 0.0, 'score': 0.5, 'win': 0.1, 'lose': 0.1, 'my_goal': 0.2, 'goal_diff': 0.0}],[('built_in_5', 'agent_0-default-0'):{'payoff': 1.0, 'score': 1.0, 'win': 1.0, 'lose': 0.0, 'my_goal': 1.5, 'goal_diff': 1.5}],[('agent_0-default-0', 'built_in_5'):{'payoff': -1.0, 'score': 0.0, 'win': 0.0, 'lose': 1.0, 'my_goal': 0.0, 'goal_diff': -1.5}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 0.0, 'score': 0.5, 'win': 0.4, 'lose': 0.4, 'my_goal': 0.5, 'goal_diff': 0.0}],
(pid=687113) /home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/monitor/monitor.py:66: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=687113)   ax.set_xticklabels([""] + xpid, rotation=90)
(pid=687113) /home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/monitor/monitor.py:67: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=687113)   ax.set_yticklabels([""] + ypid)
[2023-11-07 16:04:22,839][INFO] payoff table:
+------------+--------------+-------------+
|            |   built_in_5 |   default-0 |
+============+==============+=============+
| built_in_5 |           +0 |        +100 |
+------------+--------------+-------------+
| default-0  |         -100 |          +0 |
+------------+--------------+-------------+
[2023-11-07 16:04:22,839][INFO] default-0's top 10 worst opponents are:
+-------------+----------+
| policy_id   |   payoff |
+=============+==========+
| built_in_5  |  -100.00 |
+-------------+----------+
| default-0   |    +0.00 |
+-------------+----------+
[2023-11-07 16:04:28,836][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0
[2023-11-07 16:04:28,836][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-11-07 16:04:28,843][WARNING] ********** Generation[0] Agent[agent_0] START **********
[2023-11-07 16:04:28,843][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_5', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7fe0f9dfe970>, kwargs={})
(pid=687111) [2023-11-07 16:04:28,870][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 24}}
(pid=687111) [2023-11-07 16:04:28,872][INFO] DataServer created data table agent_0-default-1
(pid=687118) [2023-11-07 16:04:28,879][INFO] Rollout 1
(pid=687108) [2023-11-07 16:04:28,890][INFO] local_rank: 0 cuda_visible_devices:0
(pid=687108) [2023-11-07 16:04:30,187][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_5', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7f135e001dc0>, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:10.1.80.147': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=687118) [2023-11-07 16:04:55,473][WARNING] save the best model(average reward:-5020.0,average win:0.0)
(pid=687118) [2023-11-07 16:04:55,495][INFO] Rollout 2
(pid=687118) [2023-11-07 16:05:21,904][WARNING] save the best model(average reward:-3370.6666666666665,average win:0.0)
(pid=687118) [2023-11-07 16:05:21,925][INFO] Rollout 3
(pid=687118) [2023-11-07 16:05:53,902][WARNING] save the best model(average reward:-2539.0,average win:0.0)
(pid=687118) [2023-11-07 16:05:53,923][INFO] Rollout 4
(pid=687118) [2023-11-07 16:06:24,837][WARNING] save the best model(average reward:-2048.0,average win:0.0)
(pid=687118) [2023-11-07 16:06:24,856][INFO] Rollout 5
(pid=687118) [2023-11-07 16:06:55,875][WARNING] save the best model(average reward:-1714.0,average win:0.0)
(pid=687118) [2023-11-07 16:06:55,894][INFO] Rollout 6
(pid=687118) [2023-11-07 16:07:26,569][WARNING] save the best model(average reward:-1477.142857142857,average win:0.0)
(pid=687118) [2023-11-07 16:07:26,605][INFO] Rollout 7
(pid=687118) [2023-11-07 16:07:57,201][WARNING] save the best model(average reward:-1300.5,average win:0.0)
(pid=687118) [2023-11-07 16:07:57,220][INFO] Rollout 8
(pid=687118) [2023-11-07 16:08:28,308][WARNING] save the best model(average reward:-1162.6666666666667,average win:0.0)
(pid=687118) [2023-11-07 16:08:28,330][INFO] Rollout 9
(pid=687118) [2023-11-07 16:08:58,670][WARNING] save the best model(average reward:-1054.0,average win:0.0)
(pid=687118) [2023-11-07 16:08:58,688][INFO] Rollout 10
(pid=687118) [2023-11-07 16:09:30,212][WARNING] save the best model(average reward:-960.7272727272727,average win:0.0)
(pid=687118) [2023-11-07 16:09:30,234][INFO] Rollout 11
(pid=687118) [2023-11-07 16:10:00,340][WARNING] save the best model(average reward:-883.6666666666666,average win:0.0)
(pid=687118) [2023-11-07 16:10:00,362][INFO] Rollout 12
(pid=687118) [2023-11-07 16:10:32,308][WARNING] save the best model(average reward:-818.1538461538462,average win:0.0)
(pid=687118) [2023-11-07 16:10:32,333][INFO] Rollout 13
(pid=687118) [2023-11-07 16:11:04,471][WARNING] save the best model(average reward:-762.2857142857143,average win:0.0)
(pid=687118) [2023-11-07 16:11:04,495][INFO] Rollout 14
(pid=687118) [2023-11-07 16:11:34,548][WARNING] save the best model(average reward:-713.6,average win:0.0)
(pid=687118) [2023-11-07 16:11:34,572][INFO] Rollout 15
(pid=687118) [2023-11-07 16:12:05,414][WARNING] save the best model(average reward:-672.25,average win:0.0)
(pid=687118) [2023-11-07 16:12:05,435][INFO] Rollout 16
(pid=687118) [2023-11-07 16:12:35,794][WARNING] save the best model(average reward:-635.5294117647059,average win:0.0)
(pid=687118) [2023-11-07 16:12:35,812][INFO] Rollout 17
(pid=687118) [2023-11-07 16:13:05,773][WARNING] save the best model(average reward:-602.8888888888889,average win:0.0)
(pid=687118) [2023-11-07 16:13:05,796][INFO] Rollout 18
(pid=687118) [2023-11-07 16:13:36,861][WARNING] save the best model(average reward:-574.3157894736842,average win:0.0)
(pid=687118) [2023-11-07 16:13:36,877][INFO] Rollout 19
(pid=687118) [2023-11-07 16:14:07,636][WARNING] save the best model(average reward:-547.4,average win:0.0)
(pid=687118) [2023-11-07 16:14:07,653][INFO] Rollout 20
(pid=687118) [2023-11-07 16:14:38,884][WARNING] save the best model(average reward:-48.8,average win:0.0)
(pid=687118) [2023-11-07 16:14:38,905][INFO] Rollout 21
(pid=687118) [2023-11-07 16:15:08,913][WARNING] save the best model(average reward:-48.6,average win:0.0)
(pid=687118) [2023-11-07 16:15:08,931][INFO] Rollout 22
(pid=687118) [2023-11-07 16:15:38,800][WARNING] save the best model(average reward:-48.2,average win:0.0)
(pid=687118) [2023-11-07 16:15:38,820][INFO] Rollout 23
(pid=687118) [2023-11-07 16:16:07,657][INFO] Rollout 24
(pid=687118) [2023-11-07 16:16:38,027][WARNING] save the best model(average reward:-47.0,average win:0.0)
(pid=687118) [2023-11-07 16:16:38,044][INFO] Rollout 25
(pid=687118) [2023-11-07 16:17:07,691][INFO] Rollout 26
.
.
.(pid=687118) [2023-11-07 18:14:16,224][INFO] Rollout 264
(pid=687118) [2023-11-07 18:14:47,594][INFO] Rollout 265
Traceback (most recent call last):
  File "main_pbt.py", line 126, in <module>
    main()
  File "main_pbt.py", line 114, in main
    runner.run()
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/framework/pbt_runner.py", line 111, in run
    ray.get(training_task_ref)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/ray/worker.py", line 1625, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=687114, ip=10.1.80.147, repr=<light_malib.training.training_manager.TrainingManager object at 0x7fd1fa036340>)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/utils/decorator.py", line 22, in wrapper
    return func(self, *args, **kwargs)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/training/training_manager.py", line 146, in train
    statistics_list = ray.get(
ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=687108, ip=10.1.80.147, repr=<light_malib.training.distributed_trainer.DistributedTrainer object at 0x7f135e001b80>)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/training/distributed_trainer.py", line 200, in optimize
    training_info = self.trainer.optimize(batch)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
    tmp_opt_result = self.loss(mini_batch)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/common/loss_func.py", line 70, in __call__
    return tensor_cast(
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/utils/general.py", line 110, in wrap
    rets = func(*new_args, **kwargs)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
    values, action_log_probs, dist_entropy = self._evaluate_actions(
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
    dist = torch.distributions.Categorical(logits=logits)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/torch/distributions/categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/torch/distributions/distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (96000, 19)) of distribution Categorical(logits: torch.Size([96000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<SubBackward0>)



@YanSong97
Copy link
Collaborator

Hi, I have just uploaded a demo config. Feel free to try it out.

Also, my local pytorch version is at 1.13.0 and I cannot reproduce this error. Which pytorch version are you using?

@ZHQ-air
Copy link

ZHQ-air commented Nov 11, 2023

Thank you very much for your response. This error does not happen again when I used the expr_10_vs_10_psro.yaml, where I set the batch_size=100 and num_works=5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants