You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Jiahao000
Hello, after installing the environment according to the Readme file, the program encountered the following error when executing the run command. How can I run it correctly?
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). [W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use). [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address. Traceback (most recent call last): File "/home/gjj/anaconda3/envs/mosaicfusion/bin/torchrun", line 33, in <module> sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')()) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 241, in launch_agent result = agent.run() File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 723, in run result = self._invoke_run(role) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 858, in _invoke_run self._initialize_workers(self._worker_group) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 692, in _initialize_workers self._rendezvous(worker_group) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 546, in _rendezvous store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous() File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 55, in next_rendezvous self._store = TCPStore( # type: ignore[call-arg] RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use).
The text was updated successfully, but these errors were encountered:
@Jiahao000
Hello, after installing the environment according to the Readme file, the program encountered the following error when executing the run command. How can I run it correctly?
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). [W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use). [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address. Traceback (most recent call last): File "/home/gjj/anaconda3/envs/mosaicfusion/bin/torchrun", line 33, in <module> sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')()) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 241, in launch_agent result = agent.run() File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 723, in run result = self._invoke_run(role) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 858, in _invoke_run self._initialize_workers(self._worker_group) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 692, in _initialize_workers self._rendezvous(worker_group) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 546, in _rendezvous store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous() File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 55, in next_rendezvous self._store = TCPStore( # type: ignore[call-arg] RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use).
The text was updated successfully, but these errors were encountered: