Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run error #2

Open
meilijunxi opened this issue Dec 27, 2024 · 0 comments
Open

Run error #2

meilijunxi opened this issue Dec 27, 2024 · 0 comments

Comments

@meilijunxi
Copy link

@Jiahao000
Hello, after installing the environment according to the Readme file, the program encountered the following error when executing the run command. How can I run it correctly?

[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). [W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use). [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address. Traceback (most recent call last): File "/home/gjj/anaconda3/envs/mosaicfusion/bin/torchrun", line 33, in <module> sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')()) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 241, in launch_agent result = agent.run() File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 723, in run result = self._invoke_run(role) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 858, in _invoke_run self._initialize_workers(self._worker_group) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 692, in _initialize_workers self._rendezvous(worker_group) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 546, in _rendezvous store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous() File "/home/gjj/anaconda3/envs/mosaicfusion/lib/python3.8/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 55, in next_rendezvous self._store = TCPStore( # type: ignore[call-arg] RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant