-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: RuntimeError with tensor_parallel_size > 1 in Process Bootstrapping Phase #5637
Comments
We are seeing the same error for LLM, LLMEngine, and AsyncLLMEngine. Interestingly, we find wrapping everything in the python script within For
Running
For
Running
Note that there is a CUDA IPC Tensor Error and a Leaked Semaphore Objects error at the end, but at least the text generation task can be successfully completed. This might be related to unprotected usage of python's concurrent.futures library, although the only place I could find vllm using this library is here. |
Just checked. For v0.4.3, default backend (ray) works, albeit with a small error |
cc @njhill |
hi, folks, can you try to set an environment variable |
I also encountered a similar error above, adding this environment variable resolved it. |
I attempted to set the environment variable export VLLM_WORKER_MULTIPROC_METHOD=fork as suggested and reran the VLLM application. Unfortunately, I'm still encountering errors. This time, a KeyError occurred in the multiprocessing.resource_tracker module, indicating a potential issue with process management under the fork start method. The traceback highlights a removal operation on a missing key in a resource cache. Here’s the relevant part of the error message: (llm) z5327441@k091:/scratch/pbs.5466392.kman.restech.unsw.edu.au $ export VLLM_WORKER_MULTIPROC_METHOD=fork
(llm) z5327441@k091:/scratch/pbs.5466392.kman.restech.unsw.edu.au $ python test.py
2024-06-19 20:27:18,261 INFO worker.py:1568 -- Connecting to existing Ray cluster at address: 10.197.40.91:6379...
2024-06-19 20:27:18,269 INFO worker.py:1744 -- Connected to Ray cluster. View the dashboard at 10.197.40.91:8265
INFO 06-19 20:27:18 config.py:623] Defaulting to use mp for distributed inference
INFO 06-19 20:27:18 llm_engine.py:161] Initializing an LLM engine (v0.5.0.post1) with config: model='/scratch/pbs.5466392.kman.restech.unsw.edu.au/models/Meta-Llama-3-8B-Instruct', speculative_config=None, tokenizer='/scratch/pbs.5466392.kman.restech.unsw.edu.au/models/Meta-Llama-3-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=/scratch/pbs.5466392.kman.restech.unsw.edu.au/models/Meta-Llama-3-8B-Instruct)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
(VllmWorkerProcess pid=3599698) INFO 06-19 20:27:20 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=3599698) INFO 06-19 20:27:20 utils.py:637] Found nccl from library libnccl.so.2
INFO 06-19 20:27:20 utils.py:637] Found nccl from library libnccl.so.2
INFO 06-19 20:27:20 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=3599698) INFO 06-19 20:27:20 pynccl.py:63] vLLM is using nccl==2.20.5
INFO 06-19 20:27:20 custom_all_reduce_utils.py:170] generating GPU P2P access cache in /home/z5327441/.config/vllm/gpu_p2p_access_cache_for_7,6,5,4,3,2,1,0.json
2024-06-19 20:27:22,430 INFO worker.py:1568 -- Connecting to existing Ray cluster at address: 10.197.40.91:6379...
2024-06-19 20:27:22,438 INFO worker.py:1744 -- Connected to Ray cluster. View the dashboard at 10.197.40.91:8265
2024-06-19 20:27:22,475 INFO worker.py:1568 -- Connecting to existing Ray cluster at address: 10.197.40.91:6379...
2024-06-19 20:27:22,483 INFO worker.py:1744 -- Connected to Ray cluster. View the dashboard at 10.197.40.91:8265
INFO 06-19 20:27:22 config.py:623] Defaulting to use mp for distributed inference
INFO 06-19 20:27:22 llm_engine.py:161] Initializing an LLM engine (v0.5.0.post1) with config: model='/scratch/pbs.5466392.kman.restech.unsw.edu.au/models/Meta-Llama-3-8B-Instruct', speculative_config=None, tokenizer='/scratch/pbs.5466392.kman.restech.unsw.edu.au/models/Meta-Llama-3-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=/scratch/pbs.5466392.kman.restech.unsw.edu.au/models/Meta-Llama-3-8B-Instruct)
INFO 06-19 20:27:22 config.py:623] Defaulting to use mp for distributed inference
INFO 06-19 20:27:22 llm_engine.py:161] Initializing an LLM engine (v0.5.0.post1) with config: model='/scratch/pbs.5466392.kman.restech.unsw.edu.au/models/Meta-Llama-3-8B-Instruct', speculative_config=None, tokenizer='/scratch/pbs.5466392.kman.restech.unsw.edu.au/models/Meta-Llama-3-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=/scratch/pbs.5466392.kman.restech.unsw.edu.au/models/Meta-Llama-3-8B-Instruct)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
(VllmWorkerProcess pid=3600016) INFO 06-19 20:27:23 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=3600019) INFO 06-19 20:27:23 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
INFO 06-19 20:27:23 utils.py:637] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=3600016) INFO 06-19 20:27:23 utils.py:637] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=3600016) INFO 06-19 20:27:23 pynccl.py:63] vLLM is using nccl==2.20.5
INFO 06-19 20:27:23 pynccl.py:63] vLLM is using nccl==2.20.5
INFO 06-19 20:27:23 utils.py:637] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=3600019) INFO 06-19 20:27:23 utils.py:637] Found nccl from library libnccl.so.2
INFO 06-19 20:27:23 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=3600019) INFO 06-19 20:27:23 pynccl.py:63] vLLM is using nccl==2.20.5
Traceback (most recent call last):
File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/resource_tracker.py", line 209, in main
cache[rtype].remove(name)
KeyError: '/psm_deecb814'
INFO 06-19 20:27:24 custom_all_reduce_utils.py:170] generating GPU P2P access cache in /home/z5327441/.config/vllm/gpu_p2p_access_cache_for_7,6,5,4,3,2,1,0.json
[rank0]: Traceback (most recent call last):
[rank0]: File "<string>", line 1, in <module>
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
[rank0]: exitcode = _main(fd, parent_sentinel)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
[rank0]: prepare(preparation_data)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
[rank0]: _fixup_main_from_path(data['init_main_from_path'])
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
[rank0]: main_content = runpy.run_path(main_path,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/runpy.py", line 289, in run_path
[rank0]: return _run_module_code(code, init_globals, run_name,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/runpy.py", line 96, in _run_module_code
[rank0]: _run_code(code, mod_globals, init_globals,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/test.py", line 13, in <module>
[rank0]: llm = LLM(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 144, in __init__
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 363, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 223, in __init__
[rank0]: self.model_executor = executor_class(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
[rank0]: self._init_executor()
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 65, in _init_executor
[rank0]: self._run_workers("init_device")
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 119, in _run_workers
[rank0]: driver_worker_output = driver_worker_method(*args, **kwargs)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/worker/worker.py", line 115, in init_device
[rank0]: init_worker_distributed_environment(self.parallel_config, self.rank,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/worker/worker.py", line 357, in init_worker_distributed_environment
[rank0]: ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 655, in ensure_model_parallel_initialized
[rank0]: initialize_model_parallel(tensor_model_parallel_size,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 616, in initialize_model_parallel
[rank0]: _TP = GroupCoordinator(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 157, in __init__
[rank0]: self.ca_comm = CustomAllreduce(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 174, in __init__
[rank0]: if not _can_p2p(rank, world_size):
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 78, in _can_p2p
[rank0]: if not gpu_p2p_access_check(rank, i):
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/device_communicators/custom_all_reduce_utils.py", line 174, in gpu_p2p_access_check
[rank0]: cache[f"{_i}->{_j}"] = can_actually_p2p(_i, _j)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/device_communicators/custom_all_reduce_utils.py", line 123, in can_actually_p2p
[rank0]: pi.start()
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/process.py", line 121, in start
[rank0]: self._popen = self._Popen(self)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
[rank0]: return Popen(process_obj)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
[rank0]: super().__init__(process_obj)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
[rank0]: self._launch(process_obj)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
[rank0]: prep_data = spawn.get_preparation_data(process_obj._name)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
[rank0]: _check_not_importing_main()
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
[rank0]: raise RuntimeError('''
[rank0]: RuntimeError:
[rank0]: An attempt has been made to start a new process before the
[rank0]: current process has finished its bootstrapping phase.
[rank0]: This probably means that you are not using fork to start your
[rank0]: child processes and you have forgotten to use the proper idiom
[rank0]: in the main module:
[rank0]: if __name__ == '__main__':
[rank0]: freeze_support()
[rank0]: ...
[rank0]: The "freeze_support()" line can be omitted if the program
[rank0]: is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/resource_tracker.py", line 209, in main
cache[rtype].remove(name)
KeyError: '/psm_2038dbcf'
INFO 06-19 20:27:24 custom_all_reduce_utils.py:170] generating GPU P2P access cache in /home/z5327441/.config/vllm/gpu_p2p_access_cache_for_7,6,5,4,3,2,1,0.json
[rank0]: Traceback (most recent call last):
[rank0]: File "<string>", line 1, in <module>
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
[rank0]: exitcode = _main(fd, parent_sentinel)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
[rank0]: prepare(preparation_data)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
[rank0]: _fixup_main_from_path(data['init_main_from_path'])
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
[rank0]: main_content = runpy.run_path(main_path,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/runpy.py", line 289, in run_path
[rank0]: return _run_module_code(code, init_globals, run_name,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/runpy.py", line 96, in _run_module_code
[rank0]: _run_code(code, mod_globals, init_globals,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/test.py", line 13, in <module>
[rank0]: llm = LLM(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 144, in __init__
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 363, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 223, in __init__
[rank0]: self.model_executor = executor_class(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
[rank0]: self._init_executor()
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 65, in _init_executor
[rank0]: self._run_workers("init_device")
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 119, in _run_workers
[rank0]: driver_worker_output = driver_worker_method(*args, **kwargs)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/worker/worker.py", line 115, in init_device
[rank0]: init_worker_distributed_environment(self.parallel_config, self.rank,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/worker/worker.py", line 357, in init_worker_distributed_environment
[rank0]: ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 655, in ensure_model_parallel_initialized
[rank0]: initialize_model_parallel(tensor_model_parallel_size,
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 616, in initialize_model_parallel
[rank0]: _TP = GroupCoordinator(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 157, in __init__
[rank0]: self.ca_comm = CustomAllreduce(
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 174, in __init__
[rank0]: if not _can_p2p(rank, world_size):
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 78, in _can_p2p
[rank0]: if not gpu_p2p_access_check(rank, i):
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/device_communicators/custom_all_reduce_utils.py", line 174, in gpu_p2p_access_check
[rank0]: cache[f"{_i}->{_j}"] = can_actually_p2p(_i, _j)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/site-packages/vllm/distributed/device_communicators/custom_all_reduce_utils.py", line 123, in can_actually_p2p
[rank0]: pi.start()
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/process.py", line 121, in start
[rank0]: self._popen = self._Popen(self)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
[rank0]: return Popen(process_obj)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
[rank0]: super().__init__(process_obj)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
[rank0]: self._launch(process_obj)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
[rank0]: prep_data = spawn.get_preparation_data(process_obj._name)
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
[rank0]: _check_not_importing_main()
[rank0]: File "/scratch/pbs.5466392.kman.restech.unsw.edu.au/miniforge3/envs/llm/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
[rank0]: raise RuntimeError('''
[rank0]: RuntimeError:
[rank0]: An attempt has been made to start a new process before the
[rank0]: current process has finished its bootstrapping phase.
[rank0]: This probably means that you are not using fork to start your
[rank0]: child processes and you have forgotten to use the proper idiom
[rank0]: in the main module:
[rank0]: if __name__ == '__main__':
[rank0]: freeze_support()
[rank0]: ...
[rank0]: The "freeze_support()" line can be omitted if the program
[rank0]: is not going to be frozen to produce an executable.
*** SIGTERM received at time=1718792844 on cpu 39 ***
PC: @ 0x14ee9615645c (unknown) pthread_cond_wait@@GLIBC_2.3.2
@ 0x14ee9615acf0 (unknown) (unknown)
[2024-06-19 20:27:24,847 E 3600016 3599783] logging.cc:343: *** SIGTERM received at time=1718792844 on cpu 39 ***
[2024-06-19 20:27:24,847 E 3600016 3599783] logging.cc:343: PC: @ 0x14ee9615645c (unknown) pthread_cond_wait@@GLIBC_2.3.2
[2024-06-19 20:27:24,847 E 3600016 3599783] logging.cc:343: @ 0x14ee9615acf0 (unknown) (unknown)
*** SIGTERM received at time=1718792844 on cpu 37 ***
PC: @ 0x14bb61b8c45c (unknown) pthread_cond_wait@@GLIBC_2.3.2
@ 0x14bb61b90cf0 (unknown) (unknown)
[2024-06-19 20:27:24,892 E 3600019 3599784] logging.cc:343: *** SIGTERM received at time=1718792844 on cpu 37 ***
[2024-06-19 20:27:24,892 E 3600019 3599784] logging.cc:343: PC: @ 0x14bb61b8c45c (unknown) pthread_cond_wait@@GLIBC_2.3.2
[2024-06-19 20:27:24,892 E 3600019 3599784] logging.cc:343: @ 0x14bb61b90cf0 (unknown) (unknown) |
@zixuzixu two separate issues:
|
For the latest commit in the main branch #5648 adding
For #5669, it works with or without
|
This is expected. #5669 contains fixes for the error you mentioned. |
After building from the latest code #5669, everything is working now! (I faced some challenges with setting the g++ version and the cudatoolkit version without sudo permissions, but I managed to resolve them.) Thank you so much for your help! I appreciate your efforts. Should I close this issue? |
you can keep it open until #5669 is merged. I have to fix some test failure there. |
Hi. Encountering the same issue. Is there a workaround? |
The might be three ways to fix this currently. This is fixed in the latest pull request but not in the latest build.
pip install vllm==0.4.3
git clone [email protected]:vllm-project/vllm.git
cd vllm
gh pr checkout 5669
pip install .
The issue has been addressed in the latest pull request, but it hasn't been included in a build yet. |
@zixuzixu Thank you. |
Without docker, vllm 0.4.3 works for me. |
Wrapping with main works for me! |
Any update on how to do this? |
@dipta007 wrapping you code under |
Your current environment
🐛 Describe the bug
Description
When setting
tensor_parallel_size
to a value greater than 1, the program gets stuck and raises aRuntimeError
related to the bootstrapping phase of new processes. This issue does not occur when using version v0.4.3, but persists in versions v0.5.0.post1 and v0.5.0.Steps to Reproduce
Install version v0.5.0.post1 or v0.5.0 of the library.
Run the following Python script with
tensor_parallel_size
set to 2:python load.py --model_path $JOBFS/fine_tuned_models/checkpoint-1857
Expected Behavior
The program should run without any issues regarding process bootstrapping, similar to how it behaves with version v0.4.3.
Observed Behavior
The program raises the following
RuntimeError
whentensor_parallel_size
is set to 2:Environment
Additional Context
Reverting back to version v0.4.3 resolves the issue. It appears there might be a change in how processes are handled in newer versions that could be causing this error.
The text was updated successfully, but these errors were encountered: