[Usage]: Cant use vllm on a multiGPU node #10474

4k1s · 2024-11-20T05:36:31Z

Your current environment

How would you like to use vllm

model: meta-llama/Llama3-8B-Instruct
quantization: none
tensor_parallel_size: 2
GPUs: 2xA30
vllm: 0.6.4.post1v (also tried with 0.5.4 and 0.5.0)
strongly related to this issue: #6152

Can't run the script for multi GPUs (it works for a single GPU). Teh following error occurs:

(VllmWorkerProcess pid=13824) ERROR 11-20 12:20:01 multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I tried to set env variavle to "spawn" or use the latest vllm version:
os.environ["VLLM_WORKER_MULTIPROC_METHOD"]="spawn"

The problems is that the calling script runs the script again as a child process , but this is not the desired behavior. And as expected, the following error is thrown:

================================================================================
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

================================================================================
The code that loads the llm is

llm = LLM(model=model_path, device="auto", dtype="auto", max_model_len=8192, gpu_memory_utilization=GMU, tensor_parallel_size=gpu_units)

and for inference

sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_tokens, top_p=0.95)
results = model.generate(prompt , sampling_params)

More info:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INFO 11-20 13:28:39 llm_engine.py:249] Initializing an LLM engine (v0.6.4.post1) with config: model='models/meta-llama/Meta-Llama-3.1-8B-Instruct', speculative_config=None, tokenizer='models/meta-llama/Meta-Llama-3.1-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=models/meta-llama/Meta-Llama-3.1-8B-Instruct, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, chat_template_text_format=string, mm_processor_kwargs=None, pooler_config=None)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Same issue exists on different GPUs (V100). What is the problem here?

I attach the results of collect_env.py script.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

4k1s added the usage How to use vllm label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Cant use vllm on a multiGPU node #10474

[Usage]: Cant use vllm on a multiGPU node #10474

4k1s commented Nov 20, 2024 •

edited

Loading

[Usage]: Cant use vllm on a multiGPU node #10474

[Usage]: Cant use vllm on a multiGPU node #10474

Comments

4k1s commented Nov 20, 2024 • edited Loading

Your current environment

How would you like to use vllm

Before submitting a new issue...

4k1s commented Nov 20, 2024 •

edited

Loading