Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Cant use vllm on a multiGPU node #10474

Open
1 task done
4k1s opened this issue Nov 20, 2024 · 0 comments
Open
1 task done

[Usage]: Cant use vllm on a multiGPU node #10474

4k1s opened this issue Nov 20, 2024 · 0 comments
Labels
usage How to use vllm

Comments

@4k1s
Copy link

4k1s commented Nov 20, 2024

Your current environment

collect_env.txt

How would you like to use vllm

model: meta-llama/Llama3-8B-Instruct
quantization: none
tensor_parallel_size: 2
GPUs: 2xA30
vllm: 0.6.4.post1v (also tried with 0.5.4 and 0.5.0)
strongly related to this issue: #6152

Can't run the script for multi GPUs (it works for a single GPU). Teh following error occurs:

(VllmWorkerProcess pid=13824) ERROR 11-20 12:20:01 multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I tried to set env variavle to "spawn" or use the latest vllm version:
os.environ["VLLM_WORKER_MULTIPROC_METHOD"]="spawn"

The problems is that the calling script runs the script again as a child process , but this is not the desired behavior. And as expected, the following error is thrown:

================================================================================
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

================================================================================
The code that loads the llm is

llm = LLM(model=model_path, device="auto", dtype="auto", max_model_len=8192, gpu_memory_utilization=GMU, tensor_parallel_size=gpu_units)

and for inference

sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_tokens, top_p=0.95)
results = model.generate(prompt , sampling_params)

More info:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INFO 11-20 13:28:39 llm_engine.py:249] Initializing an LLM engine (v0.6.4.post1) with config: model='models/meta-llama/Meta-Llama-3.1-8B-Instruct', speculative_config=None, tokenizer='models/meta-llama/Meta-Llama-3.1-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=models/meta-llama/Meta-Llama-3.1-8B-Instruct, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, chat_template_text_format=string, mm_processor_kwargs=None, pooler_config=None)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Same issue exists on different GPUs (V100). What is the problem here?

I attach the results of collect_env.py script.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@4k1s 4k1s added the usage How to use vllm label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

1 participant