You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to follow the guide on running vLLM with docker, a ZeroDivisionError is raised.
The reason for this error is because the vLLM launcher file is using torch.cuda.is_available instead of torch.cuda.is_available() as a function, which causes the if condition to always evaluates as true even when the statement is false.
Error logs
2024-12-10T13:37:00,743 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-12-10T13:37:00,743 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 301, in
2024-12-10T13:37:00,743 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - worker.run_server()
2024-12-10T13:37:00,744 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 266, in run_server
2024-12-10T13:37:00,747 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.handle_connection_async(cl_socket)
2024-12-10T13:37:00,745 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.AsyncWorkerThread - 9000 Worker disconnected. WORKER_STARTED
2024-12-10T13:37:00,748 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 220, in handle_connection_async
2024-12-10T13:37:00,749 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service, result, code = self.load_model(msg)
2024-12-10T13:37:00,750 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 133, in load_model
2024-12-10T13:37:00,750 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service = model_loader.load(
2024-12-10T13:37:00,750 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 143, in load
2024-12-10T13:37:00,751 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - initialize_fn(service.context)
2024-12-10T13:37:00,751 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/torch_handler/vllm_handler.py", line 47, in initialize
2024-12-10T13:37:00,752 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.vllm_engine = AsyncLLMEngine.from_engine_args(vllm_engine_config)
2024-12-10T13:37:00,753 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 568, in from_engine_args
2024-12-10T13:37:00,754 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - engine_config = engine_args.create_engine_config()
2024-12-10T13:37:00,755 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 1030, in create_engine_config
2024-12-10T13:37:00,755 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - return EngineConfig(
2024-12-10T13:37:00,756 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "", line 14, in init
2024-12-10T13:37:00,757 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/config.py", line 1872, in post_init
2024-12-10T13:37:00,757 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.model_config.verify_with_parallel_config(self.parallel_config)
2024-12-10T13:37:00,758 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/config.py", line 407, in verify_with_parallel_config
2024-12-10T13:37:00,758 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - if total_num_attention_heads % tensor_parallel_size != 0:
2024-12-10T13:37:00,759 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - ZeroDivisionError: integer division or modulo by zero
2024-12-10T13:37:00,749 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.AsyncWorkerThread - Failed to send request to backend
Installation instructions
Yes I am using Docker follow the instructions mentioned here
Instead of the original ** model_id**, I am using mistralai/Pixtral-12B-2409
🐛 Describe the bug
When trying to follow the guide on running vLLM with docker, a ZeroDivisionError is raised.
The reason for this error is because the vLLM launcher file is using torch.cuda.is_available instead of torch.cuda.is_available() as a function, which causes the if condition to always evaluates as true even when the statement is false.
Error logs
2024-12-10T13:37:00,743 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-12-10T13:37:00,743 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 301, in
2024-12-10T13:37:00,743 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - worker.run_server()
2024-12-10T13:37:00,744 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 266, in run_server
2024-12-10T13:37:00,747 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.handle_connection_async(cl_socket)
2024-12-10T13:37:00,745 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.AsyncWorkerThread - 9000 Worker disconnected. WORKER_STARTED
2024-12-10T13:37:00,748 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 220, in handle_connection_async
2024-12-10T13:37:00,749 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service, result, code = self.load_model(msg)
2024-12-10T13:37:00,750 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 133, in load_model
2024-12-10T13:37:00,750 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service = model_loader.load(
2024-12-10T13:37:00,750 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 143, in load
2024-12-10T13:37:00,751 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - initialize_fn(service.context)
2024-12-10T13:37:00,751 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/torch_handler/vllm_handler.py", line 47, in initialize
2024-12-10T13:37:00,752 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.vllm_engine = AsyncLLMEngine.from_engine_args(vllm_engine_config)
2024-12-10T13:37:00,753 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 568, in from_engine_args
2024-12-10T13:37:00,754 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - engine_config = engine_args.create_engine_config()
2024-12-10T13:37:00,755 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 1030, in create_engine_config
2024-12-10T13:37:00,755 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - return EngineConfig(
2024-12-10T13:37:00,756 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "", line 14, in init
2024-12-10T13:37:00,757 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/config.py", line 1872, in post_init
2024-12-10T13:37:00,757 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.model_config.verify_with_parallel_config(self.parallel_config)
2024-12-10T13:37:00,758 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/config.py", line 407, in verify_with_parallel_config
2024-12-10T13:37:00,758 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - if total_num_attention_heads % tensor_parallel_size != 0:
2024-12-10T13:37:00,759 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - ZeroDivisionError: integer division or modulo by zero
2024-12-10T13:37:00,749 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.AsyncWorkerThread - Failed to send request to backend
Installation instructions
Yes I am using Docker follow the instructions mentioned here
Instead of the original ** model_id**, I am using mistralai/Pixtral-12B-2409
Command:
docker run --rm -ti --shm-size 10g -e HUGGING_FACE_HUB_TOKEN=$token -p 8089:8080 -v data:/data ts/vllm --model_id mistralai/Pixtral-12B-2409 --disable_token_auth
Model Packaging
https://github.com/pytorch/serve/tree/master?tab=readme-ov-file#-quick-start-llm-deployment-with-docker
config.properties
No response
Versions
Python version: 3.9 (64-bit runtime)
Python executable: /home/venv/bin/python
Versions of relevant python libraries:
captum==0.6.0
numpy==1.26.4
nvgpu==0.10.0
pillow==10.3.0
psutil==5.9.8
requests==2.32.3
sentencepiece==0.2.0
torch==2.4.0+cu121
torch-model-archiver-nightly==2024.10.15
torch-workflow-archiver-nightly==2024.10.15
torchaudio==2.4.0+cu121
torchserve-nightly==2024.10.15
torchvision==0.19.0+cu121
transformers==4.47.0
wheel==0.42.0
torch==2.4.0+cu121
**Warning: torchtext not present ..
torchvision==0.19.0+cu121
torchaudio==2.4.0+cu121
Java Version:
OS: Ubuntu 20.04.6 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: N/A
CMake version: N/A
Environment:
library_path (LD_/DYLD_): /usr/local/nvidia/lib:/usr/local/nvidia/lib64
Repro instructions
git clone https://github.com/pytorch/serve.git
docker build --pull . -f docker/Dockerfile.vllm -t ts/vllm
docker run --rm -ti --shm-size 10g -e HUGGING_FACE_HUB_TOKEN=$token -p 8089:8080 -v data:/data ts/vllm --model_id mistralai/Pixtral-12B-2409 --disable_token_auth
Possible Solution
Instead of using
torch.cuda.is_available
at Line 83 and Line 93, it should betorch.cuda.is_available()
The text was updated successfully, but these errors were encountered: