-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
System Info
tensorrt-llm:release-0.20.0
Who can help?
Error in the Disaggregated Serving example
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
in https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated
I fully reproduced the requirements of this example, using the same model. My GPU is L20 46G
Expected behavior
Successful execution
actual behavior
my cmd:
curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"prompt": "NVIDIA is a great company because",
"max_tokens": 16,
"temperature": 0
}' -w "\n"
the result history:
root@56acd428936d:/app/models# jobs -l
[1] 64414 Running CUDA_VISIBLE_DEVICES=0 trtllm-serve TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8001 --backend pytorch --extra_llm_api_options config/context_extra-llm-api-config.yml &> log_ctx_0 &
[2]- 64415 Running CUDA_VISIBLE_DEVICES=1 trtllm-serve TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8002 --backend pytorch --extra_llm_api_options config/context_extra-llm-api-config.yml &> log_ctx_1 &
[4]+ 66403 Running CUDA_VISIBLE_DEVICES=2 trtllm-serve TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8003 --backend pytorch --extra_llm_api_options config/gen_extra-llm-api-config.yml &> log_gen_0 &
root@56acd428936d:/app/models# trtllm-serve disaggregated -c config/disagg_config.yaml
2025-07-07 09:03:34,710 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.20.0
INFO: Started server process [66764]
INFO: Waiting for application startup.
INFO:root:Waiting for context and generation servers to be ready
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO:root:Sending request to ctx server: http://localhost:8001
INFO:root:Sending request to gen server: http://localhost:8003
ERROR:root:Received failed response {'object': 'error', 'message': 'Failed during generation: init(): incompatible constructor arguments. The following argument types are supported:\n 1. tensorrt_llm.bindings.executor.ContextPhaseParams(arg0: list[int], arg1: int, arg2: Optional[bytes], arg3: Optional[list[int]])\n\nInvoked with: None, None, None, None', 'type': 'BadRequestError', 'param': None, 'code': 400}
ERROR:root:400, message='Bad Request', url='http://localhost:8003/v1/completions'
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 140, in openai_completion
return await self._process_generation_server_request(gen_req, ctx_response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 214, in _process_generation_server_request
gen_response = await self.send_completion_request(gen_server, gen_req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 281, in send_completion_request
return await self.send_request(url, request, "/v1/completions", CompletionResponse, self.create_completion_generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 277, in send_request
response.raise_for_status()
File "/usr/local/lib/python3.12/dist-packages/aiohttp/client_reqrep.py", line 1161, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url='http://localhost:8003/v1/completions'
INFO: 127.0.0.1:56126 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
additional notes
my log about :
"detail":"Internal server error 400, message='Bad Request', url='http://localhost:8003/v1/completions'"}
root@56acd428936d:/app/models# ls
InternVL2_5-8B Qwen3-0.6B Qwen3-30B-A3B Qwen3-4B TinyLlama-1.1B-Chat-v1.0 config log_ctx_0 log_ctx_1 log_gen_0 qwen
root@56acd428936d:/app/models# tail -f log_gen_0
Encountered an exception: Failed during generation: init(): incompatible constructor arguments. The following argument types are supported:
1. tensorrt_llm.bindings.executor.ContextPhaseParams(arg0: list[int], arg1: int, arg2: Optional[bytes], arg3: Optional[list[int]])
Invoked with: None, None, None, None
INFO: 127.0.0.1:37166 - "POST /v1/completions HTTP/1.1" 400 Bad Request
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread await_response_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread proxy_dispatch_stats_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread proxy_dispatch_kv_cache_events_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread dispatch_stats_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread dispatch_kv_cache_events_thread stopped.