Conversation
| "num_prompts": [4, 16, 40], | ||
| "max_concurrency": [1, 4, 10], | ||
| "num_prompts": [4, 16, 40, 64], | ||
| "max_concurrency": [1, 4, 10, 16], |
There was a problem hiding this comment.
what's the maximal in theory?
There was a problem hiding this comment.
what's the maximal in theory?
In async_chunk mode, since chunked prefill is not currently supported, the theoretical maximum supported concurrency is 26 (65536/2500) for an input of 2500.
|
|
||
| # === Pipeline-wide engine settings (applied uniformly to every stage) === | ||
| trust_remote_code: bool = True | ||
| distributed_executor_backend: str = "mp" |
There was a problem hiding this comment.
Could you help check whether this default mp distributed_executor_backend will be applied in every stage? I notice that if so, it's making UX degradation. Because vLLM has a complete init for distributed_executor_backend. How about removing it and make every stage choose by themselves even when default? cc @lishunyang12
There was a problem hiding this comment.
Could you help check whether this default
mpdistributed_executor_backend will be applied in every stage? I notice that if so, it's making UX degradation. Because vLLM has a complete init for distributed_executor_backend. How about removing it and make every stage choose by themselves even when default? cc @lishunyang12
I completely agree with your point. I think we can discuss whether we need to set a default value for the distributed_executor_backend parameter.
|
BLOCKING:
|
9c901b8 to
71dd056
Compare
I will submit a separate PR to remove the default values from the config file. |
71dd056 to
e9aee80
Compare
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
e9aee80 to
85abddb
Compare
|
@gcanlin @hsliuustc0106 I think this PR is ready and can be merged. |
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
In the VLLM implementation, if
distributed_executor_backendis not configured, it will be selected based onworld_size.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)