Skip to content

Update vLLM version support to include 0.14.0 and 0.14.1#5214

Merged
qgallouedec merged 2 commits intomainfrom
vllm-0.14
Mar 4, 2026
Merged

Update vLLM version support to include 0.14.0 and 0.14.1#5214
qgallouedec merged 2 commits intomainfrom
vllm-0.14

Conversation

@qgallouedec
Copy link
Member

Summary

Extend TRL’s vLLM support to 0.14.0 and 0.14.1.

Changes

vLLM 0.14.0 introduced a breaking change: DP for dense models now errors out. From vllm-project/vllm#30739.

Reproducer and traceback
$ trl vllm-serve --model Qwen/Qwen2.5-1.5B --data_parallel_size 2
INFO:     Started server process [859382]
INFO:     Waiting for application startup.
Process Process-1:
Traceback (most recent call last):
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
    ~~~~~~~~^^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/qgallouedec/trl/trl/scripts/vllm_serve.py", line 352, in llm_worker
    llm = LLM(
        model=script_args.model,
    ...<15 lines>...
        logprobs_mode="processed_logprobs",
    )
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/vllm/entrypoints/llm.py", line 338, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        engine_args=engine_args, usage_context=UsageContext.LLM_CLASS
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/vllm/v1/engine/llm_engine.py", line 168, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/vllm/engine/arg_utils.py", line 1584, in create_engine_config
    parallel_config = ParallelConfig(
        pipeline_parallel_size=self.pipeline_parallel_size,
    ...<37 lines>...
        _api_process_rank=self._api_process_rank,
    )
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for ParallelConfig
  Value error, Offline data parallel mode is not supported/useful for dense models. [type=value_error, input_value=ArgsKwargs((), {'pipeline...'_api_process_rank': 0}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.12/v/value_error
Process Process-2:
Traceback (most recent call last):
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
    ~~~~~~~~^^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/qgallouedec/trl/trl/scripts/vllm_serve.py", line 352, in llm_worker
    llm = LLM(
        model=script_args.model,
    ...<15 lines>...
        logprobs_mode="processed_logprobs",
    )
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/vllm/entrypoints/llm.py", line 338, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        engine_args=engine_args, usage_context=UsageContext.LLM_CLASS
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/vllm/v1/engine/llm_engine.py", line 168, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/vllm/engine/arg_utils.py", line 1584, in create_engine_config
    parallel_config = ParallelConfig(
        pipeline_parallel_size=self.pipeline_parallel_size,
    ...<37 lines>...
        _api_process_rank=self._api_process_rank,
    )
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for ParallelConfig
  Value error, Offline data parallel mode is not supported/useful for dense models. [type=value_error, input_value=ArgsKwargs((), {'pipeline...'_api_process_rank': 0}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.12/v/value_error
ERROR:    Traceback (most recent call last):
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/starlette/routing.py", line 694, in lifespan
    async with self.lifespan_context(app) as maybe_state:
               ~~~~~~~~~~~~~~~~~~~~~^^^^^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/contextlib.py", line 214, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/qgallouedec/trl/trl/scripts/vllm_serve.py", line 451, in lifespan
    msg = connection.recv()
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/multiprocessing/connection.py", line 430, in _recv_bytes
    buf = self._recv(4)
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/multiprocessing/connection.py", line 399, in _recv
    raise EOFError
EOFError

ERROR:    Application startup failed. Exiting.

In my understanding, they say that scaling DP for dense models is always detrimental to performance. Which is surprising considering my old benchmark. Anyways, I recommend aligning with vLLM recommendations, and discourage scaling DP for dense model when even possible (vllm<0.14).

Tests

$ pytest tests/test_vllm_client_server.py
========================================== test session starts ==========================================
platform linux -- Python 3.13.11, pytest-9.0.2, pluggy-1.6.0
rootdir: /fsx/qgallouedec/trl
configfile: pyproject.toml
plugins: rerunfailures-15.1, anyio-4.12.1, xdist-3.8.0, datadir-1.8.0, cov-7.0.0
collected 37 items                                                                                      

tests/test_vllm_client_server.py ...............x............ssssss...                            [100%]

========================= 30 passed, 6 skipped, 1 xfailed in 425.60s (0:07:05) ==========================

@qgallouedec qgallouedec changed the title vllm 0.14 Update vLLM version support to include 0.14.0 and 0.14.1 Mar 3, 2026
Comment on lines +539 to +558
def test_generate_with_params(self):
prompts = ["Hello, AI!", "Tell me a joke"]
completion_ids = self.client.generate(prompts, n=2, repetition_penalty=0.9, temperature=0.8, max_tokens=32)[
"completion_ids"
]

# Check that the output is a list
assert isinstance(completion_ids, list)

# Check that the number of generated sequences is 2 times the number of prompts
assert len(completion_ids) == 2 * len(prompts)

# Check that the generated sequences are lists of integers
for seq in completion_ids:
assert all(isinstance(tok, int) for tok in seq)

# Check that the length of the generated sequences is less than or equal to 32
for seq in completion_ids:
assert len(seq) <= 32

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not specific to vllm 0.14, but I realized that this test case was missing

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 59252637ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +668 to +687
def test_generate_with_params(self):
prompts = ["Hello, AI!", "Tell me a joke"]
completion_ids = self.client.generate(prompts, n=2, repetition_penalty=0.9, temperature=0.8, max_tokens=32)[
"completion_ids"
]

# Check that the output is a list
assert isinstance(completion_ids, list)

# Check that the number of generated sequences is 2 times the number of prompts
assert len(completion_ids) == 2 * len(prompts)

# Check that the generated sequences are lists of integers
for seq in completion_ids:
assert all(isinstance(tok, int) for tok in seq)

# Check that the length of the generated sequences is less than or equal to 32
for seq in completion_ids:
assert len(seq) <= 32

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@qgallouedec qgallouedec merged commit 8f635b6 into main Mar 4, 2026
13 checks passed
@qgallouedec qgallouedec deleted the vllm-0.14 branch March 4, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants