Skip to content

Comments

[rollout] fix: Add "non_block" argument compatibility to collective_rpc()#3934

Merged
wuxibin89 merged 3 commits intoverl-project:mainfrom
tpx818:main
Oct 28, 2025
Merged

[rollout] fix: Add "non_block" argument compatibility to collective_rpc()#3934
wuxibin89 merged 3 commits intoverl-project:mainfrom
tpx818:main

Conversation

@kevssim
Copy link
Contributor

@kevssim kevssim commented Oct 28, 2025

What does this PR do?

This PR fixes a TypeError that occurs when newer versions of vLLM (v0.11+) attempt to call ExternalZeroMQDistributedExecutor.collective_rpc.

The issue stems from a recent vLLM update (vllm-project/vllm#24219) that added the keyword argument non_block to the Executor.collective_rpc interface. Since the verl implementation of collective_rpc did not define this parameter, calling it with non_block=True resulted in the error: TypeError: ExternalZeroMQDistributedExecutor.collective_rpc() got an unexpected keyword argument 'non_block'.

By using **extra_kwargs in the function signature, we ensure compatibility with both legacy and modern vLLM interfaces without affecting the existing ZeroMQ non-blocking logic.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a TypeError in collective_rpc when used with newer versions of vLLM by adding **extra_kwargs to the method signature. While this correctly resolves the signature mismatch, the implementation has a critical flaw: the arguments captured by **extra_kwargs are not passed on to the remote worker. This will cause any new keyword arguments to be silently ignored, potentially leading to incorrect behavior. I've left a critical comment with a suggested fix to properly merge these arguments before they are sent.

timeout: Optional[float] = None,
args: tuple = (),
kwargs: Optional[dict[str, Any]] = None,
**extra_kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

While adding **extra_kwargs fixes the TypeError, the collected arguments are then discarded, as they are not used in the function body. This means any new keyword arguments from vllm (like non_block, but also any others) will be silently ignored instead of being passed to the remote worker via the pickled message. This is a critical issue as it can lead to unexpected behavior if the remote method relies on these arguments.

The pull request description mentions not wanting to affect the existing non-blocking logic. You can achieve this while still correctly handling other arguments by merging extra_kwargs and then explicitly handling or removing non_block.

I recommend updating the function body to merge the arguments:

    all_kwargs = kwargs or {}
    all_kwargs.update(extra_kwargs)
    
    # Now `all_kwargs` contains all keyword arguments.
    # You can handle `non_block` if needed, for example, by ignoring it:
    all_kwargs.pop("non_block", None)

    # ...
    
    message = pickle.dumps((sent_method, args, all_kwargs))
    # ...

This ensures forward compatibility with any other keyword arguments that might be added in the future, while still addressing the immediate TypeError. Since the function body is not in this diff, I cannot provide a direct code suggestion.

@huaiyizhao
Copy link
Contributor

Thanks, it works!

@huaiyizhao
Copy link
Contributor

verl is TOO DEEPLY coupled with inference and training backend using its hybrid engine. I think we need a better tradeoff between compatibility and efficiency.

@ji-huazhong ji-huazhong self-requested a review October 28, 2025 07:22
@ji-huazhong ji-huazhong changed the title [rollout]fix: Add "non_block" argument compatibility to collective_rpc() [rollout] fix: Add "non_block" argument compatibility to collective_rpc() Oct 28, 2025
@wuxibin89 wuxibin89 merged commit e7be886 into verl-project:main Oct 28, 2025
71 of 74 checks passed
wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025
…pc() (verl-project#3934)

### What does this PR do?

This PR fixes a `TypeError` that occurs when newer versions of vLLM
(v0.11+) attempt to call
`ExternalZeroMQDistributedExecutor.collective_rpc`.

The issue stems from a recent vLLM update
(vllm-project/vllm#24219) that added the keyword
argument `non_block` to the `Executor.collective_rpc` interface. Since
the `verl` implementation of `collective_rpc` did not define this
parameter, calling it with `non_block=True` resulted in the error:
`TypeError: ExternalZeroMQDistributedExecutor.collective_rpc() got an
unexpected keyword argument 'non_block'`.

By using `**extra_kwargs` in the function signature, we ensure
compatibility with both legacy and modern vLLM interfaces without
affecting the existing ZeroMQ non-blocking logic.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weikaiwen <weikaiwen1998@126.com>
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 3, 2025
…pc() (verl-project#3934)

### What does this PR do?

This PR fixes a `TypeError` that occurs when newer versions of vLLM
(v0.11+) attempt to call
`ExternalZeroMQDistributedExecutor.collective_rpc`.

The issue stems from a recent vLLM update
(vllm-project/vllm#24219) that added the keyword
argument `non_block` to the `Executor.collective_rpc` interface. Since
the `verl` implementation of `collective_rpc` did not define this
parameter, calling it with `non_block=True` resulted in the error:
`TypeError: ExternalZeroMQDistributedExecutor.collective_rpc() got an
unexpected keyword argument 'non_block'`.

By using `**extra_kwargs` in the function signature, we ensure
compatibility with both legacy and modern vLLM interfaces without
affecting the existing ZeroMQ non-blocking logic.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weikaiwen <weikaiwen1998@126.com>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…pc() (verl-project#3934)

### What does this PR do?

This PR fixes a `TypeError` that occurs when newer versions of vLLM
(v0.11+) attempt to call
`ExternalZeroMQDistributedExecutor.collective_rpc`.

The issue stems from a recent vLLM update
(vllm-project/vllm#24219) that added the keyword
argument `non_block` to the `Executor.collective_rpc` interface. Since
the `verl` implementation of `collective_rpc` did not define this
parameter, calling it with `non_block=True` resulted in the error:
`TypeError: ExternalZeroMQDistributedExecutor.collective_rpc() got an
unexpected keyword argument 'non_block'`.

By using `**extra_kwargs` in the function signature, we ensure
compatibility with both legacy and modern vLLM interfaces without
affecting the existing ZeroMQ non-blocking logic.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weikaiwen <weikaiwen1998@126.com>
chenhaiq pushed a commit to The-Hierophant/verl-1 that referenced this pull request Nov 18, 2025
…pc() (verl-project#3934)

### What does this PR do?

This PR fixes a `TypeError` that occurs when newer versions of vLLM
(v0.11+) attempt to call
`ExternalZeroMQDistributedExecutor.collective_rpc`.

The issue stems from a recent vLLM update
(vllm-project/vllm#24219) that added the keyword
argument `non_block` to the `Executor.collective_rpc` interface. Since
the `verl` implementation of `collective_rpc` did not define this
parameter, calling it with `non_block=True` resulted in the error:
`TypeError: ExternalZeroMQDistributedExecutor.collective_rpc() got an
unexpected keyword argument 'non_block'`.

By using `**extra_kwargs` in the function signature, we ensure
compatibility with both legacy and modern vLLM interfaces without
affecting the existing ZeroMQ non-blocking logic.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weikaiwen <weikaiwen1998@126.com>
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025
…pc() (verl-project#3934)

### What does this PR do?

This PR fixes a `TypeError` that occurs when newer versions of vLLM
(v0.11+) attempt to call
`ExternalZeroMQDistributedExecutor.collective_rpc`.

The issue stems from a recent vLLM update
(vllm-project/vllm#24219) that added the keyword
argument `non_block` to the `Executor.collective_rpc` interface. Since
the `verl` implementation of `collective_rpc` did not define this
parameter, calling it with `non_block=True` resulted in the error:
`TypeError: ExternalZeroMQDistributedExecutor.collective_rpc() got an
unexpected keyword argument 'non_block'`.

By using `**extra_kwargs` in the function signature, we ensure
compatibility with both legacy and modern vLLM interfaces without
affecting the existing ZeroMQ non-blocking logic.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weikaiwen <weikaiwen1998@126.com>
albertimff pushed a commit to albertimff/verl that referenced this pull request Dec 1, 2025
…pc() (verl-project#3934)

### What does this PR do?

This PR fixes a `TypeError` that occurs when newer versions of vLLM
(v0.11+) attempt to call
`ExternalZeroMQDistributedExecutor.collective_rpc`.

The issue stems from a recent vLLM update
(vllm-project/vllm#24219) that added the keyword
argument `non_block` to the `Executor.collective_rpc` interface. Since
the `verl` implementation of `collective_rpc` did not define this
parameter, calling it with `non_block=True` resulted in the error:
`TypeError: ExternalZeroMQDistributedExecutor.collective_rpc() got an
unexpected keyword argument 'non_block'`.

By using `**extra_kwargs` in the function signature, we ensure
compatibility with both legacy and modern vLLM interfaces without
affecting the existing ZeroMQ non-blocking logic.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weikaiwen <weikaiwen1998@126.com>
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…pc() (verl-project#3934)

### What does this PR do?

This PR fixes a `TypeError` that occurs when newer versions of vLLM
(v0.11+) attempt to call
`ExternalZeroMQDistributedExecutor.collective_rpc`.

The issue stems from a recent vLLM update
(vllm-project/vllm#24219) that added the keyword
argument `non_block` to the `Executor.collective_rpc` interface. Since
the `verl` implementation of `collective_rpc` did not define this
parameter, calling it with `non_block=True` resulted in the error:
`TypeError: ExternalZeroMQDistributedExecutor.collective_rpc() got an
unexpected keyword argument 'non_block'`.

By using `**extra_kwargs` in the function signature, we ensure
compatibility with both legacy and modern vLLM interfaces without
affecting the existing ZeroMQ non-blocking logic.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weikaiwen <weikaiwen1998@126.com>
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…pc() (verl-project#3934)

### What does this PR do?

This PR fixes a `TypeError` that occurs when newer versions of vLLM
(v0.11+) attempt to call
`ExternalZeroMQDistributedExecutor.collective_rpc`.

The issue stems from a recent vLLM update
(vllm-project/vllm#24219) that added the keyword
argument `non_block` to the `Executor.collective_rpc` interface. Since
the `verl` implementation of `collective_rpc` did not define this
parameter, calling it with `non_block=True` resulted in the error:
`TypeError: ExternalZeroMQDistributedExecutor.collective_rpc() got an
unexpected keyword argument 'non_block'`.

By using `**extra_kwargs` in the function signature, we ensure
compatibility with both legacy and modern vLLM interfaces without
affecting the existing ZeroMQ non-blocking logic.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weikaiwen <weikaiwen1998@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants