[Perf] Exploit out-of-band buffers in shm_broadcast #26961

njhill · 2025-10-16T00:17:16Z

Avoid serialization + copy overhead when pickled payload contains large buffers by exploiting https://peps.python.org/pep-0574.

This is beneficial for example when broadcasting bitmask arrays with the multiproc executor.

Benchmark using structured outputs + multiproc executor:

canhazgpu run --gpus 1 -- vllm serve Qwen/Qwen3-1.7B --uvicorn-log-level=error  --no-enable-prefix-caching --distributed-executed-backend mp

python3 benchmarks/benchmark_serving_structured_output.py --backend vllm --model Qwen/Qwen3-1.7B --structured-output-ratio 0.8 --request-rate 120 --max-concurrency 800 --num-prompts 5000 --json-schema-path ./test3.json  --output-len 128

Before:

============ Serving Benchmark Result ============
Successful requests:                     5000      
Maximum request concurrency:             800       
Request rate configured (RPS):           120.00    
Benchmark duration (s):                  83.96     
Total input tokens:                      2405000   
Total generated tokens:                  639936    
Request throughput (req/s):              59.55     
Output token throughput (tok/s):         7621.48   
Total Token throughput (tok/s):          36264.42  
---------------Time to First Token----------------
Mean TTFT (ms):                          327.90    
Median TTFT (ms):                        296.12    
P99 TTFT (ms):                           674.63    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          96.30     
Median TPOT (ms):                        102.40    
P99 TPOT (ms):                           106.44    
---------------Inter-token Latency----------------
Mean ITL (ms):                           96.54     
Median ITL (ms):                         92.78     
P99 ITL (ms):                            251.71    
==================================================
correct_rate(%) 99.6

After:

============ Serving Benchmark Result ============
Successful requests:                     5000      
Maximum request concurrency:             800       
Request rate configured (RPS):           120.00    
Benchmark duration (s):                  71.28     
Total input tokens:                      2405000   
Total generated tokens:                  639937    
Request throughput (req/s):              70.15     
Output token throughput (tok/s):         8977.91   
Total Token throughput (tok/s):          42718.52  
---------------Time to First Token----------------
Mean TTFT (ms):                          251.35    
Median TTFT (ms):                        236.60    
P99 TTFT (ms):                           586.14    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          80.32     
Median TPOT (ms):                        86.29     
P99 TPOT (ms):                           90.48     
---------------Inter-token Latency----------------
Mean ITL (ms):                           80.32     
Median ITL (ms):                         78.74     
P99 ITL (ms):                            198.15    
==================================================
correct_rate(%) 99.58

Avoid serialization + copy overhead when pickled payload contains large buffers. This is beneficial for example when broadcasting bitmask arrays with the multiproc executor. Signed-off-by: Nick Hill <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a performance optimization for shared memory broadcasting by leveraging pickle's out-of-band (OOB) buffer support. This avoids serialization and copy overhead for large buffers, which is a great improvement. The implementation correctly adapts both shared memory and ZMQ communication paths to handle multipart data. However, I've identified a critical bug in the size calculation for shared memory writes that could lead to a buffer overflow and crash the writer process. A fix is suggested to address this.

vllm/distributed/device_communicators/shm_broadcast.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.

@codex fix this CI failure
@codex address that feedback

vllm/distributed/device_communicators/shm_broadcast.py

Signed-off-by: Nick Hill <[email protected]>

russellb

This is really nice!

One minor question and one minor suggestion, otherwise lgtm!

russellb · 2025-10-16T23:48:07Z

vllm/distributed/device_communicators/shm_broadcast.py

        n_local_reader,  # number of local readers through shared memory
        local_reader_ranks: list[int] | None = None,
-        max_chunk_bytes: int = 1024 * 1024 * 10,
+        max_chunk_bytes: int = 1024 * 1024 * 24,  # 24MiB


just curious, what led to this change? What's special about 24 MB?

It's just to ensure that the largest bitmasks are covered, I observed that they could be up to ~20MiB.

got it. It would be helpful to leave a comment in the code to explain where the number came from, and that it's "big enough based on observation" versus some observed technical limitation of the shared memory pathway.

russellb · 2025-10-16T23:57:13Z

vllm/distributed/device_communicators/shm_broadcast.py

                break

    def enqueue(self, obj, timeout: float | None = None):
        """Write to message queue with optional timeout (in seconds)"""


I think an expanded docstring here that gives an overview of the encoding format would be helpful here.

Signed-off-by: Nick Hill <[email protected]>

This is a follow-on to PRs vllm-project#26737 and vllm-project#26961 to add some clarifying comments that were suggested in review. Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

[Perf] Exploit out-of-band buffers in shm_broadcast

1f55d4a

Avoid serialization + copy overhead when pickled payload contains large buffers. This is beneficial for example when broadcasting bitmask arrays with the multiproc executor. Signed-off-by: Nick Hill <[email protected]>

njhill mentioned this pull request Oct 16, 2025

[Core] Async scheduling + structured outputs compatibility #26866

Merged

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

vllm/distributed/device_communicators/shm_broadcast.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 16, 2025

View reviewed changes

vllm/distributed/device_communicators/shm_broadcast.py Outdated Show resolved Hide resolved

vllm/distributed/device_communicators/shm_broadcast.py Show resolved Hide resolved

njhill added 2 commits October 16, 2025 09:10

fixes

a9e3b00

Signed-off-by: Nick Hill <[email protected]>

don't increase max_chunk_bytes by quite so much

b759d9f

Signed-off-by: Nick Hill <[email protected]>

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025

njhill added 2 commits October 16, 2025 10:14

add comment

44eadd5

Signed-off-by: Nick Hill <[email protected]>

simplification

4c23ff5

Signed-off-by: Nick Hill <[email protected]>

russellb approved these changes Oct 16, 2025

View reviewed changes

vllm-bot merged commit ab81379 into vllm-project:main Oct 17, 2025
50 of 52 checks passed

njhill deleted the oob-pickle branch October 17, 2025 03:44

Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

d934bef

Signed-off-by: Nick Hill <[email protected]>

njhill mentioned this pull request Oct 17, 2025

[Minor] Add some clarifying comments to recent changes #27130

Merged

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

11427b2

Signed-off-by: Nick Hill <[email protected]>

albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

5b5b777

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

b520efb

Signed-off-by: Nick Hill <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

2be3822

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

b53010e

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

19214a5

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: 0xrushi <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

558a09a

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: 0xrushi <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

96efa44

Signed-off-by: Nick Hill <[email protected]>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

2698a6f

Signed-off-by: Nick Hill <[email protected]>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Perf] Exploit out-of-band buffers in shm_broadcast (vllm-project#26961)

b5f82de

Signed-off-by: Nick Hill <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Perf] Exploit out-of-band buffers in shm_broadcast #26961

[Perf] Exploit out-of-band buffers in shm_broadcast #26961

Uh oh!

njhill commented Oct 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

russellb left a comment

Uh oh!

russellb Oct 16, 2025

Uh oh!

njhill Oct 17, 2025

Uh oh!

russellb Oct 17, 2025

Uh oh!

russellb Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Perf] Exploit out-of-band buffers in shm_broadcast #26961

[Perf] Exploit out-of-band buffers in shm_broadcast #26961

Uh oh!

Conversation

njhill commented Oct 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

russellb left a comment

Choose a reason for hiding this comment

Uh oh!

russellb Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

russellb Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

russellb Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

njhill commented Oct 16, 2025 •

edited by github-actions bot

Loading