[BugFix] SharedMemoryConnector: only use shared memory if message size is over threshold by NickCao · Pull Request #1643 · vllm-project/vllm-omni

NickCao · 2026-03-03T18:17:44Z

Purpose

This allows #939 to be used across multiple nodes.

Test Plan

Start three gpu nodes:

# on master node 0
vllm serve --omni --port 8091 --stage-id 0 \
  Qwen/Qwen2.5-Omni-3B \
  --omni-master-address "<master ip>" --omni-master-port 8092

# on worker node 1
vllm serve --omni --headless --stage-id 1 \
  Qwen/Qwen2.5-Omni-3B \
  --omni-master-address "<master ip>" --omni-master-port 8092

# on worker node 2
vllm serve --omni --headless --stage-id 2 \
  Qwen/Qwen2.5-Omni-3B \
  --omni-master-address "<master ip>" --omni-master-port 8092

Run test query:

curl http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What is inside this image?" },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://http.cat/100"
            }
          }
        ]
      }
    ]
  }'

Test Result

An audio response is successfully generated.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

hsliuustc0106 · 2026-03-04T06:28:15Z

@wuhang2014 @natureofnature PTAL

natureofnature · 2026-03-04T06:48:02Z

            size = len(payload)

-            if True:
+            if size > self.threshold:


On sender side, if the size <= self.threshold, we need to update metadata = {"inline_bytes": payload, "size": size}, but on the receiver side , the metadata might NOT be passes to the get function. And in this case, the receiver tries to receive data from shared memory, which does not exist.
I think you need to fix this to make sender<->receiver consistent in both with/without meta path.

We should probably just drop the without metadata path. Other connectors may necessitate the use of metadata and leaving the choice of whether to pass the metadata to the caller is not ideal.

@R2-Y PTAL, is it possible that we remove non-metadata path for async chunk function and use meta in all model and modes?

wuhang2014 · 2026-03-04T07:31:46Z

First of all, I'm not quite sure that if SharedMemoryConnector works in a multi-node deployment. @NickCao @natureofnature

natureofnature · 2026-03-04T07:38:48Z

First of all, I'm not quite sure that if SharedMemoryConnector works in a multi-node deployment. @NickCao @natureofnature

No, it only works on a single node. For multi-node deployment, currently we can use mooncake store connector. @wuhang2014

(Supplementary information: Currently In Bagel/Qwen3 omni case when kv cache transfer manager and async chunk transfer are used, metadata is not carried, and SharedMemoryConnector does not work even inline mode is set.)

NickCao · 2026-03-04T13:40:51Z

First of all, I'm not quite sure that if SharedMemoryConnector works in a multi-node deployment. @NickCao @natureofnature

No, it only works on a single node. For multi-node deployment, currently we can use mooncake store connector. @wuhang2014

It does work for multi-node, I've tested this using the stage base cli from #939 on a cluster of three ec2 instances. It works by forcing the threshold to sys.maxsize, thus sending all messages inline, without actually using shm.

natureofnature · 2026-03-04T14:25:04Z

First of all, I'm not quite sure that if SharedMemoryConnector works in a multi-node deployment. @NickCao @natureofnature

No, it only works on a single node. For multi-node deployment, currently we can use mooncake store connector. @wuhang2014

It does work for multi-node, I've tested this using the stage base cli from #939 on a cluster of three ec2 instances. It works by forcing the threshold to sys.maxsize, thus sending all messages inline, without actually using shm.

You’re right that multi-node can work when shm_threshold_bytes=sys.maxsize, but in that mode payload transfer is effectively inline over the stage transport (ZMQ queue), not shared memory. So this is a compatibility workaround rather than the intended high-performance path. For multi-node deployments, we recommend a network connector (e.g., MooncakeTransferEngineConnector / MooncakeStoreConnector / YuanrongConnector). @NickCao

NickCao · 2026-03-04T14:26:56Z

First of all, I'm not quite sure that if SharedMemoryConnector works in a multi-node deployment. @NickCao @natureofnature

No, it only works on a single node. For multi-node deployment, currently we can use mooncake store connector. @wuhang2014

It does work for multi-node, I've tested this using the stage base cli from #939 on a cluster of three ec2 instances. It works by forcing the threshold to sys.maxsize, thus sending all messages inline, without actually using shm.

You’re right that multi-node can work when shm_threshold_bytes=sys.maxsize, but in that mode payload transfer is effectively inline over the stage transport (ZMQ queue), not shared memory. So this is a compatibility workaround rather than the intended high-performance path. For multi-node deployments, we recommend a network connector (e.g., MooncakeTransferEngineConnector / MooncakeStoreConnector / YuanrongConnector). @NickCao

I'm aware that this may have degraded performance, but it's still useful for development and testing?

NickCao · 2026-03-04T14:27:48Z

And for single node deployments, this can reduce the overhead for small messages (which I suppose is the reason the threshold exists in the first place).

natureofnature · 2026-03-04T14:30:43Z

And for single node deployments, this can reduce the overhead for small messages (which I suppose is the reason the threshold exists in the first place).

Yes, that's exactly why it's there.

natureofnature · 2026-03-04T15:22:44Z

First of all, I'm not quite sure that if SharedMemoryConnector works in a multi-node deployment. @NickCao @natureofnature

No, it only works on a single node. For multi-node deployment, currently we can use mooncake store connector. @wuhang2014

It does work for multi-node, I've tested this using the stage base cli from #939 on a cluster of three ec2 instances. It works by forcing the threshold to sys.maxsize, thus sending all messages inline, without actually using shm.

You’re right that multi-node can work when shm_threshold_bytes=sys.maxsize, but in that mode payload transfer is effectively inline over the stage transport (ZMQ queue), not shared memory. So this is a compatibility workaround rather than the intended high-performance path. For multi-node deployments, we recommend a network connector (e.g., MooncakeTransferEngineConnector / MooncakeStoreConnector / YuanrongConnector). @NickCao

I'm aware that this may have degraded performance, but it's still useful for development and testing?

@NickCao Currently, if we force to set meta data, it requires some refactoring on chunk_transfer_adapter and kv_transfer_manager. To merge the PR, I suggest step 1. Apply the threshold check. In get function, set default metadata to {} and fall back to the SHM path when metadata does not contain "inline_bytes". Add a warning log when inline data is expected but missing, so users are aware of the potential mismatch. Step 2 (following PR maybe), Refactor chunk_transfer_adapter and kv_transfer_manager to propagate put() metadata to get() in all scenarios, ensuring sender and receiver are always consistent. @princepride @R2-Y What's your ideas?

NickCao · 2026-03-04T15:29:07Z

In get function, set default metadata to {} and fall back to the SHM path when metadata does not contain "inline_bytes".

This fallback already happens when metadata is missing.

Add a warning log when inline data is expected but missing, so users are aware of the potential mismatch.

The only way to detect this is to check if the expected file does not exist in /dev/shm?

natureofnature · 2026-03-04T16:38:38Z

In get function, set default metadata to {} and fall back to the SHM path when metadata does not contain "inline_bytes".

This fallback already happens when metadata is missing.

Add a warning log when inline data is expected but missing, so users are aware of the potential mismatch.

The only way to detect this is to check if the expected file does not exist in /dev/shm?

I mean somehow like

def get(self, ..., metadata=None):
    if metadata is None:
        metadata = {}
    if "inline_bytes" in metadata:
        # inline path
    elif "shm" in metadata:
        # get data from shared memory with the name "shm"
    else:
        # fallback: get data from sim using key
        logger.warning("No inline key ...") # when timeout

By the way, in Bagel/Qwen3 omni case when kv cache transfer manager and async chunk transfer are used, metadata is not carried, that's why I said shared memory does not work for multi node deployment now. And perhaps in these models you even need users to set the threshold to low value to make them work before letting them propagating the metadata.

NickCao · 2026-03-04T17:05:50Z

Added error log for this scenario, this is a hard error not a warning since it straight does not work under this configuration: multi node, shm connector, metadata not passed.

Followup PR would be to make kv cache transfer manager and async chunk transfer compatible with this.

R2-Y · 2026-03-05T02:15:41Z

We need to use put_key and get_key in shared memory to control the order in which chunk data are received and sent. It seems difficult to correctly transmit chunk metadata using an inline method, inline will lead disorder of recv chunk. @NickCao @natureofnature

NickCao · 2026-03-05T07:11:16Z

We need to use put_key and get_key in shared memory to control the order in which chunk data are received and sent. It seems difficult to correctly transmit chunk metadata using an inline method, inline will lead disorder of recv chunk. @NickCao @natureofnature

This also applies to single node? That mean we shall instead drop the inline code path completely.

R2-Y · 2026-03-05T07:43:15Z

We need to use put_key and get_key in shared memory to control the order in which chunk data are received and sent. It seems difficult to correctly transmit chunk metadata using an inline method, inline will lead disorder of recv chunk. @NickCao @natureofnature

This also applies to single node? That mean we shall instead drop the inline code path completely.

yes, also applies to single node.

natureofnature · 2026-03-05T15:02:54Z

I also have some concerns, which depends on future architecture: If it totally follows async mode and the orchestrator sends request meta to multiple stages at a time (async stage execution), inline mode's advantage is never there: the following stages can not get the payload through the inline path when receiving the meta data from orchestrator.
(For example , in current async chunk mode, Qwen3 omni stage 2 does not wait for stage 1's result).

…e is over threshold Signed-off-by: Nick Cao <ncao@redhat.com>

…fallback path Signed-off-by: Nick Cao <ncao@redhat.com>

NickCao · 2026-03-10T15:14:27Z

Marking draft due to concerns with the compatibility between inline data and async/chunked kv transfer.

NickCao requested a review from hsliuustc0106 as a code owner March 3, 2026 18:17

NickCao force-pushed the shm-multi-node branch from d768698 to d1f8572 Compare March 3, 2026 18:19

natureofnature reviewed Mar 4, 2026

View reviewed changes

NickCao force-pushed the shm-multi-node branch 2 times, most recently from 0729286 to d1773ff Compare March 4, 2026 17:02

natureofnature reviewed Mar 4, 2026

View reviewed changes

Comment thread vllm_omni/distributed/omni_connectors/connectors/test_shm_connector.py Outdated

Comment thread vllm_omni/distributed/omni_connectors/connectors/test_shm_connector.py Outdated

NickCao force-pushed the shm-multi-node branch from 710ded5 to 8f5a020 Compare March 4, 2026 17:25

NickCao requested a review from natureofnature March 4, 2026 17:31

natureofnature reviewed Mar 4, 2026

View reviewed changes

Comment thread vllm_omni/distributed/omni_connectors/connectors/shm_connector.py

NickCao force-pushed the shm-multi-node branch from 8f5a020 to 64c2e1a Compare March 4, 2026 18:14

NickCao added 2 commits March 10, 2026 11:10

[BugFix] SharedMemoryConnector: only use shared memory if message siz…

eea9021

…e is over threshold Signed-off-by: Nick Cao <ncao@redhat.com>

[BugFix] SharedMemoryConnector: log error when shm get failed in the …

a14f3e2

…fallback path Signed-off-by: Nick Cao <ncao@redhat.com>

NickCao marked this pull request as draft March 10, 2026 15:12

NickCao force-pushed the shm-multi-node branch from 64c2e1a to a14f3e2 Compare March 10, 2026 15:13

NickCao mentioned this pull request Mar 10, 2026

[RFC]: Ray Connector #1792

Open

1 task

NickCao closed this Apr 9, 2026

Conversation

NickCao commented Mar 3, 2026

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Mar 4, 2026

Uh oh!

natureofnature Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickCao Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

natureofnature Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuhang2014 commented Mar 4, 2026

Uh oh!

natureofnature commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickCao commented Mar 4, 2026

Uh oh!

natureofnature commented Mar 4, 2026

Uh oh!

NickCao commented Mar 4, 2026

Uh oh!

NickCao commented Mar 4, 2026

Uh oh!

natureofnature commented Mar 4, 2026

Uh oh!

natureofnature commented Mar 4, 2026

Uh oh!

NickCao commented Mar 4, 2026

Uh oh!

natureofnature commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickCao commented Mar 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

R2-Y commented Mar 5, 2026

Uh oh!

NickCao commented Mar 5, 2026

Uh oh!

R2-Y commented Mar 5, 2026

Uh oh!

natureofnature commented Mar 5, 2026

Uh oh!

NickCao commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

natureofnature Mar 4, 2026 •

edited

Loading

natureofnature Mar 4, 2026 •

edited

Loading

natureofnature commented Mar 4, 2026 •

edited

Loading

natureofnature commented Mar 4, 2026 •

edited

Loading