[BugFix] SharedMemoryConnector: only use shared memory if message size is over threshold#1643
[BugFix] SharedMemoryConnector: only use shared memory if message size is over threshold#1643NickCao wants to merge 2 commits into
Conversation
| size = len(payload) | ||
|
|
||
| if True: | ||
| if size > self.threshold: |
There was a problem hiding this comment.
On sender side, if the size <= self.threshold, we need to update metadata = {"inline_bytes": payload, "size": size}, but on the receiver side , the metadata might NOT be passes to the get function. And in this case, the receiver tries to receive data from shared memory, which does not exist.
I think you need to fix this to make sender<->receiver consistent in both with/without meta path.
There was a problem hiding this comment.
We should probably just drop the without metadata path. Other connectors may necessitate the use of metadata and leaving the choice of whether to pass the metadata to the caller is not ideal.
There was a problem hiding this comment.
@R2-Y PTAL, is it possible that we remove non-metadata path for async chunk function and use meta in all model and modes?
|
First of all, I'm not quite sure that if |
No, it only works on a single node. For multi-node deployment, currently we can use mooncake store connector. @wuhang2014 (Supplementary information: Currently In Bagel/Qwen3 omni case when kv cache transfer manager and async chunk transfer are used, metadata is not carried, and SharedMemoryConnector does not work even inline mode is set.) |
It does work for multi-node, I've tested this using the stage base cli from #939 on a cluster of three ec2 instances. It works by forcing the threshold to sys.maxsize, thus sending all messages inline, without actually using shm. |
You’re right that multi-node can work when shm_threshold_bytes=sys.maxsize, but in that mode payload transfer is effectively inline over the stage transport (ZMQ queue), not shared memory. So this is a compatibility workaround rather than the intended high-performance path. For multi-node deployments, we recommend a network connector (e.g., MooncakeTransferEngineConnector / MooncakeStoreConnector / YuanrongConnector). @NickCao |
I'm aware that this may have degraded performance, but it's still useful for development and testing? |
|
And for single node deployments, this can reduce the overhead for small messages (which I suppose is the reason the threshold exists in the first place). |
Yes, that's exactly why it's there. |
@NickCao Currently, if we force to set meta data, it requires some refactoring on chunk_transfer_adapter and kv_transfer_manager. To merge the PR, I suggest step 1. Apply the threshold check. In get function, set default metadata to {} and fall back to the SHM path when metadata does not contain "inline_bytes". Add a warning log when inline data is expected but missing, so users are aware of the potential mismatch. Step 2 (following PR maybe), Refactor chunk_transfer_adapter and kv_transfer_manager to propagate put() metadata to get() in all scenarios, ensuring sender and receiver are always consistent. @princepride @R2-Y What's your ideas? |
This fallback already happens when metadata is missing.
The only way to detect this is to check if the expected file does not exist in /dev/shm? |
I mean somehow like def get(self, ..., metadata=None):
if metadata is None:
metadata = {}
if "inline_bytes" in metadata:
# inline path
elif "shm" in metadata:
# get data from shared memory with the name "shm"
else:
# fallback: get data from sim using key
logger.warning("No inline key ...") # when timeoutBy the way, in Bagel/Qwen3 omni case when kv cache transfer manager and async chunk transfer are used, metadata is not carried, that's why I said shared memory does not work for multi node deployment now. And perhaps in these models you even need users to set the threshold to low value to make them work before letting them propagating the metadata. |
0729286 to
d1773ff
Compare
|
Added error log for this scenario, this is a hard error not a warning since it straight does not work under this configuration: multi node, shm connector, metadata not passed. Followup PR would be to make kv cache transfer manager and async chunk transfer compatible with this. |
|
We need to use put_key and get_key in shared memory to control the order in which chunk data are received and sent. It seems difficult to correctly transmit chunk metadata using an inline method, inline will lead disorder of recv chunk. @NickCao @natureofnature |
This also applies to single node? That mean we shall instead drop the inline code path completely. |
yes, also applies to single node. |
|
I also have some concerns, which depends on future architecture: If it totally follows async mode and the orchestrator sends request meta to multiple stages at a time (async stage execution), inline mode's advantage is never there: the following stages can not get the payload through the inline path when receiving the meta data from orchestrator. |
…e is over threshold Signed-off-by: Nick Cao <ncao@redhat.com>
…fallback path Signed-off-by: Nick Cao <ncao@redhat.com>
|
Marking draft due to concerns with the compatibility between inline data and async/chunked kv transfer. |
Purpose
This allows #939 to be used across multiple nodes.
Test Plan
Start three gpu nodes:
Run test query:
Test Result
An audio response is successfully generated.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)