[Feature] Bagel: Support tp+cfg parallel using mooncake transfer engine connector#2705
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Draft PR - ready for full review when draft status removed. This PR is substantial (>1000 LOC / >10 files). Please run L3 tests locally and paste the results in the PR description: Test Result section should include:
|
|
@princepride PTAL |
|
Please resolve conflicts first |
389df75 to
087f3d7
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
…and Bagel/engine integration fix: build per-stage sampling_params_list and pass cfg_text/img_scale in serving_chat KV Transfer Manager — rank-aware TP: - Embed from_rank/to_rank into connector keys for per-rank addressing - Rank mapping for heterogeneous TP (M:N) topologies - Sender-side slice and receiver-side merge hooks for KV head redistribution - Per-rank ZMQ port calculation using KV_RANK_PORT_STRIDE - receive_multi_kv_cache_distributed() for pulling from multiple sender ranks - Deduplicate serialization: shared _build_tensors_desc/_build_header_bytes and _populate_caches helpers (~91 lines saved) - Replace traceback.print_exc() with logger.exception() - Remove dead from_model_config alias and get_connector() wrapper CFG Distribution (kv_transfer_manager): - _discover_cfg_branch_roles() auto-detects branch roles from sampling_params - _build_cfg_rank_local_payloads() partitions branch KVs across CFG ranks - Generic contract: cfg_active_branch, cfg_branch_roles, cfg_branch_past_key_values, cfg_branch_kv_metadata OmniSamplingParams (data.py): - Add generic CFG fields for model-agnostic branch contract Bagel pipeline: - Read generic cfg_branch_* fields first, fall back to legacy cfg_text/cfg_img fields for backward compatibility GroupCoordinator: - Fix send_object/recv_object assert: rank_in_group instead of global rank - Initialize self.shm_broadcaster = None Engine TP auto-inference (async_omni_engine.py): - Add _tp_size_for_stage, _inject_inferred_kv_tp_topology helpers - Auto-infer from_tp/to_tp for adjacent stages in OmniKVCacheConfig - Use local _inject_kv_stage_info with TP topology support stage_engine_core_client.py: - Document rank-0 base port and KV_RANK_PORT_STRIDE adjustment cfg scales parameter pass: - pass cfg_text/img_scale in serving_chat, to make cfg scale controllable Tests: test_tp_rank_aware (rank-aware keys, hetero TP merge/slice, CFG leader distribution, payload application, multi-source receive) and test_async_omni_engine_stage_init (TP auto-inference) Signed-off-by: natureofnature <wzliu@connect.hku.hk>
087f3d7 to
bef2b86
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bef2b86c21
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce20e23582
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…ne connector (vllm-project#2705) Signed-off-by: natureofnature <wzliu@connect.hku.hk> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Progress
Purpose
Taking i2i tp2 (AR) -> tp2(DIT) using shared memory for example (50ae1de):
The output image is like :
Test Plan
2.1 Text to image
2.2 Image to Image
Prompt: Let the woman wear a white dress
Image:
Test Result
Before this PR:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)