Skip to content

[feature] mooncake support pcp/dcp in common conditions#5224

Merged
wangxiyuan merged 8 commits intovllm-project:mainfrom
wangxiaochao6:main
Dec 31, 2025
Merged

[feature] mooncake support pcp/dcp in common conditions#5224
wangxiyuan merged 8 commits intovllm-project:mainfrom
wangxiaochao6:main

Conversation

@wangxiaochao6
Copy link
Copy Markdown
Contributor

@wangxiaochao6 wangxiaochao6 commented Dec 22, 2025

What this PR does / why we need it?

  1. This PR is proposed to support complicated pcp/dcp parallelisms in Prefill and Decode nodes in Mooncake, such as Prefill: TP8/PCP2DCP8 and Decode: TP8/DCP4/DP2, which is not supported now. We establish the link mappings to transfer KVCache between prefill and decode nodes. The main function is realized in Function of _get_kv_split_metadata in Mooncake_connector.py
  2. After a prefill rank is pulled KVCache by a decode rank, the decode rank will send DONE_RECVING_MSG to the prefill rank and the prefill rank will free its KVCache blocks. If a prefill rank is pulled KVCache more than one time by several decode ranks and it surely could happen in complicated pcp/dcp parallelisms, it will cause the prefill rank free its KVCache blocks for several times, which could cause memory issue. This PR solve this issue by counting the times of prefill rank would be pulled KVCache and in the last time, it will free the prefill rank KVCache blocks. The related code is in Function of run_busy_loop in Mooncake_connector.py
  3. If a prefill rank is not pulled KVCache by any decode ranks, the first rank in decode node will send "DONE_RECVING_MSG" to free its blocks. The related code is in Function of _send_done_signal_to_free_remote_port in Mooncake_connector.py

How was this patch tested?

This PR is tested in many pcp/dcp parallelisms, and the accuracy are all correct.
MLA model:
Prefill node: TP8/DP2, Decode node: TP8/DP2
Prefill node: TP8/PCP2/DCP8, Decode node: TP8/DP2
Prefill node: TP8/PCP2/DCP8, Decode node: TP8/DCP4/DP2
Prefill node: TP8/PCP2/DCP4, Decode node: TP4/DCP2/DP4
Prefill node: TP8/PCP2/DCP2, Decode node: TP4/DCP4/DP4
Prefill node: TP8/PCP2, Decode node: TP4/DCP2

GQA model:
Prefill node: TP8/DP2, Decode node: TP8/DP2
Prefill node: TP8/PCP2/DCP2, Decode node: TP8/DP2
Prefill node: TP8/PCP2/DCP2, Decode node: TP8/DCP2/DP2
Prefill node: TP8/PCP2/DCP2, Decode node: TP4/DP4
Prefill node: TP16/DCP2/PCP1, Decode node: TP8/DCP2/DP2

Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for prefill and decode context parallelism (PCP/DCP) in the Mooncake KV cache connector. The changes involve updating data structures like ReqMeta, modifying the logic in KVCacheSendingThread and KVCacheRecvingThread for new synchronization mechanisms, and significantly refactoring _get_kv_split_metadata in MooncakeConnectorWorker to handle complex block and port mappings required for context parallelism. My review identified a critical bug in the KVCacheSendingThread where an incorrect port is used for message synchronization, which could lead to deadlocks in a distributed environment. I have provided a code suggestion to correct this issue. The other changes appear to correctly implement the new feature, though they are quite complex.

Comment on lines +256 to +264
if remote_port_send_num is not None:
if request_id not in self.port_send_num:
self.port_send_num[request_id] = 0
self.port_send_num[request_id] += 1
if self.port_send_num[request_id] >= \
remote_port_send_num[self.side_channel_port]:
self.task_tracker.update_done_task_count(
request_id)
del self.port_send_num[request_id]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The code uses self.side_channel_port to look up the expected number of messages in remote_port_send_num. However, self.side_channel_port is the base port, and this thread is listening on a specific handshake_port which is derived from the base port and a device index. Using the base port as a key is incorrect and will likely lead to a KeyError or incorrect logic when multiple devices are used, potentially causing deadlocks.

The correct port for this thread should be used. The handshake_port is calculated in the run method but is not accessible in run_busy_loop. A proper fix would involve making handshake_port a member of the class (e.g., self.handshake_port) so it can be accessed here. As a temporary fix that fits within the changed lines, you can recalculate it here.

Suggested change
if remote_port_send_num is not None:
if request_id not in self.port_send_num:
self.port_send_num[request_id] = 0
self.port_send_num[request_id] += 1
if self.port_send_num[request_id] >= \
remote_port_send_num[self.side_channel_port]:
self.task_tracker.update_done_task_count(
request_id)
del self.port_send_num[request_id]
if remote_port_send_num is not None:
device_index = self.pp_rank * self.tp_size + self.tp_rank + self.pcp_rank * self.prefill_tp_size
handshake_port = self.side_channel_port + device_index
if request_id not in self.port_send_num:
self.port_send_num[request_id] = 0
self.port_send_num[request_id] += 1
if self.port_send_num[request_id] >= \
remote_port_send_num[handshake_port]:
self.task_tracker.update_done_task_count(
request_id)
del self.port_send_num[request_id]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has been revised

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
@LCAIZJ
Copy link
Copy Markdown
Collaborator

LCAIZJ commented Dec 22, 2025

Could you describe the problem background this PR aims to address and the related design of the solution?

@wangxiaochao6
Copy link
Copy Markdown
Contributor Author

wangxiaochao6 commented Dec 24, 2025

Could you describe the problem background this PR aims to address and the related design of the solution?

The PR function and test description are provided.

@wangxiaochao6 wangxiaochao6 reopened this Dec 24, 2025
wangxiaochao added 5 commits December 24, 2025 19:33
Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
@wangxiyuan
Copy link
Copy Markdown
Collaborator

@LCAIZJ Please review and merge this if it's fine.

@wangxiyuan wangxiyuan merged commit a539ae7 into vllm-project:main Dec 31, 2025
19 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 31, 2025
…to FIA_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [feature] mooncake support pcp/dcp in common conditions (vllm-project#5224)
  [Bugfix] Fix mm_merge (vllm-project#5249)
  [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495)
  [Feature] Refactor PCP &DCP related code (vllm-project#5214)
  [main][test] Refactor the mtp and eagle test case (vllm-project#5326)
  [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type (vllm-project#5521)
  [2/N] Upgrade nightly doc (vllm-project#5534)
  [Doc] Add new contributors. (vllm-project#5537)
  [3/N][Nightly] Move ops tests to nightly (vllm-project#5538)
wangyibo1005 pushed a commit to wangyibo1005/vllm-ascend that referenced this pull request Dec 31, 2025
…#5224)

### What this PR does / why we need it?
1. This PR is proposed to support complicated pcp/dcp parallelisms in
Prefill and Decode nodes in Mooncake, such as Prefill: TP8/PCP2DCP8 and
Decode: TP8/DCP4/DP2, which is not supported now. We establish the link
mappings to transfer KVCache between prefill and decode nodes. The main
function is realized in Function of `_get_kv_split_metadata` in
Mooncake_connector.py
2. After a prefill rank is pulled KVCache by a decode rank, the decode
rank will send `DONE_RECVING_MSG` to the prefill rank and the prefill
rank will free its KVCache blocks. If a prefill rank is pulled KVCache
more than one time by several decode ranks and it surely could happen in
complicated pcp/dcp parallelisms, it will cause the prefill rank free
its KVCache blocks for several times, which could cause memory issue.
This PR solve this issue by counting the times of prefill rank would be
pulled KVCache and in the last time, it will free the prefill rank
KVCache blocks. The related code is in Function of `run_busy_loop` in
Mooncake_connector.py
3. If a prefill rank is not pulled KVCache by any decode ranks, the
first rank in decode node will send "DONE_RECVING_MSG" to free its
blocks. The related code is in Function of
`_send_done_signal_to_free_remote_port` in Mooncake_connector.py

### How was this patch tested?
This PR is tested in many pcp/dcp parallelisms, and the accuracy are all
correct.
MLA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DCP4/DP2
Prefill node:  TP8/PCP2/DCP4, Decode node: TP4/DCP2/DP4
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DCP4/DP4
Prefill node:  TP8/PCP2, Decode node: TP4/DCP2

GQA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DCP2/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DP4
Prefill node:  TP16/DCP2/PCP1, Decode node: TP8/DCP2/DP2


- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e
- Co-author by: Daishixun dsxtsteven@sina.com

---------

Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
fuzhihong699 pushed a commit to fuzhihong699/vllm-ascend that referenced this pull request Dec 31, 2025
…#5224)

1. This PR is proposed to support complicated pcp/dcp parallelisms in
Prefill and Decode nodes in Mooncake, such as Prefill: TP8/PCP2DCP8 and
Decode: TP8/DCP4/DP2, which is not supported now. We establish the link
mappings to transfer KVCache between prefill and decode nodes. The main
function is realized in Function of `_get_kv_split_metadata` in
Mooncake_connector.py
2. After a prefill rank is pulled KVCache by a decode rank, the decode
rank will send `DONE_RECVING_MSG` to the prefill rank and the prefill
rank will free its KVCache blocks. If a prefill rank is pulled KVCache
more than one time by several decode ranks and it surely could happen in
complicated pcp/dcp parallelisms, it will cause the prefill rank free
its KVCache blocks for several times, which could cause memory issue.
This PR solve this issue by counting the times of prefill rank would be
pulled KVCache and in the last time, it will free the prefill rank
KVCache blocks. The related code is in Function of `run_busy_loop` in
Mooncake_connector.py
3. If a prefill rank is not pulled KVCache by any decode ranks, the
first rank in decode node will send "DONE_RECVING_MSG" to free its
blocks. The related code is in Function of
`_send_done_signal_to_free_remote_port` in Mooncake_connector.py

This PR is tested in many pcp/dcp parallelisms, and the accuracy are all
correct.
MLA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DCP4/DP2
Prefill node:  TP8/PCP2/DCP4, Decode node: TP4/DCP2/DP4
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DCP4/DP4
Prefill node:  TP8/PCP2, Decode node: TP4/DCP2

GQA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DCP2/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DP4
Prefill node:  TP16/DCP2/PCP1, Decode node: TP8/DCP2/DP2

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e
- Co-author by: Daishixun dsxtsteven@sina.com

---------

Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request Jan 8, 2026
…#5224)

### What this PR does / why we need it?
1. This PR is proposed to support complicated pcp/dcp parallelisms in
Prefill and Decode nodes in Mooncake, such as Prefill: TP8/PCP2DCP8 and
Decode: TP8/DCP4/DP2, which is not supported now. We establish the link
mappings to transfer KVCache between prefill and decode nodes. The main
function is realized in Function of `_get_kv_split_metadata` in
Mooncake_connector.py
2. After a prefill rank is pulled KVCache by a decode rank, the decode
rank will send `DONE_RECVING_MSG` to the prefill rank and the prefill
rank will free its KVCache blocks. If a prefill rank is pulled KVCache
more than one time by several decode ranks and it surely could happen in
complicated pcp/dcp parallelisms, it will cause the prefill rank free
its KVCache blocks for several times, which could cause memory issue.
This PR solve this issue by counting the times of prefill rank would be
pulled KVCache and in the last time, it will free the prefill rank
KVCache blocks. The related code is in Function of `run_busy_loop` in
Mooncake_connector.py
3. If a prefill rank is not pulled KVCache by any decode ranks, the
first rank in decode node will send "DONE_RECVING_MSG" to free its
blocks. The related code is in Function of
`_send_done_signal_to_free_remote_port` in Mooncake_connector.py

### How was this patch tested?
This PR is tested in many pcp/dcp parallelisms, and the accuracy are all
correct.
MLA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DCP4/DP2
Prefill node:  TP8/PCP2/DCP4, Decode node: TP4/DCP2/DP4
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DCP4/DP4
Prefill node:  TP8/PCP2, Decode node: TP4/DCP2

GQA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DCP2/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DP4
Prefill node:  TP16/DCP2/PCP1, Decode node: TP8/DCP2/DP2


- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e
- Co-author by: Daishixun dsxtsteven@sina.com

---------

Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…#5224)

### What this PR does / why we need it?
1. This PR is proposed to support complicated pcp/dcp parallelisms in
Prefill and Decode nodes in Mooncake, such as Prefill: TP8/PCP2DCP8 and
Decode: TP8/DCP4/DP2, which is not supported now. We establish the link
mappings to transfer KVCache between prefill and decode nodes. The main
function is realized in Function of `_get_kv_split_metadata` in
Mooncake_connector.py
2. After a prefill rank is pulled KVCache by a decode rank, the decode
rank will send `DONE_RECVING_MSG` to the prefill rank and the prefill
rank will free its KVCache blocks. If a prefill rank is pulled KVCache
more than one time by several decode ranks and it surely could happen in
complicated pcp/dcp parallelisms, it will cause the prefill rank free
its KVCache blocks for several times, which could cause memory issue.
This PR solve this issue by counting the times of prefill rank would be
pulled KVCache and in the last time, it will free the prefill rank
KVCache blocks. The related code is in Function of `run_busy_loop` in
Mooncake_connector.py
3. If a prefill rank is not pulled KVCache by any decode ranks, the
first rank in decode node will send "DONE_RECVING_MSG" to free its
blocks. The related code is in Function of
`_send_done_signal_to_free_remote_port` in Mooncake_connector.py

### How was this patch tested?
This PR is tested in many pcp/dcp parallelisms, and the accuracy are all
correct.
MLA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DCP4/DP2
Prefill node:  TP8/PCP2/DCP4, Decode node: TP4/DCP2/DP4
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DCP4/DP4
Prefill node:  TP8/PCP2, Decode node: TP4/DCP2

GQA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DCP2/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DP4
Prefill node:  TP16/DCP2/PCP1, Decode node: TP8/DCP2/DP2

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e
- Co-author by: Daishixun dsxtsteven@sina.com

---------

Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…#5224)

### What this PR does / why we need it?
1. This PR is proposed to support complicated pcp/dcp parallelisms in
Prefill and Decode nodes in Mooncake, such as Prefill: TP8/PCP2DCP8 and
Decode: TP8/DCP4/DP2, which is not supported now. We establish the link
mappings to transfer KVCache between prefill and decode nodes. The main
function is realized in Function of `_get_kv_split_metadata` in
Mooncake_connector.py
2. After a prefill rank is pulled KVCache by a decode rank, the decode
rank will send `DONE_RECVING_MSG` to the prefill rank and the prefill
rank will free its KVCache blocks. If a prefill rank is pulled KVCache
more than one time by several decode ranks and it surely could happen in
complicated pcp/dcp parallelisms, it will cause the prefill rank free
its KVCache blocks for several times, which could cause memory issue.
This PR solve this issue by counting the times of prefill rank would be
pulled KVCache and in the last time, it will free the prefill rank
KVCache blocks. The related code is in Function of `run_busy_loop` in
Mooncake_connector.py
3. If a prefill rank is not pulled KVCache by any decode ranks, the
first rank in decode node will send "DONE_RECVING_MSG" to free its
blocks. The related code is in Function of
`_send_done_signal_to_free_remote_port` in Mooncake_connector.py

### How was this patch tested?
This PR is tested in many pcp/dcp parallelisms, and the accuracy are all
correct.
MLA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DCP4/DP2
Prefill node:  TP8/PCP2/DCP4, Decode node: TP4/DCP2/DP4
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DCP4/DP4
Prefill node:  TP8/PCP2, Decode node: TP4/DCP2

GQA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DCP2/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DP4
Prefill node:  TP16/DCP2/PCP1, Decode node: TP8/DCP2/DP2


- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e
- Co-author by: Daishixun dsxtsteven@sina.com

---------

Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…#5224)

### What this PR does / why we need it?
1. This PR is proposed to support complicated pcp/dcp parallelisms in
Prefill and Decode nodes in Mooncake, such as Prefill: TP8/PCP2DCP8 and
Decode: TP8/DCP4/DP2, which is not supported now. We establish the link
mappings to transfer KVCache between prefill and decode nodes. The main
function is realized in Function of `_get_kv_split_metadata` in
Mooncake_connector.py
2. After a prefill rank is pulled KVCache by a decode rank, the decode
rank will send `DONE_RECVING_MSG` to the prefill rank and the prefill
rank will free its KVCache blocks. If a prefill rank is pulled KVCache
more than one time by several decode ranks and it surely could happen in
complicated pcp/dcp parallelisms, it will cause the prefill rank free
its KVCache blocks for several times, which could cause memory issue.
This PR solve this issue by counting the times of prefill rank would be
pulled KVCache and in the last time, it will free the prefill rank
KVCache blocks. The related code is in Function of `run_busy_loop` in
Mooncake_connector.py
3. If a prefill rank is not pulled KVCache by any decode ranks, the
first rank in decode node will send "DONE_RECVING_MSG" to free its
blocks. The related code is in Function of
`_send_done_signal_to_free_remote_port` in Mooncake_connector.py

### How was this patch tested?
This PR is tested in many pcp/dcp parallelisms, and the accuracy are all
correct.
MLA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP8, Decode node: TP8/DCP4/DP2
Prefill node:  TP8/PCP2/DCP4, Decode node: TP4/DCP2/DP4
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DCP4/DP4
Prefill node:  TP8/PCP2, Decode node: TP4/DCP2

GQA model:
Prefill node:  TP8/DP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP8/DCP2/DP2
Prefill node:  TP8/PCP2/DCP2, Decode node: TP4/DP4
Prefill node:  TP16/DCP2/PCP1, Decode node: TP8/DCP2/DP2

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e
- Co-author by: Daishixun dsxtsteven@sina.com

---------

Signed-off-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: wangxiaochao <w00642655@china.huawei.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants