[DOC]Add Memcache Usage Guide by DreamerLeader · Pull Request #6476 · vllm-project/vllm-ascend

DreamerLeader · 2026-02-02T02:32:46Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.14.1
vLLM main: vllm-project/vllm@dc917cc

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

gemini-code-assist · 2026-02-02T02:32:59Z

Summary of Changes

Hello @DreamerLeader, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, detailed documentation section that guides users through the process of integrating and utilizing Memcache as a KV Pool backend. It provides essential information on installation, configuration, environment setup, and practical examples for various deployment scenarios, ensuring users can effectively leverage Memcache for improved KV caching.

Highlights

New Memcache Usage Guide: A comprehensive guide has been added detailing how to use Memcache as a KV Pool backend, covering installation, configuration, and operational procedures.
Detailed Configuration Instructions: The documentation now includes explicit steps for configuring Memcache's mmc-meta.conf and mmc-local.conf files, including TLS settings and key parameters like meta_service_url and local_service.dram.size.
Environment Variable Setup: Instructions for setting up necessary Memcache and MemFabric environment variables, such as MMC_META_CONFIG_PATH, are now provided.
Operational Examples for Scenarios: Concrete examples are given for running Memcache in both PD Disaggregation (prefill and decode nodes) and Colocation scenarios, tailored for different hardware series (800I A2/A3 and 800T A2/A3).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-02T02:32:59Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds a comprehensive guide on using Memcache as a KV Pool backend in the documentation. The guide covers installation, configuration, and provides detailed examples for both disaggregated and colocation scenarios. While this is a great addition, I've found a couple of critical issues in the example scripts where the JSON configurations are invalid. These syntax errors will cause the example commands to fail and need to be fixed.

gemini-code-assist · 2026-02-02T02:34:44Z

+                                                "tp_size": 4
+                                        }
+                                }
+                    } ,


This line has a trailing comma, which makes the JSON configuration invalid. This will cause the shell command to fail. Please remove the comma. I'd also recommend formatting the JSON for better readability.

Suggested change

} ,

}

gemini-code-assist · 2026-02-02T02:34:44Z

+      "tp_size": 8
+     }
+    }
+   },


This line has a trailing comma, which makes the JSON configuration invalid. This will cause the shell command to fail. Please remove the comma. I'd also recommend formatting the JSON for better readability.

Suggested change

},

}

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Pz1116 · 2026-02-02T03:09:59Z

+  --gpu-memory-utilization 0.9 \
+  --max-num_seqs 20 \
+  --no-enable-prefix-caching \
+  --additional_config='{"ascend_scheduler_config":{"enabled":false}, "enable_shared_expert_dp":false, "chunked_prefill_for_mla":true}' \


"chunked_prefill_for_mla":true is deprecated, we can remove this.

Pz1116 · 2026-02-02T03:10:52Z

+                               "kv_role": "kv_consumer",
+                               "kv_connector_extra_config":{
+                                       "backend": "memcache",
+                                       "mooncake_rpc_port":"1"


try use "lookup_rpc_port" since "mooncake_rpc_port" will be removed in the future

Pz1116 · 2026-02-02T03:11:47Z

+  --max-num_seqs 28 \
+  --max-model-len 16384 \
+  --max-num-batched-tokens 16384 \
+  --additional_config='{"ascend_scheduler_config":{"enabled":false}, "enable_shared_expert_dp":false, "torchair_graph_config":{"enabled":false},"chunked_prefill_for_mla":true}' \


Consider removing "torchair_graph_config":{"enabled":false}, and "chunked_prefill_for_mla":true since both are deprecated.

Pz1116 · 2026-02-02T03:14:21Z

+  --served-model-name dsv3 \
+  --trust-remote-code \
+  --enforce-eager \
+  -dp 2 \


consider using --data-parallel-size and --tensor-parallel-size for better clarity

Pz1116 · 2026-02-02T03:14:45Z

+  --model=xxxxxxxxxxxxxxxx/DeepSeek \
+  --served-model-name dsv3 \
+  --trust-remote-code \
+  -dp 2 \


same here, consider changing to --data-parallel-size and --tensor-parallel-size

Pz1116 · 2026-02-02T03:29:21Z

+  --gpu-memory-utilization 0.9 \
+  --max-num_seqs 20 \
+  --no-enable-prefix-caching \
+  --additional_config='{"ascend_scheduler_config":{"enabled":false}, "enable_shared_expert_dp":false, "chunked_prefill_for_mla":true}' \


"chunked_prefill_for_mla":true is no longer needed

Pz1116 · 2026-02-02T03:35:00Z

+        "kv_connector": "MultiConnector",
+        "kv_role": "kv_consumer",
+        "kv_connector_extra_config": {
+                "use_layerwise": false,


move "use_layerwise": false into Ascend store connector config

Pz1116 · 2026-02-02T03:45:17Z

+     "kv_role": "kv_producer",
+     "kv_connector_extra_config":{
+      "backend": "memcache",
+      "mooncake_rpc_port":"0"


same here, "lookup_rpc_port"

Pz1116 · 2026-02-02T03:45:55Z

+  --max-num_seqs 28 \
+  --speculative-config '{"num_speculative_tokens": 1, "method":"deepseek_mtp"}' \
+  --enable_expert_parallel \
+  --additional_config='{"ascend_scheduler_config":{"enabled":false}, "enable_shared_expert_dp":false, "torchair_graph_config":{"enabled": false},"chunked_prefill_for_mla":true}' \


remove "torchair_graph_config":{"enabled": false},"chunked_prefill_for_mla":true}'

Pz1116 · 2026-02-02T03:46:20Z

+ "kv_connector": "MultiConnector",
+ "kv_role": "kv_consumer",
+ "kv_connector_extra_config": {
+  "use_layerwise": false,


move "use_layerwise" in kv pool config

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Pz1116

LGTM

github-actions · 2026-02-04T08:39:56Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Pz1116 · 2026-02-09T06:19:55Z

@LCAIZJ @wangxiyuan

LCAIZJ · 2026-02-09T13:54:27Z

LGTM

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Feat] 310p support MoE W8A8 quantizaition (vllm-project#6641) [TEST]add a qwen3-30b acc case with mooncake mempool (vllm-project#6244) [MOE Refactor] Remove QuantType in prepare_finalize.py (vllm-project#6534) [EPLB] Avoiding eplb's dependency on a specified model (vllm-project#6528) [Doc][Misc] Restructure tutorial documentation (vllm-project#6501) implement batch invariant with ascendc (vllm-project#6590) [Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (vllm-project#6629) [Misc] upgrade to vllm main (vllm-project#6646) [main][Docs] Fix spelling errors across documentation (vllm-project#6649) [bugfix]Fix no attribute 'data' when MLAPO is enable (vllm-project#6601) [DOC]Add Memcache Usage Guide (vllm-project#6476) [main][bugfix] Fix spec acceptance rate problem in vllm_0.15.0 (vllm-project#6606) [Test][LoRA] Add e2e test for base model inference (vllm-project#6624) [refactor]Optimized the kvcache usage of Deepseek v3.2 (vllm-project#6610) [Feat](sfa,dcp) support dcp for sfa (vllm-project#6563) [BugFix] Add support for rotary_dim parameter when using partial rope in rotary_embedding (vllm-project#6581) [fix bug] fix tensor mismatch bug in sigmoid operate test case (vllm-project#6619) [Kernel]: Optimize DispatchFFNCombine performance (vllm-project#6468) [MISC] Clean up useless env USE_OPTIMIZED_MODEL (vllm-project#6618)

### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>

### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>

Add Memcache Usage Guide

e4d56c5

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

DreamerLeader requested review from LCAIZJ, Yikun and wangxiyuan as code owners February 2, 2026 02:32

github-actions Bot added the documentation Improvements or additions to documentation label Feb 2, 2026

gemini-code-assist Bot reviewed Feb 2, 2026

View reviewed changes

Add Memcache Usage Guide

09dab2f

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Pz1116 suggested changes Feb 2, 2026

View reviewed changes

Add Memcache Usage Guide

6e9bce7

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

DreamerLeader requested a review from Pz1116 February 2, 2026 07:17

房建伟 and others added 3 commits February 2, 2026 15:19

Add Memcache Usage Guide

7dcd89e

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Merge branch 'vllm-project:main' into 0202

83d3d5c

kv pool mooncake bugfix

47708ee

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

DreamerLeader requested a review from MengqingCao as a code owner February 3, 2026 01:20

Pz1116 and others added 2 commits February 3, 2026 09:25

kv pool mooncake bugfix

ca78207

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Add Memcache Usage Guide

4448ec2

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

whx-sjtu added the ready read for review label Feb 4, 2026

Pz1116 approved these changes Feb 4, 2026

View reviewed changes

github-actions Bot added the merge-conflicts label Feb 4, 2026

Pz1116 and others added 4 commits February 4, 2026 17:49

Add Memcache Usage Guide

f7845a6

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Add Memcache Usage Guide

490845b

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Add Memcache Usage Guide

d50230d

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Merge branch 'main' into 0202

76726a8

github-actions Bot removed the merge-conflicts label Feb 9, 2026

Add Memcache Usage Guide

26b1ca6

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

LCAIZJ approved these changes Feb 9, 2026

View reviewed changes

LCAIZJ merged commit 905f076 into vllm-project:main Feb 9, 2026
11 checks passed

wangxiyuan mentioned this pull request Feb 24, 2026

[Misc]: test #6787

Closed

DreamerLeader deleted the 0202 branch April 15, 2026 08:26

Conversation

DreamerLeader commented Feb 2, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions Bot commented Feb 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pz1116 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 4, 2026

Uh oh!

Pz1116 commented Feb 9, 2026

Uh oh!

LCAIZJ commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DreamerLeader commented Feb 2, 2026 •

edited by github-actions Bot

Loading