Skip to content

[DOC]Add Memcache Usage Guide#6476

Merged
LCAIZJ merged 13 commits intovllm-project:mainfrom
DreamerLeader:0202
Feb 9, 2026
Merged

[DOC]Add Memcache Usage Guide#6476
LCAIZJ merged 13 commits intovllm-project:mainfrom
DreamerLeader:0202

Conversation

@DreamerLeader
Copy link
Copy Markdown
Contributor

@DreamerLeader DreamerLeader commented Feb 2, 2026

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Feb 2, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @DreamerLeader, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, detailed documentation section that guides users through the process of integrating and utilizing Memcache as a KV Pool backend. It provides essential information on installation, configuration, environment setup, and practical examples for various deployment scenarios, ensuring users can effectively leverage Memcache for improved KV caching.

Highlights

  • New Memcache Usage Guide: A comprehensive guide has been added detailing how to use Memcache as a KV Pool backend, covering installation, configuration, and operational procedures.
  • Detailed Configuration Instructions: The documentation now includes explicit steps for configuring Memcache's mmc-meta.conf and mmc-local.conf files, including TLS settings and key parameters like meta_service_url and local_service.dram.size.
  • Environment Variable Setup: Instructions for setting up necessary Memcache and MemFabric environment variables, such as MMC_META_CONFIG_PATH, are now provided.
  • Operational Examples for Scenarios: Concrete examples are given for running Memcache in both PD Disaggregation (prefill and decode nodes) and Colocation scenarios, tailored for different hardware series (800I A2/A3 and 800T A2/A3).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 2, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive guide on using Memcache as a KV Pool backend in the documentation. The guide covers installation, configuration, and provides detailed examples for both disaggregated and colocation scenarios. While this is a great addition, I've found a couple of critical issues in the example scripts where the JSON configurations are invalid. These syntax errors will cause the example commands to fail and need to be fixed.

"tp_size": 4
}
}
} ,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This line has a trailing comma, which makes the JSON configuration invalid. This will cause the shell command to fail. Please remove the comma. I'd also recommend formatting the JSON for better readability.

Suggested change
} ,
}

"tp_size": 8
}
}
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This line has a trailing comma, which makes the JSON configuration invalid. This will cause the shell command to fail. Please remove the comma. I'd also recommend formatting the JSON for better readability.

Suggested change
},
}

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
--gpu-memory-utilization 0.9 \
--max-num_seqs 20 \
--no-enable-prefix-caching \
--additional_config='{"ascend_scheduler_config":{"enabled":false}, "enable_shared_expert_dp":false, "chunked_prefill_for_mla":true}' \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"chunked_prefill_for_mla":true is deprecated, we can remove this.

"kv_role": "kv_consumer",
"kv_connector_extra_config":{
"backend": "memcache",
"mooncake_rpc_port":"1"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try use "lookup_rpc_port" since "mooncake_rpc_port" will be removed in the future

--max-num_seqs 28 \
--max-model-len 16384 \
--max-num-batched-tokens 16384 \
--additional_config='{"ascend_scheduler_config":{"enabled":false}, "enable_shared_expert_dp":false, "torchair_graph_config":{"enabled":false},"chunked_prefill_for_mla":true}' \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing "torchair_graph_config":{"enabled":false}, and "chunked_prefill_for_mla":true since both are deprecated.

--served-model-name dsv3 \
--trust-remote-code \
--enforce-eager \
-dp 2 \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using --data-parallel-size and --tensor-parallel-size for better clarity

--model=xxxxxxxxxxxxxxxx/DeepSeek \
--served-model-name dsv3 \
--trust-remote-code \
-dp 2 \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, consider changing to --data-parallel-size and --tensor-parallel-size

--gpu-memory-utilization 0.9 \
--max-num_seqs 20 \
--no-enable-prefix-caching \
--additional_config='{"ascend_scheduler_config":{"enabled":false}, "enable_shared_expert_dp":false, "chunked_prefill_for_mla":true}' \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"chunked_prefill_for_mla":true is no longer needed

"kv_connector": "MultiConnector",
"kv_role": "kv_consumer",
"kv_connector_extra_config": {
"use_layerwise": false,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move "use_layerwise": false into Ascend store connector config

"kv_role": "kv_producer",
"kv_connector_extra_config":{
"backend": "memcache",
"mooncake_rpc_port":"0"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, "lookup_rpc_port"

--max-num_seqs 28 \
--speculative-config '{"num_speculative_tokens": 1, "method":"deepseek_mtp"}' \
--enable_expert_parallel \
--additional_config='{"ascend_scheduler_config":{"enabled":false}, "enable_shared_expert_dp":false, "torchair_graph_config":{"enabled": false},"chunked_prefill_for_mla":true}' \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "torchair_graph_config":{"enabled": false},"chunked_prefill_for_mla":true}'

"kv_connector": "MultiConnector",
"kv_role": "kv_consumer",
"kv_connector_extra_config": {
"use_layerwise": false,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move "use_layerwise" in kv pool config

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
@DreamerLeader DreamerLeader requested a review from Pz1116 February 2, 2026 07:17
房建伟 and others added 3 commits February 2, 2026 15:19
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Pz1116 and others added 2 commits February 3, 2026 09:25
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
@whx-sjtu whx-sjtu added the ready read for review label Feb 4, 2026
Copy link
Copy Markdown
Collaborator

@Pz1116 Pz1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 4, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Pz1116 and others added 4 commits February 4, 2026 17:49
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
@Pz1116
Copy link
Copy Markdown
Collaborator

Pz1116 commented Feb 9, 2026

@LCAIZJ @wangxiyuan

@LCAIZJ
Copy link
Copy Markdown
Collaborator

LCAIZJ commented Feb 9, 2026

LGTM

@LCAIZJ LCAIZJ merged commit 905f076 into vllm-project:main Feb 9, 2026
11 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Feb 11, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [Feat] 310p support MoE W8A8 quantizaition (vllm-project#6641)
  [TEST]add a qwen3-30b acc case with mooncake mempool (vllm-project#6244)
  [MOE Refactor] Remove QuantType in prepare_finalize.py (vllm-project#6534)
  [EPLB] Avoiding eplb's dependency on a specified model (vllm-project#6528)
  [Doc][Misc] Restructure tutorial documentation (vllm-project#6501)
  implement batch invariant with ascendc (vllm-project#6590)
  [Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (vllm-project#6629)
  [Misc] upgrade to vllm main (vllm-project#6646)
  [main][Docs] Fix spelling errors across documentation (vllm-project#6649)
  [bugfix]Fix no attribute 'data' when MLAPO is enable  (vllm-project#6601)
  [DOC]Add Memcache Usage Guide (vllm-project#6476)
  [main][bugfix] Fix spec acceptance rate problem in vllm_0.15.0 (vllm-project#6606)
  [Test][LoRA] Add e2e test for base model inference (vllm-project#6624)
  [refactor]Optimized the kvcache usage of Deepseek v3.2 (vllm-project#6610)
  [Feat](sfa,dcp) support dcp for sfa (vllm-project#6563)
  [BugFix] Add support for rotary_dim parameter when using partial rope in rotary_embedding (vllm-project#6581)
  [fix bug] fix tensor mismatch bug in sigmoid operate test case (vllm-project#6619)
  [Kernel]: Optimize DispatchFFNCombine performance (vllm-project#6468)
  [MISC] Clean up useless env USE_OPTIMIZED_MODEL (vllm-project#6618)
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Feb 12, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
Signed-off-by: momochenchuw <chenchuw@huawei.com>
@wangxiyuan wangxiyuan mentioned this pull request Feb 24, 2026
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
@DreamerLeader DreamerLeader deleted the 0202 branch April 15, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready read for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants