Skip to content

[WIP] Add support for prefix caching in mamba_kv_cache#3

Closed
Copilot wants to merge 1 commit intomainfrom
copilot/add-prefix-caching-support
Closed

[WIP] Add support for prefix caching in mamba_kv_cache#3
Copilot wants to merge 1 commit intomainfrom
copilot/add-prefix-caching-support

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 10, 2026

云智能体已开始处理 我改成云了,你重新回答吧,并将随工作推进更新此拉取请求。

Original prompt

我改成云了,你重新回答吧

[Chronological Review: The conversation began with the user requesting to add support for prefix caching in the mamba_kv_cache. The user indicated a desire to understand the thought process behind this implementation before discussing it step by step. The user then provided attachments related to the vllm-ascend project, which is relevant to the task at hand. The user later mentioned a change to "云" (cloud), suggesting a shift in context or approach, but did not elaborate further.]

[Intent Mapping: The user explicitly requested to "增加对于mamba_kv_cache的prefix caching的支持" (add support for prefix caching in mamba_kv_cache) and asked to understand the thought process behind it before proceeding with a step-by-step discussion.]

[Technical Inventory: The conversation references the vllm-ascend repository, which is part of the vllm-project. The user is interested in the prefix caching feature related to mamba_kv_cache, indicating a focus on caching mechanisms within the project.]

[Code Archaeology: No specific files or code changes have been discussed yet, but the user has provided context about the repository and its relevance to the task.]

[Progress Assessment: The user has initiated the conversation by outlining their goal but has not yet begun the implementation or detailed discussion of the prefix caching feature.]

[Context Validation: The user’s request for understanding the thought process behind the prefix caching implementation is crucial for continuing the discussion effectively.]

[Recent Commands Analysis: The most recent commands involved the user providing attachments related to the vllm-ascend project. The specific commands were to attach the repository information, which includes the repository name, owner, and branch details. The user also indicated a change in context with "我改成云了" (I changed it to cloud), but did not provide further details.]

1. Conversation Overview: - Primary Objectives: The user requested to "增加对于mamba_kv_cache的prefix caching的支持" (add support for prefix caching in mamba_kv_cache) and to understand the thought process behind this implementation. - Session Context: The conversation began with the user outlining their goal, followed by providing repository attachments relevant to the task. The user later indicated a change in context but did not elaborate. - User Intent Evolution: The user's intent has remained focused on understanding and implementing prefix caching support, with a recent mention of a change to "云" (cloud).
  1. Technical Foundation:
  • Repository: vllm-ascend
  • Owner: vllm-project
  • Current Branch: main
  1. Codebase Status:
  • No specific files or code changes have been discussed yet, as the conversation is still in the planning phase.
  1. Problem Resolution:
  • Issues Encountered: No technical problems have been identified yet.
  • Solutions Implemented: None at this stage.
  • Debugging Context: No ongoing troubleshooting efforts have been mentioned.
  • Lessons Learned: No insights have been documented yet.
  1. Progress Tracking:
  • Completed Tasks: The user has articulated their goal and provided context.
  • Partially Complete Work: The discussion on the implementation of prefix caching is pending.
  • Validated Outcomes: No features or code have been confirmed working yet.
  1. Active Work State:
  • Current Focus: The user is focused on understanding the implementation of prefix caching in mamba_kv_cache.
  • Recent Context: The user provided repository attachments and indicated a change in context.
  • Working Code: No specific code snippets have been discussed yet.
  • Immediate Context: The user was preparing to discuss the thought process behind prefix caching when the token budget was exceeded.
  1. Recent Operations:
  • Last Agent Commands: The user attached information about the vllm-ascend repository.
  • Tool Results Summary: The attachment included details about the repository name, owner, and branch.
  • Pre-Summary State: The agent was actively discussing the user's request for prefix caching support when the token budget was exceeded.
  • Operation Context: The user’s commands were executed to provide context for the upcoming discussion on prefix caching.
  1. Continuation Plan:
  • Pending Task 1: Discuss the thought process behind implementing prefix caching in mamba_kv_cache.
  • Pending Task 2: Explore specific implementation steps and code changes needed for the prefix caching feature.
  • Priority Information: Understanding the thought process is critical for the next steps.
  • Next Action: Engage in a detailed discussion about the prefix caching implementation approach as requested by the user.

创建自 VS Code


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot stopped work on behalf of lHrHenry233 due to an error March 10, 2026 09:03
@lHrHenry233 lHrHenry233 deleted the copilot/add-prefix-caching-support branch March 10, 2026 09:04
lHrHenry233 pushed a commit that referenced this pull request Apr 10, 2026
…(v3.1)

- Port upstream _causal_conv1d_fwd_kernel as NPU Triton kernel
  - Handles initial/final/intermediate conv state in-kernel
  - Supports APC block boundary state writes
  - NPU adaptations: removed .cache_modifier, kept debug_barrier
- Rewrite causal_conv1d_fn to dispatch to new Triton kernel
- Rewrite gdn.py conv1d path: split decode/prefill like upstream
  - Decode: causal_conv1d_update_npu with block params
  - Prefill: causal_conv1d_fn with APC params (new kernel)
- Fix SSM #6: _build_initial_state only zeros prefill sequences
- Fix SSM #7: _write_final_states adds slot >= 0 validation
- Fix SSM #8: _scatter_intermediate_states adds unaligned offset
- Update all 36 UTs to pass with new num_computed_tokens_all field

Alignment status vs upstream #26807:
  #1 conv1d prefill kernel:     FIXED (kernel ported)
  #3 causal_conv1d_fn params:   FIXED (rewritten)
  #4 intermediate conv state:   FIXED (kernel internal)
  #6 SSM zeroing scope:         FIXED
  #7 _write_final_states guard: FIXED
  #8 SSM scatter alignment:     FIXED
  #9 causal_conv1d_fn signature: FIXED
  #2 decode pre-copy:           KEEP (NPU needs it)
  #5 SSM decode index:          OK (correct approach)
  #10 conv layout hardcoded:    DEFERRED

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants