【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. by fems14 · Pull Request #6056 · vllm-project/vllm-ascend

fems14 · 2026-01-20T14:40:53Z

What this PR does / why we need it?

[releases/v0.13.0] Resolved a double-free memory vulnerability in the pooling layer under re-computation scenarios.
PR for main branch: #6045

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: fems14 <1804143737@qq.com>

gemini-code-assist

Code Review

This pull request addresses a memory deallocation failure by introducing preempted_req_ids into the AscendConnectorMetadata and refining how stored requests are tracked and decremented. The changes correctly propagate the preempted_req_ids from the scheduler to the worker. However, there are critical type safety and logical issues related to how AscendConnectorMetadata is handled and how stored_requests are checked, which could lead to runtime errors or incorrect behavior.

gemini-code-assist · 2026-01-20T14:42:40Z

        assert self.connector_worker is not None
        done_sending, done_recving = self.connector_worker.get_finished(
-            finished_req_ids)
+            finished_req_ids, self._get_connector_metadata())


The self._get_connector_metadata() method is inherited from KVConnectorBase_V1. The get_finished method in KVPoolWorker now explicitly expects an AscendConnectorMetadata object, which includes preempted_req_ids. If the inherited _get_connector_metadata() returns a generic KVConnectorMetadata (which it likely does, as AscendConnectorMetadata is a specific implementation), accessing meta.preempted_req_ids in KVPoolWorker will result in an AttributeError. To ensure type safety and correct functionality, AscendStoreConnector should override _get_connector_metadata to explicitly return an AscendConnectorMetadata instance, ensuring it contains the necessary preempted_req_ids from the scheduler.

gemini-code-assist · 2026-01-20T14:42:40Z

+        if req_id not in self.stored_requests:
+            self.request_queue.task_done()
+            return


The check if req_id not in self.stored_requests: might not behave as expected. self.stored_requests is a defaultdict(int). If a request's count has been decremented to 0 by dec_stored_request but the key has not been explicitly removed by delete_finished_stored_request, the req_id will still be present in self.stored_requests (with a value of 0). In such a case, this condition would be False, and the method would proceed to process a request that has no pending blocks to store, potentially leading to incorrect behavior or resource issues. Consider checking if self.stored_requests.get(req_id, 0) <= 0: to correctly identify requests that are logically finished from the sending perspective.

Suggested change

if req_id not in self.stored_requests:

self.request_queue.task_done()

return

if self.stored_requests.get(req_id, 0) <= 0:

self.request_queue.task_done()

return

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [0.13.0][Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5958) [v0.13.0][Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (vllm-project#5887) [0.13.0][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5972) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6057) [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug (vllm-project#6038) [0.13.0][cherry-pick][bugfix] fix bug of triton mrope (vllm-project#6009) 【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6056)

…ayer under re-computation workloads. (vllm-project#6056)  ### What this PR does / why we need it? [releases/v0.13.0] Resolved a double-free memory vulnerability in the pooling layer under re-computation scenarios. PR for main branch: vllm-project#6045  ### Does this PR introduce _any_ user-facing change?  ### How was this patch tested?  Signed-off-by: fems14 <1804143737@qq.com>

kvpool bugfix

cb8284f

Signed-off-by: fems14 <1804143737@qq.com>

fems14 changed the title ~~【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads.~~ 【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. Jan 20, 2026

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

zzzzwwjj merged commit 3ddfbfe into vllm-project:releases/v0.13.0 Jan 20, 2026
4 of 5 checks passed

Yikun mentioned this pull request Feb 5, 2026

[v0.13.0rc2] FAQ / Feedback | 问题/反馈 #6186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads.#6056

【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads.#6056
zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
fems14:0.13.0

fems14 commented Jan 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fems14 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fems14 commented Jan 20, 2026 •

edited

Loading