Skip to content

【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads.#6056

Merged
zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
fems14:0.13.0
Jan 20, 2026
Merged

【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads.#6056
zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
fems14:0.13.0

Conversation

@fems14
Copy link
Copy Markdown
Contributor

@fems14 fems14 commented Jan 20, 2026

What this PR does / why we need it?

[releases/v0.13.0] Resolved a double-free memory vulnerability in the pooling layer under re-computation scenarios.
PR for main branch: #6045

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: fems14 <1804143737@qq.com>
@fems14 fems14 changed the title 【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. 【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. Jan 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a memory deallocation failure by introducing preempted_req_ids into the AscendConnectorMetadata and refining how stored requests are tracked and decremented. The changes correctly propagate the preempted_req_ids from the scheduler to the worker. However, there are critical type safety and logical issues related to how AscendConnectorMetadata is handled and how stored_requests are checked, which could lead to runtime errors or incorrect behavior.

assert self.connector_worker is not None
done_sending, done_recving = self.connector_worker.get_finished(
finished_req_ids)
finished_req_ids, self._get_connector_metadata())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The self._get_connector_metadata() method is inherited from KVConnectorBase_V1. The get_finished method in KVPoolWorker now explicitly expects an AscendConnectorMetadata object, which includes preempted_req_ids. If the inherited _get_connector_metadata() returns a generic KVConnectorMetadata (which it likely does, as AscendConnectorMetadata is a specific implementation), accessing meta.preempted_req_ids in KVPoolWorker will result in an AttributeError. To ensure type safety and correct functionality, AscendStoreConnector should override _get_connector_metadata to explicitly return an AscendConnectorMetadata instance, ensuring it contains the necessary preempted_req_ids from the scheduler.

Comment on lines +137 to +139
if req_id not in self.stored_requests:
self.request_queue.task_done()
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The check if req_id not in self.stored_requests: might not behave as expected. self.stored_requests is a defaultdict(int). If a request's count has been decremented to 0 by dec_stored_request but the key has not been explicitly removed by delete_finished_stored_request, the req_id will still be present in self.stored_requests (with a value of 0). In such a case, this condition would be False, and the method would proceed to process a request that has no pending blocks to store, potentially leading to incorrect behavior or resource issues. Consider checking if self.stored_requests.get(req_id, 0) <= 0: to correctly identify requests that are logically finished from the sending perspective.

Suggested change
if req_id not in self.stored_requests:
self.request_queue.task_done()
return
if self.stored_requests.get(req_id, 0) <= 0:
self.request_queue.task_done()
return

@zzzzwwjj zzzzwwjj merged commit 3ddfbfe into vllm-project:releases/v0.13.0 Jan 20, 2026
4 of 5 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 21, 2026
…lm-ascend into FIA_v0.13.0

* 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend:
  [0.13.0][Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5958)
  [v0.13.0][Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (vllm-project#5887)
  [0.13.0][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5972)
  [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6057)
  [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug (vllm-project#6038)
  [0.13.0][cherry-pick][bugfix] fix bug of triton mrope (vllm-project#6009)
  【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6056)
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…ayer under re-computation workloads. (vllm-project#6056)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
[releases/v0.13.0] Resolved a double-free memory vulnerability in the
pooling layer under re-computation scenarios.
PR for main branch:
vllm-project#6045
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: fems14 <1804143737@qq.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
…ayer under re-computation workloads. (vllm-project#6056)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
[releases/v0.13.0] Resolved a double-free memory vulnerability in the
pooling layer under re-computation scenarios.
PR for main branch:
vllm-project#6045
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: fems14 <1804143737@qq.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
…ayer under re-computation workloads. (vllm-project#6056)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
[releases/v0.13.0] Resolved a double-free memory vulnerability in the
pooling layer under re-computation scenarios.
PR for main branch:
vllm-project#6045
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: fems14 <1804143737@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants