【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads.#6056
Conversation
Signed-off-by: fems14 <1804143737@qq.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a memory deallocation failure by introducing preempted_req_ids into the AscendConnectorMetadata and refining how stored requests are tracked and decremented. The changes correctly propagate the preempted_req_ids from the scheduler to the worker. However, there are critical type safety and logical issues related to how AscendConnectorMetadata is handled and how stored_requests are checked, which could lead to runtime errors or incorrect behavior.
| assert self.connector_worker is not None | ||
| done_sending, done_recving = self.connector_worker.get_finished( | ||
| finished_req_ids) | ||
| finished_req_ids, self._get_connector_metadata()) |
There was a problem hiding this comment.
The self._get_connector_metadata() method is inherited from KVConnectorBase_V1. The get_finished method in KVPoolWorker now explicitly expects an AscendConnectorMetadata object, which includes preempted_req_ids. If the inherited _get_connector_metadata() returns a generic KVConnectorMetadata (which it likely does, as AscendConnectorMetadata is a specific implementation), accessing meta.preempted_req_ids in KVPoolWorker will result in an AttributeError. To ensure type safety and correct functionality, AscendStoreConnector should override _get_connector_metadata to explicitly return an AscendConnectorMetadata instance, ensuring it contains the necessary preempted_req_ids from the scheduler.
| if req_id not in self.stored_requests: | ||
| self.request_queue.task_done() | ||
| return |
There was a problem hiding this comment.
The check if req_id not in self.stored_requests: might not behave as expected. self.stored_requests is a defaultdict(int). If a request's count has been decremented to 0 by dec_stored_request but the key has not been explicitly removed by delete_finished_stored_request, the req_id will still be present in self.stored_requests (with a value of 0). In such a case, this condition would be False, and the method would proceed to process a request that has no pending blocks to store, potentially leading to incorrect behavior or resource issues. Consider checking if self.stored_requests.get(req_id, 0) <= 0: to correctly identify requests that are logically finished from the sending perspective.
| if req_id not in self.stored_requests: | |
| self.request_queue.task_done() | |
| return | |
| if self.stored_requests.get(req_id, 0) <= 0: | |
| self.request_queue.task_done() | |
| return |
…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [0.13.0][Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5958) [v0.13.0][Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (vllm-project#5887) [0.13.0][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5972) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6057) [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug (vllm-project#6038) [0.13.0][cherry-pick][bugfix] fix bug of triton mrope (vllm-project#6009) 【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6056)
…ayer under re-computation workloads. (vllm-project#6056) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? [releases/v0.13.0] Resolved a double-free memory vulnerability in the pooling layer under re-computation scenarios. PR for main branch: vllm-project#6045 <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Signed-off-by: fems14 <1804143737@qq.com>
…ayer under re-computation workloads. (vllm-project#6056) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? [releases/v0.13.0] Resolved a double-free memory vulnerability in the pooling layer under re-computation scenarios. PR for main branch: vllm-project#6045 <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Signed-off-by: fems14 <1804143737@qq.com>
…ayer under re-computation workloads. (vllm-project#6056) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? [releases/v0.13.0] Resolved a double-free memory vulnerability in the pooling layer under re-computation scenarios. PR for main branch: vllm-project#6045 <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Signed-off-by: fems14 <1804143737@qq.com>
What this PR does / why we need it?
[releases/v0.13.0] Resolved a double-free memory vulnerability in the pooling layer under re-computation scenarios.
PR for main branch: #6045
Does this PR introduce any user-facing change?
How was this patch tested?