[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer#33377
Conversation
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
…tionality Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
There was a problem hiding this comment.
Code Review
This pull request introduces a bugfix for the LMCache multi-process connector to prevent potential double-free and memory leak issues, particularly with asynchronous scheduling. The changes involve tracking new requests per step. While the overall logic is sound and includes backward compatibility checks in some places, it misses these checks in others, which could lead to runtime errors with older versions of the lmcache library. I've identified two critical areas where adding hasattr guards would improve robustness and prevent potential crashes.
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py
Outdated
Show resolved
Hide resolved
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py
Outdated
Show resolved
Hide resolved
|
Hi @KuntaiDu, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
We should also check whether it's expected/reasonable that |
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
This behavior is only triggered when async scheduling is enabled, I guess it is a racing condition related issue. I'll spend some time on this |
|
Hi @KuntaiDu, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
…ust simply check scheduleroutput new requests Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
Hi @KuntaiDu, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
I have found the bug @njhill @orozery (cc @ApostaC ) in scheduler.py I have shipped the fix. |
…ead of just checking it is None Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
…vllm into kuntai-fix-double-free
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
I feel like it's better to ensure that every request will be sent via |
There was a problem hiding this comment.
Thanks @KuntaiDu for spotting this bug!
Looks like it was introduced in #29987 which added a new flow for aborting requests in between Scheduler.schedule and Scheduler.update_from_output.
BTW I don't see how this bug relates to async scheduling.
I was about to suggest to add a test_scheduler.py unit test.
However, I see that test_scheduler.py is completely lacking tests exercising the delay_free_blocks (i.e. async saves) path, so there is some groundwork to do beforehand.
cc @njhill
…cheduling + request abort + async KV cache transfer (vllm-project#33377) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Pai <416932041@qq.com>
…cheduling + request abort + async KV cache transfer (vllm-project#33377) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Pai <416932041@qq.com>
|
Thanks @KuntaiDu @orozery @ApostaC for the fix and analysis, the final fix also LGTM! I think why we didn't see this in other cases (e.g. NIXL connector) is that we return |
…cheduling + request abort + async KV cache transfer (vllm-project#33377) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
…cheduling + request abort + async KV cache transfer (vllm-project#33377) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Purpose
This PR fixes vLLM-side double-request-free bug in async scheduling + KV cache transfer + abort request.
Bug description: when the request is aborted, the same request may enter
get_finishedtwice and enter_free_blockstwice, resulting in double free.Reason: existing logic handles request abort by setting the request to
Noneand skip this request by skipping None request. However, in async KV cache transfer, the request won't be set toNonebecause it is still finalizing KV cache transfer ---- we need to userequest.is_finished()to check this type of request.Test Plan
End-to-end testing with lm-eval
Test Result
The lm-eval can normally finish with async scheduling.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.