[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer by KuntaiDu · Pull Request #33377 · vllm-project/vllm

KuntaiDu · 2026-01-29T23:58:02Z

Purpose

This PR fixes vLLM-side double-request-free bug in async scheduling + KV cache transfer + abort request.

Bug description: when the request is aborted, the same request may enter get_finished twice and enter _free_blocks twice, resulting in double free.

Reason: existing logic handles request abort by setting the request to None and skip this request by skipping None request. However, in async KV cache transfer, the request won't be set to None because it is still finalizing KV cache transfer ---- we need to use request.is_finished() to check this type of request.

Test Plan

End-to-end testing with lm-eval

Test Result

The lm-eval can normally finish with async scheduling.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

…tionality Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

gemini-code-assist

Code Review

This pull request introduces a bugfix for the LMCache multi-process connector to prevent potential double-free and memory leak issues, particularly with asynchronous scheduling. The changes involve tracking new requests per step. While the overall logic is sound and includes backward compatibility checks in some places, it misses these checks in others, which could lead to runtime errors with older versions of the lmcache library. I've identified two critical areas where adding hasattr guards would improve robustness and prevent potential crashes.

vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py

mergify · 2026-01-30T00:02:03Z

Hi @KuntaiDu, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

njhill · 2026-01-30T05:40:00Z

We should also check whether it's expected/reasonable that get_finished can be called twice...

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu · 2026-01-30T06:11:40Z

We should also check whether it's expected/reasonable that get_finished can be called twice...

This behavior is only triggered when async scheduling is enabled, I guess it is a racing condition related issue. I'll spend some time on this

mergify · 2026-01-30T06:11:42Z

Hi @KuntaiDu, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

…ust simply check scheduleroutput new requests Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

mergify · 2026-01-30T06:36:55Z

Hi @KuntaiDu, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

orozery · 2026-01-30T06:41:40Z

@njhill Speaking of aborted requests with an async connector, there is a scheduler bug in handling of aborted requests which are being async loaded by a connector.
#32255 aims to fix that.
Probably unrelated to what @KuntaiDu is witnessing though, as request is not freed twice in this bug case.

KuntaiDu · 2026-01-30T06:46:17Z

I have found the bug @njhill @orozery (cc @ApostaC )

in scheduler.py

        for req_id, num_tokens_scheduled in num_scheduled_tokens.items():
            assert num_tokens_scheduled > 0
            if failed_kv_load_req_ids and req_id in failed_kv_load_req_ids:
                # skip failed or rescheduled requests from KV load failure
                continue
            request = self.requests.get(req_id)
            if request is None: <-----
            ***** this is not sufficient for connector, because request will still 
            ***** stays in the queue when being aborted due to async KV transfer. 
            ***** It should be request is None or request.is_finished()
                # The request is already finished. This can happen if the
                # request is aborted while the model is executing it (e.g.,
                # in pipeline parallelism).
                continue

I have shipped the fix.

…ead of just checking it is None Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

…vllm into kuntai-fix-double-free

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

ApostaC · 2026-01-30T17:08:11Z

We should also check whether it's expected/reasonable that get_finished can be called twice...

I feel like it's better to ensure that every request will be sent via get_finished exactly once (or at most once). Otherwise the connector may need to do extra deduplication and thus need to maintain extra data structure and involve some overhead. What do you guys think?

orozery

Thanks @KuntaiDu for spotting this bug!

Looks like it was introduced in #29987 which added a new flow for aborting requests in between Scheduler.schedule and Scheduler.update_from_output.
BTW I don't see how this bug relates to async scheduling.

I was about to suggest to add a test_scheduler.py unit test.
However, I see that test_scheduler.py is completely lacking tests exercising the delay_free_blocks (i.e. async saves) path, so there is some groundwork to do beforehand.

cc @njhill

…cheduling + request abort + async KV cache transfer (vllm-project#33377) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Pai <416932041@qq.com>

njhill · 2026-02-03T19:32:20Z

Thanks @KuntaiDu @orozery @ApostaC for the fix and analysis, the final fix also LGTM!

I think why we didn't see this in other cases (e.g. NIXL connector) is that we return False (i.e. don't async save/send) if the finish reason is ABORTED. But seems reasonable in the general case that the connector may want to async-save aborted requests.

…cheduling + request abort + async KV cache transfer (vllm-project#33377) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>

…cheduling + request abort + async KV cache transfer (vllm-project#33377) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu added 2 commits January 29, 2026 23:24

ensure that get_finished only return once per request

aa5c000

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

simplify the logic and make sure the merge order does not affect func…

f6ed9de

…tionality Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu requested review from ApostaC, NickLucche and orozery as code owners January 29, 2026 23:58

KuntaiDu changed the title ~~[Bugfix][Connector][LMCache] avoid~~ [Bugfix][Async][LMCache] avoid vllm-side double free during async scheduling + LMCache Jan 29, 2026

mergify bot added bug Something isn't working kv-connector labels Jan 29, 2026

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py Outdated Show resolved Hide resolved

vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py Outdated Show resolved Hide resolved

remove unnecessary skip

cfcd13e

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

simplify logic --- don't use complicated loop to track new request, j…

b5c978d

…ust simply check scheduleroutput new requests Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

fix the check --- it needs to check that the request is finished inst…

1781a02

…ead of just checking it is None Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu requested review from WoosukKwon, alexm-redhat, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners January 30, 2026 06:47

mergify bot added the v1 label Jan 30, 2026

KuntaiDu added 2 commits January 30, 2026 06:49

edit the docstring

9e44b2d

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

Merge branch 'main' into kuntai-fix-double-free

0d41a01

KuntaiDu mentioned this pull request Jan 30, 2026

[Bugfix][Async] Avoid vLLM-side double-free error by implementing request lifecycle tracking LMCache/LMCache#2513

Closed

2 tasks

adjust doc string

4d2282c

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

Merge branch 'kuntai-fix-double-free' of https://github.com/KuntaiDu/…

f82fadb

…vllm into kuntai-fix-double-free

KuntaiDu changed the title ~~[Bugfix][Async][LMCache] avoid vllm-side double free during async scheduling + LMCache~~ [Bugfix][Async][LMCache] avoid vllm-side double free during async scheduling + request abort Jan 30, 2026

KuntaiDu changed the title ~~[Bugfix][Async][LMCache] avoid vllm-side double free during async scheduling + request abort~~ [Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer Jan 30, 2026

KuntaiDu added 2 commits January 30, 2026 07:07

adjust docstring to be more descriptive

e464967

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

grammar fix

8f65d7a

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 30, 2026

KuntaiDu added 3 commits January 31, 2026 01:24

Merge branch 'main' into kuntai-fix-double-free

49ec957

Merge branch 'main' into kuntai-fix-double-free

4a30f93

Merge branch 'main' into kuntai-fix-double-free

8b68913

orozery approved these changes Feb 3, 2026

View reviewed changes

DarkLight1337 merged commit fbb3cf6 into vllm-project:main Feb 3, 2026
41 of 42 checks passed

NickLucche mentioned this pull request Feb 3, 2026

[Roadmap]: PD Disaggregation with NixlConnector Roadmap #33702

Open

44 tasks

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Bugfix][Async][Connector] avoid vllm-side double free during async s…

72e19fd

…cheduling + request abort + async KV cache transfer (vllm-project#33377) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

orozery mentioned this pull request Mar 5, 2026

[Core] Proactively free KV cache blocks when aborting finished requests #35506

Open

Uh oh!

Conversation

KuntaiDu commented Jan 29, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

njhill commented Jan 30, 2026

Uh oh!

KuntaiDu commented Jan 30, 2026

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

orozery commented Jan 30, 2026

Uh oh!

KuntaiDu commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ApostaC commented Jan 30, 2026

Uh oh!

orozery left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

KuntaiDu commented Jan 29, 2026 •

edited by github-actions bot

Loading

KuntaiDu commented Jan 30, 2026 •

edited

Loading

orozery left a comment •

edited

Loading