Skip to content

[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer#33377

Merged
DarkLight1337 merged 14 commits intovllm-project:mainfrom
KuntaiDu:kuntai-fix-double-free
Feb 3, 2026
Merged

[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer#33377
DarkLight1337 merged 14 commits intovllm-project:mainfrom
KuntaiDu:kuntai-fix-double-free

Conversation

@KuntaiDu
Copy link
Copy Markdown
Collaborator

@KuntaiDu KuntaiDu commented Jan 29, 2026

Purpose

This PR fixes vLLM-side double-request-free bug in async scheduling + KV cache transfer + abort request.

Bug description: when the request is aborted, the same request may enter get_finished twice and enter _free_blocks twice, resulting in double free.

Reason: existing logic handles request abort by setting the request to None and skip this request by skipping None request. However, in async KV cache transfer, the request won't be set to None because it is still finalizing KV cache transfer ---- we need to use request.is_finished() to check this type of request.

Test Plan

End-to-end testing with lm-eval

Test Result

The lm-eval can normally finish with async scheduling.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
…tionality

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@KuntaiDu KuntaiDu changed the title [Bugfix][Connector][LMCache] avoid [Bugfix][Async][LMCache] avoid vllm-side double free during async scheduling + LMCache Jan 29, 2026
@mergify mergify bot added bug Something isn't working kv-connector labels Jan 29, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bugfix for the LMCache multi-process connector to prevent potential double-free and memory leak issues, particularly with asynchronous scheduling. The changes involve tracking new requests per step. While the overall logic is sound and includes backward compatibility checks in some places, it misses these checks in others, which could lead to runtime errors with older versions of the lmcache library. I've identified two critical areas where adding hasattr guards would improve robustness and prevent potential crashes.

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 30, 2026

Hi @KuntaiDu, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@njhill
Copy link
Copy Markdown
Member

njhill commented Jan 30, 2026

We should also check whether it's expected/reasonable that get_finished can be called twice...

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@KuntaiDu
Copy link
Copy Markdown
Collaborator Author

We should also check whether it's expected/reasonable that get_finished can be called twice...

This behavior is only triggered when async scheduling is enabled, I guess it is a racing condition related issue. I'll spend some time on this

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 30, 2026

Hi @KuntaiDu, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

…ust simply check scheduleroutput new requests

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 30, 2026

Hi @KuntaiDu, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@orozery
Copy link
Copy Markdown
Collaborator

orozery commented Jan 30, 2026

@njhill Speaking of aborted requests with an async connector, there is a scheduler bug in handling of aborted requests which are being async loaded by a connector.
#32255 aims to fix that.
Probably unrelated to what @KuntaiDu is witnessing though, as request is not freed twice in this bug case.

@KuntaiDu
Copy link
Copy Markdown
Collaborator Author

KuntaiDu commented Jan 30, 2026

I have found the bug @njhill @orozery (cc @ApostaC )

in scheduler.py

        for req_id, num_tokens_scheduled in num_scheduled_tokens.items():
            assert num_tokens_scheduled > 0
            if failed_kv_load_req_ids and req_id in failed_kv_load_req_ids:
                # skip failed or rescheduled requests from KV load failure
                continue
            request = self.requests.get(req_id)
            if request is None: <-----
            ***** this is not sufficient for connector, because request will still 
            ***** stays in the queue when being aborted due to async KV transfer. 
            ***** It should be request is None or request.is_finished()
                # The request is already finished. This can happen if the
                # request is aborted while the model is executing it (e.g.,
                # in pipeline parallelism).
                continue

I have shipped the fix.

…ead of just checking it is None

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@KuntaiDu KuntaiDu changed the title [Bugfix][Async][LMCache] avoid vllm-side double free during async scheduling + LMCache [Bugfix][Async][LMCache] avoid vllm-side double free during async scheduling + request abort Jan 30, 2026
@KuntaiDu KuntaiDu changed the title [Bugfix][Async][LMCache] avoid vllm-side double free during async scheduling + request abort [Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer Jan 30, 2026
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@ApostaC
Copy link
Copy Markdown
Collaborator

ApostaC commented Jan 30, 2026

We should also check whether it's expected/reasonable that get_finished can be called twice...

I feel like it's better to ensure that every request will be sent via get_finished exactly once (or at most once). Otherwise the connector may need to do extra deduplication and thus need to maintain extra data structure and involve some overhead. What do you guys think?

@KuntaiDu KuntaiDu added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 30, 2026
Copy link
Copy Markdown
Collaborator

@orozery orozery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @KuntaiDu for spotting this bug!

Looks like it was introduced in #29987 which added a new flow for aborting requests in between Scheduler.schedule and Scheduler.update_from_output.
BTW I don't see how this bug relates to async scheduling.

I was about to suggest to add a test_scheduler.py unit test.
However, I see that test_scheduler.py is completely lacking tests exercising the delay_free_blocks (i.e. async saves) path, so there is some groundwork to do beforehand.

cc @njhill

@DarkLight1337 DarkLight1337 merged commit fbb3cf6 into vllm-project:main Feb 3, 2026
41 of 42 checks passed
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
…cheduling + request abort + async KV cache transfer (vllm-project#33377)

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Pai <416932041@qq.com>
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
…cheduling + request abort + async KV cache transfer (vllm-project#33377)

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Pai <416932041@qq.com>
@njhill
Copy link
Copy Markdown
Member

njhill commented Feb 3, 2026

Thanks @KuntaiDu @orozery @ApostaC for the fix and analysis, the final fix also LGTM!

I think why we didn't see this in other cases (e.g. NIXL connector) is that we return False (i.e. don't async save/send) if the finish reason is ABORTED. But seems reasonable in the general case that the connector may want to async-save aborted requests.

gameofdimension pushed a commit to gameofdimension/vllm that referenced this pull request Feb 5, 2026
…cheduling + request abort + async KV cache transfer (vllm-project#33377)

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…cheduling + request abort + async KV cache transfer (vllm-project#33377)

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants