Skip to content

fix(mooncake): address review comments on HBM leak fix#1

Merged
machov merged 461 commits intofix/vllm-35943-mooncake-hbm-leakfrom
copilot/fix-mooncake-hbm-leak-temp
Mar 10, 2026
Merged

fix(mooncake): address review comments on HBM leak fix#1
machov merged 461 commits intofix/vllm-35943-mooncake-hbm-leakfrom
copilot/fix-mooncake-hbm-leak-temp

Conversation

Copy link

Copilot AI commented Mar 10, 2026

Three correctness issues in the MooncakeConnectorWorker race-condition fixes from the original HBM leak PR.

Changes

  • Consolidate reqs_to_recv mutations into receiver_loop: Move self.reqs_to_recv.update() from start_load_kv() (main thread) into _start_load_kv() (runs in receiver_loop). All reads and writes to reqs_to_recv now happen in a single event loop — no cross-thread race, no defensive copies needed.

  • Remove list() copies in fetch_finished_recving_reqs: With reqs_to_recv access confined to receiver_loop, the copies guarding against concurrent iteration are unnecessary.

  • Fix silent drop of err_reqs on ERROR response: process_pulling_result() was called after the ERROR early-return, so failed requests never reached finished_recving_reqs and were never cleaned up by the scheduler.

# Before: err_reqs silently dropped
if response.status == MooncakeXferResponseStatus.ERROR:
    return                            # process_pulling_result never called
self.process_pulling_result(response, pull_metas)

# After: err_reqs always forwarded to finished_recving_reqs
self.process_pulling_result(response, pull_metas)
if response.status == MooncakeXferResponseStatus.ERROR:
    return

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/machov/vllm
    • Triggering command: /usr/bin/curl curl -si -H Authorization: token ****** REDACTED (http block)
  • https://api.github.com/repos/machov/vllm/git/refs/heads/copilot%2Ffix-mooncake-hbm-leak-temp
    • Triggering command: /usr/bin/curl curl -s -X PATCH -H Authorization: token ****** -H Accept: application/vnd.github.v3+json REDACTED -d {"sha": "b786e59514ecddb92c51a4891f42e2f62a05e476", "force": true} (http block)
    • Triggering command: /usr/bin/curl curl -sv -X PATCH -H Authorization: token ****** -H Accept: application/vnd.github.v3+json -H Content-Type: application/json REDACTED -d {"sha": "b786e59514ecddb92c51a4891f42e2f62a05e476", "force": true} (http block)

If you need me to access, download, or install something from one of these locations, you can either:


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

rasmith and others added 27 commits March 10, 2026 22:54
…ica kernels execute on same device as tensor (vllm-project#34985)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…project#34270)

Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
… tests (vllm-project#35265)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…roject#29941)

Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
…ect#33807)

Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
…35338)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…consumer NVIDIA GPUs (vllm-project#33992)

Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com>
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: hujiaxin <524446785@qq.com>
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…t#34109)

Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
…ect#34890)

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
…roject#28672)

Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
Signed-off-by: Jason Li <jasonlizhengjian@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
vllm-project#34887)

Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
…rmony.py (vllm-project#35339)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…IGC_ForceOCLSIMDWidth=16` (vllm-project#35298)

Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
… before compilation config init (vllm-project#34848)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
…oject#35220)

Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…4le (vllm-project#35081)

Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
ZhongsJie and others added 20 commits March 10, 2026 22:55
…35835)

Signed-off-by: ZhongsJie <zhongsjie@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…ithout penalties (vllm-project#35654)

Signed-off-by: kkt-cohere <komal@cohere.com>
…oject#35539)

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Andrii <askliar@nvidia.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster>
Co-authored-by: Andrii <askliar@nvidia.com>
Co-authored-by: root <root@pool0-03748.cm.cluster>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: root <root@pool0-02416.cm.cluster>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: root <root@pool0-04880.cm.cluster>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
…lm-project#35869)

Signed-off-by: Nathan Price <nathan@abridge.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…mpty_cache` (vllm-project#30681)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
…m-project#34094) (vllm-project#34571)

Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Signed-off-by: pougetat <thomas.pougetabadie@gmail.com>
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…cks in LRU/ARC `prepare_store()` (vllm-project#35846)

Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…uests

Fixes vllm-project#35943

When KV transfer fails (due to P timeout, crash, or network interruption),
D-side requests remained stuck in WAITING_FOR_REMOTE_KVS state permanently,
causing progressive HBM leak that eventually stalls the decode engine.

Root cause:
1. PullReqMeta.expire_time was never set (unlike P-side SendBlockMeta.expire_time)
2. Transfer errors were silently dropped in process_pulling_result()
3. ZMQ timeout exceptions in receive_kv_from_single_worker() didn't notify scheduler

Solution:
- Set PullReqMeta.expire_time in receive_kv() method (mirrors P-side pattern)
- Add timeout checking in fetch_finished_recving_reqs() to handle expired requests
- Handle err_reqs in process_pulling_result() by adding to finished_recving_reqs
- Handle ZMQ timeout exceptions by marking requests as finished for cleanup

This ensures stuck requests are properly cleaned up and their KV cache blocks
are released, preventing the progressive memory leak.

Tested: Syntax validation passes, no breaking API changes
Signed-off-by: machov <mv1742@nyu.edu>
- Fix critical race condition in fetch_finished_recving_reqs that could cause
  RuntimeError due to concurrent dictionary modification during iteration
- Fix race condition in receive_kv where expire_time was being set on all
  requests to the same value, potentially causing premature timeouts

Addresses code review feedback from gemini-code-assist on PR vllm-project#36014
- Move self.reqs_to_recv.update() into _start_load_kv() so all
  reqs_to_recv operations happen in the same receiver_loop coroutine,
  eliminating race conditions (dtcccc comment at line 1272)
- Remove unnecessary list() copies in fetch_finished_recving_reqs since
  all access to reqs_to_recv is now in the same event loop (dtcccc
  comment at line 984)
- Call process_pulling_result() before ERROR status check so that err_reqs
  are always added to finished_recving_reqs for scheduler cleanup (dtcccc
  comment at line 1111)
- Simplify expired request deletion (no defensive check needed in same loop)

Co-authored-by: machov <43248948+machov@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix issues in mooncake HBM leak PR fix(mooncake): address review feedback on HBM leak fix Mar 10, 2026
@machov machov changed the base branch from main to fix/vllm-35943-mooncake-hbm-leak March 10, 2026 23:00
Copilot AI and others added 2 commits March 10, 2026 23:04
- Move self.reqs_to_recv.update() into _start_load_kv() so all
  reqs_to_recv operations happen in the same receiver_loop, no race
- Remove unnecessary list() copies in fetch_finished_recving_reqs
  (no need to copy after moving, per reviewer comment)
- Call process_pulling_result() before ERROR status check so err_reqs
  are always added to finished_recving_reqs for scheduler cleanup

Co-authored-by: machov <43248948+machov@users.noreply.github.com>
Copilot AI changed the title fix(mooncake): address review feedback on HBM leak fix fix(mooncake): address review comments on HBM leak fix Mar 10, 2026
@machov machov marked this pull request as ready for review March 10, 2026 23:42
@machov machov merged commit 9ddee97 into fix/vllm-35943-mooncake-hbm-leak Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.