fix(mooncake): address review comments on HBM leak fix#1
Merged
machov merged 461 commits intofix/vllm-35943-mooncake-hbm-leakfrom Mar 10, 2026
Merged
Conversation
…ica kernels execute on same device as tensor (vllm-project#34985) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…project#34270) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
… tests (vllm-project#35265) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…roject#29941) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
…lm-project#34936) Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
…ect#33807) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
…35338) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…consumer NVIDIA GPUs (vllm-project#33992) Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com> Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com> Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: hujiaxin <524446785@qq.com> Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com> Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…t#34109) Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
…ect#34890) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
…roject#28672) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com> Signed-off-by: Jason Li <jasonlizhengjian@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
vllm-project#34887) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
…rmony.py (vllm-project#35339) Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…IGC_ForceOCLSIMDWidth=16` (vllm-project#35298) Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
… before compilation config init (vllm-project#34848) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
vllm-project#35250) Signed-off-by: c0de128 <kevin.mckay@outlook.com>
…t#35283) Signed-off-by: KrxGu <krishom70@gmail.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
…oject#35220) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…4le (vllm-project#35081) Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
…35835) Signed-off-by: ZhongsJie <zhongsjie@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: damaozi <1811866786@qq.com>
…ithout penalties (vllm-project#35654) Signed-off-by: kkt-cohere <komal@cohere.com>
…oject#35539) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Andrii <askliar@nvidia.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster> Co-authored-by: Andrii <askliar@nvidia.com> Co-authored-by: root <root@pool0-03748.cm.cluster> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: root <root@pool0-02416.cm.cluster> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: root <root@pool0-04880.cm.cluster>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>
…lm-project#35869) Signed-off-by: Nathan Price <nathan@abridge.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…nsupported (vllm-project#35893) Signed-off-by: Li <chuali@amd.com>
…mpty_cache` (vllm-project#30681) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
…m-project#34094) (vllm-project#34571) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com> Signed-off-by: pougetat <thomas.pougetabadie@gmail.com> Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…cks in LRU/ARC `prepare_store()` (vllm-project#35846) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
…m-project#35656) Signed-off-by: raghavan <oneraghavan@gmail.com>
…ucer (vllm-project#34783) Signed-off-by: Qi Wang <qiwa@nvidia.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…uests Fixes vllm-project#35943 When KV transfer fails (due to P timeout, crash, or network interruption), D-side requests remained stuck in WAITING_FOR_REMOTE_KVS state permanently, causing progressive HBM leak that eventually stalls the decode engine. Root cause: 1. PullReqMeta.expire_time was never set (unlike P-side SendBlockMeta.expire_time) 2. Transfer errors were silently dropped in process_pulling_result() 3. ZMQ timeout exceptions in receive_kv_from_single_worker() didn't notify scheduler Solution: - Set PullReqMeta.expire_time in receive_kv() method (mirrors P-side pattern) - Add timeout checking in fetch_finished_recving_reqs() to handle expired requests - Handle err_reqs in process_pulling_result() by adding to finished_recving_reqs - Handle ZMQ timeout exceptions by marking requests as finished for cleanup This ensures stuck requests are properly cleaned up and their KV cache blocks are released, preventing the progressive memory leak. Tested: Syntax validation passes, no breaking API changes Signed-off-by: machov <mv1742@nyu.edu>
Signed-off-by: machov <mv1742@nyu.edu>
- Fix critical race condition in fetch_finished_recving_reqs that could cause RuntimeError due to concurrent dictionary modification during iteration - Fix race condition in receive_kv where expire_time was being set on all requests to the same value, potentially causing premature timeouts Addresses code review feedback from gemini-code-assist on PR vllm-project#36014
- Move self.reqs_to_recv.update() into _start_load_kv() so all reqs_to_recv operations happen in the same receiver_loop coroutine, eliminating race conditions (dtcccc comment at line 1272) - Remove unnecessary list() copies in fetch_finished_recving_reqs since all access to reqs_to_recv is now in the same event loop (dtcccc comment at line 984) - Call process_pulling_result() before ERROR status check so that err_reqs are always added to finished_recving_reqs for scheduler cleanup (dtcccc comment at line 1111) - Simplify expired request deletion (no defensive check needed in same loop) Co-authored-by: machov <43248948+machov@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix issues in mooncake HBM leak PR
fix(mooncake): address review feedback on HBM leak fix
Mar 10, 2026
- Move self.reqs_to_recv.update() into _start_load_kv() so all reqs_to_recv operations happen in the same receiver_loop, no race - Remove unnecessary list() copies in fetch_finished_recving_reqs (no need to copy after moving, per reviewer comment) - Call process_pulling_result() before ERROR status check so err_reqs are always added to finished_recving_reqs for scheduler cleanup Co-authored-by: machov <43248948+machov@users.noreply.github.com>
…mp' into copilot/fix-mooncake-hbm-leak-temp
Copilot
AI
changed the title
fix(mooncake): address review feedback on HBM leak fix
fix(mooncake): address review comments on HBM leak fix
Mar 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three correctness issues in the
MooncakeConnectorWorkerrace-condition fixes from the original HBM leak PR.Changes
Consolidate
reqs_to_recvmutations intoreceiver_loop: Moveself.reqs_to_recv.update()fromstart_load_kv()(main thread) into_start_load_kv()(runs inreceiver_loop). All reads and writes toreqs_to_recvnow happen in a single event loop — no cross-thread race, no defensive copies needed.Remove
list()copies infetch_finished_recving_reqs: Withreqs_to_recvaccess confined toreceiver_loop, the copies guarding against concurrent iteration are unnecessary.Fix silent drop of
err_reqson ERROR response:process_pulling_result()was called after theERRORearly-return, so failed requests never reachedfinished_recving_reqsand were never cleaned up by the scheduler.Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
https://api.github.com/repos/machov/vllm/usr/bin/curl curl -si -H Authorization: token ****** REDACTED(http block)https://api.github.com/repos/machov/vllm/git/refs/heads/copilot%2Ffix-mooncake-hbm-leak-temp/usr/bin/curl curl -s -X PATCH -H Authorization: token ****** -H Accept: application/vnd.github.v3+json REDACTED -d {"sha": "b786e59514ecddb92c51a4891f42e2f62a05e476", "force": true}(http block)/usr/bin/curl curl -sv -X PATCH -H Authorization: token ****** -H Accept: application/vnd.github.v3+json -H Content-Type: application/json REDACTED -d {"sha": "b786e59514ecddb92c51a4891f42e2f62a05e476", "force": true}(http block)If you need me to access, download, or install something from one of these locations, you can either:
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.