fix(mooncake): address review comments on HBM leak fix by Copilot · Pull Request #1 · machov/vllm

Copilot · 2026-03-10T22:47:50Z

Three correctness issues in the MooncakeConnectorWorker race-condition fixes from the original HBM leak PR.

Changes

Consolidate reqs_to_recv mutations into receiver_loop: Move self.reqs_to_recv.update() from start_load_kv() (main thread) into _start_load_kv() (runs in receiver_loop). All reads and writes to reqs_to_recv now happen in a single event loop — no cross-thread race, no defensive copies needed.
Remove list() copies in fetch_finished_recving_reqs: With reqs_to_recv access confined to receiver_loop, the copies guarding against concurrent iteration are unnecessary.
Fix silent drop of err_reqs on ERROR response: process_pulling_result() was called after the ERROR early-return, so failed requests never reached finished_recving_reqs and were never cleaned up by the scheduler.

# Before: err_reqs silently dropped
if response.status == MooncakeXferResponseStatus.ERROR:
    return                            # process_pulling_result never called
self.process_pulling_result(response, pull_metas)

# After: err_reqs always forwarded to finished_recving_reqs
self.process_pulling_result(response, pull_metas)
if response.status == MooncakeXferResponseStatus.ERROR:
    return

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

https://api.github.com/repos/machov/vllm
- Triggering command: /usr/bin/curl curl -si -H Authorization: token ****** REDACTED (http block)
https://api.github.com/repos/machov/vllm/git/refs/heads/copilot%2Ffix-mooncake-hbm-leak-temp
- Triggering command: /usr/bin/curl curl -s -X PATCH -H Authorization: token ****** -H Accept: application/vnd.github.v3+json REDACTED -d {"sha": "b786e59514ecddb92c51a4891f42e2f62a05e476", "force": true} (http block)
- Triggering command: /usr/bin/curl curl -sv -X PATCH -H Authorization: token ****** -H Accept: application/vnd.github.v3+json -H Content-Type: application/json REDACTED -d {"sha": "b786e59514ecddb92c51a4891f42e2f62a05e476", "force": true} (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…ica kernels execute on same device as tensor (vllm-project#34985) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

…project#34270) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

… tests (vllm-project#35265) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…roject#29941) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

…lm-project#34936) Signed-off-by: mgoin <mgoin64@gmail.com>

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

…ect#33807) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

…35338) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

…consumer NVIDIA GPUs (vllm-project#33992) Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com> Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com> Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

Signed-off-by: hujiaxin <524446785@qq.com> Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com> Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…t#34109) Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

…ect#34890) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

…roject#28672) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com> Signed-off-by: Jason Li <jasonlizhengjian@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

vllm-project#34887) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

…rmony.py (vllm-project#35339) Signed-off-by: sfeng33 <4florafeng@gmail.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…IGC_ForceOCLSIMDWidth=16` (vllm-project#35298) Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

Signed-off-by: chzhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

… before compilation config init (vllm-project#34848) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

vllm-project#35250) Signed-off-by: c0de128 <kevin.mckay@outlook.com>

…t#35283) Signed-off-by: KrxGu <krishom70@gmail.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

…oject#35220) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

…4le (vllm-project#35081) Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com>

…35835) Signed-off-by: ZhongsJie <zhongsjie@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

Signed-off-by: damaozi <1811866786@qq.com>

…ithout penalties (vllm-project#35654) Signed-off-by: kkt-cohere <komal@cohere.com>

…oject#35539) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Andrii <askliar@nvidia.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster> Co-authored-by: Andrii <askliar@nvidia.com> Co-authored-by: root <root@pool0-03748.cm.cluster> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: root <root@pool0-02416.cm.cluster> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: root <root@pool0-04880.cm.cluster>

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>

…lm-project#35869) Signed-off-by: Nathan Price <nathan@abridge.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…nsupported (vllm-project#35893) Signed-off-by: Li <chuali@amd.com>

…mpty_cache` (vllm-project#30681) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

…m-project#34094) (vllm-project#34571) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>

Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com> Signed-off-by: pougetat <thomas.pougetabadie@gmail.com> Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…cks in LRU/ARC `prepare_store()` (vllm-project#35846) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

…m-project#35656) Signed-off-by: raghavan <oneraghavan@gmail.com>

…ucer (vllm-project#34783) Signed-off-by: Qi Wang <qiwa@nvidia.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…uests Fixes vllm-project#35943 When KV transfer fails (due to P timeout, crash, or network interruption), D-side requests remained stuck in WAITING_FOR_REMOTE_KVS state permanently, causing progressive HBM leak that eventually stalls the decode engine. Root cause: 1. PullReqMeta.expire_time was never set (unlike P-side SendBlockMeta.expire_time) 2. Transfer errors were silently dropped in process_pulling_result() 3. ZMQ timeout exceptions in receive_kv_from_single_worker() didn't notify scheduler Solution: - Set PullReqMeta.expire_time in receive_kv() method (mirrors P-side pattern) - Add timeout checking in fetch_finished_recving_reqs() to handle expired requests - Handle err_reqs in process_pulling_result() by adding to finished_recving_reqs - Handle ZMQ timeout exceptions by marking requests as finished for cleanup This ensures stuck requests are properly cleaned up and their KV cache blocks are released, preventing the progressive memory leak. Tested: Syntax validation passes, no breaking API changes Signed-off-by: machov <mv1742@nyu.edu>

Signed-off-by: machov <mv1742@nyu.edu>

- Fix critical race condition in fetch_finished_recving_reqs that could cause RuntimeError due to concurrent dictionary modification during iteration - Fix race condition in receive_kv where expire_time was being set on all requests to the same value, potentially causing premature timeouts Addresses code review feedback from gemini-code-assist on PR vllm-project#36014

- Move self.reqs_to_recv.update() into _start_load_kv() so all reqs_to_recv operations happen in the same receiver_loop coroutine, eliminating race conditions (dtcccc comment at line 1272) - Remove unnecessary list() copies in fetch_finished_recving_reqs since all access to reqs_to_recv is now in the same event loop (dtcccc comment at line 984) - Call process_pulling_result() before ERROR status check so that err_reqs are always added to finished_recving_reqs for scheduler cleanup (dtcccc comment at line 1111) - Simplify expired request deletion (no defensive check needed in same loop) Co-authored-by: machov <43248948+machov@users.noreply.github.com>

- Move self.reqs_to_recv.update() into _start_load_kv() so all reqs_to_recv operations happen in the same receiver_loop, no race - Remove unnecessary list() copies in fetch_finished_recving_reqs (no need to copy after moving, per reviewer comment) - Call process_pulling_result() before ERROR status check so err_reqs are always added to finished_recving_reqs for scheduler cleanup Co-authored-by: machov <43248948+machov@users.noreply.github.com>

…mp' into copilot/fix-mooncake-hbm-leak-temp

Copilot AI assigned Copilot and machov Mar 10, 2026

Copilot started work on behalf of machov March 10, 2026 22:47 View session

rasmith and others added 27 commits March 10, 2026 22:54

[CI][AMD][BugFix] Add torch.cuda.set_device to test_punica_ops so pun…

2293a37

…ica kernels execute on same device as tensor (vllm-project#34985) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

fix(mxfp4): Disable monolithic path for TRITON backend with EP (vllm-…

738b96a

…project#34270) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

[ROCm][CI] Extending attention backend coverage for Eagle spec decode…

cd25b3d

… tests (vllm-project#35265) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[ROCm][CI] Amending deletion of AMD mirror (vllm-project#35322)

ef0f159

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[offloader] v2: Hide weight onloading latency via prefetching (vllm-p…

c336df5

…roject#29941) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

[UX] Add --performance-mode {balanced,interactivity,throughput} (vl…

37380fe

…lm-project#34936) Signed-off-by: mgoin <mgoin64@gmail.com>

[MoE Refactor] MXFP4 Cutlass Experts to MK (vllm-project#34542)

f111a25

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

[UX] Add --moe-backend arg for explicit kernel selection (vllm-proj…

5fd835f

…ect#33807) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

[Bugfix] Fix AttributeError in SMControlContextManager (vllm-project#…

47ff73c

…35338) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

[BugFix] Fix fp4 quant kernel on CUDA 12.8 (vllm-project#35210)

8f17711

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

[Model Runner V2] Add coding style guide (vllm-project#35325)

9592d56

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

[CPU][Feat] Enable KleidiAI INT8_W4A8 for all input dtypes (vllm-proj…

e5fd874

…ect#34890) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

[BugFix] anthropic/serving_messages: fix tool call arguments streaming (

511a835

vllm-project#34887) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

[Misc][Harmony] Move Responses API only harmony utils to responses/ha…

f3896c4

…rmony.py (vllm-project#35339) Signed-off-by: sfeng33 <4florafeng@gmail.com>

[Benchmark] Simplify SLA scan (vllm-project#35306)

88377c4

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[BugFix][XPU] Fix speculative decoding on Intel XPU due to bug with `…

0958e99

…IGC_ForceOCLSIMDWidth=16` (vllm-project#35298) Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

[XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA (vllm-project#34115)

6804eb1

Signed-off-by: chzhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

[ROCm] Add extra step in config initialization to populate custom ops…

a0e8a41

… before compilation config init (vllm-project#34848) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

[Bugfix][Hardware][AMD] Gate FP4 ops on gfx950 to prevent MI300X crash (

93aab51

vllm-project#35250) Signed-off-by: c0de128 <kevin.mckay@outlook.com>

[Test] Add tests for n parameter in chat completions API (vllm-projec…

be8c308

…t#35283) Signed-off-by: KrxGu <krishom70@gmail.com>

[Model] Ring 2.5 (vllm-project#35102)

e737194

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

[Benchmarks] Plot benchmark timeline and requests statistics (vllm-pr…

c5634f8

…oject#35220) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

[Hardware][Powerpc]Enable prefix caching and chunked prefill for ppc6…

c8b1737

…4le (vllm-project#35081) Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com>

ZhongsJie and others added 20 commits March 10, 2026 22:55

[BugFix] Support tool_choice=none in the Anthropic API (vllm-project#…

cf2ffc1

…35835) Signed-off-by: ZhongsJie <zhongsjie@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

[Bugfix] Improve engine ready timeout error message (vllm-project#35616)

f908bd7

Signed-off-by: damaozi <1811866786@qq.com>

[cohere][fix][spec-decode]: fix crash when allowed_token_ids is set w…

9368492

…ithout penalties (vllm-project#35654) Signed-off-by: kkt-cohere <komal@cohere.com>

[Bugfix] Add missing dynamic_arg_dims for Qwen3-ASR torch.compile (vl…

4141910

…lm-project#35869) Signed-off-by: Nathan Price <nathan@abridge.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

[ROCm][Bugfix] Fall back from CK MXFP4 MoE when GEMM dimensions are u…

0cb3f8a

…nsupported (vllm-project#35893) Signed-off-by: Li <chuali@amd.com>

[XPU] bump vllm-xpu-kernels to v0.1.3 (vllm-project#35984)

a17bfad

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (vll…

bff37a4

…m-project#34094) (vllm-project#34571) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>

[Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blo…

22629cf

…cks in LRU/ARC `prepare_store()` (vllm-project#35846) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

[Bugfix][Model] Fix FP8 k_scale/v_scale not loaded for Qwen3-MoE (vll…

fe04cde

…m-project#35656) Signed-off-by: raghavan <oneraghavan@gmail.com>

[BugFix] Fix implicit and incorrect assumption on ECConnector is_prod…

f33f999

…ucer (vllm-project#34783) Signed-off-by: Qi Wang <qiwa@nvidia.com>

[Bugfix] Make kaldi_native_fbank optional (vllm-project#35996)

72faf65

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

fix: address CI feedback - line length and missing metadata attribute

01cedd6

Signed-off-by: machov <mv1742@nyu.edu>

Copilot AI changed the title ~~[WIP] Fix issues in mooncake HBM leak PR~~ fix(mooncake): address review feedback on HBM leak fix Mar 10, 2026

Copilot finished work on behalf of machov March 10, 2026 22:57

machov changed the base branch from main to fix/vllm-35943-mooncake-hbm-leak March 10, 2026 23:00

Copilot started work on behalf of machov March 10, 2026 23:01 View session

Copilot AI and others added 2 commits March 10, 2026 23:04

Merge remote-tracking branch 'origin/copilot/fix-mooncake-hbm-leak-te…

a833249

…mp' into copilot/fix-mooncake-hbm-leak-temp

Copilot AI changed the title ~~fix(mooncake): address review feedback on HBM leak fix~~ fix(mooncake): address review comments on HBM leak fix Mar 10, 2026

Copilot finished work on behalf of machov March 10, 2026 23:15

machov marked this pull request as ready for review March 10, 2026 23:42

machov merged commit 9ddee97 into fix/vllm-35943-mooncake-hbm-leak Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mooncake): address review comments on HBM leak fix#1

fix(mooncake): address review comments on HBM leak fix#1
machov merged 461 commits intofix/vllm-35943-mooncake-hbm-leakfrom
copilot/fix-mooncake-hbm-leak-temp

Copilot AI commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Copilot AI commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Copilot AI commented Mar 10, 2026 •

edited

Loading