[Core][KV] Retain prefix-cache across hybrid SWA+Full via `is_pinned` blocks by jhaotingc · Pull Request #40676 · vllm-project/vllm

jhaotingc · 2026-04-23T04:30:34Z

Purpose

Hybrid SWA + full-attention models (e.g. Gemma-3/4) get near-0% cross-request prefix-cache reuse once the prefix working set grows: as the sliding window advances, SWA layers drop out-of-window blocks and free them, the freed blocks rejoin the FIFO free queue, and they are recycled before the next request can reuse the shared prefix — even when the KV cache still has spare capacity.

This PR adds opt-in SWA prefix-cache pinning behind a single boolean knob VLLM_PIN_SWA_TOKENS (default false). When enabled, each SWA window-drop PINS the current sliding-window blocks (one window per chunk) instead of freeing them, so the contiguous anchor a future request needs to hit the SWA prefix cache stays resident and is evicted last.

Implementation: an is_pinned flag plus a second pinned_block_queue tier in BlockPool; SlidingWindowManager.remove_skipped_blocks owns the pin policy while the base manager stays pinning-agnostic; pinned blocks remain registered in the prefix-cache hash map so they stay hittable; and BlockPool.demote_n releases the oldest pinned blocks (best-effort) under allocation pressure so the scheduler never stalls. Full-attention layers are unchanged.

Env var	Default	Purpose
`VLLM_PIN_SWA_TOKENS`	`false`	On/off switch for SWA prefix-cache pinning. When enabled, each SWA window-drop pins the current sliding-window blocks (one window per chunk) instead of freeing them.
`VLLM_PIN_MIN_DROP_SIZE`	`16`	Minimum drop size (in blocks) required to pin; filters out small decode-step drops.

Test Plan

Unit tests: tests/v1/core/test_prefix_caching.py (SWA block release, admission gating, full-sequence admission).
A/B serving on Gemma-4-31B-IT, TP4, H200, conc=1, OSL=400, prefix caching on, sweeping the number of distinct ~28k-token prefixes (the working-set size) with only VLLM_PIN_SWA_TOKENS differing. KV cache = 1.47M tokens.
VLLM_PIN_MIN_DROP_SIZE ablation (16 vs 0) at 30 prefixes.
Accuracy: GSM8K 5-shot and SCBench RepoQA, pinning on vs off.
Lint: full pre-commit run --all-files.

Test Result

Unit tests: 61 passed.

Prefix-working-set scaling — TTFT avg and output throughput (— = not run):

Prefixes (total input)	TTFT (ms) (main)	TTFT (ms) PR	tok/s main	tok/s PR
15 (0.43M, fits cache)	403	404	73.3	73.3
30 (0.85M)	1992	440	56.8	72.8
50 (1.42M)	—	458	—	74.2
60 (1.70M)	—	452	—	72.7

At 15 prefixes the working set fits the cache, nothing is evicted, and ON == OFF within noise (TTFT +0.6 ms, throughput identical) — pinning adds no measurable overhead when it is not needed.
Upstream main loses SWA reuse as early as 20 prefixes and re-prefills the full ~28k prefix per request (TTFT 403 → 1990 ms, 73 → 57 tok/s), while pinning (ON) keeps TTFT ~440–458 ms and ~73 tok/s through 60 prefixes (1.70M tokens, above the 1.47M cache). At 30 prefixes that is −78% TTFT and +28% throughput. Decode is unaffected throughout (ITL 12.66 ms in every run).

With this, maximum prefix cache can be stored by max-num-batched-token / window_size times, in this case (8k, 1k), 8x more prefix cache can be stored on a server.

VLLM_PIN_MIN_DROP_SIZE ablation (ON, 30 prefixes): 16 vs 0 is perf-neutral — TTFT 440.0 vs 442.3 ms, throughput 72.8 vs 72.9 tok/s (within noise). The filter only matters under real pressure, where =0 pins unique decode-tail blocks and adds demotion churn; 16 is the safe default.

Accuracy is unchanged (pinning only changes which KV blocks are reused, not the computation): GSM8K 5-shot identical within noise on vs off (0.7127 / 0.7043 vs 0.7157 / 0.7043, flexible / strict), SCBench RepoQA Pass@1 73.0% on both.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

gemini-code-assist

Code Review

This pull request implements a prefix-cache pinning mechanism to enhance cache retention, managed through new environment variables. It introduces a pinned tier for free blocks that are only demoted to the regular free queue under memory pressure. The review feedback suggests refactoring the pinned_free_deque to use the existing FreeKVCacheBlockQueue infrastructure. This change would allow for O(1) block removal during touch operations and more efficient batch processing in the demote_n and free_blocks methods, avoiding potential performance bottlenecks and stale entries associated with the current deque implementation.

gemini-code-assist · 2026-04-23T04:35:22Z

+        # Oldest-first deque of blocks at ref_cnt=0 AND is_pinned=True.
+        # These blocks are prefix-cache retention candidates. They are
+        # NOT drained by get_new_blocks directly; demote_n() under
+        # pressure flips is_pinned=False and moves them to free_block_queue.
+        from collections import deque as _deque
+
+        self.pinned_free_deque: _deque = _deque()


Using a collections.deque for pinned_free_deque introduces a potential memory leak and performance bottleneck. Since touch (line 414) skips removing blocks from this deque to avoid O(N) complexity, the deque can accumulate a large number of stale entries (blocks that have been re-activated or even re-freed into the regular queue). In a long-running server, this deque could grow significantly, and demote_n would have to iterate through many stale entries.

Instead, you should leverage the existing O(1) doubly linked list infrastructure. By using another instance of FreeKVCacheBlockQueue, you can achieve O(1) removal in touch without stale entries, and O(1) batch operations in demote_n and free_blocks, all while reusing the prev_free_block and next_free_block pointers already present in KVCacheBlock (since a block is either in the regular free queue, the pinned queue, or active).

Suggested change

# Oldest-first deque of blocks at ref_cnt=0 AND is_pinned=True.

# These blocks are prefix-cache retention candidates. They are

# NOT drained by get_new_blocks directly; demote_n() under

# pressure flips is_pinned=False and moves them to free_block_queue.

from collections import deque as _deque

self.pinned_free_deque: _deque = _deque()

# Oldest-first queue of blocks at ref_cnt=0 AND is_pinned=True.

# These blocks are prefix-cache retention candidates. They are

# NOT drained by get_new_blocks directly; demote_n() under

# pressure flips is_pinned=False and moves them to free_block_queue.

# Reuses the same linked-list pointers as free_block_queue.

from vllm.v1.core.kv_cache_utils import FreeKVCacheBlockQueue

self.pinned_block_queue = FreeKVCacheBlockQueue([])

Done. __init__ now constructs an empty FreeKVCacheBlockQueue for the pinned tier. The invariant that a block lives in exactly one of free_block_queue (is_pinned=False) or pinned_block_queue (is_pinned=True) is already enforced by the is_pinned routing in free_blocks(), so the shared prev_free_block/next_free_block pointers on KVCacheBlock are safe.

gemini-code-assist · 2026-04-23T04:35:22Z

+            # ref_cnt=0 means this block is in some free tier (regular
+            # queue if is_pinned=False, pinned_free_deque if is_pinned=True).
            if block.ref_cnt == 0 and not block.is_null:
-                self.free_block_queue.remove(block)
+                if block.is_pinned:
+                    # Stale entries are common after demote_n; avoid O(n)
+                    # removal by leaving the stale entry in place — demote_n
+                    # will skip it on the next pop.
+                    pass
+                else:
+                    self.free_block_queue.remove(block)


With the suggested refactor to use FreeKVCacheBlockQueue for pinned blocks, touch can now perform O(1) removal for both regular and pinned tiers. This eliminates the need for stale entries and keeps the data structures clean. Additionally, it is safer to explicitly reset is_pinned to False when a block is activated via touch to ensure its state is consistent with its active status.

# ref_cnt=0 means this block is in some free tier (regular # queue if is_pinned=False, pinned_block_queue if is_pinned=True). if block.ref_cnt == 0 and not block.is_null: if block.is_pinned: self.pinned_block_queue.remove(block) block.is_pinned = False else: self.free_block_queue.remove(block)

Done. touch now calls self.pinned_block_queue.remove(block) (O(1) via prev/next pointers) when the block is in the pinned tier, eliminating stale-entry accumulation.

… review) Replace collections.deque with a second FreeKVCacheBlockQueue instance (pinned_block_queue) for the ref_cnt=0 && is_pinned=True tier. This addresses Copilot review comments on PR vllm-project#40676: - touch() now does O(1) remove() from either queue via the block prev/next pointers; no more stale-entry accumulation in the pinned deque. - demote_n() uses batched popleft_n + append_n instead of a per-block loop that updated tail pointers on every iteration. - free_blocks() batches both tiers with append_n for consistency. Invariant: a block is in exactly one of free_block_queue (is_pinned=False) or pinned_block_queue (is_pinned=True), never both -- the prev/next pointers on KVCacheBlock only support one linked list at a time. This is already guaranteed by the is_pinned routing in free_blocks(). Semantics unchanged: touch() leaves is_pinned untouched so a later free_blocks() can re-route to the pinned tier when still a retention candidate. Pins survive cache-hit-then-release cycles. Validated on Gemma-4-31B-it TP=4 H200 48-prefix sweep (28k ISL): - Warmup (cold) TTFT avg: 1364 ms - Sweep (warm) TTFT avg: 305 ms (4.47x faster, p99 12.85x faster) - Full prefix-cache hit confirmed on 2nd pass; no hangs at pool limit. Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

jhaotingc · 2026-04-24T02:10:13Z

@claude review

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

jhaotingc · 2026-04-24T22:43:23Z

@claude review

roikoren755 · 2026-05-04T10:29:06Z

-            # Cannot allocate new blocks
-            return None
+        num_free_blocks = self.block_pool.get_num_free_blocks()
+        if num_blocks_to_allocate > num_free_blocks:


Don't we want to guard here with and envs.VLLM_PIN_PREFIX_BLOCKS as well?

Fixed. Thank you!

roikoren755 · 2026-05-04T10:32:17Z

+        # Route ref_cnt==0 blocks to the correct tier; batch both.
+        regular_free: list[KVCacheBlock] = []
+        pinned_free: list[KVCacheBlock] = []
+        for block in blocks_list:


Not sure which is better (or if it actually impacts performance), but have you tried profiling doing two list comprehensions, instead of a loop and appending to lists?

Changed to list comprehensions. Thank you!

roikoren755 · 2026-05-04T10:36:30Z

+        for b in to_pin:
+            b.is_pinned = True


Can do this inside the loop above, before (or after) the to_pin.append call

roikoren755 · 2026-05-04T10:38:45Z

+            from vllm.v1.kv_cache_interface import SlidingWindowSpec
+
+            if isinstance(self.kv_cache_spec, SlidingWindowSpec):
+                pin_blocks = envs.VLLM_PIN_SWA_TOKENS // self.block_size


Have you looked at how this feature affects Mamba-hybrid models? I'm wondering if we could generalize this, in a way such that adding fully-fledged Mamba support won't require changes in this file, for example

Mamba already has 3 cache mode, 'none', 'align', and 'full'.
For 'full' mode, it keeps the last mamba state of every chunks, so if chunk size is 8k and 64k ISL, it keeps all the 8k states.
For 'align' mode, it only keeps partial chunks (say a chunk size is 8k, for a 64k ISL, it may keep arbitarary any 8k states).
For 'none' mode, it only keeps the very last state.

In another word, this sliding window pining "frees up" the OOW windows earlier than the last windows, but mamba already keeps only the chunk edge states, the caching is already limited to chunk edges and there's no intermediate mamba states stored. So I think this is not generalizable.

mergify · 2026-05-23T09:17:00Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jhaotingc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

… review) Replace collections.deque with a second FreeKVCacheBlockQueue instance (pinned_block_queue) for the ref_cnt=0 && is_pinned=True tier. This addresses Copilot review comments on PR vllm-project#40676: - touch() now does O(1) remove() from either queue via the block prev/next pointers; no more stale-entry accumulation in the pinned deque. - demote_n() uses batched popleft_n + append_n instead of a per-block loop that updated tail pointers on every iteration. - free_blocks() batches both tiers with append_n for consistency. Invariant: a block is in exactly one of free_block_queue (is_pinned=False) or pinned_block_queue (is_pinned=True), never both -- the prev/next pointers on KVCacheBlock only support one linked list at a time. This is already guaranteed by the is_pinned routing in free_blocks(). Semantics unchanged: touch() leaves is_pinned untouched so a later free_blocks() can re-route to the pinned tier when still a retention candidate. Pins survive cache-hit-then-release cycles. Validated on Gemma-4-31B-it TP=4 H200 48-prefix sweep (28k ISL): - Warmup (cold) TTFT avg: 1364 ms - Sweep (warm) TTFT avg: 305 ms (4.47x faster, p99 12.85x faster) - Full prefix-cache hit confirmed on 2nd pass; no hangs at pool limit. Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

@roikoren755

… paths After rebasing the prefix-cache pinning series (vllm-project#40676) onto upstream/main, two newly-added code paths needed wiring before the pin/demote mechanism could actually function under pressure, and a round of reviewer feedback applied. Rebase fixes (engine deadlocked without these): - kv_cache_manager.py: the upstream-added `full_sequence_must_fit` admission gate in allocate_slots returns None without giving the pinned tier a chance to release. Add a VLLM_PIN_PREFIX_BLOCKS-guarded demote_n call inside that branch so the existing pressure-recovery logic engages before admission is refused. - single_type_kv_cache_manager.py: the upstream-added SlidingWindowManager._cache_block_mask elides older SWA-segment blocks from the prefix-cache hash map ('they get dropped anyway, never serve a hit'). That defeats VLLM_PIN_SWA_TOKENS, whose entire purpose is to keep those blocks alive for future hits. Short-circuit the mask to None when either pin flag is set. PR review (@roikoren755): - kv_cache_manager.py: guard the lower allocate_slots demote site with envs.VLLM_PIN_PREFIX_BLOCKS so it is a no-op for users who do not opt in. - block_pool.py: refactor the free_blocks routing from a single loop with two appends into two filtered list comprehensions for readability. - single_type_kv_cache_manager.py: move 'block.is_pinned = True' inline with the SWA pin-loop append instead of a second pass over to_pin afterwards. - single_type_kv_cache_manager.py: TODO comment noting that the SWA drop-and-pin hook should ideally live on the SingleTypeKVCacheManager base (or a per-spec capability interface) so future Mamba-hybrid support does not need to edit this file. Operator UX (answering 'will pinning help my workload?'): - kv_cache_manager.py: at engine init, when VLLM_PIN_PREFIX_BLOCKS is set, log a one-line startup hint with the active pin env vars, the pool capacity in blocks/tokens, and a rule-of-thumb estimate of how many ~25k-token prefixes fit. Pinning delivers a win when the unique-prefix working set fits in ~80% of the pool; beyond that demote_n thrashes and hit rate collapses. Validated on 4xH200 with gemma-4-31B-IT, TRITON_ATTN, 30 prefixes conc=1: TTFT 1970 ms -> 499 ms (3.95x), KV usage 0% -> 64% post-warmup, sweep hit rate 0.22% -> 83.7%. pre-commit run -a clean. Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

mergify · 2026-05-28T01:58:22Z

Hi @jhaotingc, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-05-29T03:11:04Z

Hi @jhaotingc, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Add opt-in pinning to preserve prefix-cache blocks through FIFO free-queue recycling in hybrid models like Gemma4. Without this, the last-window SWA blocks and full-attention blocks from completed requests are returned to the FIFO free queue and get recycled, evicting their hashes long before they would otherwise expire. This limits practical prefix-cache retention to ~20 requests even though the pool has room for ~170. Mechanism via ref_cnt manipulation: - VLLM_PIN_PREFIX_BLOCKS=1: allocated blocks start at ref_cnt=2. At SWA-DROP for out-of-window blocks, decrement ref_cnt by 2 so they fully release and rejoin the free queue. At end-of-request free, decrement by 1, leaving blocks at ref_cnt=1, pinned with hash intact and not in free queue but still reachable via cached_block_hash_to_block lookup. - VLLM_PIN_SWA_TOKENS=N: at each SWA-DROP, pin the most-recent N // block_size blocks being dropped ref_cnt 2 to 1 while fully freeing older blocks. This preserves chunk-boundary positions inside the shared prefix range, enabling SWA 64-contig cache-hit scan to succeed on future matching requests. - VLLM_PIN_MIN_DROP_SIZE=16: skip pinning when a SWA-DROP releases fewer than this many blocks. Decode-step drops carry unique-tail hashes with no prefix-match value; unconditional pinning bloats the pinned set until the pool is exhausted and new requests stall. Net effect for 60 prefix x 25k token workload on TP=4 bf16: - Per-prefix steady-state footprint: ~1,100 blocks Full plus SWA last-window - Pool of 189,245 blocks fits 60 prefixes comfortably - SWA prefix-cache hit rate: ~90% on cached prefixes, up from ~0% Files: - envs.py: declare and parse VLLM_PIN_PREFIX_BLOCKS, VLLM_PIN_SWA_TOKENS, VLLM_PIN_MIN_DROP_SIZE - block_pool.py: conditional ref_cnt=2 init; ref_cnt_delta param on free_blocks - single_type_kv_cache_manager.py: per-block pin-vs-free split in remove_skipped_blocks gated on drop-size threshold Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

…demotion Replaces the ref_cnt=2 pinning hack (from prior commit c302f5f) with an explicit is_pinned field on KVCacheBlock, and folds in pressure-based release of pinned blocks so the scheduler never stalls when pins exceed pool capacity. Motivation ---------- The ref_cnt=2 approach overloaded ref_cnt=1 to mean either live user OR pinned prefix-cache block. That ambiguity created four loopholes: 1. Shared block + SWA-DROP delta=2: when R2 cache-hits a pinned block from R1 (ref_cnt 1 to 2 via touch) and later SWA-DROPs it via the to_free path, delta=2 undoes both R2 touch and R1 pin at once. 2. Auto-track captured non-pin ref_cnt=1 transitions. 3. Unpin then cache hit then SWA-DROP created negative ref_cnt. 4. Pin status was lost across cache-hit-then-SWA-DROP cycles. Loophole 3 was the hang source: negative-ref_cnt blocks satisfy neither the ref_cnt==0 nor the ref_cnt==1 branch and leak permanently. The pool shrinks with each affected block until no admission can succeed. Redesign -------- - KVCacheBlock gains is_pinned: bool (default False). - ref_cnt is strictly the live-user count. All deltas are 1. - BlockPool has pinned_free_deque for (ref_cnt=0, is_pinned=True) blocks. - get_new_blocks pops from free_block_queue only; pinned_free_deque is drained only via demote_n under pressure. - free_blocks routes ref_cnt-zero blocks to free_block_queue or pinned_free_deque based on is_pinned. All deltas are 1. - touch updates ref_cnt but leaves is_pinned unchanged, so pins survive cache-hit-then-release cycles. - SWA-DROP sets is_pinned=True on to_pin candidates before calling free_blocks; to_free blocks keep their prior is_pinned value. - kv_cache_manager.free marks all non-null remaining blocks as is_pinned=True before releasing them, protecting the Full-attention prefix and the SWA last-window. Pressure-based release ---------------------- BlockPool.demote_n(n) flips is_pinned=False on the oldest pinned entries and moves them to free_block_queue. Hashes survive until _maybe_evict_cached_block fires on physical reuse, so demoted blocks remain cache-hit candidates until recycled. demote_n is invoked from two admission gates in kv_cache_manager so the scheduler cannot stall: - can_fit_full_sequence: fires when the scheduler reserves the full ISL and would reject the request before allocate_slots is called. - allocate_slots first admission check (capped budget) and second check (actual demand): both hook demote_n before returning None. Files ----- - kv_cache_utils.py: is_pinned field on KVCacheBlock. - block_pool.py: pinned_free_deque, demote_n, ref_cnt=1 alloc init, free_blocks routes by is_pinned (delta=1 always; null-block skipped to keep strict ref_cnt >= 0 invariant for real blocks), touch preserves is_pinned across pinned-tier stale entries. - single_type_kv_cache_manager.py: SWA-DROP flags is_pinned before free_blocks; to_pin and to_free both use delta=1. - kv_cache_manager.py: end-of-request free marks non-null blocks as is_pinned; pressure hooks in can_fit_full_sequence and allocate_slots (both the admission-budget and actual-demand checks). Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

- Pinned tier backed by FreeKVCacheBlockQueue; oldest entries released via demote_n wired into the admission gates. - SWA pin logic lives in SlidingWindowManager; base remove_skipped_blocks stays pinning-agnostic. Dead can_fit_full_sequence removed. - free_blocks fast-paths when pinning is off; lint/format fixes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

Add VLLM_PIN_SWA_TOKENS (bool, off by default). When enabled, each SWA drop pins the current sliding-window blocks into a separate tier instead of freeing them, so the contiguous anchor a future request needs to hit the SWA prefix cache stays resident and is evicted last. Pinned blocks are demoted best-effort, oldest-first, under allocation pressure. Improves prefix-cache reuse for shared-prefix traffic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

mergify · 2026-06-03T03:46:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jhaotingc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jhaotingc · 2026-06-04T16:27:53Z

close because of duplicated of #43447

mergify Bot added the v1 label Apr 23, 2026

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

jhaotingc marked this pull request as ready for review April 24, 2026 22:43

jhaotingc requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners April 24, 2026 22:43

claude Bot reviewed Apr 24, 2026

View reviewed changes

roikoren755 reviewed May 4, 2026

View reviewed changes

mergify Bot added the needs-rebase label May 23, 2026

jhaotingc force-pushed the jhaotingc/gemma-4-swa-flush-OOW-first branch from ec6bd2f to 1f5a2c0 Compare May 28, 2026 01:53

mergify Bot removed the needs-rebase label May 28, 2026

jhaotingc force-pushed the jhaotingc/gemma-4-swa-flush-OOW-first branch 2 times, most recently from 4a0573e to e6c18d1 Compare May 29, 2026 22:45

jhaotingc and others added 4 commits May 29, 2026 20:20

jhaotingc force-pushed the jhaotingc/gemma-4-swa-flush-OOW-first branch from e6c18d1 to 9ff6c1a Compare May 30, 2026 05:41

mergify Bot added the needs-rebase label Jun 3, 2026

jhaotingc closed this Jun 4, 2026

Uh oh!

Conversation

jhaotingc commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jhaotingc commented Apr 24, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

jhaotingc commented Apr 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

mergify Bot commented May 28, 2026

Uh oh!

mergify Bot commented May 29, 2026

Uh oh!

mergify Bot commented Jun 3, 2026

Uh oh!

jhaotingc commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jhaotingc commented Apr 23, 2026 •

edited

Loading