[HiCache][HA 1/N] Support HiCache storage runtime attach/detach by alphabetc1 · Pull Request #15892 · sgl-project/sglang

alphabetc1 · 2025-12-26T11:19:03Z

Motivation

Previously, the HiCache storage backend could only be configured at process startup. Changing the backend meant restarting the whole server, which hurts availability and makes operations clumsy.

This PR adds runtime attach/detach so operators can enable, disable, or switch the L3 storage backend without restarting. This is especially useful for:

Dynamic enable/switch of HiCache storage
Turn HiCache storage on or off, or switch backends, on the fly based on load, cost, or debugging needs.
Fault tolerance (failover)
Production backends can go bad (timeouts, misconfig, partial outages). With runtime detach, you can quickly stop sending traffic to a broken backend and avoid repeated IO failures impacting the serving path. Then you can attach a healthy backend to restore service, improving resilience and reducing MTTR.
Hot upgrade (switchover)
Backends often need upgrades or migrations (new cluster, protocol changes, config updates). With runtime attach/detach, you can perform a controlled switchover when traffic is low, without restarting the server, making “hot” transitions safer and easier to operate.

Modifications

The control path is:

HTTP Server (python/sglang/srt/entrypoints/http_server.py)
- Exposes PUT /hicache/storage-backend, DELETE /hicache/storage-backend, GET /hicache/storage-backend
TokenizerManager (python/sglang/srt/managers/tokenizer_communicator_mixin.py)
- Sends the request to the Scheduler via _Communicator
Scheduler (python/sglang/srt/managers/scheduler.py)
- Performs a strict idle check
- Calls tree_cache.attach_storage_backend(...) / detach_storage_backend(...)
HiRadixCache (python/sglang/srt/mem_cache/hiradix_cache.py)
- Parses storage_backend_extra_config_json (supports both backend config and prefetch knobs)
- Calls cache_controller.attach_storage_backend(...) / detach_storage_backend(...)
HiCacheController (python/sglang/srt/managers/cache_controller.py)
- Creates/destroys the storage backend instance (via StorageBackendFactory)
- Starts/stops backend background threads at runtime (prefetch/backup)

On the Scheduler side, add a strict idle-state check: _is_idle_for_hicache_storage_op()
Conditions:

_is_no_request() is True
waiting_queue is empty
grammar_queue is empty (if enabled)

HiCacheController adds runtime operations:

attach_storage_backend(...): create the backend, register host buffers, and start prefetch/backup threads
detach_storage_backend(): stop prefetch/backup threads and release the backend (best-effort close)

New/exposed HTTP APIs

Attach: PUT /hicache/storage-backend
Detach: DELETE /hicache/storage-backend
Status: GET /hicache/storage-backend

Flow Diagram

Attach

sequenceDiagram
  participant C as Client
  participant H as HTTP Server
  participant A as Auth Middleware
  participant T as TokenizerManager
  participant Q as _Communicator/ZMQ
  participant S as Scheduler
  participant R as HiRadixCache
  participant CC as HiCacheController
  participant SB as StorageBackend

  C->>H: PUT /hicache/storage-backend (json body)
  H->>A: auth check (ADMIN_OPTIONAL)
  A-->>H: allow/deny
  H->>H: admin_api_key configured?
  H->>T: attach_hicache_storage(...)
  T->>Q: send AttachHiCacheStorageReqInput
  Q->>S: dispatch by type
  S->>S: _is_idle_for_hicache_storage_op?
  S->>R: tree_cache.attach_storage_backend(...)
  R->>CC: cache_controller.attach_storage_backend(...)
  CC->>SB: StorageBackendFactory.create_backend(...)
  SB-->>CC: backend instance
  CC-->>R: threads started + flags set
  R-->>S: ok,msg
  S-->>Q: AttachHiCacheStorageReqOutput (per-rank)
  Q-->>T: merge results
  T-->>H: success + message
  H-->>C: 200/400

Detach

sequenceDiagram
  participant C as Client
  participant H as HTTP Server
  participant A as Auth Middleware
  participant T as TokenizerManager
  participant Q as _Communicator/ZMQ
  participant S as Scheduler
  participant R as HiRadixCache
  participant CC as HiCacheController
  participant SB as StorageBackend

  C->>H: DELETE /hicache/storage-backend
  H->>A: auth check (ADMIN_OPTIONAL)
  A-->>H: allow/deny
  H->>H: admin_api_key configured?
  H->>T: detach_hicache_storage()
  T->>Q: send DetachHiCacheStorageReqInput
  Q->>S: dispatch by type
  S->>S: _is_idle_for_hicache_storage_op?
  S->>R: tree_cache.detach_storage_backend()
  R->>CC: cache_controller.detach_storage_backend()
  CC->>SB: stop threads + close backend
  SB-->>CC: cleanup done
  CC-->>R: flags reset
  R-->>S: ok,msg
  S-->>Q: DetachHiCacheStorageReqOutput (per-rank)
  Q-->>T: merge results
  T-->>H: success + message
  H-->>C: 200/400

Accuracy Tests

Benchmarking and Profiling

UT
python3 -m pytest test/srt/hicache/test_hicache_storage_runtime_attach_detach.py -v
E2E manual flow

# launch sglang with hierarchical-cache/admin-api-key enabled
export SGLANG_HICACHE_FILE_BACKEND_STORAGE_DIR=/root/code/tmp/sglang_hicache_file_test

python -m sglang.launch_server \
  --model-path /root/models/Meta-Llama-3.1-8B-Instruct \
  --host 0.0.0.0 --port 30000 \
  --enable-hierarchical-cache \
  --mem-fraction-static 0.3 \
  --page-size 64 \
  --hicache-ratio 2 \
  --admin-api-key 123 \
  --served-model-name test

# attach/update hicache storage
curl -s -X PUT http://127.0.0.1:30000/hicache/storage-backend \
  -H 'Authorization: Bearer 123' \
  -H 'Content-Type: application/json' \
  -d '{
    "hicache_storage_backend": "file",
  }'
 
# or, attach hicache storage with prefetch_policy/extra_config_json/write_policy
curl -s -X DELETE http://127.0.0.1:30000/hicache/storage-backend \
  -H 'Authorization: Bearer 123' \
  -H 'Content-Type: application/json' \
  -d '{
    "hicache_storage_backend": "file",
    "hicache_storage_backend_extra_config_json": "{}",
    "hicache_storage_prefetch_policy": "wait_complete"
    "hicache_write_policy": "write_back"
  }'

# check
curl -s GET http://127.0.0.1:30000/hicache/storage-backend -H 'Authorization: Bearer 123'

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-26T11:19:27Z

Summary of Changes

Hello @alphabetc1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a crucial enhancement to SGLang's HiCache by enabling dynamic management of its L3 storage backend. Previously, changing storage configurations necessitated a full server restart, impacting availability. With this change, operators can now attach, detach, or switch storage backends on the fly, facilitating dynamic scaling, improving fault tolerance through quick failover, and simplifying hot upgrades. The implementation ensures operational safety by enforcing a strict idle-state requirement before any storage modification, and it exposes these capabilities via new, intuitive HTTP administration endpoints.

Highlights

Runtime HiCache Storage Management: Introduces the ability to attach, detach, and switch HiCache L3 storage backends at runtime without requiring a server restart, significantly improving operational flexibility and availability.
New HTTP Admin APIs: Adds new HTTP endpoints: POST /attach_hicache_storage_backend, POST /detach_hicache_storage_backend, and GET /hicache_storage_backend for managing and querying the storage backend status.
Strict Idle-State Check: Implements a strict idle-state check in the Scheduler to ensure that attach/detach operations only occur when no requests are running or queued, preventing consistency issues.
Dynamic Thread Management: The HiCacheController now includes dedicated mechanisms to start and stop storage-related background threads (prefetch/backup) dynamically during attach and detach operations.
Comprehensive Testing and Documentation: Includes a new E2E smoke test to validate the runtime attach/detach functionality and dedicated documentation explaining the feature, its architecture, and usage.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable feature for runtime management of HiCache storage, allowing operators to attach and detach storage backends without server restarts. The implementation is robust, with careful attention to thread safety, error handling, and state consistency, particularly in the cache_controller.py. The addition of comprehensive documentation and an end-to-end test is also commendable. I have one suggestion to improve maintainability by refactoring some duplicated code.

gemini-code-assist · 2025-12-26T11:21:55Z

python/sglang/srt/managers/cache_controller.py

+            self.tp_world_size = torch.distributed.get_world_size(group=self.tp_group)
+            if self.tp_world_size > 1:
+                group_ranks = torch.distributed.get_process_group_ranks(self.tp_group)
+                self.prefetch_tp_group = torch.distributed.new_group(
+                    group_ranks, backend="gloo"
+                )
+
+            self.page_get_func = self._generic_page_get
+            self.page_set_func = self._generic_page_set
+            if (self.storage_backend_type in ["hf3fs", "mooncake", "eic"]) or (
+                self.storage_backend_type == "dynamic"
+                and bool(self.storage_config.extra_config.get("interface_v1", 0))
+            ):
+                self.page_get_func = self._page_get_zero_copy
+                self.page_set_func = self._page_set_zero_copy


There's some code duplication between this new attach_storage_backend method and the existing __init__ method.
Specifically:

The logic for creating prefetch_tp_group (lines 501-506).

The logic for selecting page_get_func and page_set_func (lines 508-515).

To improve maintainability and reduce redundancy, consider extracting these blocks into private helper methods. For example:

def _create_prefetch_tp_group(self): self.tp_world_size = torch.distributed.get_world_size(group=self.tp_group) if self.tp_world_size > 1: group_ranks = torch.distributed.get_process_group_ranks(self.tp_group) self.prefetch_tp_group = torch.distributed.new_group( group_ranks, backend="gloo" ) else: self.prefetch_tp_group = None def _select_page_transfer_funcs(self): self.page_get_func = self._generic_page_get self.page_set_func = self._generic_page_set if (self.storage_backend_type in ["hf3fs", "mooncake", "eic"]) or ( self.storage_backend_type == "dynamic" and bool(self.storage_config.extra_config.get("interface_v1", 0)) ): self.page_get_func = self._page_get_zero_copy self.page_set_func = self._page_set_zero_copy

Then you can call these helpers from both __init__ and attach_storage_backend.

alphabetc1 · 2025-12-26T18:39:56Z

TODO:
For endpoints that modify internal state, we may need an additional layer of authorization, for example as in #15908.

alphabetc1 · 2025-12-27T15:43:18Z

@xiezhq-hermann Hi, sorry to bother you — could you help review this PR? thanks

xiezhq-hermann · 2025-12-28T23:57:51Z

Thank you @alphabetc1 for the PR, I quite like this feature and I am wondering would it be possible to refactor the existing storage backend initialization using the same attach and detach interfaces as well. For example, if user specified a storage backend, it implicitly attaches a storage backend and when process shutdown it automatically detatches the storage backend. While the current PR does not change the existing execution path, there are duplication and potential maintenance issues in the long run. Let me know how your thoughts and thanks again : )

alphabetc1 · 2025-12-29T02:11:58Z

Thank you @alphabetc1 for the PR, I quite like this feature and I am wondering would it be possible to refactor the existing storage backend initialization using the same attach and detach interfaces as well. For example, if user specified a storage backend, it implicitly attaches a storage backend and when process shutdown it automatically detatches the storage backend. While the current PR does not change the existing execution path, there are duplication and potential maintenance issues in the long run. Let me know how your thoughts and thanks again : )

Thanks for the review and suggestion!
I totally agree, that’s also how I was thinking about it. I’ll update this PR to refactor the existing storage backend init to use the same attach/detach interfaces.

…ach interfaces

alphabetc1 · 2025-12-30T03:26:57Z

cc @xiezhq-hermann

stmatengss · 2026-01-04T05:49:20Z

It is a very useful PR. It can support model update and fault tolerance.

stmatengss · 2026-01-15T18:22:06Z

If the CI still fails, merge the main and run it again.

alphabetc1 · 2026-01-16T02:38:58Z

/rerun-failed-ci

alphabetc1 · 2026-01-16T07:39:25Z

/rerun-failed-ci

alphabetc1 · 2026-01-17T10:58:16Z

/rerun-failed-ci 4

alphabetc1 · 2026-01-18T08:22:31Z

/rerun-failed-ci 1

alphabetc1 · 2026-01-19T18:38:05Z

python/sglang/srt/entrypoints/http_server.py

+#   }'
+@app.api_route("/hicache/storage-backend", methods=["PUT"])
+@auth_level(AuthLevel.ADMIN_OPTIONAL)
+async def attach_hicache_storage_backend(obj: AttachHiCacheStorageReqInput):


switched to a more RESTful API, cc @slin1237 @stmatengss

Complies with router standard. LGTM.

alphabetc1 · 2026-01-20T03:35:02Z

/rerun-failed-ci

alphabetc1 · 2026-01-20T18:23:01Z

/rerun-failed-ci 1

slin1237 · 2026-01-27T03:15:25Z

python/sglang/srt/entrypoints/http_server.py


 @app.api_route("/clear_hicache_storage_backend", methods=["GET", "POST"])
 @auth_level(AuthLevel.ADMIN_OPTIONAL)
+async def clear_hicache_storage_backend_deprecated():


…project#15892)

alphabetc1 requested review from CatherineSue, JustinTong0323, Ying1123, hanming-lu, hnyls2002, ispobock, merrymercy, slin1237, xiezhq-hermann and yizhang2077 as code owners December 26, 2025 11:19

github-actions bot added documentation Improvements or additions to documentation hicache Hierarchical Caching for SGLang labels Dec 26, 2025

gemini-code-assist bot reviewed Dec 26, 2025

View reviewed changes

xiezhq-hermann self-assigned this Dec 28, 2025

xiezhq-hermann added the run-ci label Dec 28, 2025

alphabetc1 added 6 commits December 30, 2025 10:50

[HiCache]: support runtime attach/detach hicache storage

2b56246

add ut

72e3929

support hicache_storage_prefetch_policy

1b51810

fix

003f7b2

refactor the existing storage backend init to use the same attach/det…

775c998

…ach interfaces

fix ci

fab4275

alphabetc1 force-pushed the feat/hicache_store_runtime_attach_detach branch from 1295769 to fab4275 Compare December 30, 2025 02:53

fix

9fac448

Merge branch 'main' into feat/hicache_store_runtime_attach_detach

c23477c

alphabetc1 added 2 commits January 17, 2026 01:58

Merge branch 'main' into feat/hicache_store_runtime_attach_detach

b8fe011

mock ADMIN_FORCE

105e7d5

Merge branch 'main' into feat/hicache_store_runtime_attach_detach

0ef30a8

alphabetc1 added 2 commits January 20, 2026 02:27

make API more RESTful

4e6b48b

Merge branch 'main' into feat/hicache_store_runtime_attach_detach

a6f0610

alphabetc1 commented Jan 19, 2026

View reviewed changes

xiezhq-hermann added the high priority label Jan 20, 2026

Merge branch 'main' into feat/hicache_store_runtime_attach_detach

b25a6c7

This comment was marked as outdated.

Sign in to view

alphabetc1 mentioned this pull request Jan 21, 2026

[HiCache][RFC] SLA-oriented high availability for HiCache storage #17521

Draft

alphabetc1 changed the title ~~[HiCache]: Support HiCache storage runtime attach/detach~~ [HiCache][HA 1/N] Support HiCache storage runtime attach/detach Jan 22, 2026

alphabetc1 added 3 commits January 22, 2026 17:56

Merge branch 'main' into feat/hicache_store_runtime_attach_detach

ad084ba

fix rebase

fc912a5

refactor drain_storage_control_queues

6a1cf31

xiezhq-hermann approved these changes Jan 27, 2026

View reviewed changes

xiezhq-hermann assigned slin1237 Jan 27, 2026

slin1237 approved these changes Jan 27, 2026

View reviewed changes

xiezhq-hermann merged commit fd3b179 into sgl-project:main Jan 27, 2026
405 of 424 checks passed

Chen-0210 pushed a commit to Chen-0210/sglang that referenced this pull request Jan 30, 2026

[HiCache][HA 1/N] Support HiCache storage runtime attach/detach (sgl-…

b24a180

…project#15892)

alphabetc1 deleted the feat/hicache_store_runtime_attach_detach branch February 6, 2026 08:09

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

[HiCache][HA 1/N] Support HiCache storage runtime attach/detach (sgl-…

c1fc748

…project#15892)

Conversation

alphabetc1 commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Flow Diagram

Attach

Detach

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Dec 26, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

alphabetc1 commented Dec 26, 2025

Uh oh!

alphabetc1 commented Dec 27, 2025

Uh oh!

xiezhq-hermann commented Dec 28, 2025

Uh oh!

alphabetc1 commented Dec 29, 2025

Uh oh!

alphabetc1 commented Dec 30, 2025

Uh oh!

stmatengss commented Jan 4, 2026

Uh oh!

stmatengss commented Jan 15, 2026

Uh oh!

alphabetc1 commented Jan 16, 2026

Uh oh!

alphabetc1 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alphabetc1 commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alphabetc1 commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alphabetc1 Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

stmatengss Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

alphabetc1 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alphabetc1 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

slin1237 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alphabetc1 commented Dec 26, 2025 •

edited

Loading

alphabetc1 commented Jan 16, 2026 •

edited

Loading

alphabetc1 commented Jan 17, 2026 •

edited

Loading

alphabetc1 commented Jan 18, 2026 •

edited

Loading

alphabetc1 commented Jan 20, 2026 •

edited

Loading

alphabetc1 commented Jan 20, 2026 •

edited

Loading