-
Notifications
You must be signed in to change notification settings - Fork 5k
[HiCache][HA 1/N] Support HiCache storage runtime attach/detach #15892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2b56246
72e3929
1b51810
003f7b2
775c998
fab4275
9fac448
e878adf
6033659
5a130de
59a479a
2934b8a
908fa97
86da98a
bb7e8d7
5d384fb
c23477c
b8fe011
105e7d5
0ef30a8
4e6b48b
a6f0610
b25a6c7
ad084ba
fc912a5
6a1cf31
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,132 @@ | ||
| # Runtime Attach/Detach HiCache Storage Backend (No Restart) | ||
|
|
||
| This document explains how to **dynamically attach/detach the HiCache L3 storage backend at runtime** (e.g., `mooncake` / `hf3fs` / `nixl` / `file` / `aibrix` / `eic`) while **SGLang is already running and serving traffic**, without restarting the process. | ||
|
|
||
| For safety and consistency, the current implementation **strictly requires** these operations to happen only when the service is **idle**: | ||
|
|
||
| - **No running requests** | ||
| - **No waiting/queued requests** | ||
|
|
||
| If the idle condition is not met, the API will fail fast (HTTP 400) and **will not modify** the current service state. | ||
|
|
||
| --- | ||
|
|
||
| ## 1. Background and implementation overview | ||
|
|
||
| ### 1.1 Architecture / control path | ||
|
|
||
| The control path is: | ||
|
|
||
| 1. **HTTP Server** (`python/sglang/srt/entrypoints/http_server.py`) | ||
| - Exposes `PUT /hicache/storage-backend`, `DELETE /hicache/storage-backend`, `GET /hicache/storage-backend` | ||
| 2. **TokenizerManager** (`python/sglang/srt/managers/tokenizer_communicator_mixin.py`) | ||
| - Sends the request to the Scheduler via `_Communicator` | ||
| 3. **Scheduler** (`python/sglang/srt/managers/scheduler.py`) | ||
| - Performs a **strict idle check** | ||
| - Calls `tree_cache.attach_storage_backend(...)` / `detach_storage_backend(...)` | ||
| 4. **HiRadixCache** (`python/sglang/srt/mem_cache/hiradix_cache.py`) | ||
| - Parses `hicache_storage_backend_extra_config_json` (supports both backend config and prefetch knobs) | ||
| - Calls `cache_controller.attach_storage_backend(...)` / `detach_storage_backend(...)` | ||
| 5. **HiCacheController** (`python/sglang/srt/managers/cache_controller.py`) | ||
| - Creates/destroys the storage backend instance (via `StorageBackendFactory`) | ||
| - Starts/stops backend background threads at runtime (prefetch/backup) | ||
|
|
||
| --- | ||
|
|
||
| ## 2. Idle-state requirement (strict) | ||
|
|
||
| The Scheduler uses a stricter `_is_idle_for_hicache_storage_op()`: | ||
|
|
||
| - `_is_no_request()` is true (covers running/overlap/pp/disagg and other active states) | ||
| - `waiting_queue` is empty | ||
| - `grammar_queue` is empty (if the grammar backend is enabled) | ||
|
|
||
| If the condition is not met, attach/detach returns an error like: | ||
|
|
||
| - `Reject attach: scheduler is not idle. #queue-req=... #running-req=...` | ||
|
|
||
| > Tip: before switching, drain upstream traffic and wait for the server to become idle, then call attach/detach. | ||
|
|
||
| ### 2.1 DP (data parallel) semantics | ||
|
|
||
| When `dp_size > 1`, the tokenizer dispatches the request to **all DP scheduler instances** and aggregates their responses: | ||
|
|
||
| - The final `success` is **true only if all DP ranks return success** | ||
| - The final `message` concatenates messages from all DP ranks | ||
|
|
||
| This is intended to prevent “silent partial success”, but it also means you may see: | ||
|
|
||
| - Overall **failure** even though **some ranks already succeeded** | ||
|
|
||
| Currently there is **no automatic partial rollback** across DP ranks (see TODO in code). Operationally: | ||
|
|
||
| - Prefer to keep backend config identical across ranks | ||
| - If attach fails, immediately call detach (best-effort/idempotent), fix config, then retry attach | ||
|
|
||
| --- | ||
|
|
||
| ## 3. How to use (HTTP Admin API) | ||
|
|
||
| The examples below assume your SGLang HTTP server is at `http://127.0.0.1:30000`. | ||
|
|
||
| ### 3.1 Query current storage backend status | ||
|
|
||
| ```bash | ||
| curl -s http://127.0.0.1:30000/hicache/storage-backend | ||
| ``` | ||
|
|
||
| Example response: | ||
|
|
||
| ```json | ||
| { | ||
| "hicache_storage_backend": "mooncake", | ||
| "hicache_storage_backend_extra_config": "{\"master_server_address\":\"127.0.0.1:50051\", ...}" | ||
| } | ||
| ``` | ||
|
|
||
| ### 3.2 Attach (enable) a storage backend | ||
| ```bash | ||
| curl -s -X PUT http://127.0.0.1:30000/hicache/storage-backend \ | ||
| -H 'Content-Type: application/json' \ | ||
| -d '{ | ||
| "hicache_storage_backend": "mooncake" | ||
| }' | ||
| ``` | ||
|
|
||
| ```bash | ||
| curl -s -X PUT http://127.0.0.1:30000/hicache/storage-backend \ | ||
| -H 'Content-Type: application/json' \ | ||
| -d '{ | ||
| "hicache_storage_backend": "mooncake", | ||
| "hicache_storage_backend_extra_config_json": "{\"master_server_address\":\"127.0.0.1:50051\",\"protocol\":\"tcp\",\"global_segment_size\":\"4gb\",\"prefetch_threshold\":256}", | ||
| "hicache_storage_prefetch_policy": "timeout" | ||
| }' | ||
| ``` | ||
|
|
||
| Notes: | ||
|
|
||
| - `hicache_storage_backend_extra_config_json` can include both: | ||
| - **Backend configuration** (e.g., Mooncake master/metadata/protocol, etc.) | ||
| - **Prefetch configuration** (`prefetch_threshold`, `prefetch_timeout_base`, `prefetch_timeout_per_ki_token`, `hicache_storage_pass_prefix_keys`) | ||
|
|
||
| ### 3.3 Detach (disable) the storage backend | ||
|
|
||
| ```bash | ||
| curl -s -X DELETE http://127.0.0.1:30000/hicache/storage-backend | ||
| ``` | ||
|
|
||
| Notes: | ||
|
|
||
| - Detach only makes SGLang **stop using** the L3 storage backend and stops prefetch/backup threads | ||
| - It **does not automatically delete** data stored in Mooncake/HF3FS (or other remote backends) | ||
|
|
||
| --- | ||
|
|
||
| ## 4. Behavior and caveats | ||
|
|
||
| - **No restart required**: attach/detach switches in-process at runtime | ||
| - **Must be idle**: otherwise the request is rejected to avoid consistency issues | ||
| - **Host KV layout constraints still apply**: for example, Mooncake still requires layouts like `page_first/page_first_direct/page_head`; if the server's HiCache host-memory layout does not satisfy the backend requirements, attach will fail with an error | ||
| - **Observability**: | ||
| - After attach, `server_args.hicache_storage_backend*` is updated on both the tokenizer and scheduler sides | ||
| - If metrics are enabled, attach will create a storage metrics collector in `HiRadixCache` on demand |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -93,6 +93,7 @@ | |
| from sglang.srt.function_call.function_call_parser import FunctionCallParser | ||
| from sglang.srt.managers.io_struct import ( | ||
| AbortReq, | ||
| AttachHiCacheStorageReqInput, | ||
| CheckWeightsReqInput, | ||
| CloseSessionReqInput, | ||
| ConfigureLoggingReq, | ||
|
|
@@ -693,6 +694,22 @@ async def flush_cache(): | |
|
|
||
| @app.api_route("/clear_hicache_storage_backend", methods=["GET", "POST"]) | ||
| @auth_level(AuthLevel.ADMIN_OPTIONAL) | ||
| async def clear_hicache_storage_backend_deprecated(): | ||
| """Deprecated: use POST /hicache/storage-backend/clear.""" | ||
| ret = await _global_state.tokenizer_manager.clear_hicache_storage() | ||
| return Response( | ||
| content=( | ||
| "Deprecated endpoint. Use POST /hicache/storage-backend/clear.\n" | ||
| "Hierarchical cache storage backend cleared.\n" | ||
| ), | ||
| status_code=200 if ret.success else HTTPStatus.BAD_REQUEST, | ||
| ) | ||
|
|
||
|
|
||
| # example usage: | ||
| # curl -s -X POST http://127.0.0.1:30000/clear_hicache_storage_backend | ||
| @app.api_route("/hicache/storage-backend/clear", methods=["POST"]) | ||
| @auth_level(AuthLevel.ADMIN_OPTIONAL) | ||
| async def clear_hicache_storage_backend(): | ||
| """Clear the hierarchical cache storage backend.""" | ||
| ret = await _global_state.tokenizer_manager.clear_hicache_storage() | ||
|
|
@@ -702,6 +719,89 @@ async def clear_hicache_storage_backend(): | |
| ) | ||
|
|
||
|
|
||
| # example usage: | ||
| # curl -s -X PUT http://127.0.0.1:30000/hicache/storage-backend \ | ||
| # -H 'Content-Type: application/json' \ | ||
| # -d '{ | ||
| # "hicache_storage_backend": "file", | ||
| # "hicache_storage_backend_extra_config_json": "{}", | ||
| # "hicache_storage_prefetch_policy": "timeout", | ||
| # "hicache_write_policy": "write_through" | ||
| # }' | ||
| @app.api_route("/hicache/storage-backend", methods=["PUT"]) | ||
| @auth_level(AuthLevel.ADMIN_OPTIONAL) | ||
| async def attach_hicache_storage_backend(obj: AttachHiCacheStorageReqInput): | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. switched to a more RESTful API, cc @slin1237 @stmatengss
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Complies with router standard. LGTM. |
||
| """Attach (enable) HiCache storage backend at runtime. | ||
|
|
||
| Only allowed when there are NO running / queued requests. | ||
| """ | ||
| if not _global_state.tokenizer_manager.server_args.admin_api_key: | ||
| return _admin_api_key_missing_response() | ||
|
|
||
| ret = await _global_state.tokenizer_manager.attach_hicache_storage( | ||
| hicache_storage_backend=obj.hicache_storage_backend, | ||
| hicache_storage_backend_extra_config_json=obj.hicache_storage_backend_extra_config_json, | ||
| hicache_storage_prefetch_policy=obj.hicache_storage_prefetch_policy, | ||
| hicache_write_policy=obj.hicache_write_policy, | ||
| ) | ||
| msg = getattr(ret, "message", "") | ||
| return Response( | ||
| content=( | ||
| ( | ||
| "HiCache storage backend attached.\n" | ||
| if ret.success | ||
| else "Failed to attach HiCache storage backend.\n" | ||
| ) | ||
| + (msg + "\n" if msg else "") | ||
| ), | ||
| status_code=200 if ret.success else HTTPStatus.BAD_REQUEST, | ||
| ) | ||
|
|
||
|
|
||
| # example usage: | ||
| # curl -s -X DELETE http://127.0.0.1:30000/hicache/storage-backend | ||
| @app.api_route("/hicache/storage-backend", methods=["DELETE"]) | ||
| @auth_level(AuthLevel.ADMIN_OPTIONAL) | ||
| async def detach_hicache_storage_backend(): | ||
| """Detach (disable) HiCache storage backend at runtime. | ||
|
|
||
| Only allowed when there are NO running / queued requests. | ||
| """ | ||
| if not _global_state.tokenizer_manager.server_args.admin_api_key: | ||
| return _admin_api_key_missing_response() | ||
|
|
||
| ret = await _global_state.tokenizer_manager.detach_hicache_storage() | ||
| msg = getattr(ret, "message", "") | ||
| return Response( | ||
| content=( | ||
| ( | ||
| "HiCache storage backend detached.\n" | ||
| if ret.success | ||
| else "Failed to detach HiCache storage backend.\n" | ||
| ) | ||
| + (msg + "\n" if msg else "") | ||
| ), | ||
| status_code=200 if ret.success else HTTPStatus.BAD_REQUEST, | ||
| ) | ||
|
|
||
|
|
||
| # example usage: | ||
| # curl -s http://127.0.0.1:30000/hicache/storage-backend | ||
| @app.get("/hicache/storage-backend") | ||
| @auth_level(AuthLevel.ADMIN_OPTIONAL) | ||
| async def hicache_storage_backend_status(): | ||
| """Get current HiCache storage backend status (tokenizer-side view).""" | ||
| if not _global_state.tokenizer_manager.server_args.admin_api_key: | ||
| return _admin_api_key_missing_response() | ||
|
|
||
| return { | ||
| "hicache_storage_backend": _global_state.tokenizer_manager.server_args.hicache_storage_backend, | ||
| "hicache_storage_backend_extra_config": _global_state.tokenizer_manager.server_args.hicache_storage_backend_extra_config, | ||
| "hicache_storage_prefetch_policy": _global_state.tokenizer_manager.server_args.hicache_storage_prefetch_policy, | ||
| "hicache_write_policy": _global_state.tokenizer_manager.server_args.hicache_write_policy, | ||
| } | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider some other status, such as
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
|
||
|
|
||
| @app.api_route("/start_profile", methods=["GET", "POST"]) | ||
| @auth_level(AuthLevel.ADMIN_OPTIONAL) | ||
| async def start_profile_async(obj: Optional[ProfileReqInput] = None): | ||
|
|
@@ -1489,6 +1589,27 @@ def _create_error_response(e): | |
| ) | ||
|
|
||
|
|
||
| # FIXME: In theory we should configure ADMIN_FORCE for some entrypoints, but doing so | ||
| # would currently cause all endpoints to go through add_api_key_middleware | ||
| # (even when neither api-key nor admin-api-key is configured). | ||
| # | ||
| # For now, we simulate ADMIN_FORCE by explicitly checking the admin API key parameter. | ||
| # Once the auth wiring is refactored so ADMIN_FORCE only affects the intended | ||
| # admin endpoints, we should switch this logic to use ADMIN_FORCE directly. | ||
| def _admin_api_key_missing_response( | ||
| status_code: HTTPStatus = HTTPStatus.BAD_REQUEST, | ||
| ) -> ORJSONResponse: | ||
| return ORJSONResponse( | ||
| content={ | ||
| "error": ( | ||
| "This endpoint requires admin API key, but this server was started " | ||
| "without one (admin-api-key). Restart with --admin-api-key to enable." | ||
| ) | ||
| }, | ||
| status_code=status_code, | ||
| ) | ||
|
|
||
|
|
||
| # Minimal 32x32 black PNG (base64, GLM4v requires at least 32x32 sized image) | ||
| MINIMUM_PNG_PICTURE_BASE64 = "iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAbUlEQVRYhe3VsQ2AMAxE0Y/lIgNQULD/OqyCMgCihCKSG4yRuKuiNH6JLsoEbMACOGBcua9HOR7Y6w6swBwMy0qLTpkeI77qdEBpBFAHBBDAGH8WrwJKI4AAegUCfAKgEgpQDvh3CR3oQCuav58qlAw73kKCSgAAAABJRU5ErkJggg==" | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!!!