[Utils] Make request dump robust to unpicklable server_args and large meta_info by ByronHsu · Pull Request #24767 · sgl-project/sglang

ByronHsu · 2026-05-09T05:29:25Z

Motivation

Two issues with /configure_logging request dumps surfaced under --trust-remote-code + MoE models:

Pickle fails silently — ServerArgs.get_model_config() lazily attaches a ModelConfig whose hf_config lives under the dynamic transformers_modules.* namespace and isn't safely picklable. Both _dump_data_to_file and dump_requests_before_crash then leave an empty/corrupt .pkl and the request data is lost.
Dumps balloon — with --enable-routing-replay each finished request stashes a base64'd routed_experts tensor in meta_info; same for hidden_states under --return-hidden-states. Neither is used by the replay tooling.

Modifications

Pickle safety: wrap the dump in try/except in both _dump_data_to_file and dump_requests_before_crash; on failure, retry with server_args=None so request data is still persisted.
meta_info key filtering: new dump_requests_exclude_meta_keys field on TokenizerManager and ConfigureLoggingReq, defaulting to ["routed_experts", "hidden_states"]. Strips those keys from meta_info via a shallow copy in dump_requests (does not mutate the original out_dict).
CLI: python -m sglang.srt.managers.configure_logging --dump-requests-exclude-meta-keys 'a,b,c' (empty string keeps all). Pass [] to /configure_logging to restore previous behavior.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Update documentation / docstrings as needed.
For reviewers: if you intend to acknowledge my contribution, please do so by including Co-authored-by: bingyuhsu <byronhsu1230@gmail.com> in the commit message after the PR is merged.

Three related improvements to the /configure_logging request dump pipeline that surfaced when running with --trust-remote-code and MoE models that emit large per-request meta_info blobs: 1. Pickle safety. ServerArgs.get_model_config() lazily attaches the resolved ModelConfig back onto the ServerArgs instance. With --trust-remote-code, that ModelConfig holds an hf_config whose class lives under the dynamic transformers_modules.<repo> namespace, which is not safely picklable (pickle's class identity round-trip fails when the dynamic module is re-exec'd). Wrap the pickle.dump in try/except in both _dump_data_to_file and dump_requests_before_crash; on failure, retry with server_args=None so the request data still gets persisted instead of leaving an empty/corrupt file. 2. meta_info key filtering. Request dumps grow rapidly when the server runs MoE models with --enable-routing-replay (each finished request stashes a base64-encoded routed_experts tensor in meta_info). hidden_states is similarly bulky when --return-hidden-states is on. Add a configurable list dump_requests_exclude_meta_keys on the tokenizer manager and ConfigureLoggingReq, defaulting to ["routed_experts", "hidden_states"]. Filter those keys out of meta_info via a shallow copy in dump_requests so the original out_dict (still referenced by the response path / observers) is not mutated. 3. CLI surface. Surface the new option in the configure_logging CLI as --dump-requests-exclude-meta-keys 'a,b,c' (empty string keeps all). Existing callers that don't pass the flag get the smaller dumps for free. Pass an empty list to /configure_logging to restore the previous behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

gemini-code-assist · 2026-05-09T05:29:29Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ByronHsu · 2026-05-09T05:30:29Z

/tag-and-run-ci

ByronHsu · 2026-05-09T05:42:46Z

/tag-and-rerun-ci

Remove descriptive comments that restate what the code does; keep only concise "why" notes on the non-obvious pickle fallback. Co-authored-by: Cursor <cursoragent@cursor.com>

…lable server_args and large meta_info (#24902) Co-authored-by: Byron Hsu <byron@periodiclabs.ai> Co-authored-by: Cursor <cursoragent@cursor.com>

* main: (87 commits) [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629) fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217) Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742) Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785) [NPU]Documentation update for communications quantization feature (sgl-project#24668) [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851) [Model] Add MiniCPM-V 4.6 support (sgl-project#24855) Support Intern-S2-Preview (sgl-project#24875) [PD] Unify dsv4 dispatch with swa (sgl-project#24888) Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775) Fix PD bootstrap failure handling (sgl-project#24772) [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881) [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878) [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859) [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696) [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826) [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456) [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854) [diffusion] CI: add cache-dit CI tests (sgl-project#19213) [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767) ... # Conflicts: # python/sglang/srt/utils/common.py

ByronHsu requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners May 9, 2026 05:29

ByronHsu changed the title ~~Make request dump robust to unpicklable server_args and large meta_info~~ [Dump] Make request dump robust to unpicklable server_args and large meta_info May 9, 2026

ByronHsu changed the title ~~[Dump] Make request dump robust to unpicklable server_args and large meta_info~~ [Utils] Make request dump robust to unpicklable server_args and large meta_info May 9, 2026

github-actions Bot added the run-ci label May 9, 2026

ispobock reviewed May 10, 2026

View reviewed changes

Comment thread python/sglang/srt/managers/io_struct.py Outdated

Trim verbose comments per review feedback

0cf24e4

Remove descriptive comments that restate what the code does; keep only concise "why" notes on the non-obvious pickle fallback. Co-authored-by: Cursor <cursoragent@cursor.com>

ispobock approved these changes May 10, 2026

View reviewed changes

ByronHsu merged commit 1e6c6d1 into main May 10, 2026
43 of 83 checks passed

ByronHsu deleted the byron/upstream-dump-request-fixes branch May 10, 2026 04:41

ByronHsu mentioned this pull request May 10, 2026

[sglang-miles] Cherry-pick #24767: Make request dump robust to unpicklable server_args and large meta_info #24902

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Utils] Make request dump robust to unpicklable server_args and large meta_info#24767

[Utils] Make request dump robust to unpicklable server_args and large meta_info#24767
ByronHsu merged 2 commits intomainfrom
byron/upstream-dump-request-fixes

ByronHsu commented May 9, 2026

Uh oh!

gemini-code-assist Bot commented May 9, 2026

Uh oh!

ByronHsu commented May 9, 2026

Uh oh!

ByronHsu commented May 9, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ByronHsu commented May 9, 2026

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist Bot commented May 9, 2026

Uh oh!

ByronHsu commented May 9, 2026

Uh oh!

ByronHsu commented May 9, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants