[Utils] Make request dump robust to unpicklable server_args and large meta_info#24767
Merged
[Utils] Make request dump robust to unpicklable server_args and large meta_info#24767
Conversation
Three related improvements to the /configure_logging request dump pipeline that surfaced when running with --trust-remote-code and MoE models that emit large per-request meta_info blobs: 1. Pickle safety. ServerArgs.get_model_config() lazily attaches the resolved ModelConfig back onto the ServerArgs instance. With --trust-remote-code, that ModelConfig holds an hf_config whose class lives under the dynamic transformers_modules.<repo> namespace, which is not safely picklable (pickle's class identity round-trip fails when the dynamic module is re-exec'd). Wrap the pickle.dump in try/except in both _dump_data_to_file and dump_requests_before_crash; on failure, retry with server_args=None so the request data still gets persisted instead of leaving an empty/corrupt file. 2. meta_info key filtering. Request dumps grow rapidly when the server runs MoE models with --enable-routing-replay (each finished request stashes a base64-encoded routed_experts tensor in meta_info). hidden_states is similarly bulky when --return-hidden-states is on. Add a configurable list dump_requests_exclude_meta_keys on the tokenizer manager and ConfigureLoggingReq, defaulting to ["routed_experts", "hidden_states"]. Filter those keys out of meta_info via a shallow copy in dump_requests so the original out_dict (still referenced by the response path / observers) is not mutated. 3. CLI surface. Surface the new option in the configure_logging CLI as --dump-requests-exclude-meta-keys 'a,b,c' (empty string keeps all). Existing callers that don't pass the flag get the smaller dumps for free. Pass an empty list to /configure_logging to restore the previous behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Collaborator
Author
|
/tag-and-run-ci |
Collaborator
Author
|
/tag-and-rerun-ci |
ispobock
reviewed
May 10, 2026
Remove descriptive comments that restate what the code does; keep only concise "why" notes on the non-obvious pickle fallback. Co-authored-by: Cursor <cursoragent@cursor.com>
ispobock
approved these changes
May 10, 2026
ByronHsu
added a commit
that referenced
this pull request
May 10, 2026
…lable server_args and large meta_info (#24902) Co-authored-by: Byron Hsu <byron@periodiclabs.ai> Co-authored-by: Cursor <cursoragent@cursor.com>
ltcs11
added a commit
to ltcs11/sglang
that referenced
this pull request
May 11, 2026
* main: (87 commits) [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629) fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217) Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742) Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785) [NPU]Documentation update for communications quantization feature (sgl-project#24668) [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851) [Model] Add MiniCPM-V 4.6 support (sgl-project#24855) Support Intern-S2-Preview (sgl-project#24875) [PD] Unify dsv4 dispatch with swa (sgl-project#24888) Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775) Fix PD bootstrap failure handling (sgl-project#24772) [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881) [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878) [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859) [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696) [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826) [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456) [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854) [diffusion] CI: add cache-dit CI tests (sgl-project#19213) [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767) ... # Conflicts: # python/sglang/srt/utils/common.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Two issues with
/configure_loggingrequest dumps surfaced under--trust-remote-code+ MoE models:ServerArgs.get_model_config()lazily attaches aModelConfigwhosehf_configlives under the dynamictransformers_modules.*namespace and isn't safely picklable. Both_dump_data_to_fileanddump_requests_before_crashthen leave an empty/corrupt.pkland the request data is lost.--enable-routing-replayeach finished request stashes a base64'drouted_expertstensor inmeta_info; same forhidden_statesunder--return-hidden-states. Neither is used by the replay tooling.Modifications
_dump_data_to_fileanddump_requests_before_crash; on failure, retry withserver_args=Noneso request data is still persisted.meta_infokey filtering: newdump_requests_exclude_meta_keysfield onTokenizerManagerandConfigureLoggingReq, defaulting to["routed_experts", "hidden_states"]. Strips those keys frommeta_infovia a shallow copy indump_requests(does not mutate the originalout_dict).python -m sglang.srt.managers.configure_logging --dump-requests-exclude-meta-keys 'a,b,c'(empty string keeps all). Pass[]to/configure_loggingto restore previous behavior.Checklist
Co-authored-by: bingyuhsu <byronhsu1230@gmail.com>in the commit message after the PR is merged.