Skip to content

[Utils] Make request dump robust to unpicklable server_args and large meta_info#24767

Merged
ByronHsu merged 2 commits intomainfrom
byron/upstream-dump-request-fixes
May 10, 2026
Merged

[Utils] Make request dump robust to unpicklable server_args and large meta_info#24767
ByronHsu merged 2 commits intomainfrom
byron/upstream-dump-request-fixes

Conversation

@ByronHsu
Copy link
Copy Markdown
Collaborator

@ByronHsu ByronHsu commented May 9, 2026

Motivation

Two issues with /configure_logging request dumps surfaced under --trust-remote-code + MoE models:

  1. Pickle fails silentlyServerArgs.get_model_config() lazily attaches a ModelConfig whose hf_config lives under the dynamic transformers_modules.* namespace and isn't safely picklable. Both _dump_data_to_file and dump_requests_before_crash then leave an empty/corrupt .pkl and the request data is lost.
  2. Dumps balloon — with --enable-routing-replay each finished request stashes a base64'd routed_experts tensor in meta_info; same for hidden_states under --return-hidden-states. Neither is used by the replay tooling.

Modifications

  • Pickle safety: wrap the dump in try/except in both _dump_data_to_file and dump_requests_before_crash; on failure, retry with server_args=None so request data is still persisted.
  • meta_info key filtering: new dump_requests_exclude_meta_keys field on TokenizerManager and ConfigureLoggingReq, defaulting to ["routed_experts", "hidden_states"]. Strips those keys from meta_info via a shallow copy in dump_requests (does not mutate the original out_dict).
  • CLI: python -m sglang.srt.managers.configure_logging --dump-requests-exclude-meta-keys 'a,b,c' (empty string keeps all). Pass [] to /configure_logging to restore previous behavior.

Checklist

  • Format your code according to the Code Formatting with Pre-Commit.
  • Update documentation / docstrings as needed.
  • For reviewers: if you intend to acknowledge my contribution, please do so by including Co-authored-by: bingyuhsu <byronhsu1230@gmail.com> in the commit message after the PR is merged.

Three related improvements to the /configure_logging request dump
pipeline that surfaced when running with --trust-remote-code and MoE
models that emit large per-request meta_info blobs:

1. Pickle safety. ServerArgs.get_model_config() lazily attaches the
   resolved ModelConfig back onto the ServerArgs instance. With
   --trust-remote-code, that ModelConfig holds an hf_config whose
   class lives under the dynamic transformers_modules.<repo> namespace,
   which is not safely picklable (pickle's class identity round-trip
   fails when the dynamic module is re-exec'd). Wrap the pickle.dump
   in try/except in both _dump_data_to_file and dump_requests_before_crash;
   on failure, retry with server_args=None so the request data still
   gets persisted instead of leaving an empty/corrupt file.

2. meta_info key filtering. Request dumps grow rapidly when the server
   runs MoE models with --enable-routing-replay (each finished request
   stashes a base64-encoded routed_experts tensor in meta_info).
   hidden_states is similarly bulky when --return-hidden-states is on.
   Add a configurable list dump_requests_exclude_meta_keys on the
   tokenizer manager and ConfigureLoggingReq, defaulting to
   ["routed_experts", "hidden_states"]. Filter those keys out of
   meta_info via a shallow copy in dump_requests so the original
   out_dict (still referenced by the response path / observers) is
   not mutated.

3. CLI surface. Surface the new option in the configure_logging CLI
   as --dump-requests-exclude-meta-keys 'a,b,c' (empty string keeps
   all). Existing callers that don't pass the flag get the smaller
   dumps for free.

Pass an empty list to /configure_logging to restore the previous
behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ByronHsu ByronHsu changed the title Make request dump robust to unpicklable server_args and large meta_info [Dump] Make request dump robust to unpicklable server_args and large meta_info May 9, 2026
@ByronHsu ByronHsu changed the title [Dump] Make request dump robust to unpicklable server_args and large meta_info [Utils] Make request dump robust to unpicklable server_args and large meta_info May 9, 2026
@ByronHsu
Copy link
Copy Markdown
Collaborator Author

ByronHsu commented May 9, 2026

/tag-and-run-ci

@ByronHsu
Copy link
Copy Markdown
Collaborator Author

ByronHsu commented May 9, 2026

/tag-and-rerun-ci

@github-actions github-actions Bot added the run-ci label May 9, 2026
Comment thread python/sglang/srt/managers/io_struct.py Outdated
Remove descriptive comments that restate what the code does; keep
only concise "why" notes on the non-obvious pickle fallback.

Co-authored-by: Cursor <cursoragent@cursor.com>
@ByronHsu ByronHsu merged commit 1e6c6d1 into main May 10, 2026
43 of 83 checks passed
@ByronHsu ByronHsu deleted the byron/upstream-dump-request-fixes branch May 10, 2026 04:41
ByronHsu added a commit that referenced this pull request May 10, 2026
…lable server_args and large meta_info (#24902)

Co-authored-by: Byron Hsu <byron@periodiclabs.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
ltcs11 added a commit to ltcs11/sglang that referenced this pull request May 11, 2026
* main: (87 commits)
  [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629)
  fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217)
  Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742)
  Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785)
  [NPU]Documentation update for communications quantization feature (sgl-project#24668)
  [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851)
  [Model] Add MiniCPM-V 4.6 support (sgl-project#24855)
  Support Intern-S2-Preview (sgl-project#24875)
  [PD] Unify dsv4 dispatch with swa (sgl-project#24888)
  Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775)
  Fix PD bootstrap failure handling (sgl-project#24772)
  [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881)
  [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878)
  [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859)
  [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696)
  [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826)
  [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456)
  [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854)
  [diffusion] CI: add cache-dit CI tests (sgl-project#19213)
  [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767)
  ...

# Conflicts:
#	python/sglang/srt/utils/common.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants