fix: resolve tensor file overwrite between target and draft models by yaya159456 · Pull Request #21694 · sgl-project/sglang

yaya159456 · 2026-03-30T14:05:35Z

In Eagle mode, tensor files generated by the target and draft models share the same output paths, leading to unintended overwriting.

This change separates the file outputs to prevent conflicts.
This change only affects file output paths and does not impact model computation or performance.
Fixes #21721

Motivation

Fix the issue where tensor dump files from the target and draft models overwrite each other in Eagle mode due to sharing the same output directory.

Modifications

Add role-based subdirectories for tensor dump outputs in Eagle mode:
- Append "draft" for draft workers
- Append "target" for target workers
Keep the original behavior unchanged for non-Eagle modes
Ensure compatibility with existing debug tensor dump configurations

Accuracy Tests

N/A (no impact on model outputs)

Speed Tests and Profiling

N/A (no impact on performance)

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the (not applicable, as this change only affects file output paths).
Update documentation according to (not applicable, no user-facing changes).
Provide accuracy and speed benchmark results according to(no impact on model outputs or performance).
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

In Eagle mode, tensor files generated by the target and draft models share the same output paths, leading to unintended overwriting. This change separates the file outputs to prevent conflicts.

gemini-code-assist

Code Review

This pull request updates the model runner to support separate debug tensor dump directories for draft and target workers when the Eagle speculative algorithm is active. The review feedback suggests refactoring the implementation to reduce code duplication by determining the specific dump path before calling the hook registration function.

python/sglang/srt/model_executor/model_runner.py

…share the same output paths, leading to unintended overwriting. This change separates the file outputs to prevent conflicts. Specifically, it appends distinct subdirectories ("draft" and "target") to the configured dump path based on the worker role. This change only affects file output paths and does not impact model computation or performance.

yaya159456 · 2026-03-31T01:59:17Z

/tag-and-rerun-ci

yaya159456 · 2026-03-31T02:04:00Z

Hi, CI is currently blocked due to missing run-ci label.
Could someone help trigger it? Thanks!

yaya159456 · 2026-03-31T02:35:02Z

@google-gemini review

kpham-sgl · 2026-03-31T02:58:19Z

/tag-and-rerun-ci

kpham-sgl · 2026-03-31T02:59:09Z

python/sglang/srt/model_executor/model_runner.py

            f"mem usage={self.weight_load_mem_usage:.2f} GB."
        )
        if self.server_args.debug_tensor_dump_output_folder is not None:
+            dump_folder = self.server_args.debug_tensor_dump_output_folder


nit: possibly worth to document this behavior in self.server_args.debug_tensor_dump_output_folder help docstring

Good suggestion, thanks!
I've added clarification to the help docstring regarding the behavior in Eagle mode.
In addition, I submitted a PR to update the documentation in sgl-project.github.io to make this behavior more explicit:
sgl-project/sgl-project.github.io#26

yaya159456 · 2026-04-07T13:50:22Z

Hi, just a gentle follow-up on this PR 😊
It’s linked to the issue above. When you have a chance, could you please take a look?
Thanks a lot for your time and help!

kpham-sgl

LGTM. Can you wait for CI to pass before merging in @yaya159456 ?

yaya159456 · 2026-04-10T09:39:05Z

Thanks for the review! 🙏

I have a quick question — when you mentioned waiting for CI to pass, are those CI checks visible to me on this PR? I do see some failing checks, so I’m not sure if I’m expected to fix them, or if I should just wait for CI to complete and further reviews before merging.

Also, I’m not entirely sure if there’s anything else I should take care of before this PR can be closed.

Thanks for the clarification!

kpham-sgl · 2026-04-10T17:51:03Z

Thanks for the review! 🙏

I have a quick question — when you mentioned waiting for CI to pass, are those CI checks visible to me on this PR? I do see some failing checks, so I’m not sure if I’m expected to fix them, or if I should just wait for CI to complete and further reviews before merging.

Also, I’m not entirely sure if there’s anything else I should take care of before this PR can be closed.

Thanks for the clarification!

No action needed from your side. Just hang tight until the CI is fully green (we are working on it)

fix: resolve tensor file overwrite between target and draft models

2cb370c

In Eagle mode, tensor files generated by the target and draft models share the same output paths, leading to unintended overwriting. This change separates the file outputs to prevent conflicts.

yaya159456 requested review from Fridge003, Ying1123, hnyls2002, ispobock and merrymercy as code owners March 30, 2026 14:05

gemini-code-assist bot reviewed Mar 30, 2026

View reviewed changes

python/sglang/srt/model_executor/model_runner.py Outdated Show resolved Hide resolved

yaya159456 mentioned this pull request Mar 31, 2026

[Bug] In Eagle mode, tensor files generated by the target and draft models share the same output paths, leading to unintended overwriting. #21721

Open

5 tasks

chore: trigger CI rerun

761039d

yaya159456 force-pushed the fix_eagle_fileoverwrite branch from 7ccb83a to 761039d Compare March 31, 2026 02:40

github-actions bot added the run-ci label Mar 31, 2026

kpham-sgl approved these changes Mar 31, 2026

View reviewed changes

kpham-sgl self-assigned this Mar 31, 2026

docs: clarify tensor dump output behavior in Eagle mode

f5b3b6a

yaya159456 mentioned this pull request Mar 31, 2026

docs: clarify tensor dump output behavior in Eagle mode sgl-project/sgl-project.github.io#26

Open

yaya159456 requested a review from kpham-sgl March 31, 2026 03:54

yaya159456 mentioned this pull request Mar 31, 2026

docs: clarify tensor dump output behavior in Eagle mode sgl-project/sgl-project.github.io#27

Open

kpham-sgl approved these changes Apr 7, 2026

View reviewed changes

kpham-sgl added 2 commits April 7, 2026 11:34

Merge branch 'main' into fix_eagle_fileoverwrite

3c2829d

Merge branch 'main' into fix_eagle_fileoverwrite

a89b336

kpham-sgl added 2 commits April 10, 2026 10:51

Merge branch 'main' into fix_eagle_fileoverwrite

3a3807e

Merge branch 'main' into fix_eagle_fileoverwrite

36c862f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve tensor file overwrite between target and draft models#21694

fix: resolve tensor file overwrite between target and draft models#21694
yaya159456 wants to merge 8 commits intosgl-project:mainfrom
yaya159456:fix_eagle_fileoverwrite

yaya159456 commented Mar 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

yaya159456 commented Mar 31, 2026

Uh oh!

yaya159456 commented Mar 31, 2026

Uh oh!

yaya159456 commented Mar 31, 2026

Uh oh!

kpham-sgl commented Mar 31, 2026

Uh oh!

kpham-sgl Mar 31, 2026

Uh oh!

yaya159456 Mar 31, 2026

Uh oh!

yaya159456 commented Apr 7, 2026

Uh oh!

kpham-sgl left a comment

Uh oh!

yaya159456 commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yaya159456 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yaya159456 commented Mar 31, 2026

Uh oh!

yaya159456 commented Mar 31, 2026

Uh oh!

yaya159456 commented Mar 31, 2026

Uh oh!

kpham-sgl commented Mar 31, 2026

Uh oh!

kpham-sgl Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

yaya159456 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

yaya159456 commented Apr 7, 2026

Uh oh!

kpham-sgl left a comment

Choose a reason for hiding this comment

Uh oh!

yaya159456 commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yaya159456 commented Mar 30, 2026 •

edited

Loading