[diffusion][hot fix] fix torch.compile graph break caused by torch._dynamo.disable#18336
Merged
[diffusion][hot fix] fix torch.compile graph break caused by torch._dynamo.disable#18336
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Collaborator
Author
|
/tag-and-rerun-ci |
Collaborator
Collaborator
|
only diffusion is affected, bypassing |
Collaborator
|
brilliant! We should be more careful about performance improvements and regression, and build a more mature system and tool to track them automatically. For diffusion models, the thresholds in PR-test is loose (to make sure PR merge is not blocked), so this is an important issue that needs serious concern. cc @dougyster |
charlesHsuGG
pushed a commit
to charlesHsuGG/sglang
that referenced
this pull request
Feb 9, 2026
Johnsonms
pushed a commit
to Johnsonms/sglang
that referenced
this pull request
Feb 14, 2026
magicYang1573
pushed a commit
to magicYang1573/sglang
that referenced
this pull request
Mar 9, 2026
Wangzheee
pushed a commit
to Wangzheee/sglang
that referenced
this pull request
Mar 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Motivation
PR14717 decorated the jit kernel with
@torch._dynamo.disable, introducing a torch.compile graph break. This increases cpu overhead and, in certain models (e.g. QwenImage 263ms -> 287ms per step), leads to some gpu bubbles, as shown below.This PR fixes the issue by decorate the kernel with a
torch.library.custom_op, following the approach described in the pytorch documentation.after fix (this pr):

Benchmarking and Profiling)Modifications
Accuracy Tests
Compare image between this pr and 669a9bd (before 14717).
Qwen
669a9bd:
this pr:
hunyuan video
669a9bd:
this pr:
Wan-AI/Wan2.2-T2V-A14B-Diffusers
669a9bd:
this pr:
Benchmarking and Profiling
Benchmark on H200
QwenImage
Command
before regression (669a9bd):
after regression (4739f2e):
after fix (this pr):
Wan2.2
Command
before regression (669a9bd):
after regression (4739f2e, no regression)
after fix (this pr):
Wan-AI/Wan2.1-T2V-1.3B-Diffusers
Command
before regression (669a9bd):
after regression (4739f2e, no regression)
after fix (this pr):
hunyuan
Command
before regression (669a9bd):
after regression (4739f2e, no regression)
after fix (this pr):
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci