Reduce time it takes to import SGLang #12510

raayandhar · 2025-11-02T04:04:28Z

Motivation

We notice that the time to import things in SGLang takes a lot of time (#10492). I have been looking into what is taking up a lot of time and if there are simple ways to help reduce this import time. From the original issue, we want to reduce:

time python -c "from sglang.srt.managers.scheduler import Scheduler"

which is what I have been focusing my efforts on. However, I think there are things we can do to reduce time for other imports. This is more of a V1 to get community feedback from experts.

Modifications

There are some heavy imports. For example, the quantization methods import at the module level is heavy. Moving some imports to the function level (only time it is used), we can reduce module import time. However, I can see how this can easily be an antipattern. In fact, it can hurt performance if we have a function that is used a lot that we have an import in. I tried to only do this in functions that we only expect to run once or a small number of times. However, I can understand the argument against this kind of code. I also don't think all the changes to hf_transformer_utils.py help so I will be taken a deeper look, since the changes are a bit invasive.

Accuracy Tests

These changes should not affect model outputs.

Benchmarking and Profiling

Running for i in {1..100}; do (time python -B -c "from sglang.srt.managers.scheduler import Scheduler") 2>&1 | grep "^real"; done | python calc_avg.py (calc_avg.py)

With these changes:

===========Timing Statistics============                                                                            
Number of runs: 100                                                                                                 
Mean:     8.308s                                                                                                    
Median:   8.236s                                                                                                    
Std Dev:  0.297s                                                                                                    
Min:      7.999s                                                                                                    
Max:      9.329s                                                                                                    
========================================

Compared to top-of-main:

===========Timing Statistics============                                                                            
Number of runs: 100                                                                                                 
Mean:     9.836s                                                                                                    
Median:   9.790s                                                                                                    
Std Dev:  0.332s                                                                                                    
Min:      8.655s                                                                                                    
Max:      11.801s                                                                                                   
========================================

so we have ~1.5 second improvement. Not the best, so I am going to keep working on it. I mostly targeted improving the timing in the creation of ModelConfig object. The difference so far is largely from removing the quantization import:

Without these changes import_sglang_tom.log

With these changes import_sglang_new.log

Machine:

AMD EPYC 7343 16-Core Processor
L40S GPU

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests. (N/A?)
Update documentation according to Write documentations. (N/A?)
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed. -- will try to do this, although I am GPU poor.

raayandhar · 2025-11-02T04:04:45Z

I will continue working on this, there is more improvements to be made.

raayandhar · 2025-11-02T04:38:55Z

python/sglang/srt/utils/hf_transformers_utils.py

-for name, cls in _CONFIG_REGISTRY.items():
-    with contextlib.suppress(ValueError):
-        AutoConfig.register(name, cls)
+def _register_custom_configs():


I think that a lot of these changes in this file do not really improve the times, and they are overly invasive. However, the transformers related code is a huge time increase. I think they may be impossible to remove since we init many transformer related objects in the scheduler.

OK, so moving the configs into a function to load them later does massively improve the times. This, and also moving the _CUSTOMIZED_MM_PROCESSOR import drastically improves the import time.

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar · 2025-11-06T04:31:25Z

python/sglang/srt/disaggregation/decode.py


    def add(self, req: Req, is_retracted: bool = False) -> None:
        """Add a request to the pending queue."""
+        from sglang.srt.managers.schedule_batch import RequestStage


This is a very repetitive pattern, but the from sglang.srt.managers.schedule_batch import is somewhat expensive. The RequestStage object is just an enum, so we don't want to be doing the massive import for just an enum in my opinion. Maybe we can put this enum in a different file / some other solution that effectively does the same thing, so we can just import at the top level.

raayandhar · 2025-11-06T04:33:45Z

python/sglang/srt/utils/hf_transformers_utils.py

-for name, cls in _CONFIG_REGISTRY.items():
-    with contextlib.suppress(ValueError):
-        AutoConfig.register(name, cls)
+def _register_custom_configs():


OK, so moving the configs into a function to load them later does massively improve the times. This, and also moving the _CUSTOMIZED_MM_PROCESSOR import drastically improves the import time.

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar · 2025-11-06T06:34:13Z

python/sglang/srt/layers/moe/__init__.py

 )
+from sglang.utils import LazyImport
+
+MoeRunner = LazyImport("sglang.srt.layers.moe.moe_runner.runner", "MoeRunner")


I think this (LazyImport(...))is something we can apply to many other files as well, I'm not sure if there's a downside?

raayandhar · 2025-11-06T06:48:34Z

python/sglang/srt/managers/scheduler.py

        disable_overlap_schedule: bool,
        offload_tags: set[str],
    ):
+        from sglang.srt.layers.moe import TboDPAttentionPreparer


Maybe not worth it to lazy load this one here. It saves 600 ms in import time but this will get called a lot more in the event loop. Although I'm not sure because this may be cached.

raayandhar · 2025-11-06T07:06:48Z

Hi experts,

At this point, looking at the profiling, there's been some pretty good improvement in times. Looking at time python -X importtime -c "from sglang.srt.managers.scheduler import Scheduler" 2> import_sglang.log, we started at ~6000 ms overall, but now we are down to 4000 ms; see below:
Top-of-main, newly updated (import_sglang_tom.log):

This is the improved version, with my changes (import_sglang_improved.log)

I think it's best to just click the image and click again to see clearly. But in words we see an improvement of around 33%. In the improved version, most of what's left are transformers / torch imports that are basically unavoidable (without some extremely invasive changes). Otherwise a lot of the other imports have massively shrunk, i.e. model_config from 4300 ms to 550 ms, etc. You can see the logs for more details. I have some version that is super insanely optimized (just to see what's possible) that can improve it even further but the changes are really invasive and impractical.

The changes are largely just moving imports into functions so they are lazy-loaded, or moving imports to only run when we type check. Now as I've commented earlier, not all of the changes are very pretty. My rationale is the following-nearly all of the functions that I moved imports into are probably only going to run very intermittently, or even just once at object init (e.g. a lot of the functions in hf_transformers_utils.py are this way). Then, doing the lazy loading looks maybe a bit uglier but otherwise we reap good benefits for effectively no downside. I left some more comments with other thoughts of mine on how to best do this.

Also, this is largely specific to the import path and object in the issue (Scheduler). I think these changes should help other paths as well (i.e. the changes to hf_transformer_utils.py should be useful, among others). If there other paths that should be targeted let me know and I will work on it. Furthermore, I think there's a lot of free lunch for the two types of changes:

1. For imports that are only used as types, moving them under a if TYPE_CHECKING block seems to have no downside. A lot of code seems to have this but I guess some parts don't since I was able to find these changes for this path.
1. Using the LazyImport module when possible. This issue has been described before (Slow import #606), and this module is only used in sglang/__init__.py. It seems like there's no downside to using this (but I could be misunderstanding, please let me know), so we could be using it more broadly.

So these two changes could be used more broadly than just this path.

At this point I'm going to open to review. Not sure if this exactly tackles what the original issue was trying to get at, so appreciate any clarification on what direction this PR should go. Appreciate the time reviewers take to look at this PR!

raayandhar commented Nov 2, 2025

View reviewed changes

hnyls2002 self-assigned this Nov 3, 2025

raayandhar force-pushed the reduce-import-time branch from aff6929 to 9adc75a Compare November 4, 2025 07:31

raayandhar added 2 commits November 5, 2025 18:46

efforts done towards reducing import time

c55a6e3

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

remove comment

6ed8399

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar force-pushed the reduce-import-time branch from 9adc75a to 6ed8399 Compare November 6, 2025 02:47

raayandhar added 2 commits November 5, 2025 19:15

moving configs into _register() improves time

d976b8b

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

bigger improvements, reaching 7.5s-8s

7eeea38

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar commented Nov 6, 2025

View reviewed changes

more improvements, down to 3900 ms

193544d

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar commented Nov 6, 2025

View reviewed changes

raayandhar marked this pull request as ready for review November 6, 2025 07:09

raayandhar requested review from BBuf, ByronHsu, Edwardf0t1, HaiShaw, Ying1123, ch-wan, fzyzcjy, hnyls2002, ispobock, kushanam, merrymercy, xiezhq-hermann and zhyncs as code owners November 6, 2025 07:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce time it takes to import SGLang #12510

Reduce time it takes to import SGLang #12510

raayandhar commented Nov 2, 2025

Uh oh!

raayandhar commented Nov 2, 2025

Uh oh!

raayandhar Nov 2, 2025

Uh oh!

raayandhar Nov 6, 2025

Uh oh!

raayandhar Nov 6, 2025

Uh oh!

raayandhar Nov 6, 2025

Uh oh!

raayandhar Nov 6, 2025

Uh oh!

raayandhar Nov 6, 2025 •

edited

Loading

Uh oh!

raayandhar commented Nov 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Reduce time it takes to import SGLang #12510

Are you sure you want to change the base?

Reduce time it takes to import SGLang #12510

Conversation

raayandhar commented Nov 2, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

raayandhar commented Nov 2, 2025

Uh oh!

raayandhar Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

raayandhar Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

raayandhar Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

raayandhar Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

raayandhar Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

raayandhar Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raayandhar commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raayandhar Nov 6, 2025 •

edited

Loading

raayandhar commented Nov 6, 2025 •

edited

Loading