[router] PD Router Simplification and Reorganization#8838
Merged
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
zhyncs
approved these changes
Aug 6, 2025
0cd8b5f to
25fddd6
Compare
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
6 tasks
narutolhy
pushed a commit
to narutolhy/sglang
that referenced
this pull request
Aug 17, 2025
MahmoudAshraf97
pushed a commit
to MahmoudAshraf97/sglang
that referenced
this pull request
Sep 8, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The PD (Prefill-Decode) router in SGLang's Rust router implementation suffered from significant architectural complexity that made it difficult to maintain and extend. The existing implementation used a complex multi-layer type conversion system (OpenAI → PD types → JSON) with redundant wrapper methods, resulting in over 1800 lines of unnecessary code and manual field mapping that was error-prone and required updates in multiple places for new fields.
Key problems addressed:
request_adapter.rsimplblocks with inconsistent organizationModifications
This PR implements simplification and reorganization of the PD router architecture:
1. Direct JSON Manipulation (Simplification)
bootstrap_injector.rswith intelligent batch detection and field injectionToPdRequesttrait and adapter pattern entirely2. Direct Implementation Architecture (Reorganization)
3. Code Reduction
request_adapter.rs: 1400+ lines → 0 lines (100% reduction)pd_types.rs: 500 lines → 60 lines (88% reduction)pd_router.rs: Removed 370+ lines of wrapper methods and duplicates4. Files Modified
src/routers/bootstrap_injector.rs(direct JSON field injection)src/routers/request_adapter.rs(removed adapter layer)src/routers/pd_types.rs(removed intermediate types)src/routers/pd_router.rs(direct trait implementations)Benchmark & Profiling
Before
Benchmark:
After
Benchmark:
Performance improvements achieved through architectural simplification:
Benchmark Validation
Two llama3.3 70B on one H100 node
Checklist