[router] PD Router Simplification and Reorganization by slin1237 · Pull Request #8838 · sgl-project/sglang

slin1237 · 2025-08-06T01:39:30Z

Motivation

The PD (Prefill-Decode) router in SGLang's Rust router implementation suffered from significant architectural complexity that made it difficult to maintain and extend. The existing implementation used a complex multi-layer type conversion system (OpenAI → PD types → JSON) with redundant wrapper methods, resulting in over 1800 lines of unnecessary code and manual field mapping that was error-prone and required updates in multiple places for new fields.

Key problems addressed:

Complex adapter layer: 1400+ lines of manual field mapping in request_adapter.rs
Dual type system: Maintaining both OpenAI and PD type definitions
SGLang extension pass-through issues: New fields required manual updates in 3+ places
Redundant wrapper methods: RouterTrait methods just calling PDRouter methods with no added value
Code organization: 5 scattered impl blocks with inconsistent organization

Modifications

This PR implements simplification and reorganization of the PD router architecture:

1. Direct JSON Manipulation (Simplification)

New approach: OpenAI Request → JSON → Bootstrap Injection (eliminates intermediate type conversions)
Created: bootstrap_injector.rs with intelligent batch detection and field injection
Eliminated: Complex ToPdRequest trait and adapter pattern entirely
Result: All OpenAI and SGLang fields automatically preserved with zero manual mapping

2. Direct Implementation Architecture (Reorganization)

Eliminated: All wrapper methods where RouterTrait just called PDRouter methods
Moved: All routing logic directly into RouterTrait implementations (no indirection)
Reorganized: 5 scattered impl blocks → 3 well-organized blocks by functionality

3. Code Reduction

request_adapter.rs: 1400+ lines → 0 lines (100% reduction)
pd_types.rs: 500 lines → 60 lines (88% reduction)
pd_router.rs: Removed 370+ lines of wrapper methods and duplicates
Total: 1800+ lines eliminated while maintaining 100% functionality

4. Files Modified

Created: src/routers/bootstrap_injector.rs (direct JSON field injection)
Simplified: src/routers/request_adapter.rs (removed adapter layer)
Cleaned: src/routers/pd_types.rs (removed intermediate types)
Reorganized: src/routers/pd_router.rs (direct trait implementations)

Benchmark & Profiling

Before

OpenAI Request (GenerateRequest/ChatCompletionRequest/CompletionRequest)
    ↓ .to_pd_request() [ToPdRequest trait]
PD-Specific Types (GenerateReqInput/ChatReqInput) 
    ↓ .add_bootstrap_info() [Bootstrap trait]
Add bootstrap fields (host/port/room)
    ↓ serde_json::to_value()
JSON for backend

Benchmark:

SGLang Router Performance Benchmark Suite
=============================================

Quick Performance Overview:
  * Serialization (avg):          650 ns/req
  * Deserialization (avg):        912 ns/req
  * PD Adaptation (avg):         1053 ns/req
  * Total Pipeline (avg):        2615 ns/req

After

OpenAI Request (GenerateRequest/ChatCompletionRequest/CompletionRequest)
    ↓ serde_json::to_value()
JSON (with all fields preserved)
    ↓ inject_bootstrap_fields()
JSON with bootstrap fields added

Benchmark:

SGLang Router Performance Benchmark Suite
=============================================

Quick Performance Overview:
  * Serialization (avg):          484 ns/req
  * Deserialization (avg):        533 ns/req
  * Bootstrap Injection (avg):   1061 ns/req
  * Total Pipeline (avg):        2078 ns/req

Performance improvements achieved through architectural simplification:

Eliminated one type conversion step per request (OpenAI → PD types)
Reduced memory allocations from intermediate type creation
Removed function call overhead from wrapper methods
Direct JSON manipulation is more efficient than struct conversions

Benchmark Validation

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    24.0      
Max request concurrency:                 not set   
Successful requests:                     256       
Benchmark duration (s):                  91.97     
Total input tokens:                      1495240   
Total generated tokens:                  1229      
Total generated tokens (retokenized):    249       
Request throughput (req/s):              2.78      
Input token throughput (tok/s):          16258.14  
Output token throughput (tok/s):         13.36     
Total token throughput (tok/s):          16271.50  
Concurrency:                             126.62    
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   45488.91  
Median E2E Latency (ms):                 47299.31  
---------------Time to First Token----------------
Mean TTFT (ms):                          44253.95  
Median TTFT (ms):                        46615.94  
P99 TTFT (ms):                           80582.18  
---------------Inter-Token Latency----------------
Mean ITL (ms):                           0.00      
Median ITL (ms):                         0.00      
P95 ITL (ms):                            0.00      
P99 ITL (ms):                            0.00      
Max ITL (ms):                            0.00      
==================================================

Two llama3.3 70B on one H100 node

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
- Added 10 comprehensive bootstrap injection tests
- Maintained all existing test coverage (211 tests passing)
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
- Updated inline documentation for new architecture
- Comprehensive planning documents included
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
- Performance maintained with architectural improvements
- All integration tests verify identical behavior
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist · 2025-08-06T01:39:34Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2025-08-06T04:20:40Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

slin1237 requested a review from ByronHsu as a code owner August 6, 2025 01:39

zhyncs approved these changes Aug 6, 2025

View reviewed changes

[router] simplify pd router

25fddd6

slin1237 force-pushed the slin/pd-simplication branch from 0cd8b5f to 25fddd6 Compare August 6, 2025 02:19

slin1237 merged commit 8c7bb39 into main Aug 6, 2025
24 checks passed

slin1237 deleted the slin/pd-simplication branch August 6, 2025 04:20

slin1237 mentioned this pull request Aug 6, 2025

[router] upgrade router version to 0.1.9 #8844

Merged

6 tasks

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025

[router] PD Router Simplification and Reorganization (sgl-project#8838)

22c4127

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025

[router] PD Router Simplification and Reorganization (sgl-project#8838)

9b014ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[router] PD Router Simplification and Reorganization#8838

[router] PD Router Simplification and Reorganization#8838
slin1237 merged 1 commit intomainfrom
slin/pd-simplication

slin1237 commented Aug 6, 2025

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

slin1237 commented Aug 6, 2025

Motivation

Modifications

1. Direct JSON Manipulation (Simplification)

2. Direct Implementation Architecture (Reorganization)

3. Code Reduction

4. Files Modified

Benchmark & Profiling

Before

After

Benchmark Validation

Checklist

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants