Skip to content

Add sglang_router support for expert distribution recording#43

Closed
ishandhanani wants to merge 1 commit intomainfrom
ishan/rcrd
Closed

Add sglang_router support for expert distribution recording#43
ishandhanani wants to merge 1 commit intomainfrom
ishan/rcrd

Conversation

@ishandhanani
Copy link
Copy Markdown
Owner

Summary

This PR adds support for using sglang_router.launch_router instead of the dynamo stack (NATS/ETCD/dynamo.frontend). This is the foundation for enabling expert distribution recording functionality.

Key Changes

1. Configuration Layer

  • Added use_sglang_router field to BackendConfig schema
  • Users can now set use_sglang_router: true in their YAML configs

2. Worker Command Building

  • Updated command.py to use sglang.launch_server when either use_profiling or use_sglang_router is enabled
  • Workers properly configured with --port 30001 and disaggregation settings

3. Router Infrastructure

  • New setup_sglang_router() function in infrastructure.py
  • Supports multiple prefill and decode worker groups
  • Constructs router command with multiple --prefill and --decode endpoints
  • Router runs on port 8000 (compatible with existing nginx setup)

4. SLURM Template Updates

  • Modified disaggregated job template to conditionally launch router
  • Automatically collects all prefill/decode worker leader IPs
  • Skips NATS/ETCD/dynamo.frontend when using sglang_router

5. CLI Integration

  • Added router worker type to worker_setup.py
  • Added --use-sglang-router flag support throughout the stack
  • Proper validation for router-specific arguments

6. Example Configuration

  • Created examples/sglang-router-example.yaml with complete working example

Architecture

When use_sglang_router: true:

  • Router launched on first prefill node (port 8000)
  • Router connects to all prefill workers (port 30000, bootstrap: 30001)
  • Router connects to all decode workers (port 30001)
  • Workers use sglang.launch_server instead of dynamo.sglang
  • No NATS/ETCD/dynamo.frontend overhead

Usage

backend:
  use_sglang_router: true
  sglang_config:
    prefill:
      disaggregation_mode: "prefill"
      disaggregation_bootstrap_port: 30001
      # ... other config
    decode:
      disaggregation_mode: "decode"
      disaggregation_bootstrap_port: 30001
      # ... other config

Testing

  • Schema validation works correctly
  • Template generation includes proper router setup
  • CLI arguments properly propagated through the stack

Next Steps

This PR lays the foundation for Phase 2: expert distribution recording, which will add:

  • enable_expert_recording configuration option
  • HTTP API triggers for recording (start/dump)
  • Expert distribution analysis tools

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

@weireweire
Copy link
Copy Markdown
Collaborator

2025-12-09 23:14:14| root: Etcd not ready yet, retrying in 2 seconds... (attempt 47/1000)
seems not working for me. and we'd better rebase to main

@ishandhanani
Copy link
Copy Markdown
Owner Author

#55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants