feat: support SGLang in pre-deployment sweeping #2360

tedzhouhk · 2025-08-07T22:52:34Z

Summary by CodeRabbit

New Features
- Added SGLang backend support across profiler and deployments; selectable via --backend sglang.
- Introduced SGLang-specific config handling, including model/port detection and TP size updates.
Improvements
- Enhanced KV‑cache log parsing and per-deployment tracking for more accurate resource reporting.
Deployment
- Updated SGLang manifests to support disaggregated prefill/decode workers and shell-based command execution.
- Added compatibility mappings for SGLang worker component names.
Chores
- Bumped SGLang runtime image to a newer version.
- Updated profiling job to pass the selected backend.

copy-pr-bot · 2025-08-07T22:52:37Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-11T17:57:16Z

Walkthrough

Adds SGLang backend support across profiler tooling and deployment configs: introduces backend-specific worker names, per-deployment naming, dynamic log paths, an SGLang config modifier, CLI backend flag, shell command wrappers, image updates, and a Grove-disabling annotation in deployment creation.

Changes

Cohort / File(s)	Summary
Profiler backend integration `benchmarks/profiler/profile_sla.py`, `benchmarks/profiler/utils/config.py`, `benchmarks/profiler/deploy/profile_sla_job.yaml`	Adds --backend (supports vllm, sglang); introduces per-deployment names in DynamoDeploymentClient usage; dynamic decode log path via WORKER_COMPONENT_NAMES; new SGLangConfigModifier with convert/set_tp/getters/log parsing; job YAML passes --backend sglang.
Deployment client annotation `benchmarks/profiler/utils/dynamo_deployment.py`	Sets metadata.annotations to disable Grove: nvidia.com/enable-grove: "false" during deployment creation.
Planner worker-name registry `components/planner/src/dynamo/planner/defaults.py`	Adds SGLangComponentName; registers "sglang" in WORKER_COMPONENT_NAMES with prefill/decode worker K8s names and endpoints.
SGLang deploy specs (agg, router, disagg) `components/backends/sglang/deploy/agg.yaml`, `components/backends/sglang/deploy/agg_router.yaml`, `components/backends/sglang/deploy/disagg.yaml`	Switches to explicit /bin/sh -c wrappers; splits/adjusts args for python module invocations; updates runtime image tag; defines disaggregation commands for prefill/decode workers (decode_worker, worker).

Sequence Diagram(s)

sequenceDiagram
  participant CLI as profile_sla.py
  participant Config as SGLangConfigModifier
  participant Planner as WORKER_COMPONENT_NAMES
  participant Deploy as DynamoDeploymentClient
  participant K8s as Kubernetes

  CLI->>Config: convert_config(target=prefill/decode)
  Config-->>CLI: transformed spec (sglang)
  CLI->>Deploy: create_deployment(deployment_name, spec, backend=sglang)
  Deploy->>K8s: Apply CR with annotations (disable Grove)
  K8s-->>Deploy: Deployment ready
  CLI->>Planner: resolve decode worker K8s name
  Planner-->>CLI: SGLangDecodeWorker
  CLI->>K8s: Read logs for KV cache (per-deployment path)
  CLI-->>CLI: Run SLA profiling with prefill/decode clients
  CLI->>Deploy: cleanup

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

refactor: break profile_sla into different files; feat: support vllm_v1 #1588: Parallel changes to profiler plumbing and planner defaults adding backend mappings and CONFIG_MODIFIERS.
fix(k8s): sglang disagg now uses decode worker #2206: Adjusts SGLang decode invocation to dynamo.sglang.decode_worker in disagg.yaml, matching this PR’s changes.
feat: add sglang disagg deployment examples #2137: Updates SGLang deployment YAMLs (agg, agg_router, disagg) and worker naming similar to this PR.

Poem

A rabbit taps keys with gentle might,
SGLang hops in, backend bright.
Pods awake with /bin/sh cheer,
Logs now named so each is clear.
Grove is hushed, KV in sight—
Benchmarks bound, we profile right.
Hop, deploy, and test take flight! 🐇✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10f4302 and f216393.

📒 Files selected for processing (8)

benchmarks/profiler/deploy/profile_sla_job.yaml (1 hunks)
benchmarks/profiler/profile_sla.py (8 hunks)
benchmarks/profiler/utils/config.py (3 hunks)
benchmarks/profiler/utils/dynamo_deployment.py (1 hunks)
components/backends/sglang/deploy/agg.yaml (1 hunks)
components/backends/sglang/deploy/agg_router.yaml (1 hunks)
components/backends/sglang/deploy/disagg.yaml (3 hunks)
components/planner/src/dynamo/planner/defaults.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-07-28T17:00:07.968Z

Learnt from: biswapanda
PR: ai-dynamo/dynamo#2137
File: components/backends/sglang/deploy/agg_router.yaml:0-0
Timestamp: 2025-07-28T17:00:07.968Z
Learning: In components/backends/sglang/deploy/agg_router.yaml, the clear_namespace command is intentionally designed to block the router from starting if it fails (using &&). This is a deliberate design decision where namespace clearing is a critical prerequisite and the router should not start with an uncleared namespace.

Applied to files:

components/backends/sglang/deploy/agg_router.yaml

🔇 Additional comments (12)

components/backends/sglang/deploy/agg_router.yaml (1)

92-94: Consistent shell wrapper pattern applied

The addition of the shell wrapper for SGLangDecodeWorker follows the same pattern as the Frontend container, ensuring consistency across components.

components/backends/sglang/deploy/agg.yaml (1)

92-94: Shell wrapper consistently applied

The shell wrapper addition maintains consistency with the Frontend container and other SGLang deployment configurations.

components/planner/src/dynamo/planner/defaults.py (1)

85-92: Verify component naming asymmetry is intentional

The SGLangComponentName defines prefill_worker_component_name = "worker" and decode_worker_component_name = "decode". This differs from the VllmComponentName pattern where prefill uses "prefill" and decode uses "backend".

Please confirm this asymmetric naming (worker/decode) aligns with SGLang's architectural design and component naming conventions.

benchmarks/profiler/deploy/profile_sla_job.yaml (1)

37-38: LGTM! Clean backend specification for SGLang.

The hardcoded --backend sglang parameter correctly enables SGLang backend selection for this specific job configuration. The placement maintains logical argument ordering.

benchmarks/profiler/profile_sla.py (8)

24-24: LGTM! Essential import for backend-specific worker name resolution.

The import of WORKER_COMPONENT_NAMES enables dynamic log path construction based on the selected backend, which is crucial for SGLang support.

145-145: LGTM! Per-deployment naming for prefill configuration.

Adding deployment_name parameter enables proper per-deployment resource tracking and cleanup, derived appropriately from the config metadata.

253-253: LGTM! Consistent per-deployment naming for decode configuration.

The deployment_name parameter maintains consistency with the prefill configuration approach.

268-268: Excellent dynamic log path construction for backend compatibility.

The dynamic path construction using WORKER_COMPONENT_NAMES[args.backend].decode_worker_k8s_name.lower() elegantly handles backend-specific worker naming differences between vLLM and SGLang. This replaces hardcoded paths with flexible, backend-aware resolution.

403-403: LGTM! Per-deployment naming for selected prefill interpolation.

Consistent application of the deployment_name pattern for interpolation profiling.

495-498: Clean parameter organization for decode interpolation.

The parameter ordering maintains consistency with other DynamoDeploymentClient initializations while adding the required deployment_name.

513-513: LGTM! Consistent dynamic log path for interpolation phase.

The dynamic log path construction maintains consistency with the earlier decode profiling phase, ensuring proper backend-specific worker log access.

598-599: LGTM! CLI backend support expansion.

The addition of "sglang" to the choices list with updated help text properly extends the CLI interface to support the new backend option.

benchmarks/profiler/utils/config.py

benchmarks/profiler/utils/dynamo_deployment.py

components/backends/sglang/deploy/disagg.yaml

benchmarks/profiler/deploy/profile_sla_job.yaml

components/backends/sglang/deploy/disagg.yaml

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <[email protected]>

…/sglang-sweep

into hzhou/sglang-sweep

pull-request-size bot added the size/L label Aug 7, 2025

github-actions bot added the feat label Aug 7, 2025

tedzhouhk marked this pull request as ready for review August 11, 2025 17:48

tedzhouhk requested review from a team, GuanLuo, PeaBrane, alec-flowers, biswapanda, grahamking, ishandhanani, jthomson04, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25 and tmonty12 as code owners August 11, 2025 17:48

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

hhzhang16 reviewed Aug 11, 2025

View reviewed changes

benchmarks/profiler/deploy/profile_sla_job.yaml Outdated Show resolved Hide resolved

hhzhang16 reviewed Aug 11, 2025

View reviewed changes

components/backends/sglang/deploy/disagg.yaml Show resolved Hide resolved

tedzhouhk force-pushed the hzhou/sglang-sweep branch from 384e3c1 to b8ae707 Compare August 11, 2025 20:23

tedzhouhk added 4 commits August 11, 2025 13:24

stage

953deeb

update config

ca34d5f

update config

c1aa62a

bug fix

6bf85e7

tedzhouhk added 3 commits August 11, 2025 13:24

bug fix

c8f6da5

update doc

96c301b

unique index

f41fdae

tedzhouhk force-pushed the hzhou/sglang-sweep branch from b8ae707 to f41fdae Compare August 11, 2025 20:24

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 20:25 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 20:28 Inactive

Update benchmarks/profiler/utils/dynamo_deployment.py

0ea8ff8

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 20:38 Inactive

Update benchmarks/profiler/utils/config.py

65dc113

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 20:38 Inactive

pc

322d996

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 20:42 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 20:45 Inactive

Merge branch 'main' into hzhou/sglang-sweep

ba0ae8c

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 21:01 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 21:07 Inactive

hhzhang16 approved these changes Aug 11, 2025

View reviewed changes

ishandhanani approved these changes Aug 11, 2025

View reviewed changes

ishandhanani mentioned this pull request Aug 11, 2025

[SGLANG]: Backend Roadmap/Improvements #2261

Open

19 tasks

tedzhouhk added 2 commits August 11, 2025 16:07

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

831dc3b

…/sglang-sweep

Merge branch 'hzhou/sglang-sweep' of https://github.com/ai-dynamo/dynamo

aab839a

into hzhou/sglang-sweep

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 23:07 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 23:08 Inactive

tedzhouhk merged commit b4189c6 into main Aug 12, 2025
14 of 15 checks passed

tedzhouhk deleted the hzhou/sglang-sweep branch August 12, 2025 16:57

This was referenced Aug 15, 2025

fix: fix examples #2459

Merged

fix: Standardize namespace environment variable to DYN_NAMESPACE #2485

Closed

feat: use consistent small models across all deploy examples #2573

Merged

coderabbitai bot mentioned this pull request Sep 4, 2025

test: add test for pre-deployment script #2857

Merged

coderabbitai bot mentioned this pull request Sep 23, 2025

feat: support MoE model in SLA Planner Sglang #3185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support SGLang in pre-deployment sweeping #2360

feat: support SGLang in pre-deployment sweeping #2360

Uh oh!

tedzhouhk commented Aug 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 7, 2025

Uh oh!

coderabbitai bot commented Aug 11, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: support SGLang in pre-deployment sweeping #2360

feat: support SGLang in pre-deployment sweeping #2360

Uh oh!

Conversation

tedzhouhk commented Aug 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 7, 2025

Uh oh!

coderabbitai bot commented Aug 11, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tedzhouhk commented Aug 7, 2025 •

edited by coderabbitai bot

Loading