Skip to content

Conversation

@tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented Aug 7, 2025

Summary by CodeRabbit

  • New Features
    • Added SGLang backend support across profiler and deployments; selectable via --backend sglang.
    • Introduced SGLang-specific config handling, including model/port detection and TP size updates.
  • Improvements
    • Enhanced KV‑cache log parsing and per-deployment tracking for more accurate resource reporting.
  • Deployment
    • Updated SGLang manifests to support disaggregated prefill/decode workers and shell-based command execution.
    • Added compatibility mappings for SGLang worker component names.
  • Chores
    • Bumped SGLang runtime image to a newer version.
    • Updated profiling job to pass the selected backend.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

Walkthrough

Adds SGLang backend support across profiler tooling and deployment configs: introduces backend-specific worker names, per-deployment naming, dynamic log paths, an SGLang config modifier, CLI backend flag, shell command wrappers, image updates, and a Grove-disabling annotation in deployment creation.

Changes

Cohort / File(s) Summary
Profiler backend integration
benchmarks/profiler/profile_sla.py, benchmarks/profiler/utils/config.py, benchmarks/profiler/deploy/profile_sla_job.yaml
Adds --backend (supports vllm, sglang); introduces per-deployment names in DynamoDeploymentClient usage; dynamic decode log path via WORKER_COMPONENT_NAMES; new SGLangConfigModifier with convert/set_tp/getters/log parsing; job YAML passes --backend sglang.
Deployment client annotation
benchmarks/profiler/utils/dynamo_deployment.py
Sets metadata.annotations to disable Grove: nvidia.com/enable-grove: "false" during deployment creation.
Planner worker-name registry
components/planner/src/dynamo/planner/defaults.py
Adds SGLangComponentName; registers "sglang" in WORKER_COMPONENT_NAMES with prefill/decode worker K8s names and endpoints.
SGLang deploy specs (agg, router, disagg)
components/backends/sglang/deploy/agg.yaml, components/backends/sglang/deploy/agg_router.yaml, components/backends/sglang/deploy/disagg.yaml
Switches to explicit /bin/sh -c wrappers; splits/adjusts args for python module invocations; updates runtime image tag; defines disaggregation commands for prefill/decode workers (decode_worker, worker).

Sequence Diagram(s)

sequenceDiagram
  participant CLI as profile_sla.py
  participant Config as SGLangConfigModifier
  participant Planner as WORKER_COMPONENT_NAMES
  participant Deploy as DynamoDeploymentClient
  participant K8s as Kubernetes

  CLI->>Config: convert_config(target=prefill/decode)
  Config-->>CLI: transformed spec (sglang)
  CLI->>Deploy: create_deployment(deployment_name, spec, backend=sglang)
  Deploy->>K8s: Apply CR with annotations (disable Grove)
  K8s-->>Deploy: Deployment ready
  CLI->>Planner: resolve decode worker K8s name
  Planner-->>CLI: SGLangDecodeWorker
  CLI->>K8s: Read logs for KV cache (per-deployment path)
  CLI-->>CLI: Run SLA profiling with prefill/decode clients
  CLI->>Deploy: cleanup
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A rabbit taps keys with gentle might,
SGLang hops in, backend bright.
Pods awake with /bin/sh cheer,
Logs now named so each is clear.
Grove is hushed, KV in sight—
Benchmarks bound, we profile right.
Hop, deploy, and test take flight! 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10f4302 and f216393.

📒 Files selected for processing (8)
  • benchmarks/profiler/deploy/profile_sla_job.yaml (1 hunks)
  • benchmarks/profiler/profile_sla.py (8 hunks)
  • benchmarks/profiler/utils/config.py (3 hunks)
  • benchmarks/profiler/utils/dynamo_deployment.py (1 hunks)
  • components/backends/sglang/deploy/agg.yaml (1 hunks)
  • components/backends/sglang/deploy/agg_router.yaml (1 hunks)
  • components/backends/sglang/deploy/disagg.yaml (3 hunks)
  • components/planner/src/dynamo/planner/defaults.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:00:07.968Z
Learnt from: biswapanda
PR: ai-dynamo/dynamo#2137
File: components/backends/sglang/deploy/agg_router.yaml:0-0
Timestamp: 2025-07-28T17:00:07.968Z
Learning: In components/backends/sglang/deploy/agg_router.yaml, the clear_namespace command is intentionally designed to block the router from starting if it fails (using &&). This is a deliberate design decision where namespace clearing is a critical prerequisite and the router should not start with an uncleared namespace.

Applied to files:

  • components/backends/sglang/deploy/agg_router.yaml
🔇 Additional comments (12)
components/backends/sglang/deploy/agg_router.yaml (1)

92-94: Consistent shell wrapper pattern applied

The addition of the shell wrapper for SGLangDecodeWorker follows the same pattern as the Frontend container, ensuring consistency across components.

components/backends/sglang/deploy/agg.yaml (1)

92-94: Shell wrapper consistently applied

The shell wrapper addition maintains consistency with the Frontend container and other SGLang deployment configurations.

components/planner/src/dynamo/planner/defaults.py (1)

85-92: Verify component naming asymmetry is intentional

The SGLangComponentName defines prefill_worker_component_name = "worker" and decode_worker_component_name = "decode". This differs from the VllmComponentName pattern where prefill uses "prefill" and decode uses "backend".

Please confirm this asymmetric naming (worker/decode) aligns with SGLang's architectural design and component naming conventions.

benchmarks/profiler/deploy/profile_sla_job.yaml (1)

37-38: LGTM! Clean backend specification for SGLang.

The hardcoded --backend sglang parameter correctly enables SGLang backend selection for this specific job configuration. The placement maintains logical argument ordering.

benchmarks/profiler/profile_sla.py (8)

24-24: LGTM! Essential import for backend-specific worker name resolution.

The import of WORKER_COMPONENT_NAMES enables dynamic log path construction based on the selected backend, which is crucial for SGLang support.


145-145: LGTM! Per-deployment naming for prefill configuration.

Adding deployment_name parameter enables proper per-deployment resource tracking and cleanup, derived appropriately from the config metadata.


253-253: LGTM! Consistent per-deployment naming for decode configuration.

The deployment_name parameter maintains consistency with the prefill configuration approach.


268-268: Excellent dynamic log path construction for backend compatibility.

The dynamic path construction using WORKER_COMPONENT_NAMES[args.backend].decode_worker_k8s_name.lower() elegantly handles backend-specific worker naming differences between vLLM and SGLang. This replaces hardcoded paths with flexible, backend-aware resolution.


403-403: LGTM! Per-deployment naming for selected prefill interpolation.

Consistent application of the deployment_name pattern for interpolation profiling.


495-498: Clean parameter organization for decode interpolation.

The parameter ordering maintains consistency with other DynamoDeploymentClient initializations while adding the required deployment_name.


513-513: LGTM! Consistent dynamic log path for interpolation phase.

The dynamic log path construction maintains consistency with the earlier decode profiling phase, ensuring proper backend-specific worker log access.


598-599: LGTM! CLI backend support expansion.

The addition of "sglang" to the choices list with updated help text properly extends the CLI interface to support the new backend option.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Hongkuan Zhou <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Hongkuan Zhou <[email protected]>
@tedzhouhk tedzhouhk merged commit b4189c6 into main Aug 12, 2025
14 of 15 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/sglang-sweep branch August 12, 2025 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants