[draft]support sglang native disagg and profiling. by weireweire · Pull Request #55 · ishandhanani/srt-slurm

weireweire · 2025-12-11T09:00:24Z

enable by

backend:
  use_sglang_router: true
profiling:
  type: "nsys"
  isl: 128000
  osl: 8000
  concurrency: 256
  start_step: 10
  stop_step: 30

Summary by CodeRabbit

New Features
- Optional sglang router deployment path for disaggregated serving
- Unified profiling orchestration across leader/worker nodes with per-IP start signaling
- Profiler activities extended to include memory (MEM)
Improvements
- Simplified, unified profiling configuration model
- Environment-driven multi-worker readiness with local fallback
- Conditional multi-frontend behavior and router-mode gating for setup and sequencing

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-11T09:00:40Z

Walkthrough

Adds optional sglang router support and threads a new use_sglang_router flag through job templates, backend, and worker setup. Profiling flow is unified and orchestrated across leader IPs, profiling waits and endpoints are made environment-driven, and worker setup skips ETCD/frontend bootstrap when router mode is enabled.

Changes

Cohort / File(s)	Summary
Profiling Orchestration `scripts/profiling/profile.sh`	Read head_node/head_port from env, add PREFILL_IPS/DECODE_IPS-driven waits, wait_all_workers_ready helper, per-IP start_profile orchestration, MEM added to ACTIVITIES, gate bench_serving/lm_eval to PROFILING_MODE=="prefill", fallback wait to 127.0.0.1:30000 for compatibility.
Job Template & Router Launch `scripts/templates/job_script_template_disagg.j2`	Add `use_sglang_router` conditional paths: launch sglang router, collect leader IPs for prefill/decode, build ROUTER_ARGS, add `--use-sglang-router` to worker args, unify profiling orchestration across leaders, adjust frontend/multi-frontend gating.
Worker Setup CLI `scripts/worker_setup.py`	Add `--use-sglang-router` CLI flag and thread it to setup_prefill_worker, setup_decode_worker, setup_aggregated_worker calls.
Worker Setup Implementation `scripts/worker_setup/worker.py`	Add `use_sglang_router: bool = False` param on prefill/decode/aggregated setup functions; skip ETCD/frontend waits when true; add hook to run setup script after wheel install; preserve SGLang command invocation.
Worker Command Construction `scripts/worker_setup/command.py`	Remove profiling-time guard that omitted `disaggregation-mode`; now emit disaggregation-mode flag when present even during profiling.
Backend SLURM & Templating `src/srtctl/backends/sglang.py`	Always emit disaggregation-mode flag, use unified profiling_cfg for prefill/decode envs, expose `use_sglang_router` template variable, minor formatting adjustments.
Schema Changes `src/srtctl/core/schema.py`	Rename ProfilingPhaseConfig → ProfilingConfig, add `type: ProfilingType`, collapse per-phase prefill/decode into single ProfilingConfig, add `BackendConfig.use_sglang_router: bool = False`.

Sequence Diagram(s)

sequenceDiagram
    participant Orchestrator as Profiling Orchestrator (profile.sh)
    participant Router as sglang Router (optional)
    participant Worker as Worker (prefill/decode nodes)
    participant ETCD as ETCD/NATS

    Note over Orchestrator,Worker: Start profiling run
    Orchestrator->>Router: (if use_sglang_router) launch router with ROUTER_ARGS
    Orchestrator->>Worker: collect PREFILL_IPS / DECODE_IPS
    Orchestrator->>Worker: wait_all_workers_ready (port 30000)
    alt router mode
        Worker-->>ETCD: skip ETCD/frontend bootstrap (use_sglang_router=true)
    else traditional mode
        Worker->>ETCD: wait for ETCD/frontend readiness
    end
    Orchestrator->>Worker: start_profile_on_worker (per-IP start_profile)
    Orchestrator->>Orchestrator: run bench_serving / lm_eval (only if PROFILING_MODE=="prefill")
    Worker->>Orchestrator: profile logs / status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Files touching control flow across templates, backend, and scripts require cross-file consistency checks.
Pay special attention to:
- scripts/profiling/profile.sh — new multi-IP waits and orchestration logic.
- scripts/templates/job_script_template_disagg.j2 — router launch and ROUTER_ARGS correctness.
- scripts/worker_setup/worker.py & command.py — ETCD wait gating and flag propagation.
- src/srtctl/core/schema.py — schema rename and structure change for profiling (backwards compatibility).

Possibly related PRs

add nsys profiling and swap to bench_serving instead of bench_one_batch #41 — Overlaps profiling subsystem changes (profile scripts, job templates, worker_setup plumbing, and backend profiling config).

Poem

A rabbit scurries, flags in tow, 🐇
Routers hum and profilers grow,
Workers skip the waiting dance,
Leaders join the profiling trance,
Logs bloom bright where endpoints flow.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[draft]support sglang native disagg and profiling' directly aligns with the main changes: introducing sglang router support and profiling orchestration across multiple files.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

scripts/worker_setup/worker.py (2)
109-122: Frontend may start unnecessarily when router is enabled.

The frontend startup block (lines 110-122) only checks need_frontend which doesn't account for use_sglang_router. When the router is enabled, the frontend shouldn't be started since the router handles traffic routing. Consider gating this block:
-    if need_frontend:
+    if need_frontend and not use_sglang_router:
         logging.info("Starting frontend in traditional mode (after dynamo installation)")
220-233: Frontend may start unnecessarily in aggregated worker when router is enabled.

Similar to the prefill worker, the frontend startup block doesn't check use_sglang_router. Apply the same fix:
-    if need_frontend:
+    if need_frontend and not use_sglang_router:
         logging.info("Starting frontend in traditional mode (after dynamo installation)")

🧹 Nitpick comments (1)

scripts/profiling/profile.sh (1)
99-118: Consider checking intermediate exit codes.

If bench_serving fails but lm_eval succeeds, the final exit_code will be 0 (success). If bench_serving results are important, consider capturing and checking its exit code:
     python3 -m sglang.bench_serving \
     ...
     --warmup-request 5
+    bench_exit=$?
+    if [ $bench_exit -ne 0 ]; then
+        echo "Warning: bench_serving failed with exit code $bench_exit"
+    fi

     pip install lm-eval tenacity > /dev/null

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 286427c and 3d4449b.

📒 Files selected for processing (7)

scripts/profiling/profile.sh (3 hunks)
scripts/templates/job_script_template_disagg.j2 (7 hunks)
scripts/worker_setup.py (4 hunks)
scripts/worker_setup/command.py (0 hunks)
scripts/worker_setup/worker.py (3 hunks)
src/srtctl/backends/sglang.py (4 hunks)
src/srtctl/core/schema.py (2 hunks)

💤 Files with no reviewable changes (1)

scripts/worker_setup/command.py

🧰 Additional context used

🧬 Code graph analysis (2)

scripts/worker_setup/worker.py (2)

scripts/worker_setup/infrastructure.py (1)

setup_head_prefill_node (14-32)

scripts/worker_setup/utils.py (1)

wait_for_etcd (53-70)

src/srtctl/backends/sglang.py (1)

tests/test_command_generation.py (1)

test_profiling_mode (232-268)

🪛 Ruff (0.14.8)

scripts/worker_setup/worker.py

97-97: Avoid specifying long messages outside the exception class

(TRY003)

101-101: Avoid specifying long messages outside the exception class

(TRY003)

160-160: Avoid specifying long messages outside the exception class

(TRY003)

208-208: Avoid specifying long messages outside the exception class

(TRY003)

212-212: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (13)

scripts/worker_setup.py (2)

98-102: LGTM!

The new --use-sglang-router CLI argument is properly defined with store_true action, consistent with the existing --multiple-frontends-enabled flag pattern.

184-227: LGTM!

The use_sglang_router argument is correctly propagated to all three worker setup functions, maintaining consistency across prefill, decode, and aggregated worker paths.

src/srtctl/core/schema.py (2)

173-183: LGTM!

The simplified ProfilingConfig with unified fields for both prefill and decode aligns well with the PD disaggregation model where a single profiling specification drives both sides.

237-242: LGTM!

The use_sglang_router option is appropriately added to BackendConfig with a sensible default of False for backward compatibility. The updated comment clearly explains its purpose.

src/srtctl/backends/sglang.py (2)

299-302: LGTM!

Using the same profiling_cfg for both prefill_profile_env and decode_profile_env is consistent with the unified ProfilingConfig schema change, where a single profiling specification drives both sides in PD disaggregation mode.

327-327: LGTM!

The use_sglang_router template variable is correctly exposed with a sensible default of False, enabling the template to conditionally configure router-based deployments.

scripts/worker_setup/worker.py (2)

93-101: LGTM!

The conditional logic correctly bypasses NATS/ETCD infrastructure setup when use_sglang_router is enabled. The preserved logging in the non-router path maintains good observability.

158-160: LGTM!

The decode worker correctly skips etcd connectivity checks when the sglang router is enabled, consistent with the prefill worker behavior.

scripts/profiling/profile.sh (2)

45-56: LGTM!

The conditional wait logic correctly differentiates between prefill and decode profiling modes. Prefill waits for the router (to generate traffic) while decode only needs the local server. Both modes wait for the local server at port 30000, ensuring the profiling endpoint is ready.

95-96: LGTM!

Targeting the local server at 127.0.0.1:30000 for /start_profile is correct for both prefill and decode modes, as profiling needs to be triggered on the local sglang instance.

scripts/templates/job_script_template_disagg.j2 (3)

60-60: LGTM!

The conditional correctly gates the multiple frontend architecture block to run only when enable_multiple_frontends is true AND use_sglang_router is false, preventing conflicts between the two routing approaches.

186-196: LGTM!

The WORKER_ARGS construction correctly adds --use-sglang-router when the flag is enabled, and ensures --multiple-frontends-enabled is mutually exclusive with the router flag.

339-367: LGTM!

The router launch block correctly collects leader IPs for prefill and decode workers, constructs appropriate router arguments with --pd-disaggregation, and launches the router on port 8001. This aligns with the profiling script's expectation.

coderabbitai · 2025-12-11T09:03:50Z

+            # Always pass disaggregation-mode so profiling runs in PD mode



⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the test expectations for disaggregation-mode echo "=== Test file context around disaggregation-mode ===" rg -n -B 5 -A 5 "disaggregation-mode" tests/test_command_generation.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 2490

🏁 Script executed:

#!/bin/bash # Check the actual code in sglang.py around lines 145-146 and surrounding context echo "=== Code context in sglang.py around lines 140-160 ===" sed -n '140,160p' src/srtctl/backends/sglang.py

Repository: ishandhanani/srt-slurm

Length of output: 846

🏁 Script executed:

#!/bin/bash # Search for all disaggregation-mode references in sglang.py to understand the full context echo "=== All disaggregation-mode references in sglang.py ===" rg -n -B 2 -A 2 "disaggregation-mode" src/srtctl/backends/sglang.py

Repository: ishandhanani/srt-slurm

Length of output: 304

🏁 Script executed:

#!/bin/bash # Search for all profiling-related code and disaggregation-mode handling echo "=== Profiling-specific code in sglang.py ===" rg -n "profiling" src/srtctl/backends/sglang.py | head -30

Repository: ishandhanani/srt-slurm

Length of output: 1247

🏁 Script executed:

#!/bin/bash # Check if there's a test at line 263 or if it's in a different test echo "=== Line count and test structure ===" wc -l tests/test_command_generation.py echo "" echo "=== Test functions in the file ===" rg -n "^def test_" tests/test_command_generation.py

Repository: ishandhanani/srt-slurm

Length of output: 377

🏁 Script executed:

#!/bin/bash # Look for any assertions that disaggregation-mode should NOT be present echo "=== Assertions checking disaggregation-mode NOT present ===" rg -n "assert.*disaggregation-mode.*not" tests/test_command_generation.py

Repository: ishandhanani/srt-slurm

Length of output: 229

🏁 Script executed:

#!/bin/bash # Check context around line 266 - which test and which cmd echo "=== Context around line 266 ===" sed -n '255,270p' tests/test_command_generation.py

Repository: ishandhanani/srt-slurm

Length of output: 592

🏁 Script executed:

#!/bin/bash # Check the entire test_profiling_mode function echo "=== test_profiling_mode function ===" sed -n '232,270p' tests/test_command_generation.py

Repository: ishandhanani/srt-slurm

Length of output: 1462

🏁 Script executed:

#!/bin/bash # Get the full _get_sglang_config_flags method to see if profiling logic exists echo "=== Full _get_sglang_config_flags method ===" sed -n '130,165p' src/srtctl/backends/sglang.py

Repository: ishandhanani/srt-slurm

Length of output: 1262

🏁 Script executed:

#!/bin/bash # Check if there's a separate method handling profiling config differently echo "=== Search for method that might handle profiling config ===" rg -n "_get_profiling_config|_filter_config|_get_config_flags" src/srtctl/backends/sglang.py

Repository: ishandhanani/srt-slurm

Length of output: 126

Profiling test will fail: disaggregation-mode not filtered during profiling mode.

The profiling_type variable is retrieved at line 139 but never used. The code currently adds all config flags unconditionally, including disaggregation-mode. However, test_profiling_mode() at line 232 explicitly expects --disaggregation-mode to NOT appear in the command when profiling is enabled (assert "--disaggregation-mode" not in cmd). The comment at line 145 suggests intent to handle this, but the implementation is missing.

Add conditional logic to skip disaggregation-mode when profiling_type != "none".

🤖 Prompt for AI Agents

In src/srtctl/backends/sglang.py around lines 145-146 (profiling_type is read at line 139), the code currently unconditionally adds the --disaggregation-mode flag which breaks test_profiling_mode; update the logic where config flags are appended so that when profiling_type != "none" the code skips adding the disaggregation-mode flag (i.e., only append --disaggregation-mode when profiling_type == "none"), leaving all other flags unchanged.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/worker_setup/worker.py (1)
110-122: Gate frontend startup by use_sglang_router flag.

Line 110's frontend startup check only tests need_frontend, but should also check use_sglang_router. When using the sglang router, individual workers should not start their own frontend processes. The fix is to change line 110 from:
if need_frontend:
to:
if need_frontend and not use_sglang_router:
This same pattern exists in the aggregated worker setup function (line 221) and should be fixed identically. The infrastructure setup correctly gates on use_sglang_router at lines 93 and 204, so the frontend startup should follow the same pattern.

♻️ Duplicate comments (1)

src/srtctl/backends/sglang.py (1)
145-146: Profiling test will fail: disaggregation-mode not filtered during profiling mode.

The comment says "Always pass disaggregation-mode" but test_profiling_mode() explicitly asserts --disaggregation-mode should NOT appear in the command when profiling is enabled. The profiling_type variable is retrieved at line 139 but never used to filter the flag.

Add conditional logic to skip disaggregation-mode when profiling_type != "none":
         for key, value in sorted(config.items()):
             # Convert underscores to hyphens
             flag_name = key.replace("_", "-")

-            # Always pass disaggregation-mode so profiling runs in PD mode
+            # Skip disaggregation-mode when profiling is enabled (profiling uses launch_server directly)
+            if flag_name == "disaggregation-mode" and profiling_type != "none":
+                continue

             if isinstance(value, bool):

🧹 Nitpick comments (2)

scripts/profiling/profile.sh (1)

156-161: Exit code may not reflect actual profiling status.

The exit_code captures the exit status of lm_eval (the last command before it), but if PROFILING_MODE != "prefill", no traffic generation commands run, and exit_code will be 0 regardless of the profiling outcome. Consider explicitly setting exit_code=0 for the decode path or capturing the curl exit status.
scripts/templates/job_script_template_disagg.j2 (1)
356-362: Consider adding error handling for empty IP arrays.

If PREFILL_LEADER_IPS or DECODE_LEADER_IPS arrays are empty due to misconfiguration, the router will launch without proper worker URLs. Consider adding validation.
+# Validate that we have at least one prefill and decode worker
+if [ ${#PREFILL_LEADER_IPS[@]} -eq 0 ] || [ ${#DECODE_LEADER_IPS[@]} -eq 0 ]; then
+    echo "Error: Cannot launch router without both prefill and decode workers"
+    exit 1
+fi
+
 ROUTER_ARGS="--pd-disaggregation"
 for ip in "${PREFILL_LEADER_IPS[@]}"; do

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d4449b and 3d1f0c4.

📒 Files selected for processing (7)

scripts/profiling/profile.sh (4 hunks)
scripts/templates/job_script_template_disagg.j2 (8 hunks)
scripts/worker_setup.py (4 hunks)
scripts/worker_setup/command.py (0 hunks)
scripts/worker_setup/worker.py (3 hunks)
src/srtctl/backends/sglang.py (4 hunks)
src/srtctl/core/schema.py (2 hunks)

💤 Files with no reviewable changes (1)

scripts/worker_setup/command.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/srtctl/core/schema.py

🧰 Additional context used

🧬 Code graph analysis (1)

scripts/worker_setup/worker.py (2)

scripts/worker_setup/infrastructure.py (1)

setup_head_prefill_node (14-32)

scripts/worker_setup/utils.py (1)

wait_for_etcd (53-70)

🪛 Ruff (0.14.8)

scripts/worker_setup/worker.py

97-97: Avoid specifying long messages outside the exception class

(TRY003)

101-101: Avoid specifying long messages outside the exception class

(TRY003)

160-160: Avoid specifying long messages outside the exception class

(TRY003)

208-208: Avoid specifying long messages outside the exception class

(TRY003)

212-212: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (16)

scripts/profiling/profile.sh (4)

9-10: LGTM: Environment-driven configuration for head node.

The parameterization of head_node and head_port with sensible defaults (127.0.0.1 and 8000) enables flexible deployment across different environments.

45-58: LGTM: Multi-worker readiness orchestration.

The wait_all_workers_ready function properly handles comma-separated IP lists and iterates through workers to ensure all are ready before proceeding. The empty string check prevents issues with trailing commas or empty entries.

61-73: Verify backward compatibility when both IP arrays are empty but PROFILING_MODE is not "prefill".

When PROFILING_MODE is not "prefill", the router readiness check is skipped (lines 61-64). If the IP arrays are also empty, the code falls back to waiting for 127.0.0.1:30000. This seems correct for decode-only profiling, but verify this matches the expected behavior for standalone decode profiling scenarios.

135-154: Traffic generation correctly gated to prefill mode only.

This is correct — only the prefill profiling job should generate traffic through the router. Decode workers only need to trigger /start_profile without generating additional traffic.

scripts/worker_setup.py (2)

98-103: LGTM: New CLI flag for sglang router mode.

The --use-sglang-router flag is properly defined with a clear help message explaining its purpose (skipping NATS/ETCD/frontend bootstrap).

184-227: LGTM: Flag correctly propagated to all worker setup functions.

The use_sglang_router flag is consistently passed to setup_prefill_worker, setup_decode_worker, and setup_aggregated_worker at the end of their argument lists, matching the updated function signatures in worker.py.

src/srtctl/backends/sglang.py (2)

282-302: LGTM: Unified profiling configuration.

Using a single profiling_cfg for both prefill and decode environments is cleaner and ensures consistency. The build_env_str helper handles optional fields correctly.

327-327: LGTM: New template variable for sglang router.

The use_sglang_router variable is correctly sourced from backend_config with a sensible default of False.

scripts/worker_setup/worker.py (3)

87-101: LGTM: Conditional etcd/frontend setup for prefill workers.

When use_sglang_router is True, the etcd wait and head node setup are correctly bypassed since the sglang router handles coordination instead of NATS/ETCD.

153-160: LGTM: Conditional etcd wait for decode workers.

The etcd wait is correctly skipped when use_sglang_router is True for decode workers.

198-212: LGTM: Conditional etcd/frontend setup for aggregated workers.

Consistent with prefill worker handling — etcd wait and head node setup are bypassed when using sglang router.

scripts/templates/job_script_template_disagg.j2 (5)

61-61: LGTM: Consistent gating for multiple frontends vs sglang router.

All frontend-related blocks are correctly gated with enable_multiple_frontends and not use_sglang_router, ensuring mutual exclusivity between the two routing approaches.

Also applies to: 146-146, 192-192, 213-213, 323-323

187-191: LGTM: Router flag propagation to worker arguments.

The --use-sglang-router flag is correctly appended to WORKER_ARGS when use_sglang_router is enabled.

339-368: LGTM: Sglang router launch logic.

The router launch block correctly:

Collects leader IPs for both prefill and decode workers

Constructs ROUTER_ARGS with --pd-disaggregation and worker URLs

Launches the router on node 0 with proper logging

388-419: LGTM: Unified profiling orchestration.

The profiling flow is properly consolidated:

Collects leader IPs for all prefill and decode workers

Passes them as comma-separated strings (PROFILE_PREFILL_IPS, PROFILE_DECODE_IPS) to the orchestrator

Runs a single profiling orchestrator on the first prefill leader

This aligns well with the changes in profile.sh that consume these environment variables.

5-5: Verify --segment is a valid SBATCH directive for your SLURM version.

The #SBATCH --segment={{ total_nodes }} directive does not appear in standard SLURM documentation and is not recognized across common SLURM installations. This is either a non-standard extension, a typo, or will be ignored by SLURM. Verify this is supported on your system, or consider using standard node allocation directives like --nodes.

coderabbitai · 2025-12-17T08:33:34Z

+start_profile_on_worker() {
+    local ip="$1"
+    if [[ -z "${ip}" ]]; then
+        return
+    fi
+    echo "Starting profiling on http://${ip}:30000"
+    curl -X POST "http://${ip}:30000/start_profile" -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"
+}
+
+if [[ "${#PREFILL_IPS[@]}" -gt 0 || "${#DECODE_IPS[@]}" -gt 0 ]]; then
+    for ip in "${PREFILL_IPS[@]}"; do
+        start_profile_on_worker "${ip}"
+    done
+    for ip in "${DECODE_IPS[@]}"; do
+        start_profile_on_worker "${ip}"
+    done
+else
+    # Fallback to local single-node profiling
+    echo "Starting profiling on local server http://127.0.0.1:30000"
+    curl -X POST http://127.0.0.1:30000/start_profile -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"
+fi


⚠️ Potential issue | 🟡 Minor

Potential issue: start_step is passed as a string while num_steps is an integer.

In the JSON payload at line 118, start_step is quoted as a string (\"$PROFILE_START_STEP\") while num_steps is computed as an integer. This inconsistency may cause issues if the server expects consistent types.

- curl -X POST "http://${ip}:30000/start_profile" -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}" + curl -X POST "http://${ip}:30000/start_profile" -H "Content-Type: application/json" -d "{\"start_step\": $PROFILE_START_STEP, \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"

Also update line 131 for consistency:

- curl -X POST http://127.0.0.1:30000/start_profile -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}" + curl -X POST http://127.0.0.1:30000/start_profile -H "Content-Type: application/json" -d "{\"start_step\": $PROFILE_START_STEP, \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"

🤖 Prompt for AI Agents

In scripts/profiling/profile.sh around lines 112 to 132, the JSON payloads send start_step as a quoted string while num_steps is an integer; change both curl -d payloads (the one in start_profile_on_worker and the local fallback) to pass start_step unquoted so it is a numeric value (i.e., remove the surrounding quotes from $PROFILE_START_STEP) and keep num_steps computed as $((PROFILE_STOP_STEP-PROFILE_START_STEP)); ensure ACTIVITIES remains valid JSON (or pre-serialize it) so the overall payload types are consistent.

coderabbitai · 2025-12-17T08:33:34Z

+srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \
+    --output=${LOG_DIR}/profile_all.out --overlap \
+    bash -c "PROFILING_MODE=prefill PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &


⚠️ Potential issue | 🟠 Major

Missing HEAD_NODE and HEAD_PORT environment variables for router-based profiling.

The profile.sh script reads HEAD_NODE and HEAD_PORT from environment (lines 9-10), but this srun command doesn't set them. When using sglang router, the profiling script needs to know the router endpoint to wait for readiness and send traffic.

- bash -c "PROFILING_MODE=prefill PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" & + bash -c "PROFILING_MODE=prefill HEAD_NODE=${nodes[0]} HEAD_PORT=8000 PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &

Note: ${nodes[0]} is the router node. You may need to resolve its IP using get_node_ip or use the hostname directly if DNS resolution works in the container.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \

--output=${LOG_DIR}/profile_all.out --overlap \

bash -c "PROFILING_MODE=prefill PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &

srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \

--output=${LOG_DIR}/profile_all.out --overlap \

bash -c "PROFILING_MODE=prefill HEAD_NODE=${nodes[0]} HEAD_PORT=8000 PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &

weireweire changed the title ~~support sglang native disagg and profiling.~~ [draft]support sglang native disagg and profiling. Dec 11, 2025

coderabbitai bot reviewed Dec 11, 2025

View reviewed changes

support sglang native disagg and profiling.

3d1f0c4

weireweire force-pushed the support-native-disag-profile branch from 3d4449b to 3d1f0c4 Compare December 17, 2025 08:29

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

wenscarl mentioned this pull request Dec 17, 2025

sglang router profiling for PD disagg #60

Closed

ishandhanani merged commit 6b6f886 into ishandhanani:main Dec 17, 2025
1 check passed

This was referenced Dec 17, 2025

Add sglang_router support for expert distribution recording #43

Closed

sgl-router and docs #61

Merged

This was referenced Dec 19, 2025

pass in frontend args #64

Merged

complete refactor #65

Merged

support set prefill/decode profiling step separately and agg profiling #68

Merged

coderabbitai bot mentioned this pull request Jan 9, 2026

Suport nsys with dynamo #76

Closed

This was referenced Jan 22, 2026

feat: TRTLLM support #79

Merged

Yeswanthk/dsr1 workeronly agg #99

Open

add srtlog functionality to vectorize logs #101

Closed

Add h200 config. Fix nginx/dynamo install issue. #82

Merged

feat: add vllm backend with dp+ep support for deepseek-r1 #112

Merged

coderabbitai bot mentioned this pull request Feb 4, 2026

Add H100 FP8 SGLang Disaggregated Recipes for DSR1 #121

Merged

coderabbitai bot mentioned this pull request Feb 12, 2026

Support Profiling Benchmark for Various Setup with nsys/torch_profiler #170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft]support sglang native disagg and profiling.#55

[draft]support sglang native disagg and profiling.#55
ishandhanani merged 1 commit intoishandhanani:mainfrom
weireweire:support-native-disag-profile

weireweire commented Dec 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Dec 11, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 17, 2025

Uh oh!

coderabbitai bot Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Always pass disaggregation-mode so profiling runs in PD mode

Conversation

weireweire commented Dec 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weireweire commented Dec 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 11, 2025 •

edited

Loading