Skip to content

[draft]support sglang native disagg and profiling.#55

Merged
ishandhanani merged 1 commit intoishandhanani:mainfrom
weireweire:support-native-disag-profile
Dec 17, 2025
Merged

[draft]support sglang native disagg and profiling.#55
ishandhanani merged 1 commit intoishandhanani:mainfrom
weireweire:support-native-disag-profile

Conversation

@weireweire
Copy link
Copy Markdown
Collaborator

@weireweire weireweire commented Dec 11, 2025

enable by

backend:
  use_sglang_router: true
profiling:
  type: "nsys"
  isl: 128000
  osl: 8000
  concurrency: 256
  start_step: 10
  stop_step: 30

Summary by CodeRabbit

  • New Features

    • Optional sglang router deployment path for disaggregated serving
    • Unified profiling orchestration across leader/worker nodes with per-IP start signaling
    • Profiler activities extended to include memory (MEM)
  • Improvements

    • Simplified, unified profiling configuration model
    • Environment-driven multi-worker readiness with local fallback
    • Conditional multi-frontend behavior and router-mode gating for setup and sequencing

✏️ Tip: You can customize this high-level summary in your review settings.

@weireweire weireweire changed the title support sglang native disagg and profiling. [draft]support sglang native disagg and profiling. Dec 11, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

Adds optional sglang router support and threads a new use_sglang_router flag through job templates, backend, and worker setup. Profiling flow is unified and orchestrated across leader IPs, profiling waits and endpoints are made environment-driven, and worker setup skips ETCD/frontend bootstrap when router mode is enabled.

Changes

Cohort / File(s) Summary
Profiling Orchestration
scripts/profiling/profile.sh
Read head_node/head_port from env, add PREFILL_IPS/DECODE_IPS-driven waits, wait_all_workers_ready helper, per-IP start_profile orchestration, MEM added to ACTIVITIES, gate bench_serving/lm_eval to PROFILING_MODE=="prefill", fallback wait to 127.0.0.1:30000 for compatibility.
Job Template & Router Launch
scripts/templates/job_script_template_disagg.j2
Add use_sglang_router conditional paths: launch sglang router, collect leader IPs for prefill/decode, build ROUTER_ARGS, add --use-sglang-router to worker args, unify profiling orchestration across leaders, adjust frontend/multi-frontend gating.
Worker Setup CLI
scripts/worker_setup.py
Add --use-sglang-router CLI flag and thread it to setup_prefill_worker, setup_decode_worker, setup_aggregated_worker calls.
Worker Setup Implementation
scripts/worker_setup/worker.py
Add use_sglang_router: bool = False param on prefill/decode/aggregated setup functions; skip ETCD/frontend waits when true; add hook to run setup script after wheel install; preserve SGLang command invocation.
Worker Command Construction
scripts/worker_setup/command.py
Remove profiling-time guard that omitted disaggregation-mode; now emit disaggregation-mode flag when present even during profiling.
Backend SLURM & Templating
src/srtctl/backends/sglang.py
Always emit disaggregation-mode flag, use unified profiling_cfg for prefill/decode envs, expose use_sglang_router template variable, minor formatting adjustments.
Schema Changes
src/srtctl/core/schema.py
Rename ProfilingPhaseConfig → ProfilingConfig, add type: ProfilingType, collapse per-phase prefill/decode into single ProfilingConfig, add BackendConfig.use_sglang_router: bool = False.

Sequence Diagram(s)

sequenceDiagram
    participant Orchestrator as Profiling Orchestrator (profile.sh)
    participant Router as sglang Router (optional)
    participant Worker as Worker (prefill/decode nodes)
    participant ETCD as ETCD/NATS

    Note over Orchestrator,Worker: Start profiling run
    Orchestrator->>Router: (if use_sglang_router) launch router with ROUTER_ARGS
    Orchestrator->>Worker: collect PREFILL_IPS / DECODE_IPS
    Orchestrator->>Worker: wait_all_workers_ready (port 30000)
    alt router mode
        Worker-->>ETCD: skip ETCD/frontend bootstrap (use_sglang_router=true)
    else traditional mode
        Worker->>ETCD: wait for ETCD/frontend readiness
    end
    Orchestrator->>Worker: start_profile_on_worker (per-IP start_profile)
    Orchestrator->>Orchestrator: run bench_serving / lm_eval (only if PROFILING_MODE=="prefill")
    Worker->>Orchestrator: profile logs / status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Files touching control flow across templates, backend, and scripts require cross-file consistency checks.
  • Pay special attention to:
    • scripts/profiling/profile.sh — new multi-IP waits and orchestration logic.
    • scripts/templates/job_script_template_disagg.j2 — router launch and ROUTER_ARGS correctness.
    • scripts/worker_setup/worker.py & command.py — ETCD wait gating and flag propagation.
    • src/srtctl/core/schema.py — schema rename and structure change for profiling (backwards compatibility).

Possibly related PRs

Poem

A rabbit scurries, flags in tow, 🐇
Routers hum and profilers grow,
Workers skip the waiting dance,
Leaders join the profiling trance,
Logs bloom bright where endpoints flow.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[draft]support sglang native disagg and profiling' directly aligns with the main changes: introducing sglang router support and profiling orchestration across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
scripts/worker_setup/worker.py (2)

109-122: Frontend may start unnecessarily when router is enabled.

The frontend startup block (lines 110-122) only checks need_frontend which doesn't account for use_sglang_router. When the router is enabled, the frontend shouldn't be started since the router handles traffic routing. Consider gating this block:

-    if need_frontend:
+    if need_frontend and not use_sglang_router:
         logging.info("Starting frontend in traditional mode (after dynamo installation)")

220-233: Frontend may start unnecessarily in aggregated worker when router is enabled.

Similar to the prefill worker, the frontend startup block doesn't check use_sglang_router. Apply the same fix:

-    if need_frontend:
+    if need_frontend and not use_sglang_router:
         logging.info("Starting frontend in traditional mode (after dynamo installation)")
🧹 Nitpick comments (1)
scripts/profiling/profile.sh (1)

99-118: Consider checking intermediate exit codes.

If bench_serving fails but lm_eval succeeds, the final exit_code will be 0 (success). If bench_serving results are important, consider capturing and checking its exit code:

     python3 -m sglang.bench_serving \
     ...
     --warmup-request 5
+    bench_exit=$?
+    if [ $bench_exit -ne 0 ]; then
+        echo "Warning: bench_serving failed with exit code $bench_exit"
+    fi

     pip install lm-eval tenacity > /dev/null
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 286427c and 3d4449b.

📒 Files selected for processing (7)
  • scripts/profiling/profile.sh (3 hunks)
  • scripts/templates/job_script_template_disagg.j2 (7 hunks)
  • scripts/worker_setup.py (4 hunks)
  • scripts/worker_setup/command.py (0 hunks)
  • scripts/worker_setup/worker.py (3 hunks)
  • src/srtctl/backends/sglang.py (4 hunks)
  • src/srtctl/core/schema.py (2 hunks)
💤 Files with no reviewable changes (1)
  • scripts/worker_setup/command.py
🧰 Additional context used
🧬 Code graph analysis (2)
scripts/worker_setup/worker.py (2)
scripts/worker_setup/infrastructure.py (1)
  • setup_head_prefill_node (14-32)
scripts/worker_setup/utils.py (1)
  • wait_for_etcd (53-70)
src/srtctl/backends/sglang.py (1)
tests/test_command_generation.py (1)
  • test_profiling_mode (232-268)
🪛 Ruff (0.14.8)
scripts/worker_setup/worker.py

97-97: Avoid specifying long messages outside the exception class

(TRY003)


101-101: Avoid specifying long messages outside the exception class

(TRY003)


160-160: Avoid specifying long messages outside the exception class

(TRY003)


208-208: Avoid specifying long messages outside the exception class

(TRY003)


212-212: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (13)
scripts/worker_setup.py (2)

98-102: LGTM!

The new --use-sglang-router CLI argument is properly defined with store_true action, consistent with the existing --multiple-frontends-enabled flag pattern.


184-227: LGTM!

The use_sglang_router argument is correctly propagated to all three worker setup functions, maintaining consistency across prefill, decode, and aggregated worker paths.

src/srtctl/core/schema.py (2)

173-183: LGTM!

The simplified ProfilingConfig with unified fields for both prefill and decode aligns well with the PD disaggregation model where a single profiling specification drives both sides.


237-242: LGTM!

The use_sglang_router option is appropriately added to BackendConfig with a sensible default of False for backward compatibility. The updated comment clearly explains its purpose.

src/srtctl/backends/sglang.py (2)

299-302: LGTM!

Using the same profiling_cfg for both prefill_profile_env and decode_profile_env is consistent with the unified ProfilingConfig schema change, where a single profiling specification drives both sides in PD disaggregation mode.


327-327: LGTM!

The use_sglang_router template variable is correctly exposed with a sensible default of False, enabling the template to conditionally configure router-based deployments.

scripts/worker_setup/worker.py (2)

93-101: LGTM!

The conditional logic correctly bypasses NATS/ETCD infrastructure setup when use_sglang_router is enabled. The preserved logging in the non-router path maintains good observability.


158-160: LGTM!

The decode worker correctly skips etcd connectivity checks when the sglang router is enabled, consistent with the prefill worker behavior.

scripts/profiling/profile.sh (2)

45-56: LGTM!

The conditional wait logic correctly differentiates between prefill and decode profiling modes. Prefill waits for the router (to generate traffic) while decode only needs the local server. Both modes wait for the local server at port 30000, ensuring the profiling endpoint is ready.


95-96: LGTM!

Targeting the local server at 127.0.0.1:30000 for /start_profile is correct for both prefill and decode modes, as profiling needs to be triggered on the local sglang instance.

scripts/templates/job_script_template_disagg.j2 (3)

60-60: LGTM!

The conditional correctly gates the multiple frontend architecture block to run only when enable_multiple_frontends is true AND use_sglang_router is false, preventing conflicts between the two routing approaches.


186-196: LGTM!

The WORKER_ARGS construction correctly adds --use-sglang-router when the flag is enabled, and ensures --multiple-frontends-enabled is mutually exclusive with the router flag.


339-367: LGTM!

The router launch block correctly collects leader IPs for prefill and decode workers, constructs appropriate router arguments with --pd-disaggregation, and launches the router on port 8001. This aligns with the profiling script's expectation.

Comment thread scripts/templates/job_script_template_disagg.j2 Outdated
Comment on lines +145 to 146
# Always pass disaggregation-mode so profiling runs in PD mode

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the test expectations for disaggregation-mode
echo "=== Test file context around disaggregation-mode ==="
rg -n -B 5 -A 5 "disaggregation-mode" tests/test_command_generation.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 2490


🏁 Script executed:

#!/bin/bash
# Check the actual code in sglang.py around lines 145-146 and surrounding context
echo "=== Code context in sglang.py around lines 140-160 ==="
sed -n '140,160p' src/srtctl/backends/sglang.py

Repository: ishandhanani/srt-slurm

Length of output: 846


🏁 Script executed:

#!/bin/bash
# Search for all disaggregation-mode references in sglang.py to understand the full context
echo "=== All disaggregation-mode references in sglang.py ==="
rg -n -B 2 -A 2 "disaggregation-mode" src/srtctl/backends/sglang.py

Repository: ishandhanani/srt-slurm

Length of output: 304


🏁 Script executed:

#!/bin/bash
# Search for all profiling-related code and disaggregation-mode handling
echo "=== Profiling-specific code in sglang.py ==="
rg -n "profiling" src/srtctl/backends/sglang.py | head -30

Repository: ishandhanani/srt-slurm

Length of output: 1247


🏁 Script executed:

#!/bin/bash
# Check if there's a test at line 263 or if it's in a different test
echo "=== Line count and test structure ==="
wc -l tests/test_command_generation.py
echo ""
echo "=== Test functions in the file ==="
rg -n "^def test_" tests/test_command_generation.py

Repository: ishandhanani/srt-slurm

Length of output: 377


🏁 Script executed:

#!/bin/bash
# Look for any assertions that disaggregation-mode should NOT be present
echo "=== Assertions checking disaggregation-mode NOT present ==="
rg -n "assert.*disaggregation-mode.*not" tests/test_command_generation.py

Repository: ishandhanani/srt-slurm

Length of output: 229


🏁 Script executed:

#!/bin/bash
# Check context around line 266 - which test and which cmd
echo "=== Context around line 266 ==="
sed -n '255,270p' tests/test_command_generation.py

Repository: ishandhanani/srt-slurm

Length of output: 592


🏁 Script executed:

#!/bin/bash
# Check the entire test_profiling_mode function
echo "=== test_profiling_mode function ==="
sed -n '232,270p' tests/test_command_generation.py

Repository: ishandhanani/srt-slurm

Length of output: 1462


🏁 Script executed:

#!/bin/bash
# Get the full _get_sglang_config_flags method to see if profiling logic exists
echo "=== Full _get_sglang_config_flags method ==="
sed -n '130,165p' src/srtctl/backends/sglang.py

Repository: ishandhanani/srt-slurm

Length of output: 1262


🏁 Script executed:

#!/bin/bash
# Check if there's a separate method handling profiling config differently
echo "=== Search for method that might handle profiling config ==="
rg -n "_get_profiling_config|_filter_config|_get_config_flags" src/srtctl/backends/sglang.py

Repository: ishandhanani/srt-slurm

Length of output: 126


Profiling test will fail: disaggregation-mode not filtered during profiling mode.

The profiling_type variable is retrieved at line 139 but never used. The code currently adds all config flags unconditionally, including disaggregation-mode. However, test_profiling_mode() at line 232 explicitly expects --disaggregation-mode to NOT appear in the command when profiling is enabled (assert "--disaggregation-mode" not in cmd). The comment at line 145 suggests intent to handle this, but the implementation is missing.

Add conditional logic to skip disaggregation-mode when profiling_type != "none".

🤖 Prompt for AI Agents
In src/srtctl/backends/sglang.py around lines 145-146 (profiling_type is read at
line 139), the code currently unconditionally adds the --disaggregation-mode
flag which breaks test_profiling_mode; update the logic where config flags are
appended so that when profiling_type != "none" the code skips adding the
disaggregation-mode flag (i.e., only append --disaggregation-mode when
profiling_type == "none"), leaving all other flags unchanged.

@weireweire weireweire force-pushed the support-native-disag-profile branch from 3d4449b to 3d1f0c4 Compare December 17, 2025 08:29
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/worker_setup/worker.py (1)

110-122: Gate frontend startup by use_sglang_router flag.

Line 110's frontend startup check only tests need_frontend, but should also check use_sglang_router. When using the sglang router, individual workers should not start their own frontend processes. The fix is to change line 110 from:

if need_frontend:

to:

if need_frontend and not use_sglang_router:

This same pattern exists in the aggregated worker setup function (line 221) and should be fixed identically. The infrastructure setup correctly gates on use_sglang_router at lines 93 and 204, so the frontend startup should follow the same pattern.

♻️ Duplicate comments (1)
src/srtctl/backends/sglang.py (1)

145-146: Profiling test will fail: disaggregation-mode not filtered during profiling mode.

The comment says "Always pass disaggregation-mode" but test_profiling_mode() explicitly asserts --disaggregation-mode should NOT appear in the command when profiling is enabled. The profiling_type variable is retrieved at line 139 but never used to filter the flag.

Add conditional logic to skip disaggregation-mode when profiling_type != "none":

         for key, value in sorted(config.items()):
             # Convert underscores to hyphens
             flag_name = key.replace("_", "-")

-            # Always pass disaggregation-mode so profiling runs in PD mode
+            # Skip disaggregation-mode when profiling is enabled (profiling uses launch_server directly)
+            if flag_name == "disaggregation-mode" and profiling_type != "none":
+                continue

             if isinstance(value, bool):
🧹 Nitpick comments (2)
scripts/profiling/profile.sh (1)

156-161: Exit code may not reflect actual profiling status.

The exit_code captures the exit status of lm_eval (the last command before it), but if PROFILING_MODE != "prefill", no traffic generation commands run, and exit_code will be 0 regardless of the profiling outcome. Consider explicitly setting exit_code=0 for the decode path or capturing the curl exit status.

scripts/templates/job_script_template_disagg.j2 (1)

356-362: Consider adding error handling for empty IP arrays.

If PREFILL_LEADER_IPS or DECODE_LEADER_IPS arrays are empty due to misconfiguration, the router will launch without proper worker URLs. Consider adding validation.

+# Validate that we have at least one prefill and decode worker
+if [ ${#PREFILL_LEADER_IPS[@]} -eq 0 ] || [ ${#DECODE_LEADER_IPS[@]} -eq 0 ]; then
+    echo "Error: Cannot launch router without both prefill and decode workers"
+    exit 1
+fi
+
 ROUTER_ARGS="--pd-disaggregation"
 for ip in "${PREFILL_LEADER_IPS[@]}"; do
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d4449b and 3d1f0c4.

📒 Files selected for processing (7)
  • scripts/profiling/profile.sh (4 hunks)
  • scripts/templates/job_script_template_disagg.j2 (8 hunks)
  • scripts/worker_setup.py (4 hunks)
  • scripts/worker_setup/command.py (0 hunks)
  • scripts/worker_setup/worker.py (3 hunks)
  • src/srtctl/backends/sglang.py (4 hunks)
  • src/srtctl/core/schema.py (2 hunks)
💤 Files with no reviewable changes (1)
  • scripts/worker_setup/command.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/srtctl/core/schema.py
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/worker_setup/worker.py (2)
scripts/worker_setup/infrastructure.py (1)
  • setup_head_prefill_node (14-32)
scripts/worker_setup/utils.py (1)
  • wait_for_etcd (53-70)
🪛 Ruff (0.14.8)
scripts/worker_setup/worker.py

97-97: Avoid specifying long messages outside the exception class

(TRY003)


101-101: Avoid specifying long messages outside the exception class

(TRY003)


160-160: Avoid specifying long messages outside the exception class

(TRY003)


208-208: Avoid specifying long messages outside the exception class

(TRY003)


212-212: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (16)
scripts/profiling/profile.sh (4)

9-10: LGTM: Environment-driven configuration for head node.

The parameterization of head_node and head_port with sensible defaults (127.0.0.1 and 8000) enables flexible deployment across different environments.


45-58: LGTM: Multi-worker readiness orchestration.

The wait_all_workers_ready function properly handles comma-separated IP lists and iterates through workers to ensure all are ready before proceeding. The empty string check prevents issues with trailing commas or empty entries.


61-73: Verify backward compatibility when both IP arrays are empty but PROFILING_MODE is not "prefill".

When PROFILING_MODE is not "prefill", the router readiness check is skipped (lines 61-64). If the IP arrays are also empty, the code falls back to waiting for 127.0.0.1:30000. This seems correct for decode-only profiling, but verify this matches the expected behavior for standalone decode profiling scenarios.


135-154: Traffic generation correctly gated to prefill mode only.

This is correct — only the prefill profiling job should generate traffic through the router. Decode workers only need to trigger /start_profile without generating additional traffic.

scripts/worker_setup.py (2)

98-103: LGTM: New CLI flag for sglang router mode.

The --use-sglang-router flag is properly defined with a clear help message explaining its purpose (skipping NATS/ETCD/frontend bootstrap).


184-227: LGTM: Flag correctly propagated to all worker setup functions.

The use_sglang_router flag is consistently passed to setup_prefill_worker, setup_decode_worker, and setup_aggregated_worker at the end of their argument lists, matching the updated function signatures in worker.py.

src/srtctl/backends/sglang.py (2)

282-302: LGTM: Unified profiling configuration.

Using a single profiling_cfg for both prefill and decode environments is cleaner and ensures consistency. The build_env_str helper handles optional fields correctly.


327-327: LGTM: New template variable for sglang router.

The use_sglang_router variable is correctly sourced from backend_config with a sensible default of False.

scripts/worker_setup/worker.py (3)

87-101: LGTM: Conditional etcd/frontend setup for prefill workers.

When use_sglang_router is True, the etcd wait and head node setup are correctly bypassed since the sglang router handles coordination instead of NATS/ETCD.


153-160: LGTM: Conditional etcd wait for decode workers.

The etcd wait is correctly skipped when use_sglang_router is True for decode workers.


198-212: LGTM: Conditional etcd/frontend setup for aggregated workers.

Consistent with prefill worker handling — etcd wait and head node setup are bypassed when using sglang router.

scripts/templates/job_script_template_disagg.j2 (5)

61-61: LGTM: Consistent gating for multiple frontends vs sglang router.

All frontend-related blocks are correctly gated with enable_multiple_frontends and not use_sglang_router, ensuring mutual exclusivity between the two routing approaches.

Also applies to: 146-146, 192-192, 213-213, 323-323


187-191: LGTM: Router flag propagation to worker arguments.

The --use-sglang-router flag is correctly appended to WORKER_ARGS when use_sglang_router is enabled.


339-368: LGTM: Sglang router launch logic.

The router launch block correctly:

  1. Collects leader IPs for both prefill and decode workers
  2. Constructs ROUTER_ARGS with --pd-disaggregation and worker URLs
  3. Launches the router on node 0 with proper logging

388-419: LGTM: Unified profiling orchestration.

The profiling flow is properly consolidated:

  1. Collects leader IPs for all prefill and decode workers
  2. Passes them as comma-separated strings (PROFILE_PREFILL_IPS, PROFILE_DECODE_IPS) to the orchestrator
  3. Runs a single profiling orchestrator on the first prefill leader

This aligns well with the changes in profile.sh that consume these environment variables.


5-5: Verify --segment is a valid SBATCH directive for your SLURM version.

The #SBATCH --segment={{ total_nodes }} directive does not appear in standard SLURM documentation and is not recognized across common SLURM installations. This is either a non-standard extension, a typo, or will be ignored by SLURM. Verify this is supported on your system, or consider using standard node allocation directives like --nodes.

Comment on lines +112 to +132
start_profile_on_worker() {
local ip="$1"
if [[ -z "${ip}" ]]; then
return
fi
echo "Starting profiling on http://${ip}:30000"
curl -X POST "http://${ip}:30000/start_profile" -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"
}

if [[ "${#PREFILL_IPS[@]}" -gt 0 || "${#DECODE_IPS[@]}" -gt 0 ]]; then
for ip in "${PREFILL_IPS[@]}"; do
start_profile_on_worker "${ip}"
done
for ip in "${DECODE_IPS[@]}"; do
start_profile_on_worker "${ip}"
done
else
# Fallback to local single-node profiling
echo "Starting profiling on local server http://127.0.0.1:30000"
curl -X POST http://127.0.0.1:30000/start_profile -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential issue: start_step is passed as a string while num_steps is an integer.

In the JSON payload at line 118, start_step is quoted as a string (\"$PROFILE_START_STEP\") while num_steps is computed as an integer. This inconsistency may cause issues if the server expects consistent types.

-    curl -X POST "http://${ip}:30000/start_profile" -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"
+    curl -X POST "http://${ip}:30000/start_profile" -H "Content-Type: application/json" -d "{\"start_step\": $PROFILE_START_STEP, \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"

Also update line 131 for consistency:

-    curl -X POST http://127.0.0.1:30000/start_profile -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"
+    curl -X POST http://127.0.0.1:30000/start_profile -H "Content-Type: application/json" -d "{\"start_step\": $PROFILE_START_STEP, \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"
🤖 Prompt for AI Agents
In scripts/profiling/profile.sh around lines 112 to 132, the JSON payloads send
start_step as a quoted string while num_steps is an integer; change both curl -d
payloads (the one in start_profile_on_worker and the local fallback) to pass
start_step unquoted so it is a numeric value (i.e., remove the surrounding
quotes from $PROFILE_START_STEP) and keep num_steps computed as
$((PROFILE_STOP_STEP-PROFILE_START_STEP)); ensure ACTIVITIES remains valid JSON
(or pre-serialize it) so the overall payload types are consistent.

Comment on lines +416 to +418
srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \
--output=${LOG_DIR}/profile_all.out --overlap \
bash -c "PROFILING_MODE=prefill PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing HEAD_NODE and HEAD_PORT environment variables for router-based profiling.

The profile.sh script reads HEAD_NODE and HEAD_PORT from environment (lines 9-10), but this srun command doesn't set them. When using sglang router, the profiling script needs to know the router endpoint to wait for readiness and send traffic.

-    bash -c "PROFILING_MODE=prefill PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &
+    bash -c "PROFILING_MODE=prefill HEAD_NODE=${nodes[0]} HEAD_PORT=8000 PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &

Note: ${nodes[0]} is the router node. You may need to resolve its IP using get_node_ip or use the hostname directly if DNS resolution works in the container.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \
--output=${LOG_DIR}/profile_all.out --overlap \
bash -c "PROFILING_MODE=prefill PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &
srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \
--output=${LOG_DIR}/profile_all.out --overlap \
bash -c "PROFILING_MODE=prefill HEAD_NODE=${nodes[0]} HEAD_PORT=8000 PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants