[draft]support sglang native disagg and profiling.#55
Conversation
WalkthroughAdds optional sglang router support and threads a new Changes
Sequence Diagram(s)sequenceDiagram
participant Orchestrator as Profiling Orchestrator (profile.sh)
participant Router as sglang Router (optional)
participant Worker as Worker (prefill/decode nodes)
participant ETCD as ETCD/NATS
Note over Orchestrator,Worker: Start profiling run
Orchestrator->>Router: (if use_sglang_router) launch router with ROUTER_ARGS
Orchestrator->>Worker: collect PREFILL_IPS / DECODE_IPS
Orchestrator->>Worker: wait_all_workers_ready (port 30000)
alt router mode
Worker-->>ETCD: skip ETCD/frontend bootstrap (use_sglang_router=true)
else traditional mode
Worker->>ETCD: wait for ETCD/frontend readiness
end
Orchestrator->>Worker: start_profile_on_worker (per-IP start_profile)
Orchestrator->>Orchestrator: run bench_serving / lm_eval (only if PROFILING_MODE=="prefill")
Worker->>Orchestrator: profile logs / status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
scripts/worker_setup/worker.py (2)
109-122: Frontend may start unnecessarily when router is enabled.The frontend startup block (lines 110-122) only checks
need_frontendwhich doesn't account foruse_sglang_router. When the router is enabled, the frontend shouldn't be started since the router handles traffic routing. Consider gating this block:- if need_frontend: + if need_frontend and not use_sglang_router: logging.info("Starting frontend in traditional mode (after dynamo installation)")
220-233: Frontend may start unnecessarily in aggregated worker when router is enabled.Similar to the prefill worker, the frontend startup block doesn't check
use_sglang_router. Apply the same fix:- if need_frontend: + if need_frontend and not use_sglang_router: logging.info("Starting frontend in traditional mode (after dynamo installation)")
🧹 Nitpick comments (1)
scripts/profiling/profile.sh (1)
99-118: Consider checking intermediate exit codes.If
bench_servingfails butlm_evalsucceeds, the finalexit_codewill be 0 (success). Ifbench_servingresults are important, consider capturing and checking its exit code:python3 -m sglang.bench_serving \ ... --warmup-request 5 + bench_exit=$? + if [ $bench_exit -ne 0 ]; then + echo "Warning: bench_serving failed with exit code $bench_exit" + fi pip install lm-eval tenacity > /dev/null
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
scripts/profiling/profile.sh(3 hunks)scripts/templates/job_script_template_disagg.j2(7 hunks)scripts/worker_setup.py(4 hunks)scripts/worker_setup/command.py(0 hunks)scripts/worker_setup/worker.py(3 hunks)src/srtctl/backends/sglang.py(4 hunks)src/srtctl/core/schema.py(2 hunks)
💤 Files with no reviewable changes (1)
- scripts/worker_setup/command.py
🧰 Additional context used
🧬 Code graph analysis (2)
scripts/worker_setup/worker.py (2)
scripts/worker_setup/infrastructure.py (1)
setup_head_prefill_node(14-32)scripts/worker_setup/utils.py (1)
wait_for_etcd(53-70)
src/srtctl/backends/sglang.py (1)
tests/test_command_generation.py (1)
test_profiling_mode(232-268)
🪛 Ruff (0.14.8)
scripts/worker_setup/worker.py
97-97: Avoid specifying long messages outside the exception class
(TRY003)
101-101: Avoid specifying long messages outside the exception class
(TRY003)
160-160: Avoid specifying long messages outside the exception class
(TRY003)
208-208: Avoid specifying long messages outside the exception class
(TRY003)
212-212: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (13)
scripts/worker_setup.py (2)
98-102: LGTM!The new
--use-sglang-routerCLI argument is properly defined withstore_trueaction, consistent with the existing--multiple-frontends-enabledflag pattern.
184-227: LGTM!The
use_sglang_routerargument is correctly propagated to all three worker setup functions, maintaining consistency across prefill, decode, and aggregated worker paths.src/srtctl/core/schema.py (2)
173-183: LGTM!The simplified
ProfilingConfigwith unified fields for both prefill and decode aligns well with the PD disaggregation model where a single profiling specification drives both sides.
237-242: LGTM!The
use_sglang_routeroption is appropriately added toBackendConfigwith a sensible default ofFalsefor backward compatibility. The updated comment clearly explains its purpose.src/srtctl/backends/sglang.py (2)
299-302: LGTM!Using the same
profiling_cfgfor bothprefill_profile_envanddecode_profile_envis consistent with the unifiedProfilingConfigschema change, where a single profiling specification drives both sides in PD disaggregation mode.
327-327: LGTM!The
use_sglang_routertemplate variable is correctly exposed with a sensible default ofFalse, enabling the template to conditionally configure router-based deployments.scripts/worker_setup/worker.py (2)
93-101: LGTM!The conditional logic correctly bypasses NATS/ETCD infrastructure setup when
use_sglang_routeris enabled. The preserved logging in the non-router path maintains good observability.
158-160: LGTM!The decode worker correctly skips etcd connectivity checks when the sglang router is enabled, consistent with the prefill worker behavior.
scripts/profiling/profile.sh (2)
45-56: LGTM!The conditional wait logic correctly differentiates between prefill and decode profiling modes. Prefill waits for the router (to generate traffic) while decode only needs the local server. Both modes wait for the local server at port 30000, ensuring the profiling endpoint is ready.
95-96: LGTM!Targeting the local server at
127.0.0.1:30000for/start_profileis correct for both prefill and decode modes, as profiling needs to be triggered on the local sglang instance.scripts/templates/job_script_template_disagg.j2 (3)
60-60: LGTM!The conditional correctly gates the multiple frontend architecture block to run only when
enable_multiple_frontendsis true ANDuse_sglang_routeris false, preventing conflicts between the two routing approaches.
186-196: LGTM!The
WORKER_ARGSconstruction correctly adds--use-sglang-routerwhen the flag is enabled, and ensures--multiple-frontends-enabledis mutually exclusive with the router flag.
339-367: LGTM!The router launch block correctly collects leader IPs for prefill and decode workers, constructs appropriate router arguments with
--pd-disaggregation, and launches the router on port 8001. This aligns with the profiling script's expectation.
| # Always pass disaggregation-mode so profiling runs in PD mode | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the test expectations for disaggregation-mode
echo "=== Test file context around disaggregation-mode ==="
rg -n -B 5 -A 5 "disaggregation-mode" tests/test_command_generation.py | head -50Repository: ishandhanani/srt-slurm
Length of output: 2490
🏁 Script executed:
#!/bin/bash
# Check the actual code in sglang.py around lines 145-146 and surrounding context
echo "=== Code context in sglang.py around lines 140-160 ==="
sed -n '140,160p' src/srtctl/backends/sglang.pyRepository: ishandhanani/srt-slurm
Length of output: 846
🏁 Script executed:
#!/bin/bash
# Search for all disaggregation-mode references in sglang.py to understand the full context
echo "=== All disaggregation-mode references in sglang.py ==="
rg -n -B 2 -A 2 "disaggregation-mode" src/srtctl/backends/sglang.pyRepository: ishandhanani/srt-slurm
Length of output: 304
🏁 Script executed:
#!/bin/bash
# Search for all profiling-related code and disaggregation-mode handling
echo "=== Profiling-specific code in sglang.py ==="
rg -n "profiling" src/srtctl/backends/sglang.py | head -30Repository: ishandhanani/srt-slurm
Length of output: 1247
🏁 Script executed:
#!/bin/bash
# Check if there's a test at line 263 or if it's in a different test
echo "=== Line count and test structure ==="
wc -l tests/test_command_generation.py
echo ""
echo "=== Test functions in the file ==="
rg -n "^def test_" tests/test_command_generation.pyRepository: ishandhanani/srt-slurm
Length of output: 377
🏁 Script executed:
#!/bin/bash
# Look for any assertions that disaggregation-mode should NOT be present
echo "=== Assertions checking disaggregation-mode NOT present ==="
rg -n "assert.*disaggregation-mode.*not" tests/test_command_generation.pyRepository: ishandhanani/srt-slurm
Length of output: 229
🏁 Script executed:
#!/bin/bash
# Check context around line 266 - which test and which cmd
echo "=== Context around line 266 ==="
sed -n '255,270p' tests/test_command_generation.pyRepository: ishandhanani/srt-slurm
Length of output: 592
🏁 Script executed:
#!/bin/bash
# Check the entire test_profiling_mode function
echo "=== test_profiling_mode function ==="
sed -n '232,270p' tests/test_command_generation.pyRepository: ishandhanani/srt-slurm
Length of output: 1462
🏁 Script executed:
#!/bin/bash
# Get the full _get_sglang_config_flags method to see if profiling logic exists
echo "=== Full _get_sglang_config_flags method ==="
sed -n '130,165p' src/srtctl/backends/sglang.pyRepository: ishandhanani/srt-slurm
Length of output: 1262
🏁 Script executed:
#!/bin/bash
# Check if there's a separate method handling profiling config differently
echo "=== Search for method that might handle profiling config ==="
rg -n "_get_profiling_config|_filter_config|_get_config_flags" src/srtctl/backends/sglang.pyRepository: ishandhanani/srt-slurm
Length of output: 126
Profiling test will fail: disaggregation-mode not filtered during profiling mode.
The profiling_type variable is retrieved at line 139 but never used. The code currently adds all config flags unconditionally, including disaggregation-mode. However, test_profiling_mode() at line 232 explicitly expects --disaggregation-mode to NOT appear in the command when profiling is enabled (assert "--disaggregation-mode" not in cmd). The comment at line 145 suggests intent to handle this, but the implementation is missing.
Add conditional logic to skip disaggregation-mode when profiling_type != "none".
🤖 Prompt for AI Agents
In src/srtctl/backends/sglang.py around lines 145-146 (profiling_type is read at
line 139), the code currently unconditionally adds the --disaggregation-mode
flag which breaks test_profiling_mode; update the logic where config flags are
appended so that when profiling_type != "none" the code skips adding the
disaggregation-mode flag (i.e., only append --disaggregation-mode when
profiling_type == "none"), leaving all other flags unchanged.
3d4449b to
3d1f0c4
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/worker_setup/worker.py (1)
110-122: Gate frontend startup byuse_sglang_routerflag.Line 110's frontend startup check only tests
need_frontend, but should also checkuse_sglang_router. When using the sglang router, individual workers should not start their own frontend processes. The fix is to change line 110 from:if need_frontend:to:
if need_frontend and not use_sglang_router:This same pattern exists in the aggregated worker setup function (line 221) and should be fixed identically. The infrastructure setup correctly gates on
use_sglang_routerat lines 93 and 204, so the frontend startup should follow the same pattern.
♻️ Duplicate comments (1)
src/srtctl/backends/sglang.py (1)
145-146: Profiling test will fail:disaggregation-modenot filtered during profiling mode.The comment says "Always pass disaggregation-mode" but
test_profiling_mode()explicitly asserts--disaggregation-modeshould NOT appear in the command when profiling is enabled. Theprofiling_typevariable is retrieved at line 139 but never used to filter the flag.Add conditional logic to skip
disaggregation-modewhenprofiling_type != "none":for key, value in sorted(config.items()): # Convert underscores to hyphens flag_name = key.replace("_", "-") - # Always pass disaggregation-mode so profiling runs in PD mode + # Skip disaggregation-mode when profiling is enabled (profiling uses launch_server directly) + if flag_name == "disaggregation-mode" and profiling_type != "none": + continue if isinstance(value, bool):
🧹 Nitpick comments (2)
scripts/profiling/profile.sh (1)
156-161: Exit code may not reflect actual profiling status.The
exit_codecaptures the exit status oflm_eval(the last command before it), but ifPROFILING_MODE != "prefill", no traffic generation commands run, andexit_codewill be 0 regardless of the profiling outcome. Consider explicitly settingexit_code=0for the decode path or capturing the curl exit status.scripts/templates/job_script_template_disagg.j2 (1)
356-362: Consider adding error handling for empty IP arrays.If
PREFILL_LEADER_IPSorDECODE_LEADER_IPSarrays are empty due to misconfiguration, the router will launch without proper worker URLs. Consider adding validation.+# Validate that we have at least one prefill and decode worker +if [ ${#PREFILL_LEADER_IPS[@]} -eq 0 ] || [ ${#DECODE_LEADER_IPS[@]} -eq 0 ]; then + echo "Error: Cannot launch router without both prefill and decode workers" + exit 1 +fi + ROUTER_ARGS="--pd-disaggregation" for ip in "${PREFILL_LEADER_IPS[@]}"; do
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
scripts/profiling/profile.sh(4 hunks)scripts/templates/job_script_template_disagg.j2(8 hunks)scripts/worker_setup.py(4 hunks)scripts/worker_setup/command.py(0 hunks)scripts/worker_setup/worker.py(3 hunks)src/srtctl/backends/sglang.py(4 hunks)src/srtctl/core/schema.py(2 hunks)
💤 Files with no reviewable changes (1)
- scripts/worker_setup/command.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/srtctl/core/schema.py
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/worker_setup/worker.py (2)
scripts/worker_setup/infrastructure.py (1)
setup_head_prefill_node(14-32)scripts/worker_setup/utils.py (1)
wait_for_etcd(53-70)
🪛 Ruff (0.14.8)
scripts/worker_setup/worker.py
97-97: Avoid specifying long messages outside the exception class
(TRY003)
101-101: Avoid specifying long messages outside the exception class
(TRY003)
160-160: Avoid specifying long messages outside the exception class
(TRY003)
208-208: Avoid specifying long messages outside the exception class
(TRY003)
212-212: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (16)
scripts/profiling/profile.sh (4)
9-10: LGTM: Environment-driven configuration for head node.The parameterization of
head_nodeandhead_portwith sensible defaults (127.0.0.1 and 8000) enables flexible deployment across different environments.
45-58: LGTM: Multi-worker readiness orchestration.The
wait_all_workers_readyfunction properly handles comma-separated IP lists and iterates through workers to ensure all are ready before proceeding. The empty string check prevents issues with trailing commas or empty entries.
61-73: Verify backward compatibility when both IP arrays are empty but PROFILING_MODE is not "prefill".When
PROFILING_MODEis not "prefill", the router readiness check is skipped (lines 61-64). If the IP arrays are also empty, the code falls back to waiting for127.0.0.1:30000. This seems correct for decode-only profiling, but verify this matches the expected behavior for standalone decode profiling scenarios.
135-154: Traffic generation correctly gated to prefill mode only.This is correct — only the prefill profiling job should generate traffic through the router. Decode workers only need to trigger
/start_profilewithout generating additional traffic.scripts/worker_setup.py (2)
98-103: LGTM: New CLI flag for sglang router mode.The
--use-sglang-routerflag is properly defined with a clear help message explaining its purpose (skipping NATS/ETCD/frontend bootstrap).
184-227: LGTM: Flag correctly propagated to all worker setup functions.The
use_sglang_routerflag is consistently passed tosetup_prefill_worker,setup_decode_worker, andsetup_aggregated_workerat the end of their argument lists, matching the updated function signatures inworker.py.src/srtctl/backends/sglang.py (2)
282-302: LGTM: Unified profiling configuration.Using a single
profiling_cfgfor both prefill and decode environments is cleaner and ensures consistency. Thebuild_env_strhelper handles optional fields correctly.
327-327: LGTM: New template variable for sglang router.The
use_sglang_routervariable is correctly sourced frombackend_configwith a sensible default ofFalse.scripts/worker_setup/worker.py (3)
87-101: LGTM: Conditional etcd/frontend setup for prefill workers.When
use_sglang_routeris True, the etcd wait and head node setup are correctly bypassed since the sglang router handles coordination instead of NATS/ETCD.
153-160: LGTM: Conditional etcd wait for decode workers.The etcd wait is correctly skipped when
use_sglang_routeris True for decode workers.
198-212: LGTM: Conditional etcd/frontend setup for aggregated workers.Consistent with prefill worker handling — etcd wait and head node setup are bypassed when using sglang router.
scripts/templates/job_script_template_disagg.j2 (5)
61-61: LGTM: Consistent gating for multiple frontends vs sglang router.All frontend-related blocks are correctly gated with
enable_multiple_frontends and not use_sglang_router, ensuring mutual exclusivity between the two routing approaches.Also applies to: 146-146, 192-192, 213-213, 323-323
187-191: LGTM: Router flag propagation to worker arguments.The
--use-sglang-routerflag is correctly appended toWORKER_ARGSwhenuse_sglang_routeris enabled.
339-368: LGTM: Sglang router launch logic.The router launch block correctly:
- Collects leader IPs for both prefill and decode workers
- Constructs
ROUTER_ARGSwith--pd-disaggregationand worker URLs- Launches the router on node 0 with proper logging
388-419: LGTM: Unified profiling orchestration.The profiling flow is properly consolidated:
- Collects leader IPs for all prefill and decode workers
- Passes them as comma-separated strings (
PROFILE_PREFILL_IPS,PROFILE_DECODE_IPS) to the orchestrator- Runs a single profiling orchestrator on the first prefill leader
This aligns well with the changes in
profile.shthat consume these environment variables.
5-5: Verify--segmentis a valid SBATCH directive for your SLURM version.The
#SBATCH --segment={{ total_nodes }}directive does not appear in standard SLURM documentation and is not recognized across common SLURM installations. This is either a non-standard extension, a typo, or will be ignored by SLURM. Verify this is supported on your system, or consider using standard node allocation directives like--nodes.
| start_profile_on_worker() { | ||
| local ip="$1" | ||
| if [[ -z "${ip}" ]]; then | ||
| return | ||
| fi | ||
| echo "Starting profiling on http://${ip}:30000" | ||
| curl -X POST "http://${ip}:30000/start_profile" -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}" | ||
| } | ||
|
|
||
| if [[ "${#PREFILL_IPS[@]}" -gt 0 || "${#DECODE_IPS[@]}" -gt 0 ]]; then | ||
| for ip in "${PREFILL_IPS[@]}"; do | ||
| start_profile_on_worker "${ip}" | ||
| done | ||
| for ip in "${DECODE_IPS[@]}"; do | ||
| start_profile_on_worker "${ip}" | ||
| done | ||
| else | ||
| # Fallback to local single-node profiling | ||
| echo "Starting profiling on local server http://127.0.0.1:30000" | ||
| curl -X POST http://127.0.0.1:30000/start_profile -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}" | ||
| fi |
There was a problem hiding this comment.
Potential issue: start_step is passed as a string while num_steps is an integer.
In the JSON payload at line 118, start_step is quoted as a string (\"$PROFILE_START_STEP\") while num_steps is computed as an integer. This inconsistency may cause issues if the server expects consistent types.
- curl -X POST "http://${ip}:30000/start_profile" -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"
+ curl -X POST "http://${ip}:30000/start_profile" -H "Content-Type: application/json" -d "{\"start_step\": $PROFILE_START_STEP, \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"Also update line 131 for consistency:
- curl -X POST http://127.0.0.1:30000/start_profile -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"
+ curl -X POST http://127.0.0.1:30000/start_profile -H "Content-Type: application/json" -d "{\"start_step\": $PROFILE_START_STEP, \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": $ACTIVITIES}"🤖 Prompt for AI Agents
In scripts/profiling/profile.sh around lines 112 to 132, the JSON payloads send
start_step as a quoted string while num_steps is an integer; change both curl -d
payloads (the one in start_profile_on_worker and the local fallback) to pass
start_step unquoted so it is a numeric value (i.e., remove the surrounding
quotes from $PROFILE_START_STEP) and keep num_steps computed as
$((PROFILE_STOP_STEP-PROFILE_START_STEP)); ensure ACTIVITIES remains valid JSON
(or pre-serialize it) so the overall payload types are consistent.
| srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \ | ||
| --output=${LOG_DIR}/profile_all.out --overlap \ | ||
| bash -c "PROFILING_MODE=prefill PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" & |
There was a problem hiding this comment.
Missing HEAD_NODE and HEAD_PORT environment variables for router-based profiling.
The profile.sh script reads HEAD_NODE and HEAD_PORT from environment (lines 9-10), but this srun command doesn't set them. When using sglang router, the profiling script needs to know the router endpoint to wait for readiness and send traffic.
- bash -c "PROFILING_MODE=prefill PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &
+ bash -c "PROFILING_MODE=prefill HEAD_NODE=${nodes[0]} HEAD_PORT=8000 PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &Note: ${nodes[0]} is the router node. You may need to resolve its IP using get_node_ip or use the hostname directly if DNS resolution works in the container.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \ | |
| --output=${LOG_DIR}/profile_all.out --overlap \ | |
| bash -c "PROFILING_MODE=prefill PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" & | |
| srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \ | |
| --output=${LOG_DIR}/profile_all.out --overlap \ | |
| bash -c "PROFILING_MODE=prefill HEAD_NODE=${nodes[0]} HEAD_PORT=8000 PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" & |
enable by
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.