add nsys profiling and swap to bench_serving instead of bench_one_batch#41
Conversation
e219567 to
062f5a6
Compare
nsys profiling and swap to bench_serving instead of bench_one_batch
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughReplace the boolean profiling flag with a string-based profiler ("none", "torch", "nsys"), add per-phase profiling config, propagate the profiler through schema, backend rendering, worker CLI/command builders, SLURM templates and profiling scripts, and enforce mutual exclusion with benchmarking. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as User CLI (srtctl)
participant Scheduler as SLURM / job script
participant Worker as Worker process
participant Profiler as Profiler dir / Remote Profiler API
CLI->>Scheduler: submit job with profiling.type (none|torch|nsys)
Scheduler->>Worker: start worker with `--profiler` and phase envs
alt profiling.type == "torch" and SGLANG_TORCH_PROFILER_DIR set
Worker->>Profiler: create profiler dir
Worker->>Worker: start python -m sglang.launch_server under torch profiler
else profiling.type == "torch" and no local profiler dir
Worker->>Profiler: HTTP POST /start_profile (remote trigger)
Worker->>Worker: start python -m sglang.launch_server
else profiling.type == "nsys"
Worker->>Worker: start python prefixed with nsys CLI wrapper
else profiling.type == "none"
Worker->>Worker: start python -m dynamo.sglang (normal serving)
end
Worker->>Scheduler: report exit code and results path
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
scripts/profiling/profile.sh (2)
48-48: Remove unused variablePROFILE_STEPS_ARG.This variable is assigned but never used in the script.
# Determine profiling parameters strictly from environment -PROFILE_STEPS_ARG="" CLI_ARGS=""
49-50: Remove unusedCLI_ARGSvariable.
CLI_ARGSis built with--batch-sizebut never used in thebench_servingcommand. The concurrency is passed via--max-concurrencydirectly.# Determine profiling parameters strictly from environment -CLI_ARGS="" -[[ -n "${PROFILE_CONCURRENCY}" ]] && CLI_ARGS+=" --batch-size ${PROFILE_CONCURRENCY}" # Require ISL/OSL to be provided; do not pass them as CLI args heresrc/srtctl/backends/sglang.py (1)
289-301: Consider simplifying the conditional checks.The pattern
if "key" in cfg and cfg["key"] is not Nonecan be simplified toif cfg.get("key") is not Nonesincedict.get()returnsNonefor missing keys.- if "isl" in cfg and cfg["isl"] is not None: - parts.append(f"PROFILE_ISL={cfg['isl']}") - if "osl" in cfg and cfg["osl"] is not None: - parts.append(f"PROFILE_OSL={cfg['osl']}") - if "concurrency" in cfg and cfg["concurrency"] is not None: - parts.append(f"PROFILE_CONCURRENCY={cfg['concurrency']}") - if "start_step" in cfg and cfg["start_step"] is not None: - parts.append(f"PROFILE_START_STEP={cfg['start_step']}") - if "stop_step" in cfg and cfg["stop_step"] is not None: - parts.append(f"PROFILE_STOP_STEP={cfg['stop_step']}") + for key, env_var in [ + ("isl", "PROFILE_ISL"), + ("osl", "PROFILE_OSL"), + ("concurrency", "PROFILE_CONCURRENCY"), + ("start_step", "PROFILE_START_STEP"), + ("stop_step", "PROFILE_STOP_STEP"), + ]: + if cfg.get(key) is not None: + parts.append(f"{env_var}={cfg[key]}")src/srtctl/core/schema.py (1)
303-306: Remove or clarify the no-op code block.This block contains only a
passstatement with a comment stating "This is fine - backend will handle disabling it". If no action is needed here, consider removing the entireifblock to avoid confusion.- # Auto-disable config dump when profiling (already handled in backend, but validate here too) - if self.enable_config_dump: - # This is fine - backend will handle disabling it - passscripts/worker_setup/worker.py (1)
92-93: Pre-existing: File handles not explicitly closed.The file handles for frontend logs are opened but never explicitly closed. While this is intentional for long-running background processes (the frontend runs until job completion), consider using a context manager or documenting this behavior for clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
docs/installation.md(1 hunks)scripts/profiling/profile.sh(1 hunks)scripts/templates/job_script_template_agg.j2(7 hunks)scripts/templates/job_script_template_disagg.j2(8 hunks)scripts/worker_setup.py(4 hunks)scripts/worker_setup/command.py(6 hunks)scripts/worker_setup/worker.py(6 hunks)src/srtctl/backends/sglang.py(5 hunks)src/srtctl/core/schema.py(3 hunks)
🧰 Additional context used
🪛 Shellcheck (0.11.0)
scripts/profiling/profile.sh
[warning] 48-48: PROFILE_STEPS_ARG appears unused. Verify use (or export if used externally).
(SC2034)
🔇 Additional comments (22)
docs/installation.md (1)
155-179: Clear and comprehensive profiling documentation.The documentation effectively explains the new profiling configuration with good examples. The mutual exclusion note with benchmarking and the serving behavior differences are helpful for users.
scripts/worker_setup.py (2)
105-111: Clean transition to string-based profiler argument.The new
--profilerargument with explicit choices provides clear validation and better extensibility compared to the previous boolean flag. Default of"none"maintains backward compatibility.
178-218: Consistent propagation of profiler to all worker types.The
profilerargument is correctly passed tosetup_prefill_worker,setup_decode_worker, andsetup_aggregated_worker, ensuring uniform profiling behavior across all worker types.scripts/templates/job_script_template_agg.j2 (3)
193-195: Correct integration of profiler argument into worker commands.The
--profiler {{ profiler }}flag is properly added toWORKER_ARGS, ensuring all worker invocations receive the profiling configuration.
342-359: Appropriate wait behavior for profiling vs. non-profiling modes.The wait logic correctly uses
wait(all tasks) when profiling is active to ensure the profiling script completes, versuswait -n(first task) for normal operation.
323-340: Verifydecode_profile_envis populated by the backend.The profiling block correctly gates on
profiler != 'none'and conditionally includesSGLANG_TORCH_PROFILER_DIRonly for torch profiler. However,decode_profile_envmust be confirmed as populated in the backend template context with environment variables (likePROFILE_ISL,PROFILE_OSL) expected by the profiling script.scripts/profiling/profile.sh (1)
52-65: Good validation with helpful defaults.The required parameter validation for
PROFILE_ISLandPROFILE_OSLwith early exit, combined with sensible defaults for step ranges with warnings, provides a robust configuration approach.scripts/worker_setup/command.py (4)
13-21: Clean API update for string-based profiler.The function signature change from boolean to string-based profiler with clear type hints and default value provides better extensibility for future profiling methods.
63-69: Well-constructed nsys profiling integration.The nsys prefix includes appropriate options for CUDA profiling (
cuda,nvtx,--cuda-graph-trace=node,--capture-range-end stop) and uses a consistent output path pattern with torch profiler.
71-84: Correct flag handling with appropriate filtering.The iteration over config flags with type-aware handling (booleans, lists, scalars) and explicit exclusion of
disaggregation-modefor profiling mode is well implemented.
145-174: Consistent API update forget_gpu_command.The function correctly mirrors the profiler parameter addition and passes it through to
build_sglang_command_from_yaml.src/srtctl/backends/sglang.py (3)
104-112: LGTM! Profiler type handling looks correct.The conditional logic properly routes to the appropriate Python module and nsys prefix based on profiling type. The fallback to
dynamo.sglangfor non-profiling mode maintains backward compatibility.
139-147: LGTM! Correct handling of flag exclusion during profiling.The logic appropriately skips the
disaggregation-modeflag when profiling is active, assglang.launch_serverdoesn't accept this flag.
338-340: LGTM! Template variables properly integrated.The new profiling-related template variables are correctly added and will be available for the Jinja templates.
src/srtctl/core/schema.py (2)
165-189: LGTM! Well-structured profiling configuration schema.The new
ProfilingTypeenum,ProfilingPhaseConfig, andProfilingConfigclasses provide a clean, type-safe structure for profiling configuration. The enum values ("nsys", "torch", "none") align with usage throughout the codebase.
308-334: LGTM! Comprehensive validation logic.The mutual exclusivity check with non-MANUAL benchmarks and the single-worker constraint validations are well-implemented with clear error messages.
scripts/templates/job_script_template_disagg.j2 (3)
193-195: LGTM! Profiler argument correctly added to worker args.The profiler mode is properly injected into
WORKER_ARGSfor all worker types.
363-370: LGTM! Profiling environment correctly conditionally injected.The template properly conditionalizes
SGLANG_TORCH_PROFILER_DIRfor torch profiler only, while allowing the genericprefill_profile_env/decode_profile_envto be injected for all profiling modes.
374-390: LGTM! Correct wait behavior for profiling vs non-profiling modes.The differentiated wait logic ensures profiling scripts complete before job exit, while benchmark mode uses the original
wait -nbehavior.scripts/worker_setup/worker.py (3)
55-67: LGTM! Clean parameter refactor for setup_prefill_worker.The change from boolean
sglang_torch_profilerto stringprofilerwith default"none"aligns with the new multi-profiler support and is properly propagated toget_gpu_command.
120-131: LGTM! Consistent parameter refactor for setup_decode_worker.
162-174: LGTM! Consistent parameter refactor for setup_aggregated_worker.
| else | ||
| curl -X POST http://${head_node}:${head_port}/start_profile -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": [\"CUDA_PROFILER\"]}" | ||
| mkdir -p "/logs/profiles" 2>/dev/null || true | ||
| fi |
There was a problem hiding this comment.
Add error handling for the remote profiling start request.
If the curl request fails (network issue, server error), the script continues without profiling being enabled. Consider checking the response status.
else
- curl -X POST http://${head_node}:${head_port}/start_profile -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": [\"CUDA_PROFILER\"]}"
+ response=$(curl -s -w "\n%{http_code}" -X POST http://${head_node}:${head_port}/start_profile \
+ -H "Content-Type: application/json" \
+ -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": [\"CUDA_PROFILER\"]}")
+ http_code=$(echo "$response" | tail -n1)
+ if [[ "$http_code" != "200" ]]; then
+ echo "Warning: Failed to start remote profiling (HTTP $http_code)"
+ fi
mkdir -p "/logs/profiles" 2>/dev/null || true
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else | |
| curl -X POST http://${head_node}:${head_port}/start_profile -H "Content-Type: application/json" -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": [\"CUDA_PROFILER\"]}" | |
| mkdir -p "/logs/profiles" 2>/dev/null || true | |
| fi | |
| else | |
| response=$(curl -s -w "\n%{http_code}" -X POST http://${head_node}:${head_port}/start_profile \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"start_step\": \"$PROFILE_START_STEP\", \"num_steps\": $((PROFILE_STOP_STEP-PROFILE_START_STEP)), \"activities\": [\"CUDA_PROFILER\"]}") | |
| http_code=$(echo "$response" | tail -n1) | |
| if [[ "$http_code" != "200" ]]; then | |
| echo "Warning: Failed to start remote profiling (HTTP $http_code)" | |
| fi | |
| mkdir -p "/logs/profiles" 2>/dev/null || true | |
| fi |
| --model ${model_name} \ | ||
| --host ${head_node} --port ${head_port} \ | ||
| --dataset-name random \ | ||
| --max-concurrency $PROFILE_CONCURRENCY \ |
There was a problem hiding this comment.
Add validation or default for PROFILE_CONCURRENCY.
PROFILE_CONCURRENCY is used directly without validation, unlike PROFILE_ISL and PROFILE_OSL. If unset, this will produce an invalid command.
Consider adding validation similar to ISL/OSL:
if [[ -z "${PROFILE_CONCURRENCY}" ]]; then
echo "Warning: PROFILE_CONCURRENCY not set; defaulting to 1"
PROFILE_CONCURRENCY=1
fi🤖 Prompt for AI Agents
In scripts/profiling/profile.sh around line 93, PROFILE_CONCURRENCY is used
directly and can be empty, producing an invalid command; add a validation block
like the existing PROFILE_ISL/PROFILE_OSL checks that verifies
PROFILE_CONCURRENCY is set and a positive integer, echoing a warning and
defaulting it to 1 when unset or invalid (e.g., if empty or non-numeric), so the
subsequent --max-concurrency argument always receives a valid value.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
scripts/worker_setup/command.py (1)
18-18: Consider validating theprofilerparameter against allowed values.The
profilerparameter accepts any string but only "none", "torch", and "nsys" are valid. Invalid values would silently fall into the profiling code path (sinceprofiler != "none"would be True) but without proper setup.Add validation at the start of
build_sglang_command_from_yaml:VALID_PROFILERS = {"none", "torch", "nsys"} def build_sglang_command_from_yaml( # ... params ... ) -> str: if profiler not in VALID_PROFILERS: raise ValueError(f"Invalid profiler '{profiler}'. Must be one of: {VALID_PROFILERS}") # ... rest of functionAlso applies to: 36-36, 149-149, 161-161
scripts/worker_setup/worker.py (1)
92-100: Consider using context managers or explicit cleanup for file handles.The file handles
frontend_stdoutandfrontend_stderrare opened but never explicitly closed. While they remain open intentionally for the background process, ifrun_commandfails before returning a process, the handles would leak (though Python will close them on exit).This is a minor concern since the files are needed for the lifetime of the background process. If you want to be more explicit:
try: frontend_stdout = open("/logs/frontend.out", "w") frontend_stderr = open("/logs/frontend.err", "w") frontend_process = run_command(frontend_cmd, background=True, stdout=frontend_stdout, stderr=frontend_stderr) if not frontend_process: frontend_stdout.close() frontend_stderr.close() raise RuntimeError("Failed to start frontend") except Exception: frontend_stdout.close() frontend_stderr.close() raise
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/installation.md(1 hunks)scripts/worker_setup/command.py(6 hunks)scripts/worker_setup/worker.py(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/installation.md
🔇 Additional comments (8)
scripts/worker_setup/command.py (4)
62-62: LGTM - nsys profile command construction is appropriate.The nsys profile options are well-chosen for CUDA profiling with graph tracing. The
--force-overwrite trueprevents profile file conflicts.
64-92: Profiling mode command construction looks correct.The logic properly differentiates between nsys (wrapping with nsys prefix) and torch profiling (inline python invocation), and correctly builds all SGLang flags from the config.
93-103: LGTM - non-profiling mode correctly uses config-based invocation.
142-171: LGTM - clean parameter propagation to command builder.The function correctly passes the new
profilerparameter through tobuild_sglang_command_from_yaml.scripts/worker_setup/worker.py (4)
55-117: LGTM - prefill worker correctly updated to use string-based profiler.The parameter change from boolean to string is correctly applied and propagated to
get_gpu_command.
120-159: LGTM - decode worker correctly updated.Consistent parameter change and propagation.
162-224: LGTM - aggregated worker correctly updated.Consistent parameter change and propagation. Note the same file handle pattern exists here (lines 199-200) as discussed for the prefill worker.
103-104: Verify the GPU type check logic—the condition appears logically contradictory.The condition
gpu_type.startswith("gb200") and not gpu_type.startswith("gb300")appears redundant since a string cannot simultaneously start with both "gb200" and "gb300". However, this may be intentional if GPU naming includes variants like "gb200" and "gb2000". Clarify whether this should be a different exclusion (e.g., excluding a specific GB200 variant) or if the condition should be restructured.
1ca09e4 to
6917ba5
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
scripts/profiling/profile.sh (2)
84-84: Existing concern: curl error handling.This review comment echoes a previous reviewer's concern about the lack of error handling for the remote profiling start request.
91-91: Existing concern: validate PROFILE_CONCURRENCY.This review comment echoes a previous reviewer's concern about the lack of validation for
PROFILE_CONCURRENCYbefore use.
🧹 Nitpick comments (1)
scripts/profiling/profile.sh (1)
48-48: Remove unused variable.
PROFILE_STEPS_ARGis assigned but never used in the script.Apply this diff:
-PROFILE_STEPS_ARG="" CLI_ARGS=""
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
docs/installation.md(1 hunks)scripts/profiling/profile.sh(1 hunks)scripts/templates/job_script_template_agg.j2(7 hunks)scripts/templates/job_script_template_disagg.j2(8 hunks)scripts/worker_setup.py(4 hunks)scripts/worker_setup/command.py(6 hunks)scripts/worker_setup/worker.py(6 hunks)src/srtctl/backends/sglang.py(5 hunks)src/srtctl/core/schema.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- docs/installation.md
- scripts/worker_setup/worker.py
- scripts/worker_setup/command.py
🧰 Additional context used
🪛 Shellcheck (0.11.0)
scripts/profiling/profile.sh
[warning] 48-48: PROFILE_STEPS_ARG appears unused. Verify use (or export if used externally).
(SC2034)
🔇 Additional comments (16)
scripts/worker_setup.py (2)
106-111: LGTM: Clean profiler argument design.The string-based profiler choices provide clear options and the default of "none" is safe.
179-218: LGTM: Consistent profiler propagation.The profiler argument is correctly propagated to all worker setup functions (prefill, decode, aggregated).
scripts/profiling/profile.sh (3)
47-65: LGTM: Environment-driven profiling configuration.The shift from mode-based to environment-driven parameter selection is cleaner. The validation for ISL/OSL with error exit and defaults for start/stop steps with warnings are appropriate.
71-80: LGTM: Conditional profiling directory setup.The logic to conditionally create profiler directories and set activities based on whether
SGLANG_TORCH_PROFILER_DIRis provided is correct.
86-107: LGTM: Serving-based profiling and evaluation flow.The replacement of
bench_one_batch_serverwithbench_servingand the addition oflm-evalalign with the PR objectives to support remote profiling.src/srtctl/core/schema.py (2)
165-189: LGTM: Well-structured profiling configuration models.The new
ProfilingTypeenum,ProfilingPhaseConfig, andProfilingConfigclasses provide a clean, type-safe profiling configuration system with sensible defaults.
297-334: LGTM: Proper profiling validation constraints.The validation logic correctly enforces:
- Mutual exclusion between profiling and benchmarking
- Single-worker constraint when profiling is enabled (for both disaggregated and aggregated modes)
The short-circuit when profiling is "none" avoids unnecessary checks.
src/srtctl/backends/sglang.py (3)
104-127: LGTM: Profiler-based command rendering.The logic correctly selects:
- NSYS prefix +
sglang.launch_serverfor nsys profilingsglang.launch_serverfor torch profilingdynamo.sglangfor normal operationSafe dictionary access with
.get()and default "none" prevents errors.
139-147: LGTM: Skip incompatible flag when profiling.Correctly skips the
disaggregation-modeflag forsglang.launch_server, which doesn't support it.
284-306: LGTM: Clean profiling environment construction.The
build_env_strhelper cleanly converts profiling configuration into environment variable assignments for injection into templates.scripts/templates/job_script_template_agg.j2 (3)
193-194: LGTM: Unified profiler argument propagation.The profiler value from the config is correctly propagated to all worker setup commands through
WORKER_ARGS.
323-340: LGTM: Conditional profiling execution.The profiling block correctly:
- Gates on
profiler != 'none'- Conditionally sets
SGLANG_TORCH_PROFILER_DIRonly for torch profiling- Injects phase-specific environment variables via
decode_profile_env
342-359: LGTM: Profiler-aware wait logic.The wait logic correctly branches:
- When profiling is enabled: waits for profiling script completion and propagates its exit code
- When profiling is disabled: uses
wait -nto wait for first task (benchmark) completionscripts/templates/job_script_template_disagg.j2 (3)
193-194: LGTM: Unified profiler argument propagation.Consistent with the aggregated template, the profiler value is correctly propagated through
WORKER_ARGS.
350-372: LGTM: Disaggregated profiling execution.The template correctly:
- Launches separate profiling for prefill and decode workers
- Conditionally sets
SGLANG_TORCH_PROFILER_DIRonly for torch profiling- Injects phase-specific environment variables (
prefill_profile_env,decode_profile_env)
374-391: LGTM: Correct wait semantics for disaggregated profiling.The wait logic properly:
- Waits for all profiling scripts (both prefill and decode) when profiling is enabled
- Falls back to
wait -nfor benchmark completion when profiling is disabled
now profiling can set
Besides, I merged err and out file, because .err actually not really error for these framework. and separation loses the information about the time sequence.
Summary by CodeRabbit
New Features
Documentation
Validation
✏️ Tip: You can customize this high-level summary in your review settings.