Skip to content

sglang router profiling for PD disagg#60

Closed
wenscarl wants to merge 10 commits intoishandhanani:mainfrom
wenscarl:native_separate_pd
Closed

sglang router profiling for PD disagg#60
wenscarl wants to merge 10 commits intoishandhanani:mainfrom
wenscarl:native_separate_pd

Conversation

@wenscarl
Copy link
Copy Markdown

@wenscarl wenscarl commented Dec 17, 2025

Add the following:

  1. Separate start / stop iteration for P and D.
  2. Delay the sgl_router until prefill server is ready.
  3. Remove some frontend check.
  4. Support pre-install from external sglang source for debugging purpose.

Summary by CodeRabbit

  • New Features

    • Wait-for-prefill-server readiness before starting routers (health-check with timeout warning).
    • Optional mount/install of local SGLang source for live debugging on workers.
    • Example config for FP4 disaggregated NSYS profiling with staged prefill/decode profiling.
  • Bug Fixes

    • Per-worker profiling output paths to separate profiling artifacts.
  • Documentation

    • Added guide for debugging with local SGLang source and router usage.
    • Updated accuracy benchmark details and minor profiling docs tweak.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 17, 2025

📝 Walkthrough

Walkthrough

Adds a prefill-server readiness wait loop to the disaggregated SLURM job template, optionally mounts and installs local SGLang source into containers, threads a new worker_idx through GPU command construction and profiling paths, exposes sglang_src_dir to templates, and adds an NSYS profiling example.

Changes

Cohort / File(s) Summary
Job template & prefill readiness
scripts/templates/job_script_template_disagg.j2
Resolves PREFILL_LEADER_NODE/PREFILL_LEADER_IP and adds a health-check wait loop polling http://<leader_ip>:30000/health every 30s up to 600s; logs progress/warning on timeout and then proceeds to profiler/router launch.
SGLang source mount & docs
scripts/templates/job_script_template_disagg.j2, docs/sglang-router.md
Conditionally includes sglang_src_dir in container ENROOT mounts as /ext-sglang-src/ and documents mounting + pip install -e . --no-deps editable-install workflow for debugging with local SGLang source.
Worker command, install & profiling paths
scripts/worker_setup/command.py, scripts/worker_setup/worker.py
Threads new worker_idx: int into build_sglang_command_from_yaml / get_gpu_command, includes worker_idx in NSYS profile path, and adds install_sglang_from_source (default /ext-sglang-src) used in sglang-router flow.
Backend/template & schema
src/srtctl/core/backend.py, src/srtctl/core/schema.py
Exposes sglang_src_dir to template vars used for rendering job scripts; schema.py gained an additional (duplicate) logging import.
Example config
examples/fp4-disagg-nsys-profiling.yaml
New example configuration for disaggregated NSYS profiling (FP4) including resources, backend/env, detailed sglang_config for prefill/decode, and profiling step ranges.
Docs / minor updates
docs/accuracy.md, docs/profiling.md
Updated MMLU max_tokens/results and adjusted profiling YAML comment indentation respectively.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Job as Slurm Job Script (disagg)
  participant Prefill as Prefill Leader Service
  participant Prof as Profiler (nsys)
  participant Router as SGLang Router

  Job->>Prefill: Resolve PREFILL_LEADER_NODE/IP
  loop Poll /health every 30s (max 600s)
    Job->>Prefill: GET http://<leader_ip>:30000/health
    Prefill-->>Job: 200 OK / not-ready
  end
  alt health OK
    Job->>Prof: start profiler (background) for leader
    Job->>Router: launch router pointing at leader IP
    Router-->>Job: router ready
  else timeout
    Job-->>Job: log warning and proceed
    Job->>Prof: optionally start profiler (best-effort)
    Job->>Router: launch router anyway
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I hopped to check the leader's door,
I polled its health and counted more,
Routers wake and profilers hum,
Worker indices join the run,
Carrots, code, and one small log—encore.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'sglang router profiling for PD disagg' accurately reflects the main changes: adding sglang router profiling capabilities with separate prefill/decode (PD) handling and disaggregated setup support.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
scripts/worker_setup/worker.py (3)

114-122: File handles are not closed on error paths.

The file handles for frontend_stdout and frontend_stderr are opened but never closed. If run_command fails or an exception occurs later, these handles will leak.

Consider using a context manager or storing the handles for later cleanup:

-        frontend_stdout = open("/logs/frontend.out", "w")
-        frontend_stderr = open("/logs/frontend.err", "w")
-
-        frontend_cmd = "python3 -m dynamo.frontend --http-port=8000"
-        frontend_process = run_command(frontend_cmd, background=True, stdout=frontend_stdout, stderr=frontend_stderr)
+        with open("/logs/frontend.out", "w") as frontend_stdout, \
+             open("/logs/frontend.err", "w") as frontend_stderr:
+            frontend_cmd = "python3 -m dynamo.frontend --http-port=8000"
+            frontend_process = run_command(frontend_cmd, background=True, stdout=frontend_stdout, stderr=frontend_stderr)

Note: If the background process requires the file handles to remain open, store them and close them during cleanup/shutdown instead.


241-251: Missing worker_idx argument will cause runtime error.

The get_gpu_command function now requires worker_idx as a positional argument (see command.py line 146), but setup_aggregated_worker does not pass it. This will raise a TypeError at runtime.

     cmd_to_run = get_gpu_command(
         worker_type="aggregated",
+        worker_idx=worker_idx,
         sglang_config_path=sglang_config_path,
         host_ip=leader_ip,
         port=DIST_INIT_PORT,
         total_nodes=nodes_per_worker,
         rank=local_rank,
         profiler=profiler,
         dump_config_path=dump_config_path,
     )

222-235: Same file handle leak issue in aggregated worker.

Similar to setup_prefill_worker, the frontend file handles are opened but never properly closed.

Additionally, unlike setup_prefill_worker (line 110), this block is missing the and not use_sglang_router condition:

-    if need_frontend:
+    if need_frontend and not use_sglang_router:

This inconsistency means the frontend would still be started in router mode for aggregated workers.

🧹 Nitpick comments (6)
srtslurm.yaml.example (1)

6-7: Consider using placeholder values in the example config.

The example file now contains what appear to be real cluster-specific paths and credentials (e.g., /mnt/lustre01/users/slurm-shared/shuw/...). For an example configuration file that serves as documentation, consider using obvious placeholders like /path/to/your/srt-slurm to avoid:

  1. Leaking internal infrastructure details
  2. Confusion when users copy this file directly

Also applies to: 18-18, 21-21, 25-26

recipies/gb200-fp4/1k1k/low-latency.yaml (1)

122-127: Fix indentation of commented-out benchmark block.

The commented-out benchmark block has excessive indentation (8 spaces). If preserved for reference, it should align with other top-level blocks.

-        #benchmark:
-        #  type: "sa-bench"
-        #  isl: 1024
-        #  osl: 1024
-        #  concurrencies: "256"
-        #  req_rate: "inf"
+#benchmark:
+#  type: "sa-bench"
+#  isl: 1024
+#  osl: 1024
+#  concurrencies: "256"
+#  req_rate: "inf"
src/srtctl/backends/sglang.py (1)

145-146: Remove or complete the dangling comment.

The comment "Always pass disaggregation-mode so profiling runs in PD mode" appears to reference code that was removed or moved. Either remove this orphaned comment or add the corresponding implementation.

-            # Always pass disaggregation-mode so profiling runs in PD mode
-
             if isinstance(value, bool):
scripts/profiling/profile.sh (1)

112-112: Move pip install to container setup or job initialization.

Installing packages at profiling time introduces latency and potential failure points (network issues, pip conflicts). Consider pre-installing lm-eval and tenacity in the container image or during job setup phase.

scripts/worker_setup/command.py (1)

11-21: Docstring missing worker_idx parameter description.

The worker_idx parameter was added but not documented in the docstring's Args section.

     Args:
         worker_type: "prefill", "decode", or "aggregated"
+        worker_idx: Worker index (0-indexed) for this worker type
         sglang_config_path: Path to generated sglang_config.yaml
src/srtctl/core/schema.py (1)

327-338: Remove or restore the commented-out validation code.

The validation that prevented profiling with multiple prefill/decode workers has been disabled via pass statements with commented-out raise blocks. This leaves dead code that:

  1. Obscures the actual behavior
  2. May confuse future maintainers

Either remove the commented code entirely if multi-worker profiling is now intentionally supported:

         if is_disaggregated:
-            if self.resources.prefill_workers and self.resources.prefill_workers > 1:
-                pass
-#                raise ValueError(
-#                    f"Profiling mode requires single worker only. "
-#                    f"Got prefill_workers={self.resources.prefill_workers}"
-#                )
-            if self.resources.decode_workers and self.resources.decode_workers > 1:
-                pass
-#                raise ValueError(
-#                    f"Profiling mode requires single worker only. "
-#                    f"Got decode_workers={self.resources.decode_workers}"
-#                )
+            # Multi-worker profiling is supported in disaggregated mode
+            pass

Or restore the validation if this was unintentional.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22704aa and bd9f0e8.

📒 Files selected for processing (9)
  • recipies/gb200-fp4/1k1k/low-latency.yaml (3 hunks)
  • scripts/profiling/profile.sh (3 hunks)
  • scripts/templates/job_script_template_disagg.j2 (8 hunks)
  • scripts/worker_setup.py (4 hunks)
  • scripts/worker_setup/command.py (4 hunks)
  • scripts/worker_setup/worker.py (6 hunks)
  • src/srtctl/backends/sglang.py (3 hunks)
  • src/srtctl/core/schema.py (3 hunks)
  • srtslurm.yaml.example (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
recipies/gb200-fp4/1k1k/low-latency.yaml (1)
tests/test_command_generation.py (1)
  • test_profiling_mode (232-268)
scripts/worker_setup/worker.py (2)
scripts/worker_setup/infrastructure.py (1)
  • setup_head_prefill_node (14-32)
scripts/worker_setup/utils.py (1)
  • wait_for_etcd (53-70)
🪛 Ruff (0.14.8)
scripts/worker_setup/worker.py

97-97: Avoid specifying long messages outside the exception class

(TRY003)


101-101: Avoid specifying long messages outside the exception class

(TRY003)


161-161: Avoid specifying long messages outside the exception class

(TRY003)


210-210: Avoid specifying long messages outside the exception class

(TRY003)


214-214: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (14)
recipies/gb200-fp4/1k1k/low-latency.yaml (1)

111-121: LGTM! Separate profiling steps for prefill and decode.

The new profiling structure with phase-specific start_step/stop_step values enables fine-grained control over when profiling captures each phase. The decode phase having a later step range (700-730 vs 60-70) appropriately accounts for the different timing characteristics of each phase.

scripts/worker_setup.py (2)

98-102: LGTM! Clean flag addition for router mode.

The new --use-sglang-router flag is well-documented and follows the existing argument patterns in the file.


184-227: LGTM! Flag correctly propagated to all worker types.

The use_sglang_router parameter is consistently passed to setup_prefill_worker, setup_decode_worker, and setup_aggregated_worker, ensuring uniform behavior across worker types.

src/srtctl/backends/sglang.py (3)

285-305: LGTM! Well-structured helper for environment string construction.

The build_env_str function cleanly separates shared parameters (isl/osl/concurrency) from phase-specific parameters (start_step/stop_step), with proper None checks for defensive coding.


307-324: LGTM! Backward-compatible phase config handling.

The fallback logic correctly supports both the new structure (separate prefill/decode blocks) and legacy configs with top-level start_step/stop_step. This ensures existing configurations continue to work.


349-349: LGTM! Router flag added to template variables.

The use_sglang_router flag is correctly sourced from backend_config with a sensible default of False.

scripts/profiling/profile.sh (2)

45-56: LGTM! Correct conditional wait logic for PD disaggregation.

The logic correctly distinguishes between prefill (needs router + local server) and decode (only needs local server). This ensures the decode profiler doesn't block waiting for a router it doesn't need.


95-96: Verify: start_profile targets local server, not router.

The profiling trigger is now sent to the local server (127.0.0.1:30000) rather than the router. This is correct for per-worker profiling, but ensure the local server supports the /start_profile endpoint.

scripts/worker_setup/worker.py (1)

87-102: Router-based flow gating looks correct.

The conditional logic to skip setup_head_prefill_node and wait_for_etcd when use_sglang_router=True is well-structured. The router mode bypasses traditional infrastructure setup since the sglang router handles routing directly.

scripts/templates/job_script_template_disagg.j2 (1)

338-388: Router launch logic looks well-structured.

The implementation correctly:

  • Waits for prefill server health with a timeout
  • Collects leader IPs for both prefill and decode workers
  • Assembles ROUTER_ARGS with --pd-disaggregation and endpoint URLs
  • Launches the router on node 0

One minor observation: the wait loop increments by 10 but sleeps for 10 seconds, so WAIT_COUNT represents elapsed seconds. The warning at line 358 could be clearer about the timeout.

scripts/worker_setup/command.py (2)

63-63: Profile output path now includes worker_idx — good for multi-worker profiling.

This change enables per-worker profiling outputs when running multiple workers, preventing file collisions.


144-174: Function signature and delegation updated correctly.

The worker_idx parameter is properly threaded through to build_sglang_command_from_yaml. The docstring here is also missing the worker_idx description (same as above).

src/srtctl/core/schema.py (2)

184-193: Schema restructuring for per-phase profiling looks appropriate.

The changes properly add:

  • Shared parameters (isl, osl, concurrency) at the top level
  • Per-phase configs (prefill, decode) for phase-specific profiling settings
  • Legacy fields for backward compatibility with deprecation notes

247-252: New use_sglang_router field is well-documented.

The field has a clear description and appropriate default (False). This integrates cleanly with the rest of the router-enabled workflow.

Comment on lines 120 to 121
exit_code=$?
set +x
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Exit code only captures lm_eval result, not bench_serving.

When PROFILING_MODE == "prefill", exit_code=$? captures only the exit status of lm_eval, not bench_serving. If bench_serving fails but lm_eval succeeds, the script exits successfully.

Consider capturing and checking both exit codes:

+    python3 -m sglang.bench_serving \
     ...
-    --warmup-request 5
+    --warmup-request 5 || exit $?

     pip install lm-eval tenacity > /dev/null
     python -m lm_eval \
     ...
     --limit 10
 fi

 exit_code=$?

Or use set -e at the script start if you want to fail on any error.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In scripts/profiling/profile.sh around lines 120-121, the script only records
exit_code=$? which captures lm_eval's exit status and misses bench_serving when
PROFILING_MODE=="prefill"; update the script to capture and check both commands'
exit codes (e.g., store the exit status of bench_serving into a variable right
after it runs, then after lm_eval set exit_code to a non-zero value if either
command failed) or alternatively enable fail-fast by adding set -e at the top of
the script; ensure the final exit_code reflects any failure from bench_serving
as well as lm_eval before exiting.

@wenscarl wenscarl changed the title [2/N] sglang router profiling for PD disagg [draft][2/N] sglang router profiling for PD disagg Dec 18, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
scripts/templates/job_script_template_disagg.j2 (1)

445-464: Duplicate profiling block runs unconditionally — missing conditional guard.

This block (lines 445-464) duplicates the profiling logic from lines 411-443 but lacks the {% if profiler != 'none' %} guard. It will execute for every job, including those without profiling enabled, causing:

  • Spurious "Starting profiler..." messages
  • Unnecessary profiling srun commands that will likely fail or produce empty results

This was flagged in a previous review and marked as addressed, but the issue persists in the current code.

🔎 Wrap the block in the conditional guard or remove if it's a duplicate:
+{% if profiler != 'none' %}
 {% raw %}
 echo "Starting profiler..."

 # Get leader nodes for first prefill and decode workers
 PREFILL_LEADER_NODE=${nodes[${prefill_leaders[0]}]}
 DECODE_LEADER_NODE=${nodes[${decode_leaders[0]}]}

 echo "Prefill profiling will run on: $PREFILL_LEADER_NODE"
 echo "Decode profiling will run on: $DECODE_LEADER_NODE"

 # Run profiling on prefill leader (drives traffic via router)
 srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PREFILL_LEADER_NODE \
     --output=${LOG_DIR}/profile_prefill.out --overlap \
     bash -c "PROFILING_MODE=prefill {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles/prefill {% endif %}{{ prefill_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &

 # Run profiling on decode leader (only triggers /start_profile, no extra traffic)
 srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $DECODE_LEADER_NODE \
     --output=${LOG_DIR}/profile_decode.out  --overlap \
     bash -c "PROFILING_MODE=decode {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles/decode {% endif %}{{ decode_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &
 {% endraw %}
+{% endif %}

Alternatively, if the unified profiling block (lines 411-443) already handles all cases, this duplicate block should be removed entirely.

🧹 Nitpick comments (1)
src/srtctl/core/schema.py (1)

329-341: Remove or restore the commented-out validation code.

The commented-out validation (lines 332-335 and 338-341) is dead code. If multi-worker profiling is now intentionally supported in disaggregated mode, remove the commented code entirely. If this is temporary for debugging, add a TODO comment explaining when it should be restored.

Leaving pass statements with commented-out code creates maintenance burden and ambiguity about the intended behavior.

🔎 If multi-worker profiling is intentionally supported, apply this diff:
         if is_disaggregated:
             if self.resources.prefill_workers and self.resources.prefill_workers > 1:
-                pass
-#                raise ValueError(
-#                    f"Profiling mode requires single worker only. "
-#                    f"Got prefill_workers={self.resources.prefill_workers}"
-#                )
+                pass  # Multi-worker profiling supported in disaggregated mode
             if self.resources.decode_workers and self.resources.decode_workers > 1:
-                pass
-#                raise ValueError(
-#                    f"Profiling mode requires single worker only. "
-#                    f"Got decode_workers={self.resources.decode_workers}"
-#                )
+                pass  # Multi-worker profiling supported in disaggregated mode
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd9f0e8 and 19f5dcc.

📒 Files selected for processing (7)
  • recipies/gb200-fp4/1k1k/low-latency.yaml (3 hunks)
  • scripts/templates/job_script_template_disagg.j2 (3 hunks)
  • scripts/worker_setup/command.py (4 hunks)
  • scripts/worker_setup/worker.py (3 hunks)
  • src/srtctl/backends/sglang.py (1 hunks)
  • src/srtctl/core/schema.py (2 hunks)
  • srtslurm.yaml.example (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • scripts/worker_setup/worker.py
  • srtslurm.yaml.example
  • recipies/gb200-fp4/1k1k/low-latency.yaml
🔇 Additional comments (5)
scripts/worker_setup/command.py (2)

11-21: LGTM! Per-worker profiling context is properly threaded.

The worker_idx parameter is correctly added to enable per-worker profiling output differentiation. The nsys output path at line 63 now includes {config_key}_{worker_idx}_{rank} which prevents file collisions when running multiple workers.

Minor: The docstring (lines 30-40) should document the new worker_idx parameter for completeness.


144-174: LGTM! Function signature and call site updated consistently.

The worker_idx parameter is correctly propagated from get_gpu_command to build_sglang_command_from_yaml. The docstring (lines 157-167) should also document worker_idx for completeness.

src/srtctl/core/schema.py (1)

176-196: LGTM! Clean per-phase profiling configuration.

The new ProfilingPhaseConfig model and the updated ProfilingConfig with separate prefill/decode fields provides a clean structure for phase-specific profiling. The legacy start_step/stop_step fields maintain backward compatibility.

src/srtctl/backends/sglang.py (1)

285-324: LGTM! Clean separation of phase-specific and shared profiling config.

The refactored build_env_str function correctly:

  1. Separates phase-specific parameters (start_step/stop_step) from shared parameters (isl/osl/concurrency)
  2. Maintains backward compatibility by deriving prefill_phase_cfg and decode_phase_cfg from legacy fields when explicit phase sections are absent
  3. Generates distinct environment strings for prefill and decode phases
scripts/templates/job_script_template_disagg.j2 (1)

343-362: LGTM! Proper readiness check before router launch.

The prefill server health check with timeout and fallback warning is a good safeguard. The 300-second timeout with 10-second polling intervals provides reasonable wait time while the warning at line 361 ensures the job proceeds with a clear warning if the server doesn't become ready.

@wenscarl wenscarl marked this pull request as draft December 18, 2025 14:31
@wenscarl wenscarl force-pushed the native_separate_pd branch from 901d60a to 362c476 Compare January 6, 2026 17:20
@wenscarl wenscarl marked this pull request as ready for review January 6, 2026 17:20
@wenscarl wenscarl changed the title [draft][2/N] sglang router profiling for PD disagg sglang router profiling for PD disagg Jan 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/worker_setup/worker.py (1)

290-301: Missing worker_idx argument in setup_aggregated_worker will cause runtime failure.

The get_gpu_command function signature now requires worker_idx as the second positional parameter (after worker_type), but setup_aggregated_worker does not pass it. This will raise a TypeError when an aggregated worker is started.

🔎 Proposed fix
     # Build and execute SGLang command from YAML config
     cmd_to_run = get_gpu_command(
         worker_type="aggregated",
+        worker_idx=worker_idx,
         sglang_config_path=sglang_config_path,
         host_ip=leader_ip,
         port=DIST_INIT_PORT,
         total_nodes=nodes_per_worker,
         rank=local_rank,
         profiler=profiler,
         dump_config_path=dump_config_path,
         use_sglang_router=use_sglang_router,
     )
🤖 Fix all issues with AI Agents
In @scripts/templates/job_script_template_disagg.j2:
- Around line 547-566: The duplicate profiling srun block that prints "Starting
profiler..." and launches /scripts/profiling/profile.sh using
PREFILL_LEADER_NODE / DECODE_LEADER_NODE (setting PROFILING_MODE=prefill/decode
and using prefill_profile_env / decode_profile_env) must be removed or wrapped
in the same guard as the earlier block; specifically ensure this block only runs
when profiler != 'none' (i.e., wrap it with {% if profiler != 'none' %}...{%
endif %}) or delete the duplicate so profiling is launched only once and the
unnecessary srun commands and messages do not run when profiler is 'none'.
🧹 Nitpick comments (2)
examples/fp4-disagg-nsys-profiling.yaml (1)

16-17: Consider using boolean instead of string for use_sglang_router.

The value "true" is a string, but boolean flags are typically represented as true (without quotes) in YAML. Verify that the schema parser handles this correctly, or change to a proper boolean.

🔎 Suggested fix
 backend:
-  use_sglang_router: "true"
+  use_sglang_router: true
scripts/worker_setup/command.py (1)

11-22: LGTM - worker_idx parameter correctly added to function signature.

The parameter is properly positioned and typed.

Consider updating the docstring to document the new worker_idx parameter:

     Args:
         worker_type: "prefill", "decode", or "aggregated"
+        worker_idx: Index of the worker group (0-based)
         sglang_config_path: Path to generated sglang_config.yaml
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19f5dcc and 362c476.

📒 Files selected for processing (5)
  • examples/fp4-disagg-nsys-profiling.yaml
  • scripts/templates/job_script_template_disagg.j2
  • scripts/worker_setup/command.py
  • scripts/worker_setup/worker.py
  • src/srtctl/core/schema.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/srtctl/core/schema.py
🧰 Additional context used
🧬 Code graph analysis (3)
examples/fp4-disagg-nsys-profiling.yaml (1)
tests/test_command_generation.py (1)
  • test_profiling_mode (232-268)
scripts/worker_setup/command.py (2)
scripts/worker_setup.py (2)
  • _validate_args (167-197)
  • _parse_command_line_args (33-164)
scripts/worker_setup/infrastructure.py (1)
  • setup_frontend_worker (46-65)
scripts/worker_setup/worker.py (1)
scripts/worker_setup/infrastructure.py (1)
  • setup_frontend_worker (46-65)
🔇 Additional comments (7)
scripts/worker_setup/worker.py (2)

129-141: LGTM - worker_idx correctly passed for prefill workers.

The worker_idx parameter is now properly forwarded to get_gpu_command, enabling per-worker profiling output paths.


175-187: LGTM - worker_idx correctly passed for decode workers.

Consistent with the prefill worker change.

examples/fp4-disagg-nsys-profiling.yaml (2)

1-112: Configuration structure looks good for disaggregated profiling example.

The configuration properly defines:

  • Separate prefill/decode environments with appropriate timeout and disaggregation settings
  • Distinct sglang_config sections for prefill and decode modes
  • Reasonable profiling step ranges for each phase

113-123: Profiling configuration is well-structured.

The separate start_step/stop_step for prefill and decode enables targeted profiling of each phase independently, which aligns with the disaggregated architecture.

scripts/worker_setup/command.py (2)

67-67: LGTM - Per-worker profiling output path correctly constructed.

The nsys output path now includes worker_idx to differentiate profiles across workers: /logs/profiles/{config_key}_{worker_idx}_{rank}.


148-181: LGTM - get_gpu_command correctly propagates worker_idx.

The parameter is added to the signature and forwarded to build_sglang_command_from_yaml. Note: The docstring should also be updated to document worker_idx.

scripts/templates/job_script_template_disagg.j2 (1)

343-362: Good addition of prefill server readiness check.

This ensures the router doesn't start before the prefill server is ready to accept connections. The implementation correctly:

  • Polls the /health endpoint with curl
  • Has a reasonable 600-second timeout with 30-second intervals
  • Proceeds with a warning rather than failing hard on timeout

Comment thread scripts/templates/job_script_template_disagg.j2 Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @scripts/templates/job_script_template_disagg.j2:
- Around line 360-362: The current health-check timeout block logs a warning and
continues, which can cause router failures; update the block that checks
WAIT_COUNT and MAX_WAIT so it fails fast: replace the echo-only path with a
non-zero exit (e.g., echo an error and exit 1) or conditionally exit based on a
new env flag (e.g., PREFILL_FAIL_ON_TIMEOUT) so the script stops when the
prefill server never becomes ready; adjust the message to include WAIT_COUNT and
MAX_WAIT for clarity and ensure the router is not launched after this failure.
- Around line 345-346: PREFILL_LEADER_IP may be empty if get_node_ip returns an
empty string; before running the health-check curl (the URL using
http://$PREFILL_LEADER_IP:30000/health), validate PREFILL_LEADER_IP is non-empty
and a valid IP (or hostname) and bail or retry with a clear error if it is
empty; update the section that sets PREFILL_LEADER_IP (referring to get_node_ip
and the health-check curl) to check for emptiness, log an error mentioning
PREFILL_LEADER_NODE and SLURM_JOB_ID, and exit non-zero (or implement a retry
loop) so the malformed URL is never used.
🧹 Nitpick comments (1)
scripts/templates/job_script_template_disagg.j2 (1)

513-556: Consider consolidating duplicate profiler conditional checks.

The profiler logic is split across two separate {% if profiler != 'none' %} blocks (lines 513-545 and 547-556). While functionally correct, consolidating these into a single conditional block would improve readability and reduce redundancy.

🔎 Proposed refactor to consolidate conditionals
 {% if profiler != 'none' %}
 {% raw %}
 # Torch/NSYS profiling mode: run a single orchestrator that profiles all prefill and decode workers.
 echo "Starting unified profiler..."
 
 # Collect leader IPs for prefill and decode workers
 PREFILL_LEADER_IPS=()
 for idx in "${prefill_leaders[@]}"; do
     node_name=${nodes[$idx]}
     ip=$(get_node_ip "$node_name" "$SLURM_JOB_ID" "$NETWORK_INTERFACE")
     PREFILL_LEADER_IPS+=("$ip")
 done
 
 DECODE_LEADER_IPS=()
 for idx in "${decode_leaders[@]}"; do
     node_name=${nodes[$idx]}
     ip=$(get_node_ip "$node_name" "$SLURM_JOB_ID" "$NETWORK_INTERFACE")
     DECODE_LEADER_IPS+=("$ip")
 done
 
 PREFILL_LEADER_IPS_STR=$(IFS=,; echo "${PREFILL_LEADER_IPS[*]}")
 DECODE_LEADER_IPS_STR=$(IFS=,; echo "${DECODE_LEADER_IPS[*]}")
 
 # Use the first prefill leader as the orchestrator node
 PROFILE_ORCHESTRATOR_NODE=${nodes[${prefill_leaders[0]}]}
 echo "Unified profiling will run on orchestrator node: $PROFILE_ORCHESTRATOR_NODE"
 
 # Run a single profiling orchestrator that coordinates profiling across all leaders
 srun --nodes=1 --ntasks=1 $ENROOT_ARGS --jobid $SLURM_JOB_ID -w $PROFILE_ORCHESTRATOR_NODE \
     --output=${LOG_DIR}/profile_all.out --overlap \
     bash -c "PROFILE_PREFILL_IPS=${PREFILL_LEADER_IPS_STR} PROFILE_DECODE_IPS=${DECODE_LEADER_IPS_STR} {% endraw %}{% if profiler == 'torch' %}SGLANG_TORCH_PROFILER_DIR=/logs/profiles {% endif %}{{ profiling_driver_env }} {{ prefill_profile_env }} {{ decode_profile_env }}{% raw %} /scripts/profiling/profile.sh $PREFILL_WORKERS $DECODE_WORKERS $PREFILL_GPUS $DECODE_GPUS $TOTAL_GPUS" &
-{% endraw %}
-{% endif %}
-
-{% if profiler != 'none' %}
-{% raw %}
+
 # Wait for all profiling scripts to complete (both prefill and decode)
 echo "Waiting for all profiling scripts to complete..."
 wait
 exit_code=$?
 echo "All profiling scripts finished at $(date) with exit code ${exit_code}"
 exit $exit_code
 {% endraw %}
 {% else %}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 362c476 and 6b8c0f4.

📒 Files selected for processing (1)
  • scripts/templates/job_script_template_disagg.j2

Comment on lines +345 to +346
PREFILL_LEADER_NODE=${nodes[${prefill_leaders[0]}]}
PREFILL_LEADER_IP=$(get_node_ip "$PREFILL_LEADER_NODE" "$SLURM_JOB_ID" "$NETWORK_INTERFACE")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validate PREFILL_LEADER_IP before use in health check.

If get_node_ip returns an empty string, the subsequent curl command will use a malformed URL (http://:30000/health), causing the health check to fail silently or produce misleading results.

🔎 Proposed fix to add IP validation
 PREFILL_LEADER_NODE=${nodes[${prefill_leaders[0]}]}
 PREFILL_LEADER_IP=$(get_node_ip "$PREFILL_LEADER_NODE" "$SLURM_JOB_ID" "$NETWORK_INTERFACE")
+if [ -z "$PREFILL_LEADER_IP" ]; then
+    echo "Error: Could not retrieve IP address for prefill leader node $PREFILL_LEADER_NODE"
+    exit 1
+fi
 MAX_WAIT=600
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PREFILL_LEADER_NODE=${nodes[${prefill_leaders[0]}]}
PREFILL_LEADER_IP=$(get_node_ip "$PREFILL_LEADER_NODE" "$SLURM_JOB_ID" "$NETWORK_INTERFACE")
PREFILL_LEADER_NODE=${nodes[${prefill_leaders[0]}]}
PREFILL_LEADER_IP=$(get_node_ip "$PREFILL_LEADER_NODE" "$SLURM_JOB_ID" "$NETWORK_INTERFACE")
if [ -z "$PREFILL_LEADER_IP" ]; then
echo "Error: Could not retrieve IP address for prefill leader node $PREFILL_LEADER_NODE"
exit 1
fi
🤖 Prompt for AI Agents
In @scripts/templates/job_script_template_disagg.j2 around lines 345 - 346,
PREFILL_LEADER_IP may be empty if get_node_ip returns an empty string; before
running the health-check curl (the URL using
http://$PREFILL_LEADER_IP:30000/health), validate PREFILL_LEADER_IP is non-empty
and a valid IP (or hostname) and bail or retry with a clear error if it is
empty; update the section that sets PREFILL_LEADER_IP (referring to get_node_ip
and the health-check curl) to check for emptiness, log an error mentioning
PREFILL_LEADER_NODE and SLURM_JOB_ID, and exit non-zero (or implement a retry
loop) so the malformed URL is never used.

Comment on lines +360 to +362
if [ $WAIT_COUNT -ge $MAX_WAIT ]; then
echo "Warning: Prefill server did not become ready within ${MAX_WAIT}s, proceeding anyway..."
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Proceeding after health check timeout may cause router failures.

If the prefill server fails to become ready within 600 seconds, the script logs a warning but continues to launch the router. This could lead to router initialization failures or runtime errors when the router attempts to connect to an unavailable prefill server.

Consider exiting with an error instead of proceeding, or implement a more robust retry mechanism in the router itself.

🤖 Prompt for AI Agents
In @scripts/templates/job_script_template_disagg.j2 around lines 360 - 362, The
current health-check timeout block logs a warning and continues, which can cause
router failures; update the block that checks WAIT_COUNT and MAX_WAIT so it
fails fast: replace the echo-only path with a non-zero exit (e.g., echo an error
and exit 1) or conditionally exit based on a new env flag (e.g.,
PREFILL_FAIL_ON_TIMEOUT) so the script stops when the prefill server never
becomes ready; adjust the message to include WAIT_COUNT and MAX_WAIT for clarity
and ensure the router is not launched after this failure.

@wenscarl wenscarl force-pushed the native_separate_pd branch from 6b8c0f4 to e5ee24d Compare January 6, 2026 20:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/worker_setup/worker.py (1)

305-316: Missing required worker_idx parameter will cause runtime error.

The get_gpu_command function signature was updated to require worker_idx as the second positional parameter (see scripts/worker_setup/command.py line 208), but the call in setup_aggregated_worker does not pass this parameter. This will cause a TypeError at runtime.

🐛 Proposed fix
 cmd_to_run = get_gpu_command(
     worker_type="aggregated",
+    worker_idx=worker_idx,
     sglang_config_path=sglang_config_path,
     host_ip=leader_ip,
     port=DIST_INIT_PORT,
     total_nodes=nodes_per_worker,
     rank=local_rank,
     profiler=profiler,
     dump_config_path=dump_config_path,
     use_sglang_router=use_sglang_router,
 )
🤖 Fix all issues with AI agents
In @docs/sglang-router.md:
- Around line 140-145: The docs currently say your local sglang dir is mounted
to /ext-sglang-src/, but the worker setup actually takes the provided
sglang_src_dir and automatically appends "/python" before running pip install
-e; update the "How It Works" section to state that sglang_src_dir must point to
the SGLang repository root (not the python subdirectory) because the setup code
will resolve to the python/ subdirectory automatically prior to installation.
🧹 Nitpick comments (1)
scripts/worker_setup/command.py (1)

148-203: Improve exception handling in install_sglang_from_source.

The function correctly implements the source installation workflow, but the exception handling can be improved:

  1. Line 171: Catching bare Exception is too broad; consider catching specific exceptions like OSError or PermissionError
  2. Line 172: Use logging.exception() instead of logging.error() to include the stack trace
  3. Line 173: Chain the exception with raise ... from e to preserve the exception context

Additionally, line 163 hardcodes the 'python' subdirectory assumption. This should be documented in user-facing docs (already flagged in docs/sglang-router.md review).

♻️ Proposed exception handling improvements
     # List directory contents for debugging
     try:
         contents = os.listdir(abs_path)
         logging.info(f"Directory contains {len(contents)} items: {contents[:20]}")
-    except Exception as e:
-        logging.error(f"Cannot list directory {abs_path}: {e}")
-        raise RuntimeError(f"Cannot access sglang source directory: {abs_path}")
+    except (OSError, PermissionError) as e:
+        logging.exception(f"Cannot list directory {abs_path}")
+        raise RuntimeError(f"Cannot access sglang source directory: {abs_path}") from e
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5ee24d and 27fcefc.

📒 Files selected for processing (6)
  • docs/sglang-router.md
  • scripts/templates/job_script_template_disagg.j2
  • scripts/worker_setup/command.py
  • scripts/worker_setup/worker.py
  • src/srtctl/core/backend.py
  • src/srtctl/core/schema.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/srtctl/core/schema.py
🧰 Additional context used
🧬 Code graph analysis (4)
src/srtctl/core/backend.py (1)
tests/test_command_generation.py (1)
  • test_profiling_mode (232-268)
scripts/worker_setup/command.py (2)
scripts/worker_setup.py (2)
  • _validate_args (167-197)
  • _parse_command_line_args (33-164)
tests/test_command_generation.py (1)
  • test_profiling_mode (232-268)
scripts/templates/job_script_template_disagg.j2 (1)
scripts/utils/check_server_health.py (1)
  • check_sglang_router_health (30-47)
scripts/worker_setup/worker.py (1)
scripts/worker_setup/command.py (3)
  • get_gpu_command (206-239)
  • install_dynamo_wheels (121-145)
  • install_sglang_from_source (148-203)
🪛 Ruff (0.14.10)
scripts/worker_setup/command.py

171-171: Do not catch blind exception: Exception

(BLE001)


172-172: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


173-173: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


173-173: Avoid specifying long messages outside the exception class

(TRY003)


181-181: f-string without any placeholders

Remove extraneous f prefix

(F541)


183-188: Avoid specifying long messages outside the exception class

(TRY003)


194-194: Starting a process with a partial executable path

(S607)


201-201: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (12)
src/srtctl/core/backend.py (1)

234-234: LGTM!

The addition of sglang_src_dir to template variables is correct and consistent with the existing pattern. The template will handle None values gracefully when the configuration is not provided.

scripts/worker_setup/worker.py (6)

10-14: LGTM!

The import of install_sglang_from_source is correctly added alongside the existing imports from the command module.


109-112: LGTM!

The conditional installation of SGLang from source when using sglang-router mode is correctly implemented with appropriate logging.


137-149: LGTM!

The addition of worker_idx parameter to get_gpu_command correctly threads the worker context through the prefill worker setup path.


174-176: LGTM!

The conditional installation of SGLang from source in decode worker setup mirrors the prefill worker implementation correctly.


186-198: LGTM!

The addition of worker_idx parameter to get_gpu_command correctly threads the worker context through the decode worker setup path.


277-280: LGTM!

The conditional installation of SGLang from source in aggregated worker setup is consistent with the prefill and decode implementations.

scripts/templates/job_script_template_disagg.j2 (2)

181-181: LGTM!

The conditional mount for sglang_src_dir is correctly implemented using Jinja2 syntax. When sglang_src_dir is defined, it will be mounted to /ext-sglang-src/ in the container, otherwise the mount directive is omitted cleanly.


343-362: Prefill server readiness check is functional but has validation gaps.

The new wait loop for prefill server readiness implements a reasonable polling strategy (30s intervals, 600s timeout). However, past review comments have identified valid concerns:

  1. IP validation missing (line 346): If get_node_ip returns empty, the curl will use a malformed URL
  2. Timeout handling (lines 360-362): Proceeding with a warning after timeout may cause router initialization failures

These concerns remain unaddressed in the current implementation.

Given that these issues were flagged in previous reviews, please verify whether:

  • IP validation will be added as a separate follow-up
  • The timeout warning-only behavior is intentional for this use case
scripts/worker_setup/command.py (3)

11-22: LGTM!

The addition of worker_idx parameter to build_sglang_command_from_yaml correctly extends the function signature to support per-worker command construction. The parameter is appropriately positioned and typed.


67-67: LGTM!

The updated nsys profiling output path now includes worker_idx in the filename (/logs/profiles/{config_key}_w{worker_idx}_{rank}), which properly isolates profiling data per worker. This prevents file collisions in multi-worker scenarios.


206-239: LGTM!

The addition of worker_idx parameter to get_gpu_command and its propagation to build_sglang_command_from_yaml correctly threads per-worker context through the command construction pipeline. The parameter is appropriately positioned as the second argument after worker_type.

Comment thread docs/sglang-router.md
Comment on lines +140 to +145
### How It Works

1. Your local sglang directory is mounted to `/ext-sglang-src/` in the container
2. Before launching workers, the script runs: `pip install -e . --no-deps`
3. Workers use your local sglang code instead of the container's pre-installed version

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the automatic 'python' subdirectory resolution.

The documentation states that the local directory is mounted to /ext-sglang-src/, but the implementation in scripts/worker_setup/command.py (line 163) automatically appends /python to this path before installation. Users should be informed that sglang_src_dir must point to the SGLang repository root (not the python subdirectory), as the code will automatically resolve to the python/ subdirectory within it.

📝 Suggested documentation update
 ### How It Works
 
-1. Your local sglang directory is mounted to `/ext-sglang-src/` in the container
-2. Before launching workers, the script runs: `pip install -e . --no-deps`
+1. Your local sglang directory is mounted to `/ext-sglang-src/` in the container
+2. The installation automatically uses the `python/` subdirectory within the mounted source
+3. Before launching workers, the script runs: `pip install -e . --no-deps` from `/ext-sglang-src/python/`
-3. Workers use your local sglang code instead of the container's pre-installed version
+4. Workers use your local sglang code instead of the container's pre-installed version
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### How It Works
1. Your local sglang directory is mounted to `/ext-sglang-src/` in the container
2. Before launching workers, the script runs: `pip install -e . --no-deps`
3. Workers use your local sglang code instead of the container's pre-installed version
### How It Works
1. Your local sglang directory is mounted to `/ext-sglang-src/` in the container
2. The installation automatically uses the `python/` subdirectory within the mounted source
3. Before launching workers, the script runs: `pip install -e . --no-deps` from `/ext-sglang-src/python/`
4. Workers use your local sglang code instead of the container's pre-installed version
🤖 Prompt for AI Agents
In @docs/sglang-router.md around lines 140 - 145, The docs currently say your
local sglang dir is mounted to /ext-sglang-src/, but the worker setup actually
takes the provided sglang_src_dir and automatically appends "/python" before
running pip install -e; update the "How It Works" section to state that
sglang_src_dir must point to the SGLang repository root (not the python
subdirectory) because the setup code will resolve to the python/ subdirectory
automatically prior to installation.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
docs/accuracy.md (1)

55-55: Clarify the approximation symbol in the accuracy guidance note.

Line 55 uses ~0.81 (tilde indicating "approximately"), but it's unclear whether this is an empirical observation or a rough estimate. For benchmark documentation, clarify whether this value was measured precisely in testing or is an approximate guideline.

If this is an empirical result, consider changing to a precise value or adding context like "approximately 0.81 based on testing" or "capped around 0.81". If it's a guideline, consider adding a brief explanation of the variance range (e.g., "typically in the 0.79–0.82 range").

docs/sglang-router.md (2)

234-240: Clarify the interaction between --no-deps and pre-installed dependencies.

Lines 234–240 note that --no-deps is used and the container must have compatible dependencies. However, this could be clearer:

  • Does this mean all transitive dependencies must already exist in the container?
  • What happens if a local sglang version requires a newer version of a dependency not in the container?

Consider adding an example scenario or a troubleshooting tip (e.g., "If you encounter import errors, ensure your container has dependencies >= X version").


212-232: Add guidance on verifying sglang_src_dir path before deployment.

The example configuration (lines 212–232) shows sglang_src_dir: "/home/username/projects/sglang", but the notes (line 237) say the directory "must exist on the host running srtctl." Consider adding a quick pre-flight check suggestion, such as:

# Verify the source exists before applying
srtctl apply -f recipes/debug-sglang-router.yaml --validate

Or add a note: "You can validate the path exists locally before applying: test -d /home/username/projects/sglang && echo 'OK' || echo 'Path not found'"

This helps users catch configuration errors early.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27fcefc and 8844d36.

📒 Files selected for processing (4)
  • docs/accuracy.md
  • docs/profiling.md
  • docs/sglang-router.md
  • src/srtctl/core/schema.py
✅ Files skipped from review due to trivial changes (2)
  • docs/profiling.md
  • src/srtctl/core/schema.py
🔇 Additional comments (1)
docs/sglang-router.md (1)

180-240: Verify implementation alignment with sglang_src_dir documentation.

The new "Debugging with SGLang Source Code" section (lines 180–240) documents a feature for mounting and installing sglang from source. Key claims should be verified against the actual implementation:

  1. Mount location (line 196): Does the implementation mount to /ext-sglang-src/?
  2. Installation command (line 197): Does the code execute pip install -e . --no-deps exactly?
  3. Constraint enforcement (line 236): Is the feature truly gated to use_sglang_router: true only?
  4. Scope (line 204): Does the feature apply to all "prefill/decode/aggregated workers" as stated?

Additionally, the AI-generated summary mentions that this content "appears in two locations" within the file. Please confirm whether the section is duplicated elsewhere in the markdown; if so, consolidate to a single source of truth.

@ishandhanani
Copy link
Copy Markdown
Owner

Too much has changed. If this feature is needed in the future we can rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants