Skip to content

sgl-router and docs#61

Merged
ishandhanani merged 14 commits intomainfrom
ishan/sgl-router-full
Dec 19, 2025
Merged

sgl-router and docs#61
ishandhanani merged 14 commits intomainfrom
ishan/sgl-router-full

Conversation

@ishandhanani
Copy link
Copy Markdown
Owner

@ishandhanani ishandhanani commented Dec 17, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added SGLang Router mode as an alternative to Dynamo frontends for disaggregated deployments, including multi-router architecture with nginx load balancing.
    • Introduced new configuration options: use_sglang_router, num_additional_frontends, and backend-specific parameters.
  • Documentation

    • Expanded installation and profiling guides with clarified architecture options and cluster compatibility guidance.
    • Added comprehensive SGLang Router documentation with deployment modes and troubleshooting.
  • Improvements

    • Reorganized profiling parameters for cleaner configuration structure.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 17, 2025

Walkthrough

This PR introduces SGLang Router mode as an alternative to Dynamo frontends for disaggregated inference, adds multi-router architecture with load balancing, refactors backend classes from backends/ to core/backend.py, and simplifies CLI submission by removing the validate subcommand and consolidating dry-run handling.

Changes

Cohort / File(s) Summary
Worker Setup & Router Mode
scripts/worker_setup/command.py, scripts/worker_setup/worker.py, scripts/worker_setup/__init__.py, scripts/worker_setup.py
Added use_sglang_router parameter to enable router mode in worker setup. Conditional logic selects between sglang.launch_server and dynamo.sglang launchers. New setup_router_worker function configures sglang router with prefill/decode server lists. CLI extended with --prefill-ips, --decode-ips, --router-port, --server-port, --bootstrap-port arguments.
Health Check Refactoring
scripts/utils/benchmark_utils.sh, scripts/utils/check_server_health.py
Refactored health checks into two modes: Dynamo frontend (/health) and SGLang router (/workers). New functions check_dynamo_health and check_sglang_router_health delegate based on --sglang-router flag. CLI now accepts --sglang-router to route health checks appropriately.
Disaggregated Job Template & Load Balancing
scripts/templates/job_script_template_disagg.j2, scripts/templates/nginx.conf.j2
Converted single-router architecture to multi-router with nginx load balancer. Dynamic router distribution across nodes, nginx configuration generation, and orchestration of multiple routers on designated nodes. Conditional fallback for single-router mode. Made upstream port configurable via template variable.
Benchmark Script
scripts/benchmarks/sa-bench/bench.sh
Added use_sglang_router parameter, propagates to wait_for_model invocation, and appends to configuration output.
Backend Architecture Refactoring
src/srtctl/backends/base.py, src/srtctl/backends/sglang.py, src/srtctl/core/backend.py, src/srtctl/__init__.py
Removed abstract Backend base class. Moved SGLangBackend implementation from backends/sglang.py to core/backend.py. New implementation handles disaggregated mode detection, environment variable retrieval, config file generation, command rendering with profiling support, and SLURM script generation via Jinja2 templating.
CLI Submission Refactoring
src/srtctl/cli/submit.py
Removed validate subcommand and DryRunContext class. New run_dry_run() function consolidates dry-run artifact generation. Simplified command rendering flow, unified argument parsing, and removed duplication in sweep/single submission paths.
Core Schema
src/srtctl/core/schema.py
Added num_additional_frontends: int = 9 field to BackendConfig for controlling multi-router deployment scale.
Documentation
docs/README.md, docs/installation.md, docs/profiling.md, docs/SUMMARY.md, docs/sglang-router.md, docs/analyzing.md, docs/slurm-faq.md
Added profiling support notes and SGLang Router entry to features. Updated installation to use new repository path and added architecture-specific setup guidance. Replaced per-block profiling parameters with top-level parameters. New comprehensive SGLang Router documentation with architecture diagrams and distribution logic. Added SLURM FAQ for cluster compatibility. Removed portions of analyzing documentation.

Sequence Diagram

sequenceDiagram
    participant User
    participant SLURM as SLURM Job Script
    participant nginx as Nginx<br/>(Load Balancer)
    participant Router as SGLang Router(s)
    participant Prefill as Prefill Servers
    participant Decode as Decode Servers
    participant HealthCheck as Health Check Service

    User->>SLURM: Submit job
    SLURM->>nginx: Start nginx on node 0
    SLURM->>Router: Launch primary router on nginx node
    SLURM->>Router: Launch additional routers on worker nodes
    Router->>Router: Register server endpoints
    
    HealthCheck->>nginx: Poll /workers endpoint
    nginx->>Router: Query worker stats
    Router->>Router: Aggregate prefill/decode counts
    Router-->>HealthCheck: Return stats
    HealthCheck->>HealthCheck: Validate expected counts
    
    HealthCheck-->>SLURM: Confirm ready
    SLURM->>User: Frontend available at http://localhost:8000
    
    User->>nginx: Send inference request
    nginx->>Router: Route request
    Router->>Prefill: Token prefill phase
    Prefill-->>Router: Prefill results
    Router->>Decode: Decoding phase
    Decode-->>Router: Decoded tokens
    Router-->>nginx: Return results
    nginx-->>User: Response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

  • Backend refactoring: Removal of base class and relocation of SGLangBackend to new module requires careful validation of moved logic and updated imports across multiple files.
  • Multi-router job template logic: Dynamic router distribution with nginx configuration generation involves conditional blocks and loop logic that may have edge cases.
  • Health check branching: Two-mode implementation with different endpoint polling and response parsing logic requires thorough testing.
  • CLI changes: Removal of validate subcommand and DryRunContext consolidation affects submission flow; verify all paths remain functional.

Possibly related PRs

Poem

🐰 A router is born to balance the load,
With prefill and decode along the right road,
Nginx stands guard at the nginx node gate,
While backends unite—oh how backends migrate!
From silo to core, the refactor's complete,
SGLang routers make disaggregation sweet! 🌟

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.63% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'support sglang router' is concise and directly describes the main feature being added throughout the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ishan/sgl-router-full

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d3f537 and 1e080e7.

📒 Files selected for processing (4)
  • src/srtctl/__init__.py (1 hunks)
  • src/srtctl/cli/submit.py (7 hunks)
  • src/srtctl/core/backend.py (1 hunks)
  • src/srtctl/core/schema.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/srtctl/core/backend.py (3)
src/srtctl/core/config.py (1)
  • get_srtslurm_setting (104-118)
src/srtctl/core/sweep.py (1)
  • expand_template (17-48)
analysis/srtlog/models.py (1)
  • is_aggregated (74-76)
src/srtctl/__init__.py (1)
src/srtctl/core/backend.py (1)
  • SGLangBackend (20-212)
src/srtctl/cli/submit.py (1)
src/srtctl/core/backend.py (3)
  • SGLangBackend (20-212)
  • render_command (81-113)
  • generate_config_file (53-79)
🪛 Ruff (0.14.8)
src/srtctl/core/backend.py

23-23: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


53-53: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


65-65: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


68-68: Avoid specifying long messages outside the exception class

(TRY003)


81-81: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


115-115: Unused method argument: config_path

(ARG002)


115-115: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


115-115: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


203-203: Avoid specifying long messages outside the exception class

(TRY003)

src/srtctl/cli/submit.py

56-56: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🔇 Additional comments (10)
src/srtctl/core/schema.py (1)

251-253: LGTM!

The new num_additional_frontends field is well-documented with clear comments explaining the total calculation and dual usage for both Dynamo frontends and SGLang router instances. The default value of 9 (yielding 10 total) is reasonable for distributed serving workloads.

src/srtctl/__init__.py (1)

8-8: LGTM!

The import path change from .backends.sglang to .core.backend correctly reflects the backend consolidation. The SGLangBackend class remains part of the public API.

src/srtctl/cli/submit.py (5)

29-29: LGTM!

The import path update correctly reflects the backend module consolidation from srtctl.backends.sglang to srtctl.core.backend.


40-53: LGTM!

The refactored render_commands_file function is cleaner and delegates command rendering to the backend, improving separation of concerns.


113-115: LGTM!

The simplified dry-run path correctly instantiates the backend and delegates artifact generation to run_dry_run.


178-223: LGTM!

The metadata restructuring improves code organization by consolidating resource, model, and SLURM configurations into a run_meta structure. The walrus operator usage for bench_type is appropriate and makes the conditional handling cleaner.


317-358: LGTM!

The CLI refactoring improves usability by:

  • Consolidating common arguments via the add_common_args helper
  • Auto-detecting sweep mode with is_sweep_config
  • Removing the validate subcommand to simplify the interface

The changes maintain backward compatibility for the primary apply/dry-run workflows.

src/srtctl/core/backend.py (3)

31-51: LGTM!

The helper methods provide clean abstractions for determining mode, retrieving environment variables, and converting config dictionaries to command-line flags. The flag conversion logic correctly handles boolean, list, and scalar values.


96-99: LGTM!

The use of yaml.safe_load correctly addresses the security concern from previous reviews, preventing arbitrary object instantiation from YAML files.


163-212: LGTM!

The SLURM script generation is comprehensive, correctly incorporating:

  • Resource allocation (aggregated vs disaggregated modes)
  • Frontend/router settings (num_additional_frontends, use_sglang_router)
  • Benchmark and profiling configurations
  • Template rendering with clear error handling

The implementation properly integrates the new SGLang router features introduced in this PR.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
scripts/worker_setup/worker.py (2)

113-122: Consider closing file handles on failure.

If run_command returns None (line 119), the file handles opened on lines 114-115 remain open. While minor (only occurs on failure path), consider using a context manager or explicit cleanup.

+    frontend_stdout = None
+    frontend_stderr = None
+    try:
         # Open log files for frontend
         frontend_stdout = open("/logs/frontend.out", "w")
         frontend_stderr = open("/logs/frontend.err", "w")
 
         frontend_cmd = "python3 -m dynamo.frontend --http-port=8000"
         frontend_process = run_command(frontend_cmd, background=True, stdout=frontend_stdout, stderr=frontend_stderr)
         if not frontend_process:
+            raise RuntimeError("Failed to start frontend")
-            raise RuntimeError("Failed to start frontend")
         logging.info(f"Frontend started in background (PID: {frontend_process.pid})")
         logging.info("Frontend logs: /logs/frontend.out and /logs/frontend.err")
+    except Exception:
+        if frontend_stdout:
+            frontend_stdout.close()
+        if frontend_stderr:
+            frontend_stderr.close()
+        raise

124-126: Simplify redundant condition.

The check not gpu_type.startswith("gb300") is redundant—a string cannot start with both "gb200" and "gb300". This simplifies to:

-    if (gpu_type.startswith("gb200") and not gpu_type.startswith("gb300")) or gpu_type.startswith("h100"):
+    if gpu_type.startswith("gb200") or gpu_type.startswith("h100"):

If the intent is to exclude "gb300" types entirely, the condition should be restructured (e.g., check gpu_type.startswith(("gb200", "h100")) and not gpu_type.startswith("gb300")).

scripts/worker_setup/command.py (1)

177-179: Consider using keyword arguments for clarity.

With 9 parameters, positional arguments are error-prone and hard to read. Keyword arguments improve maintainability.

     return build_sglang_command_from_yaml(
-        worker_type, sglang_config_path, host_ip, port, total_nodes, rank, profiler, dump_config_path, use_sglang_router
+        worker_type=worker_type,
+        sglang_config_path=sglang_config_path,
+        host_ip=host_ip,
+        port=port,
+        total_nodes=total_nodes,
+        rank=rank,
+        profiler=profiler,
+        dump_config_path=dump_config_path,
+        use_sglang_router=use_sglang_router,
     )
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b6f886 and 5543871.

📒 Files selected for processing (2)
  • scripts/worker_setup/command.py (7 hunks)
  • scripts/worker_setup/worker.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/worker_setup/worker.py (1)
scripts/worker_setup/command.py (1)
  • install_dynamo_wheels (120-144)
🔇 Additional comments (3)
scripts/worker_setup/worker.py (2)

143-185: LGTM!

The use_sglang_router parameter is correctly integrated. Conditional skipping of etcd wait and dynamo installation aligns with the sglang router mode behavior. Same comment about the redundant condition on line 170 applies here.


188-253: LGTM - same patterns as setup_prefill_worker.

The implementation is consistent with the prefill worker. The same optional refactors for file handle cleanup (lines 227-228) and the redundant condition (line 238) apply here as well.

scripts/worker_setup/command.py (1)

62-65: LGTM!

The use_launch_server logic correctly unifies profiling mode and sglang router mode under the same sglang.launch_server module path, with dynamo.sglang reserved for the traditional/non-profiling mode.

Comment thread scripts/worker_setup/command.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/benchmarks/sa-bench/bench.sh (1)

29-29: Consider using ${chosen_concurrencies[*]} for cleaner array expansion.

When expanding an array in a string context, ${array[*]} is preferred over ${array[@]} to avoid shellcheck warnings (SC2145).

Apply this diff:

-echo "Config ${chosen_isl}; ${chosen_osl}; ${chosen_concurrencies[@]}; ${chosen_req_rate}; sglang_router=${use_sglang_router}"
+echo "Config ${chosen_isl}; ${chosen_osl}; ${chosen_concurrencies[*]}; ${chosen_req_rate}; sglang_router=${use_sglang_router}"
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5543871 and 82dfe10.

📒 Files selected for processing (4)
  • scripts/benchmarks/sa-bench/bench.sh (1 hunks)
  • scripts/templates/job_script_template_disagg.j2 (1 hunks)
  • scripts/utils/benchmark_utils.sh (1 hunks)
  • scripts/utils/check_server_health.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/benchmarks/sa-bench/bench.sh (1)
scripts/utils/benchmark_utils.sh (1)
  • wait_for_model (5-57)
🪛 Shellcheck (0.11.0)
scripts/utils/benchmark_utils.sh

[warning] 25-25: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 26-26: Declare and assign separately to avoid masking return values.

(SC2155)

scripts/benchmarks/sa-bench/bench.sh

[error] 29-29: Argument mixes string and array. Use * or separate argument.

(SC2145)

🔇 Additional comments (11)
scripts/templates/job_script_template_disagg.j2 (2)

381-381: LGTM!

The template variable correctly renders to a bash-compatible boolean string.


385-385: LGTM!

The USE_SGLANG_ROUTER flag is correctly passed to bench.sh as the 9th positional argument. The argument ordering relies on BENCHMARK_ARGS expanding to exactly 4 space-separated values, which should work correctly based on line 380.

scripts/benchmarks/sa-bench/bench.sh (2)

27-27: LGTM!

The parameter correctly captures the 9th positional argument with a sensible default value.


35-35: LGTM!

The flag is correctly passed as the 8th argument to wait_for_model, matching the function signature in benchmark_utils.sh.

scripts/utils/benchmark_utils.sh (3)

14-17: LGTM!

The new parameter is added with a sensible default, maintaining backward compatibility. The workers_addr variable is appropriately defined for the sglang router mode.


19-23: LGTM!

The conditional echo provides clear feedback about which health check mode is active, aiding in debugging and monitoring.


29-37: LGTM!

The conditional logic correctly routes to the appropriate endpoint and passes the corresponding flag to the health check script based on the mode.

scripts/utils/check_server_health.py (4)

5-27: LGTM!

The updated docstring clearly documents both health check modes with usage examples, and the argparse import supports the enhanced CLI interface.


50-80: LGTM!

The dynamo health check logic is correctly extracted into its own function, maintaining the original behavior for /health endpoint validation. The component-based counting logic properly handles both disaggregated and aggregated modes.


82-96: LGTM!

The dispatcher correctly validates inputs and routes to the appropriate health check function based on the mode flag. The default parameter maintains backward compatibility.


98-113: LGTM!

The argparse-based CLI provides a cleaner interface with proper flag support for selecting the health check mode. The argument passing is correct and maintains compatibility with both modes.

Comment on lines +30 to +48
def check_sglang_router_health(expected_n_prefill: int, expected_n_decode: int, response: str) -> str:
"""Check health using sglang router /workers endpoint."""
try:
decoded_response = json.loads(response)
except json.JSONDecodeError:
return f"Got invalid response from server that leads to JSON Decode error: {response}"

expected_n_prefill = int(expected_n_prefill)
expected_n_decode = int(expected_n_decode)
if "stats" not in decoded_response:
return f"Key 'stats' not found in response: {response}"

stats = decoded_response["stats"]
actual_prefill = stats.get("prefill_count", 0)
actual_decode = stats.get("decode_count", 0)

if actual_prefill >= expected_n_prefill and actual_decode >= expected_n_decode:
return f"Model is ready. Have {actual_prefill} prefills and {actual_decode} decodes."
else:
return f"Model is not ready, waiting for {expected_n_prefill - actual_prefill} prefills and {expected_n_decode - actual_decode} decodes. Have {actual_prefill} prefills and {actual_decode} decodes."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix potentially confusing negative wait counts.

When one worker type exceeds the expected count while another doesn't, the subtraction on line 47 can produce negative values. For example, if actual_prefill=3, expected_n_prefill=2 but actual_decode=1, expected_n_decode=2, the message would incorrectly report "waiting for -1 prefills and 1 decodes."

Apply this diff to only report positive wait counts:

 def check_sglang_router_health(expected_n_prefill: int, expected_n_decode: int, response: str) -> str:
     """Check health using sglang router /workers endpoint."""
     try:
         decoded_response = json.loads(response)
     except json.JSONDecodeError:
         return f"Got invalid response from server that leads to JSON Decode error: {response}"
 
     if "stats" not in decoded_response:
         return f"Key 'stats' not found in response: {response}"
 
     stats = decoded_response["stats"]
     actual_prefill = stats.get("prefill_count", 0)
     actual_decode = stats.get("decode_count", 0)
 
     if actual_prefill >= expected_n_prefill and actual_decode >= expected_n_decode:
         return f"Model is ready. Have {actual_prefill} prefills and {actual_decode} decodes."
     else:
-        return f"Model is not ready, waiting for {expected_n_prefill - actual_prefill} prefills and {expected_n_decode - actual_decode} decodes. Have {actual_prefill} prefills and {actual_decode} decodes."
+        prefill_needed = max(0, expected_n_prefill - actual_prefill)
+        decode_needed = max(0, expected_n_decode - actual_decode)
+        return f"Model is not ready, waiting for {prefill_needed} prefills and {decode_needed} decodes. Have {actual_prefill} prefills and {actual_decode} decodes."
🤖 Prompt for AI Agents
In scripts/utils/check_server_health.py around lines 30 to 48, the health
message can show negative "waiting for" counts when actual > expected; compute
positive wait counts by clamping each difference to zero (e.g., wait_prefill =
max(0, expected_n_prefill - actual_prefill) and wait_decode = max(0,
expected_n_decode - actual_decode)) and use those wait_prefill and wait_decode
values in the "Model is not ready" return string so it never reports negative
waits.

@ishandhanani
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 18, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
src/srtctl/core/schema.py (1)

242-244: Consider adding validation for num_additional_frontends.

The field accepts any integer, including negative values, which would result in invalid configurations. A simple validator ensuring num_additional_frontends >= 0 would prevent misconfiguration.

🔎 Suggested validation:
+from pydantic import field_validator
+
 class BackendConfig(BaseModel):
     # ... existing fields ...
     num_additional_frontends: int = 9
+
+    @field_validator("num_additional_frontends")
+    @classmethod
+    def validate_num_additional_frontends(cls, v):
+        if v < 0:
+            raise ValueError("num_additional_frontends must be non-negative")
+        return v
docs/README.md (1)

42-48: Command table improves clarity.

The table format makes the available commands easy to scan.

Minor note: Line 48 shows validate as an alias for dry-run, but the description reads "Alias for dry-run" which could be slightly clearer as "Same as dry-run" to avoid confusion about which is the primary command.

docs/installation.md (1)

116-121: Add language specifier to fenced code block.

The error message code block is missing a language specifier. Consider adding a language hint for consistent formatting.

🔎 Suggested fix:
-```
+```text
 sbatch: error: Invalid --segment specification
</details>

</blockquote></details>
<details>
<summary>scripts/worker_setup/worker.py (1)</summary><blockquote>

`113-122`: **File handles for frontend logs are never closed.**

The file handles `frontend_stdout` and `frontend_stderr` are opened but never explicitly closed. While these are for a background process and the handles will be closed when the parent process exits, it's better practice to manage them explicitly or use a context manager.



Note: The same pattern exists at lines 273-274 in `setup_aggregated_worker`. Consider addressing both locations consistently.

</blockquote></details>
<details>
<summary>scripts/templates/job_script_template_disagg.j2 (1)</summary><blockquote>

`398-417`: **Consider safer alternative to `eval()` for parsing router list.**

The inline Python script uses `eval(os.environ['ROUTER_LIST'])` to parse the bash-generated list. While the input is controlled (generated by the script itself), `eval()` is generally risky. Consider using `json.loads()` with proper JSON formatting instead.


<details>
<summary>🔎 Suggested approach:</summary>

```diff
-ROUTER_LIST=$(printf "'%s'," "${router_ips[@]}")
-ROUTER_LIST="[${ROUTER_LIST%,}]"
+ROUTER_LIST=$(printf '"%s",' "${router_ips[@]}")
+ROUTER_LIST="[${ROUTER_LIST%,}]"
 export ROUTER_LIST SCRIPT_DIR LOG_DIR
 python3 - <<'PY'
 import os
+import json
 from jinja2 import Template
 ...
-router_hosts = eval(os.environ['ROUTER_LIST'])
+router_hosts = json.loads(os.environ['ROUTER_LIST'])

The same pattern exists at line 125 for FRONTEND_LIST.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82dfe10 and 3502b3c.

📒 Files selected for processing (10)
  • README.md (1 hunks)
  • docs/README.md (2 hunks)
  • docs/installation.md (8 hunks)
  • docs/profiling.md (5 hunks)
  • examples/example.yaml (1 hunks)
  • scripts/templates/job_script_template_disagg.j2 (3 hunks)
  • scripts/worker_setup.py (5 hunks)
  • scripts/worker_setup/__init__.py (2 hunks)
  • scripts/worker_setup/worker.py (6 hunks)
  • src/srtctl/core/schema.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/example.yaml
🧰 Additional context used
🧬 Code graph analysis (3)
scripts/worker_setup/__init__.py (1)
scripts/worker_setup/worker.py (1)
  • setup_router_worker (188-231)
scripts/worker_setup/worker.py (2)
scripts/worker_setup/command.py (1)
  • install_dynamo_wheels (120-144)
scripts/worker_setup/utils.py (1)
  • run_command (73-104)
scripts/worker_setup.py (1)
scripts/worker_setup/worker.py (1)
  • setup_router_worker (188-231)
🪛 LanguageTool
docs/installation.md

[style] ~116-~116: Consider an alternative verb to strengthen your wording.
Context: ... #### Segment-Based Scheduling If you see this error when submitting jobs: ``` s...

(IF_YOU_HAVE_THIS_PROBLEM)

🪛 markdownlint-cli2 (0.18.1)
docs/installation.md

118-118: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.8)
scripts/worker_setup/worker.py

192-192: Possible binding to all interfaces

(S104)

scripts/worker_setup.py

195-195: Avoid specifying long messages outside the exception class

(TRY003)


197-197: Avoid specifying long messages outside the exception class

(TRY003)


273-273: Possible binding to all interfaces

(S104)

🔇 Additional comments (12)
scripts/worker_setup/__init__.py (1)

10-10: LGTM!

The new setup_router_worker function is correctly imported and exported, following the existing module patterns.

Also applies to: 30-30

docs/profiling.md (1)

39-40: Documentation update looks good.

The unified profiling parameters simplify the configuration and the documentation clearly explains that the same parameters apply to both prefill and decode workers in disaggregated mode.

scripts/worker_setup.py (3)

93-121: New CLI arguments for sglang-router are well-structured.

The arguments follow existing patterns and provide sensible defaults. The help text clearly describes each parameter's purpose.


192-197: Validation logic is correct.

The validation properly requires both --prefill-ips and --decode-ips for the sglang-router worker type. Note: The static analysis warning about long exception messages (TRY003) can be safely ignored here as the descriptive messages aid debugging in CLI contexts.


266-277: Router worker setup implementation looks good.

The IP parsing correctly handles whitespace and empty entries. Using args.worker_idx or 0 provides a safe fallback.

Regarding the static analysis hint (S104) about binding to 0.0.0.0: This is intentional and appropriate for a router service that needs to accept external connections within the cluster network.

docs/README.md (1)

15-15: Documentation updates correctly reflect the new features.

The profiling support addition and the updated frontend distribution description (nginx or sglang_router) align well with the PR's implementation.

Also applies to: 25-25

README.md (1)

27-44: Good addition of the Commands section.

The examples clearly demonstrate common workflows including job submission, custom setup scripts, tags, dry-run validation, and dashboard launch. This improves the onboarding experience.

docs/installation.md (2)

184-196: Backend options documentation looks good.

The new backend options for enable_multiple_frontends, num_additional_frontends, and use_sglang_router are clearly documented with appropriate defaults and explanations.


291-327: Comprehensive srtslurm.yaml reference is helpful.

This complete example provides users with a single reference for all available configuration options, improving discoverability.

scripts/worker_setup/worker.py (2)

188-231: New setup_router_worker function looks well-structured.

The function correctly:

  • Accepts prefill/decode IP lists for router configuration
  • Builds the sglang_router.launch_router command with --pd-disaggregation
  • Properly formats prefill servers with bootstrap port and decode servers without
  • Uses run_command for consistent execution

The host="0.0.0.0" default binding is appropriate for container environments where the router needs to accept connections from other nodes.


93-104: Correct conditional handling for sglang router mode.

The logic properly skips dynamo wheel installation and frontend startup when use_sglang_router=True, as the sglang router handles load balancing differently. The flag is correctly propagated to get_gpu_command.

Also applies to: 109-111, 138-138

scripts/templates/job_script_template_disagg.j2 (1)

463-469: USE_SGLANG_ROUTER correctly propagated to benchmark script.

The template properly converts the boolean to a bash-friendly string and passes it as an argument to bench.sh, enabling router-aware benchmarking.

Comment thread README.md
Comment on lines +8 to 11
# Clone and install
git clone https://github.com/your-org/srtctl.git
cd srtctl
pip install -e .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Placeholder URL should be updated to actual repository.

Line 9 contains your-org/srtctl.git which appears to be a placeholder. This should reference the actual repository URL for users to clone.

🤖 Prompt for AI Agents
In README.md around lines 8 to 11 the git clone URL uses the placeholder
your-org/srtctl.git; replace that placeholder with the actual repository URL
(e.g., git clone https://github.com/<actual-org>/<actual-repo>.git) so the clone
command points to the real project, and verify the path and casing match the
repository name before committing.

Comment thread scripts/templates/job_script_template_disagg.j2
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
scripts/templates/job_script_template_disagg.j2 (1)

441-442: Consider reusing the nodes_per_router calculation.

The same calculation from line 383 is repeated here. Since both calculations are within the same conditional block, you could reuse the earlier value to reduce duplication.

💡 Optional refactor

Move the calculation before the first loop (around line 383) and reference the same variable in both loops, eliminating the recalculation at line 441.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3502b3c and d89a474.

📒 Files selected for processing (2)
  • scripts/templates/job_script_template_disagg.j2 (3 hunks)
  • scripts/templates/nginx.conf.j2 (1 hunks)
🔇 Additional comments (5)
scripts/templates/nginx.conf.j2 (1)

9-9: LGTM! Configurable backend port with sensible default.

The template now accepts a backend_port variable while defaulting to 8000, enabling flexible nginx configurations for different backend topologies.

scripts/templates/job_script_template_disagg.j2 (4)

341-359: LGTM! Clean IP collection and string formatting.

The leader IP collection and comma-separated string construction follows standard bash patterns and correctly prepares the arguments for router invocation.


403-436: Port conflict resolved! ✅

The previous port conflict issue (nginx and router 0 both binding to 8000) has been properly addressed:

  • Routers now use internal port 30080 (line 403: ROUTER_INTERNAL_PORT)
  • nginx listens on external port 8000 and proxies to routers on 30080
  • Both services coexist on node 0 without conflict

This architecture cleanly separates concerns with nginx as the public-facing load balancer.


460-471: LGTM! Clean single-router fallback.

The single-router mode provides a straightforward deployment without nginx overhead when multi-router scaling isn't needed. The router correctly binds directly to port 8000 without conflicts.


485-489: LGTM! Correct flag propagation to benchmark.

The USE_SGLANG_ROUTER flag is properly set and passed to the benchmark script, enabling conditional behavior in downstream benchmark utilities as intended by this PR.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/sglang-router.md (1)

40-48: Add language identifiers to fenced code blocks (markdown linting).

Lines 40–48, 65–80, and 99–101 contain fenced code blocks without language specifiers. For proper markdown linting compliance, add a language identifier (e.g., text for ASCII diagrams).

🔎 Apply these diffs to add language identifiers:

-┌─────────────────────────────────────────────────────────┐
+```text
+┌─────────────────────────────────────────────────────────┐


```diff

-┌──────────────────────────────────────────────────────────────────────┐
+```text
+┌──────────────────────────────────────────────────────────────────────┐


```diff

-nodes_per_router = ceil((total_nodes - 1) / num_additional_frontends)
+```
+nodes_per_router = ceil((total_nodes - 1) / num_additional_frontends)

Also applies to: 65-80, 99-101

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d89a474 and 788ed48.

📒 Files selected for processing (3)
  • docs/README.md (2 hunks)
  • docs/SUMMARY.md (1 hunks)
  • docs/sglang-router.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/SUMMARY.md

[grammar] ~8-~8: Ensure spelling is correct
Context: ...ring](monitoring.md) - Parameter Sweeps - Analying

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/sglang-router.md

[grammar] ~124-~124: Ensure spelling is correct
Context: ...rt 30000 by default. This is standard sglang behavior and doesn't need configuration...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
docs/sglang-router.md

40-40: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


65-65: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


99-99: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (6)
docs/sglang-router.md (1)

1-209: Excellent, comprehensive documentation for SGLang Router feature.

The documentation clearly explains the new SGLang Router mode as an alternative to Dynamo frontends, with well-structured sections covering configuration, architecture modes (single and multiple routers), distribution logic, port configuration, troubleshooting, and comparison with Dynamo. The ASCII diagrams, tables, and complete YAML example effectively communicate the feature's capabilities and deployment patterns. Port numbers (8000, 30080, 30001) are consistently used and documented.

docs/README.md (4)

32-39: Improved clarity in "How It Works" section.

The expanded, enumerated breakdown of the workflow (validation → alias resolution → generation → SLURM submission) is clear and easy to follow. Good addition to the documentation flow.


25-26: Clear presentation of routing options.

The updated architecture description now explicitly mentions both Dynamo frontends (default) and SGLang Router as routing options, with direct links to documentation. This improves discoverability and helps users understand the feature choice.


43-49: Command table improves CLI documentation.

The tabular format for commands is clearer than prose and makes the --tags option and aliases (dry-run, validate) immediately discoverable. Good UX improvement.


54-54: All referenced documentation files exist and are properly linked.

The links to sglang-router.md, profiling.md, and analyzing.md are valid—all files exist in the docs directory with correct relative link formatting (.md extension). No broken links detected.

docs/SUMMARY.md (1)

8-8: Fix typo in table of contents entry.

Line 8 contains a spelling error: "Analying" should be "Analyzing".

🔎 Apply this diff to fix the typo:
-- [Analying](analyzing.md)
+- [Analyzing](analyzing.md)

Likely an incorrect or invalid review comment.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
docs/slurm-faq.md (2)

9-13: Add language identifier to fenced code block.

The code block at line 11 should specify a language for proper syntax highlighting. Since it contains an error message (plain text), use ```text or keep it without language if it's meant to be literal output. However, the subsequent YAML block should use ```yaml.

🔎 Proposed fix
  If you see this error when submitting jobs:
  
- ```
+ ```text
  sbatch: error: Invalid generic resource (gres) specification
- ```
+ ```

25-29: Add language identifier to second fenced code block.

Similar to the first block, specify ```text for the error message output.

🔎 Proposed fix
  If you see this error when submitting jobs:
  
- ```
+ ```text
  sbatch: error: Invalid --segment specification
- ```
+ ```
docs/installation.md (1)

21-21: Minor: Consider rewording for clarity.

The phrase "You can use AI to figure out the right set of commands for your cluster" is informal. Consider: "Consult your cluster administrator or cluster documentation if these commands don't work for your system."

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 788ed48 and d64401c.

📒 Files selected for processing (6)
  • docs/README.md (1 hunks)
  • docs/SUMMARY.md (1 hunks)
  • docs/analyzing.md (1 hunks)
  • docs/installation.md (7 hunks)
  • docs/profiling.md (5 hunks)
  • docs/slurm-faq.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/analyzing.md
🧰 Additional context used
🪛 LanguageTool
docs/slurm-faq.md

[style] ~9-~9: Consider an alternative verb to strengthen your wording.
Context: ... ## GPU Resource Specification If you see this error when submitting jobs: ``` s...

(IF_YOU_HAVE_THIS_PROBLEM)


[style] ~25-~25: Consider an alternative verb to strengthen your wording.
Context: ...t. ## Segment-Based Scheduling If you see this error when submitting jobs: ``` s...

(IF_YOU_HAVE_THIS_PROBLEM)

🪛 markdownlint-cli2 (0.18.1)
docs/slurm-faq.md

11-11: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (12)
docs/SUMMARY.md (1)

1-10: Verify the referenced sglang-router.md file exists.

The TOC references sglang-router.md at line 5, which is not included in this review. Ensure the file exists in the docs/ directory and that the link is correct.

docs/README.md (1)

15-15: LGTM!

The additions clearly document the new profiling support and SGLang Router features. The "How It Works" section provides helpful context for users, and the updated command table and Next Steps guide users through the expanded workflow.

Also applies to: 17-45

docs/profiling.md (1)

10-10: LGTM!

The unified profiling parameter approach (top-level isl, osl, concurrency applied to both prefill and decode) is clearly documented and consistent. The requirement for benchmark.type: manual is explicitly stated, and the note at line 43 clarifies parameter usage across disaggregated workers.

Also applies to: 16-18, 43-43

docs/installation.md (9)

14-15: LGTM!

Repository URL updated to reflect the correct source (ishandhanani/srt-slurm). The instructions remain clear and functional.


32-32: LGTM!

The architecture-aware setup instructions are clear. The addition of ARCH parameter requirement, updated step descriptions, and the new auto-detect step for srtctl_root improve clarity and robustness for different environments.

Also applies to: 35-35, 40-40, 47-47


81-92: LGTM!

The new "Cloud Sync (Optional)" section clearly documents the optional S3-compatible storage integration with realistic examples and actionable make commands.


94-129: LGTM!

The "Complete srtslurm.yaml Reference" section provides comprehensive documentation of all available configuration options, including the new SLURM directive compatibility flags and cloud settings. The examples are clear and realistic.


143-145: LGTM!

The extra_mount formatting updated to use YAML sequence syntax (dash-prefixed) is consistent with YAML best practices and clearer than the previous format.


182-182: LGTM!

Added req_rate: "inf" benchmark option with clear documentation. This parameter is properly positioned in the example configuration.


185-197: Clarify the relationship between enable_multiple_frontends and use_sglang_router.

The new Backend Options section introduces two load-balancing configurations:

  • nginx + multiple frontends (enabled by default): enable_multiple_frontends: true with optional num_additional_frontends: 9
  • SGLang router: use_sglang_router: false (alternative to nginx)

However, the documentation doesn't explicitly state whether these are mutually exclusive or how to transition between them. If setting use_sglang_router: true should disable the nginx frontend mode, this constraint should be documented clearly.

🔎 Suggested clarification

Consider adding:

backend:
  # Load balancing modes (mutually exclusive):
  # Option 1: nginx with multiple frontends (default)
  enable_multiple_frontends: true # Default: true
  num_additional_frontends: 9     # Default: 9
  
  # Option 2: SGLang router (alternative to nginx)
  # Note: Set use_sglang_router: true to disable nginx mode
  use_sglang_router: false        # Default: false

Or verify the actual behavior and update accordingly.


212-220: LGTM!

The new "Submit with Tags" section clearly documents the --tags feature with practical examples and explains how tags are stored and used in analysis.


260-260: LGTM!

Updated note about setup script execution mentions Dynamo installation and remains accurate.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/srtctl/backends/sglang.py (2)

121-121: Security: Use safe YAML loader.

Using yaml.FullLoader can execute arbitrary Python code from YAML files. Switch to yaml.safe_load() to prevent potential code injection.

🔎 Proposed fix
-                sglang_config = yaml.load(f, Loader=yaml.FullLoader)
+                sglang_config = yaml.safe_load(f)

346-346: Potential runtime error with None value.

If self.config.get("extra_mount") returns None, calling ",".join(None) will raise a TypeError.

🔎 Proposed fix
-            "extra_container_mounts": ",".join(self.config.get("extra_mount") or []),
+            "extra_container_mounts": ",".join(self.config.get("extra_mount", [])),
src/srtctl/cli/submit.py (1)

301-301: Security: Use safe YAML loader.

Using yaml.FullLoader can execute arbitrary Python code from YAML files. Switch to yaml.safe_load() to prevent potential code injection.

🔎 Proposed fix
-        sweep_config = yaml.load(f, Loader=yaml.FullLoader)
+        sweep_config = yaml.safe_load(f)
🧹 Nitpick comments (1)
src/srtctl/backends/sglang.py (1)

204-385: Consider extracting helper methods to reduce complexity.

The generate_slurm_script method is quite long (181 lines). Consider extracting logical sections into smaller helper methods:

  • Node count computation (lines 218-236)
  • Template variable assembly (lines 310-347)
  • Template path resolution (lines 355-371)

This would improve readability and testability, though the current implementation is functional.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d64401c and 1e3bc5f.

📒 Files selected for processing (3)
  • src/srtctl/backends/base.py (0 hunks)
  • src/srtctl/backends/sglang.py (1 hunks)
  • src/srtctl/cli/submit.py (2 hunks)
💤 Files with no reviewable changes (1)
  • src/srtctl/backends/base.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/srtctl/cli/submit.py (1)
src/srtctl/backends/sglang.py (2)
  • SGLangBackend (22-385)
  • generate_config_file (41-89)
🪛 Ruff (0.14.8)
src/srtctl/cli/submit.py

72-72: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

src/srtctl/backends/sglang.py

25-25: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🔇 Additional comments (3)
src/srtctl/backends/sglang.py (1)

329-331: LGTM: sglang router flag integration.

The use_sglang_router flag is correctly extracted from backend_config with a sensible default of False and properly threaded through to the template variables.

src/srtctl/cli/submit.py (2)

72-100: LGTM: Cleaner dry-run implementation.

The new run_dry_run function consolidates dry-run logic effectively, replacing the previous DryRunContext class with a simpler, more direct approach. The conditional handling of SGLang-specific artifacts is well-structured.


129-131: LGTM: Conditional backend instantiation.

The dry-run path correctly instantiates the backend only when needed (for SGLang) and gracefully handles the case where no backend is required.

Comment thread src/srtctl/backends/sglang.py Outdated
Comment thread src/srtctl/backends/sglang.py Outdated
class SGLangBackend:
"""SGLang backend for distributed serving."""

def __init__(self, config: dict, setup_script: str = None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix implicit Optional type hint.

The setup_script parameter should explicitly use str | None instead of relying on implicit Optional.

🔎 Proposed fix
-    def __init__(self, config: dict, setup_script: str = None):
+    def __init__(self, config: dict, setup_script: str | None = None):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, config: dict, setup_script: str = None):
def __init__(self, config: dict, setup_script: str | None = None):
🧰 Tools
🪛 Ruff (0.14.8)

25-25: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In src/srtctl/backends/sglang.py around line 25, the __init__ signature uses an
implicit Optional for setup_script; change the parameter type to an explicit
union by updating the signature to use "setup_script: str | None = None" (or use
typing.Optional[str] if supporting older Python versions), and adjust any
related type hints or imports if necessary so the annotation is explicit.

Comment thread src/srtctl/cli/submit.py
print("\nTo see what commands would run:")
print(f" cat {self.output_dir}/commands.sh")
print("\n" + "=" * 60 + "\n")
def run_dry_run(config: dict, backend, sglang_config_path: Path = None) -> Path:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix implicit Optional type hint.

The sglang_config_path parameter should explicitly use Path | None instead of relying on implicit Optional.

🔎 Proposed fix
-def run_dry_run(config: dict, backend, sglang_config_path: Path = None) -> Path:
+def run_dry_run(config: dict, backend, sglang_config_path: Path | None = None) -> Path:
🧰 Tools
🪛 Ruff (0.14.8)

72-72: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In src/srtctl/cli/submit.py around line 72, the parameter annotation for
sglang_config_path uses an implicit Optional; update the function signature so
sglang_config_path is explicitly typed as Path | None (i.e., use the union type
operator) to make the optional intent explicit and type-checker friendly; ensure
Path is imported from pathlib if not already.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
scripts/utils/check_server_health.py (1)

30-48: Fix potentially confusing negative wait counts.

When one worker type exceeds the expected count while another doesn't, Line 47's subtraction can produce negative values in the message. This is the same issue flagged in the previous review.

🔎 Apply the previously suggested fix
     if actual_prefill >= expected_n_prefill and actual_decode >= expected_n_decode:
         return f"Model is ready. Have {actual_prefill} prefills and {actual_decode} decodes."
     else:
-        return f"Model is not ready, waiting for {expected_n_prefill - actual_prefill} prefills and {expected_n_decode - actual_decode} decodes. Have {actual_prefill} prefills and {actual_decode} decodes."
+        prefill_needed = max(0, expected_n_prefill - actual_prefill)
+        decode_needed = max(0, expected_n_decode - actual_decode)
+        return f"Model is not ready, waiting for {prefill_needed} prefills and {decode_needed} decodes. Have {actual_prefill} prefills and {actual_decode} decodes."
src/srtctl/cli/submit.py (1)

56-56: Fix implicit Optional type hint.

This issue was already flagged in a previous review. The sglang_config_path parameter should explicitly use Path | None instead of relying on implicit Optional.

🔎 Proposed fix
-def run_dry_run(config: dict, backend, sglang_config_path: Path = None) -> Path:
+def run_dry_run(config: dict, backend, sglang_config_path: Path | None = None) -> Path:
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e3bc5f and 00219df.

📒 Files selected for processing (4)
  • scripts/utils/check_server_health.py (2 hunks)
  • src/srtctl/backends/sglang.py (0 hunks)
  • src/srtctl/cli/submit.py (6 hunks)
  • src/srtctl/core/backend.py (1 hunks)
💤 Files with no reviewable changes (1)
  • src/srtctl/backends/sglang.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/srtctl/core/backend.py (3)
src/srtctl/core/config.py (1)
  • get_srtslurm_setting (104-118)
src/srtctl/core/sweep.py (1)
  • expand_template (17-48)
analysis/srtlog/models.py (1)
  • is_aggregated (74-76)
src/srtctl/cli/submit.py (1)
src/srtctl/core/backend.py (3)
  • SGLangBackend (20-212)
  • render_command (81-113)
  • generate_config_file (53-79)
🪛 Ruff (0.14.8)
src/srtctl/core/backend.py

23-23: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


53-53: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


65-65: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


68-68: Avoid specifying long messages outside the exception class

(TRY003)


81-81: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


98-98: Probable use of unsafe loader FullLoader with yaml.load. Allows instantiation of arbitrary objects. Consider yaml.safe_load.

(S506)


115-115: Unused method argument: config_path

(ARG002)


115-115: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


115-115: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


203-203: Avoid specifying long messages outside the exception class

(TRY003)

src/srtctl/cli/submit.py

56-56: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🔇 Additional comments (21)
scripts/utils/check_server_health.py (4)

9-27: LGTM! Clear documentation for dual-mode usage.

The updated docstring provides helpful usage examples for both Dynamo and SGLang router modes, making it easy for users to understand how to invoke the script in each context.


50-80: LGTM! Clean refactoring of existing Dynamo health check logic.

The existing logic has been successfully extracted into a well-named function without altering the health check behavior.


82-98: LGTM! Clean routing logic with proper validation.

The function correctly validates numeric inputs and routes to the appropriate health check handler based on the sglang_router flag. The .isnumeric() check implicitly rejects negative values since it only accepts positive digit sequences.


100-115: LGTM! Well-structured CLI interface.

The argparse implementation provides a clean CLI interface that correctly handles positional arguments for worker counts and an optional flag for SGLang router mode. The stdin-based input approach works well with the documented curl usage pattern.

src/srtctl/cli/submit.py (7)

29-29: LGTM: Import source change aligns with new architecture.

The import of SGLangBackend from srtctl.core.backend correctly reflects the refactored backend architecture shown in the relevant code snippets.


40-54: LGTM: Command rendering logic is clean and correct.

The function properly delegates command generation to the backend and creates an executable script with appropriate content and permissions.


57-84: LGTM: Dry-run implementation is clean and informative.

The inline artifact-saving approach is more readable than the previous context manager pattern, and the compact summary output provides clear feedback to users.


110-116: LGTM: Simplified dry-run flow is more maintainable.

The conditional backend construction and streamlined call to run_dry_run improve code clarity.


178-220: LGTM: Metadata generation uses modern Python idioms effectively.

The inline unpacking, walrus operator usage, and clearer variable naming (bench_meta instead of profiler_metadata) improve readability without changing functionality.


303-307: LGTM: Streamlined output formatting.

The compact f-string formatting improves readability of the sweep summary.


321-356: LGTM: CLI refactoring improves maintainability.

The introduction of add_common_args reduces duplication, and the removal of the validate subcommand simplifies the interface. The use of getattr with fallbacks is idiomatic and clean.

src/srtctl/core/backend.py (10)

31-38: LGTM: Helper methods provide clean encapsulation.

These utility methods appropriately encapsulate common checks and lookups used throughout the class.


40-51: LGTM: Flag conversion logic handles all value types correctly.

The method properly handles booleans, lists, and scalars, with sorted output for consistency.


63-68: LGTM: Kebab-case validation prevents CLI flag errors.

The validation ensures config keys use the correct format for command-line flags, with a helpful error message showing the correct form.


70-79: LGTM: Config file generation correctly merges environment variables.

The logic properly combines mode-specific configurations with their environment variables and writes to a temporary file.


85-94: LGTM: SGLang router support correctly implemented.

The launcher selection logic properly handles three cases: nsys profiling, SGLang router (when profiling is disabled or use_sglang_router is enabled), and Dynamo fallback. This aligns with the PR objective to support the SGLang router.


101-113: LGTM: Node count calculation correctly handles both modes.

The ternary logic properly calculates node count for disaggregated (mode-specific) and aggregated deployments.


120-128: LGTM: Resource extraction correctly handles both deployment modes.

The conditional logic properly extracts aggregated or disaggregated resources and calculates total node count.


130-148: LGTM: SLURM and benchmark configuration with sensible defaults.

The code properly retrieves SLURM settings with fallbacks to cluster config, and correctly parses benchmark parameters for sa-bench.


163-198: LGTM: Template variables comprehensively prepared, including router flag.

The template_vars dictionary is thorough and correctly includes the use_sglang_router flag (line 184), enabling the templates to conditionally render router-specific logic.


200-212: LGTM: Template rendering with helpful error message.

The template loading, rendering, and temporary file creation are correct. The error message on line 203 is descriptive and helps users troubleshoot configuration issues.

Comment on lines +20 to +30
class SGLangBackend:
"""SGLang backend for distributed serving."""

def __init__(self, config: dict, setup_script: str = None):
self.config = config
self.backend_config = config.get("backend", {})
self.resources = config.get("resources", {})
self.model = config.get("model", {})
self.slurm = config.get("slurm", {})
self.setup_script = setup_script

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix implicit Optional type hint for setup_script.

The setup_script parameter should explicitly use str | None instead of relying on implicit Optional.

🔎 Proposed fix
-    def __init__(self, config: dict, setup_script: str = None):
+    def __init__(self, config: dict, setup_script: str | None = None):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class SGLangBackend:
"""SGLang backend for distributed serving."""
def __init__(self, config: dict, setup_script: str = None):
self.config = config
self.backend_config = config.get("backend", {})
self.resources = config.get("resources", {})
self.model = config.get("model", {})
self.slurm = config.get("slurm", {})
self.setup_script = setup_script
class SGLangBackend:
"""SGLang backend for distributed serving."""
def __init__(self, config: dict, setup_script: str | None = None):
self.config = config
self.backend_config = config.get("backend", {})
self.resources = config.get("resources", {})
self.model = config.get("model", {})
self.slurm = config.get("slurm", {})
self.setup_script = setup_script
🧰 Tools
🪛 Ruff (0.14.8)

23-23: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In src/srtctl/core/backend.py around lines 20 to 30, the __init__ parameter
setup_script uses an implicit Optional by defaulting to None; update the type
annotation to explicitly indicate it can be None (use setup_script: str | None =
None or, if targeting older Python versions, setup_script: Optional[str] = None
and add from typing import Optional) so the signature clearly expresses an
optional string.

lines.append(f" --{flag} {value} \\")
return lines

def generate_config_file(self, params: dict = None) -> Path | None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix implicit Optional type hint for params.

The params parameter should explicitly use dict | None instead of relying on implicit Optional.

🔎 Proposed fix
-    def generate_config_file(self, params: dict = None) -> Path | None:
+    def generate_config_file(self, params: dict | None = None) -> Path | None:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def generate_config_file(self, params: dict = None) -> Path | None:
def generate_config_file(self, params: dict | None = None) -> Path | None:
🧰 Tools
🪛 Ruff (0.14.8)

53-53: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In src/srtctl/core/backend.py around line 53, the method signature uses an
implicit Optional for params; change the type hint to explicitly declare params:
dict | None (i.e., replace params: dict = None with params: dict | None = None)
so the parameter type is unambiguous; ensure any callers and docstrings remain
compatible and adjust any import/typing usage if your project targets a Python
version that requires typing.Optional instead of PEP 604 syntax.

logging.info(f"Generated SGLang config: {temp_path}")
return Path(temp_path)

def render_command(self, mode: str, config_path: Path = None) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix implicit Optional type hint for config_path.

The config_path parameter should explicitly use Path | None instead of relying on implicit Optional.

🔎 Proposed fix
-    def render_command(self, mode: str, config_path: Path = None) -> str:
+    def render_command(self, mode: str, config_path: Path | None = None) -> str:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def render_command(self, mode: str, config_path: Path = None) -> str:
def render_command(self, mode: str, config_path: Path | None = None) -> str:
🧰 Tools
🪛 Ruff (0.14.8)

81-81: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In src/srtctl/core/backend.py around line 81 the parameter type for config_path
is implicitly Optional; update the function signature to explicitly use a union
type (config_path: Path | None) instead of relying on implicit Optional, and
ensure typing imports/forward-refs are correct (add "from __future__ import
annotations" at top if needed or import typing.Optional if you prefer
Optional[Path]) so static type checkers accept the explicit optional type; run
tests/type-checking after the change.

Comment on lines +96 to +99
if config_path:
with open(config_path) as f:
sglang_config = yaml.load(f, Loader=yaml.FullLoader)
lines.extend(self._config_to_flags(sglang_config.get(mode, {})))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Security: Replace yaml.FullLoader with yaml.safe_load.

Using yaml.FullLoader allows instantiation of arbitrary Python objects, which is a security risk if config files can come from untrusted sources.

🔎 Proposed fix
         if config_path:
             with open(config_path) as f:
-                sglang_config = yaml.load(f, Loader=yaml.FullLoader)
+                sglang_config = yaml.safe_load(f)
             lines.extend(self._config_to_flags(sglang_config.get(mode, {})))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if config_path:
with open(config_path) as f:
sglang_config = yaml.load(f, Loader=yaml.FullLoader)
lines.extend(self._config_to_flags(sglang_config.get(mode, {})))
if config_path:
with open(config_path) as f:
sglang_config = yaml.safe_load(f)
lines.extend(self._config_to_flags(sglang_config.get(mode, {})))
🧰 Tools
🪛 Ruff (0.14.8)

98-98: Probable use of unsafe loader FullLoader with yaml.load. Allows instantiation of arbitrary objects. Consider yaml.safe_load.

(S506)

🤖 Prompt for AI Agents
In src/srtctl/core/backend.py around lines 96 to 99, the code uses
yaml.FullLoader to parse the config which can instantiate arbitrary Python
objects; replace yaml.load(f, Loader=yaml.FullLoader) with yaml.safe_load(f) to
avoid object construction from untrusted input, preserve existing behavior by
handling the possible None return (e.g., treat as {} or use .get safely), and
ensure any downstream call self._config_to_flags receives a dict.

)
return "\n".join(lines)

def generate_slurm_script(self, config_path: Path = None, timestamp: str = None) -> tuple[Path, str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix implicit Optional type hints for both parameters.

Both config_path and timestamp parameters should explicitly use union syntax instead of relying on implicit Optional.

🔎 Proposed fix
-    def generate_slurm_script(self, config_path: Path = None, timestamp: str = None) -> tuple[Path, str]:
+    def generate_slurm_script(self, config_path: Path | None = None, timestamp: str | None = None) -> tuple[Path, str]:
🧰 Tools
🪛 Ruff (0.14.8)

115-115: Unused method argument: config_path

(ARG002)


115-115: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


115-115: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In src/srtctl/core/backend.py around line 115, the method signature uses
implicit Optional by default (config_path: Path = None, timestamp: str = None);
change it to use explicit union syntax for both parameters: config_path: Path |
None = None and timestamp: str | None = None so static type checkers and linters
recognize the optional types; update any related type hints or docstrings nearby
if present to match the new annotations.

@ishandhanani ishandhanani changed the title support sglang router sgl-router and docs Dec 19, 2025
@ishandhanani ishandhanani merged commit 0147c76 into main Dec 19, 2025
2 of 3 checks passed
@weireweire
Copy link
Copy Markdown
Collaborator

what does multiple sglang router mean? each router connect to different worker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants