Add Rollup Stage for Experiment Data Consolidation#92
Conversation
📝 WalkthroughWalkthroughAdds a pluggable benchmark/node parsing framework, two benchmark parsers and two node parsers, rollup dataclasses and a RollupStageMixin (plus CLI integration), a standalone rollup harness and upload tool, script invocation refactors, tests, and a pandas dependency. Changes
Sequence Diagram(s)sequenceDiagram
participant Orch as SweepOrchestrator
participant Rollup as RollupStageMixin
participant BenchFactory as BenchmarkParserFactory
participant NodeFactory as NodeParserFactory
participant Logs as LogDirectory
participant Agg as RollupAggregator
participant File as rollup.json
Orch->>Rollup: run_rollup(tags?)
Rollup->>BenchFactory: get_benchmark_parser(benchmark_type)
BenchFactory->>Logs: parse_result_json()/parse_result_directory()
Logs-->>BenchFactory: benchmark metrics
Rollup->>NodeFactory: get_node_parser(backend_type)
NodeFactory->>Logs: parse_logs(log_dir)
Logs-->>NodeFactory: NodeMetrics[]
Rollup->>Rollup: _collect_environment_config()
Rollup->>Agg: _build_rollup_summary(benchmarks, nodes, env, commands)
Agg->>Agg: compute_summary_stats()
Rollup->>File: _write_rollup(RollupSummary)
File-->>Orch: rollup.json path
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@analysis/srtlog/parsers/nodes/sglang.py`:
- Around line 93-95: The log-reading try block that opens log_path and iterates
lines should guard against UnicodeDecodeError by opening the file with an
explicit encoding and a replacement error handler; update the with
open(log_path) call in the try block that reads lines to use encoding="utf-8"
and errors="replace" so non‑UTF8 bytes are replaced instead of raising and
dropping metrics (keep the rest of the loop logic unchanged and preserve the
existing try/except behavior).
In `@docs/architecture.md`:
- Around line 822-826: The fenced code block that renders the "ROLLUP STAGE
ARCHITECTURE" ASCII art is missing a language tag and triggers MD040; update the
opening fence from ``` to ```text so the block is tagged as plain text (i.e.,
replace the triple backticks before the ASCII box with ```text) to satisfy
Markdownlint. Locate the fenced block containing the "ROLLUP STAGE ARCHITECTURE"
header and modify only the opening fence.
In `@src/srtctl/cli/do_sweep.py`:
- Around line 211-214: The code uses hasattr(self.config, "tags") which leaves
tags typed as object and fails the run_rollup(tags: list[str] | None) signature;
update the extraction to explicitly narrow to list[str] | None before calling
run_rollup (e.g., check isinstance(self.config.tags, list) and verify items are
str, or use typing.cast[list[str] | None](...) after validating) so that the
local variable tags is typed as list[str] | None and then call
self.run_rollup(tags=tags).
In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 561-564: The endpoints property currently has an ellipsis body
which returns None and violates the declared return type list["Endpoint"];
replace the ellipsis with either (a) remove the `@property` method and declare a
class-level type hint endpoints: list["Endpoint"] if you want a pure type-only
mixin, or (b) implement the property to raise NotImplementedError (e.g. def
endpoints(self) -> list["Endpoint"]: raise NotImplementedError) so concrete
classes must provide it; update the symbol named endpoints accordingly.
- Around line 641-661: The type checker flags calls to optional parser methods
because BenchmarkParserProtocol doesn't declare find_aiperf_results and
parse_result_directory; fix by either using getattr with a callable check before
invoking (e.g., replace hasattr checks with m = getattr(parser,
"find_aiperf_results", None) and if callable(m): use m(...), same for
parse_result_directory) or add these methods as optional members to
BenchmarkParserProtocol (with default implementations returning empty lists) so
parse_result_json, find_aiperf_results, and parse_result_directory are
statically known; update references in the rollup logic where
parser.parse_result_json, parser.find_aiperf_results, and
parser.parse_result_directory are used.
In `@src/srtctl/README.md`:
- Around line 144-158: The fenced diagram block in the README triggers
markdownlint MD040 because it lacks a language tag; update the fenced code block
in the ROLLUP STAGE PIPELINE section (the triple-backtick block showing the
ASCII diagram) to include a language identifier (e.g., change ``` to ```text) so
the block is explicitly marked and the lint rule is satisfied.
In `@tests/test_rollup.py`:
- Around line 800-938: The test fails because analysis.srtlog modules import
pandas unconditionally (e.g., in analysis.srtlog.config_reader) causing
ImportError to be swallowed by RollupStageMixin.run_rollup; update the pandas
import to be guarded (wrap "import pandas as pd" in a try/except ImportError and
set pd = None) and ensure functions in analysis.srtlog.config_reader and any
parsers check pd is not None before using it and raise a clear error if pandas
is required, and also modify RollupStageMixin.run_rollup to not silently ignore
ImportError (detect ImportError from analysis.srtlog imports and re-raise or
surface a clear message) so tests fail fast if pandas is genuinely needed.
🧹 Nitpick comments (8)
tests/test_configs.py (1)
388-390: Extract shared setup into pytest fixtures to reduce repetition.The new test class repeats config creation, root monkeypatching, and sbatch mocking. A small fixture/helper would keep this suite shorter and easier to maintain.
tests/test_rollup.py (1)
701-744: Consider freezing mock config dataclasses.These mocks model configuration objects; setting
frozen=Truewould align with the immutable config pattern used in production code. As per coding guidelines, ...analysis/srtlog/parsers/__init__.py (2)
29-53: Consider making data-carrying dataclasses frozen for immutability.Per coding guidelines, configuration objects should use
@dataclass(frozen=True). SinceBenchmarkLaunchCommandrepresents parsed, immutable launch command data, consider making it frozen to prevent accidental mutation after parsing.♻️ Suggested change
-@dataclass +@dataclass(frozen=True) class BenchmarkLaunchCommand: """Parsed benchmark launch command information."""Note: This would require using
field(default_factory=dict)replacement. If mutability is intentional for incremental field population during parsing, this can be ignored.
272-275: Wildcard imports for auto-registration are acceptable but could be more explicit.The wildcard imports trigger parser registration as a side effect. While this works, explicit imports (e.g.,
from analysis.srtlog.parsers.benchmark import SABenchParser, MooncakeRouterParser) would make the registered parsers more discoverable in this file. This is a minor readability preference.src/srtctl/cli/mixins/rollup_stage.py (4)
28-57: Consider frozen dataclass for immutable rollup data.Similar to the parsers module, these data objects represent parsed results that shouldn't be mutated after construction. Per coding guidelines, consider
@dataclass(frozen=True)forLaunchCommandRollupand other rollup data models.
245-318: Consider extracting common aggregation logic to reduce duplication.The
_aggregate_agg_batchesmethod duplicates logic from_aggregate_prefill_batches(lines 254-287) and_aggregate_decode_batches(lines 289-317). Consider refactoring to reuse the existing methods:♻️ Suggested refactor
def _aggregate_agg_batches(self, batches: list) -> None: """Aggregate metrics for agg workers (handles both prefill and decode batches).""" # Separate prefill and decode batches prefill_batches = [b for b in batches if b.batch_type == "prefill"] decode_batches = [b for b in batches if b.batch_type == "decode"] - self.total_prefill_batches = len(prefill_batches) - self.total_decode_batches = len(decode_batches) - - # Aggregate prefill metrics + # Reuse existing aggregation methods if prefill_batches: - new_tokens = [] - # ... (duplicated prefill logic) + self._aggregate_prefill_batches(prefill_batches) - - # Aggregate decode metrics if decode_batches: - running_reqs = [] - # ... (duplicated decode logic) + self._aggregate_decode_batches(decode_batches)
773-773: Moveimport osto module level.The
osmodule import should be at the top of the file with other standard library imports, not inside the method.♻️ Suggested change
Add to imports at top of file (around line 11):
import json import logging import os # Add this from dataclasses import asdict, dataclass, fieldThen remove line 773.
952-959: Minor: Use generator expression instead of list forany().The
any()function can short-circuit with a generator, avoiding creation of an intermediate list.♻️ Suggested change
- if not any([ + if not any(( config.prefill_environment, config.decode_environment, config.aggregated_environment, config.prefill_engine_config, config.decode_engine_config, config.aggregated_engine_config, - ]): + )):
| try: | ||
| with open(log_path) as f: | ||
| for line in f: |
There was a problem hiding this comment.
Guard against UnicodeDecodeError when reading logs.
If logs contain non‑UTF8 bytes, open() can throw and drop all metrics. Use explicit encoding with errors="replace" to keep parsing resilient.
🔧 Proposed fix
- with open(log_path) as f:
+ with open(log_path, "r", encoding="utf-8", errors="replace") as f:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| with open(log_path) as f: | |
| for line in f: | |
| try: | |
| with open(log_path, "r", encoding="utf-8", errors="replace") as f: | |
| for line in f: |
🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/nodes/sglang.py` around lines 93 - 95, The
log-reading try block that opens log_path and iterates lines should guard
against UnicodeDecodeError by opening the file with an explicit encoding and a
replacement error handler; update the with open(log_path) call in the try block
that reads lines to use encoding="utf-8" and errors="replace" so non‑UTF8 bytes
are replaced instead of raising and dropping metrics (keep the rest of the loop
logic unchanged and preserve the existing try/except behavior).
| ``` | ||
| ┌─────────────────────────────────────────────────────────────────────────────────────┐ | ||
| │ ROLLUP STAGE ARCHITECTURE │ | ||
| └─────────────────────────────────────────────────────────────────────────────────────┘ | ||
|
|
There was a problem hiding this comment.
Add a language tag to this fenced block for MD040 compliance.
Markdownlint flags this block; tagging it as text is sufficient.
🛠️ Suggested change
-```
+```text📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ``` | |
| ┌─────────────────────────────────────────────────────────────────────────────────────┐ | |
| │ ROLLUP STAGE ARCHITECTURE │ | |
| └─────────────────────────────────────────────────────────────────────────────────────┘ |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
822-822: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@docs/architecture.md` around lines 822 - 826, The fenced code block that
renders the "ROLLUP STAGE ARCHITECTURE" ASCII art is missing a language tag and
triggers MD040; update the opening fence from ``` to ```text so the block is
tagged as plain text (i.e., replace the triple backticks before the ASCII box
with ```text) to satisfy Markdownlint. Locate the fenced block containing the
"ROLLUP STAGE ARCHITECTURE" header and modify only the opening fence.
| # Run rollup to consolidate experiment data | ||
| if exit_code == 0: | ||
| tags = self.config.tags if hasattr(self.config, "tags") else [] | ||
| self.run_rollup(tags=tags) |
There was a problem hiding this comment.
Fix tags type narrowing to satisfy run_rollup signature (CI failure).
Current hasattr path leaves tags typed as object, triggering the CI type error. Narrow explicitly to list[str] | None.
✅ Suggested fix
- if exit_code == 0:
- tags = self.config.tags if hasattr(self.config, "tags") else []
- self.run_rollup(tags=tags)
+ if exit_code == 0:
+ tags: list[str] | None = None
+ raw_tags = getattr(self.config, "tags", None)
+ if isinstance(raw_tags, list):
+ tags = raw_tags
+ self.run_rollup(tags=tags)🧰 Tools
🪛 GitHub Actions: CI
[error] 214-214: invalid-argument-type: Argument to bound method run_rollup is incorrect. Expected list[str] | None, found object.
🤖 Prompt for AI Agents
In `@src/srtctl/cli/do_sweep.py` around lines 211 - 214, The code uses
hasattr(self.config, "tags") which leaves tags typed as object and fails the
run_rollup(tags: list[str] | None) signature; update the extraction to
explicitly narrow to list[str] | None before calling run_rollup (e.g., check
isinstance(self.config.tags, list) and verify items are str, or use
typing.cast[list[str] | None](...) after validating) so that the local variable
tags is typed as list[str] | None and then call self.run_rollup(tags=tags).
| @property | ||
| def endpoints(self) -> list["Endpoint"]: | ||
| """Endpoint allocation topology.""" | ||
| ... |
There was a problem hiding this comment.
Fix the endpoints property that always returns None.
The property body ... (ellipsis) causes it to implicitly return None, violating the list[Endpoint] return type annotation. This triggers the pipeline error. For a mixin expecting the concrete class to provide this property, either:
- Remove the implementation entirely (just declare the type hint)
- Raise
NotImplementedError
🔧 Proposed fix
`@property`
def endpoints(self) -> list["Endpoint"]:
"""Endpoint allocation topology."""
- ...
+ raise NotImplementedError("Subclass must provide endpoints property")Or, if using a pure type-hint approach for mixins:
# Just declare as class variable type hint without `@property`
endpoints: list["Endpoint"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @property | |
| def endpoints(self) -> list["Endpoint"]: | |
| """Endpoint allocation topology.""" | |
| ... | |
| `@property` | |
| def endpoints(self) -> list["Endpoint"]: | |
| """Endpoint allocation topology.""" | |
| raise NotImplementedError("Subclass must provide endpoints property") |
🧰 Tools
🪛 GitHub Actions: CI
[error] 562-562: invalid-return-type: Function always implicitly returns None, which is not assignable to return type list[Endpoint].
🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 561 - 564, The endpoints
property currently has an ellipsis body which returns None and violates the
declared return type list["Endpoint"]; replace the ellipsis with either (a)
remove the `@property` method and declare a class-level type hint endpoints:
list["Endpoint"] if you want a pure type-only mixin, or (b) implement the
property to raise NotImplementedError (e.g. def endpoints(self) ->
list["Endpoint"]: raise NotImplementedError) so concrete classes must provide
it; update the symbol named endpoints accordingly.
| # Try parser-specific result collection first | ||
| if parser is not None: | ||
| # For mooncake-router, look for AIPerf results | ||
| if hasattr(parser, "find_aiperf_results"): | ||
| aiperf_files = parser.find_aiperf_results(self.runtime.log_dir) | ||
| for aiperf_path in aiperf_files: | ||
| result = parser.parse_result_json(aiperf_path) | ||
| if result.get("output_tps") is not None: | ||
| results.append(result) | ||
| logger.debug("Loaded AIPerf result: %s", aiperf_path) | ||
|
|
||
| # For sa-bench style, look for result directories | ||
| if hasattr(parser, "parse_result_directory"): | ||
| for entry in self.runtime.log_dir.iterdir(): | ||
| if not entry.is_dir(): | ||
| continue | ||
| # Match patterns like sa-bench_isl_X_osl_Y | ||
| if "_isl_" in entry.name and "_osl_" in entry.name: | ||
| logger.debug("Found benchmark results directory: %s", entry.name) | ||
| dir_results = parser.parse_result_directory(entry) | ||
| results.extend(dir_results) |
There was a problem hiding this comment.
Fix type errors for optional parser methods not in protocol.
The type checker reports call-non-callable because find_aiperf_results and parse_result_directory are not defined in BenchmarkParserProtocol. While the hasattr checks make this runtime-safe, the type checker cannot verify this.
🔧 Proposed fix using getattr
# Try parser-specific result collection first
if parser is not None:
# For mooncake-router, look for AIPerf results
- if hasattr(parser, "find_aiperf_results"):
- aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+ find_aiperf = getattr(parser, "find_aiperf_results", None)
+ if find_aiperf is not None:
+ aiperf_files = find_aiperf(self.runtime.log_dir)
for aiperf_path in aiperf_files:
result = parser.parse_result_json(aiperf_path)
if result.get("output_tps") is not None:
results.append(result)
logger.debug("Loaded AIPerf result: %s", aiperf_path)
# For sa-bench style, look for result directories
- if hasattr(parser, "parse_result_directory"):
+ parse_result_dir = getattr(parser, "parse_result_directory", None)
+ if parse_result_dir is not None:
for entry in self.runtime.log_dir.iterdir():
if not entry.is_dir():
continue
# Match patterns like sa-bench_isl_X_osl_Y
if "_isl_" in entry.name and "_osl_" in entry.name:
logger.debug("Found benchmark results directory: %s", entry.name)
- dir_results = parser.parse_result_directory(entry)
+ dir_results = parse_result_dir(entry)
results.extend(dir_results)Alternatively, add these methods as optional (with default implementations returning empty results) to BenchmarkParserProtocol.
🧰 Tools
🪛 GitHub Actions: CI
[error] 645-645: call-non-callable: Object of type object is not callable.
[error] 660-660: call-non-callable: Object of type object is not callable.
🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 641 - 661, The type
checker flags calls to optional parser methods because BenchmarkParserProtocol
doesn't declare find_aiperf_results and parse_result_directory; fix by either
using getattr with a callable check before invoking (e.g., replace hasattr
checks with m = getattr(parser, "find_aiperf_results", None) and if callable(m):
use m(...), same for parse_result_directory) or add these methods as optional
members to BenchmarkParserProtocol (with default implementations returning empty
lists) so parse_result_json, find_aiperf_results, and parse_result_directory are
statically known; update references in the rollup logic where
parser.parse_result_json, parser.find_aiperf_results, and
parser.parse_result_directory are used.
| ``` | ||
| ┌─────────────────────────────────────────────────────────────────────────────┐ | ||
| │ ROLLUP STAGE PIPELINE │ | ||
| ├─────────────────────────────────────────────────────────────────────────────┤ | ||
| │ │ | ||
| │ INPUT FILES OUTPUT │ | ||
| │ ─────────── ────── │ | ||
| │ • benchmark results (*.json) ──┐ │ | ||
| │ • worker logs (*.out/*.err) ──┼──► rollup.json │ | ||
| │ • config.yaml ──┤ • benchmark results │ | ||
| │ • engine configs (*.yaml) ──┘ • node metrics │ | ||
| │ • environment config │ | ||
| │ • summary statistics │ | ||
| └─────────────────────────────────────────────────────────────────────────────┘ | ||
| ``` |
There was a problem hiding this comment.
Add language to fenced block to satisfy markdownlint (MD040).
🔧 Proposed fix
-```
+```text
┌─────────────────────────────────────────────────────────────────────────────┐
│ ROLLUP STAGE PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ INPUT FILES OUTPUT │
│ ─────────── ────── │
│ • benchmark results (*.json) ──┐ │
│ • worker logs (*.out/*.err) ──┼──► rollup.json │
│ • config.yaml ──┤ • benchmark results │
│ • engine configs (*.yaml) ──┘ • node metrics │
│ • environment config │
│ • summary statistics │
└─────────────────────────────────────────────────────────────────────────────┘📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ``` | |
| ┌─────────────────────────────────────────────────────────────────────────────┐ | |
| │ ROLLUP STAGE PIPELINE │ | |
| ├─────────────────────────────────────────────────────────────────────────────┤ | |
| │ │ | |
| │ INPUT FILES OUTPUT │ | |
| │ ─────────── ────── │ | |
| │ • benchmark results (*.json) ──┐ │ | |
| │ • worker logs (*.out/*.err) ──┼──► rollup.json │ | |
| │ • config.yaml ──┤ • benchmark results │ | |
| │ • engine configs (*.yaml) ──┘ • node metrics │ | |
| │ • environment config │ | |
| │ • summary statistics │ | |
| └─────────────────────────────────────────────────────────────────────────────┘ | |
| ``` |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
144-144: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@src/srtctl/README.md` around lines 144 - 158, The fenced diagram block in the
README triggers markdownlint MD040 because it lacks a language tag; update the
fenced code block in the ROLLUP STAGE PIPELINE section (the triple-backtick
block showing the ASCII diagram) to include a language identifier (e.g., change
``` to ```text) so the block is explicitly marked and the lint rule is
satisfied.
| def test_rollup_with_node_logs(self, tmp_path): | ||
| """Test rollup with actual node log files parsed by NodeAnalyzer.""" | ||
| from dataclasses import dataclass, field | ||
|
|
||
| # Create mock benchmark results | ||
| bench_dir = tmp_path / "sa-bench_isl_1024_osl_1024" | ||
| bench_dir.mkdir() | ||
|
|
||
| result_file = bench_dir / "result_c100.json" | ||
| result_file.write_text( | ||
| json.dumps( | ||
| { | ||
| "max_concurrency": 100, | ||
| "output_throughput": 5000.0, | ||
| "mean_ttft_ms": 150.0, | ||
| "mean_itl_ms": 20.0, | ||
| } | ||
| ) | ||
| ) | ||
|
|
||
| # Create mock prefill log file (matches NodeAnalyzer expected format) | ||
| prefill_log = tmp_path / "node-01_prefill_w0.err" | ||
| prefill_log.write_text( | ||
| """[2025-01-22 10:00:00 DP0 TP0 EP0] Prefill batch, #new-seq: 10, #new-token: 1024, #cached-token: 256, token usage: 0.50, #running-req: 5, #queue-req: 2, #prealloc-req: 0, #inflight-req: 3, input throughput (token/s): 5000.00, | ||
| [2025-01-22 10:00:01 DP0 TP0 EP0] Load weight end. type=DeepseekV3ForCausalLM, dtype=torch.bfloat16, avail mem=75.11 GB, mem usage=107.07 GB. | ||
| [2025-01-22 10:00:02 DP0 TP0 EP0] KV Cache is allocated. #tokens: 524288, KV size: 17.16 GB | ||
| """ | ||
| ) | ||
|
|
||
| # Create mock decode log file | ||
| decode_log = tmp_path / "node-02_decode_w0.err" | ||
| decode_log.write_text( | ||
| """[2025-01-22 10:00:00 DP31 TP31 EP31] Decode batch, #running-req: 50, #token: 5000, token usage: 0.50, pre-allocated usage: 0.10, #prealloc-req: 2, #transfer-req: 1, #queue-req: 3, gen throughput (token/s): 150.00, | ||
| """ | ||
| ) | ||
|
|
||
| # Mock config classes | ||
| @dataclass | ||
| class MockResourceConfig: | ||
| is_disaggregated: bool = True | ||
| prefill_gpus: int = 8 | ||
| decode_gpus: int = 8 | ||
| agg_nodes: int | None = None | ||
| gpus_per_node: int = 8 | ||
| gpu_type: str = "B200" | ||
| total_nodes: int = 2 | ||
| prefill_nodes: int = 1 | ||
| decode_nodes: int = 1 | ||
| num_prefill: int = 1 | ||
| num_decode: int = 1 | ||
| num_agg: int | None = None | ||
|
|
||
| @dataclass | ||
| class MockBenchmarkConfig: | ||
| type: str = "sa-bench" | ||
| isl: int = 1024 | ||
| osl: int = 1024 | ||
| concurrencies: str = "100" | ||
|
|
||
| def get_concurrency_list(self): | ||
| return [100] | ||
|
|
||
| @dataclass | ||
| class MockModelConfig: | ||
| precision: str = "fp8" | ||
|
|
||
| @dataclass | ||
| class MockFrontendConfig: | ||
| type: str = "sglang" | ||
|
|
||
| @dataclass | ||
| class MockConfig: | ||
| name: str = "test-job" | ||
| served_model_name: str = "deepseek-v3" | ||
| backend_type: str = "sglang" | ||
| resources: MockResourceConfig = field(default_factory=MockResourceConfig) | ||
| benchmark: MockBenchmarkConfig = field(default_factory=MockBenchmarkConfig) | ||
| model: MockModelConfig = field(default_factory=MockModelConfig) | ||
| frontend: MockFrontendConfig = field(default_factory=MockFrontendConfig) | ||
|
|
||
| @dataclass | ||
| class MockRuntime: | ||
| job_id: str = "12345" | ||
| log_dir: Path = field(default_factory=Path) | ||
| model_path: Path = field(default_factory=lambda: Path("/models/deepseek-v3")) | ||
|
|
||
| class MockOrchestrator(RollupStageMixin): | ||
| def __init__(self, config, runtime): | ||
| self._config = config | ||
| self._runtime = runtime | ||
|
|
||
| @property | ||
| def config(self): | ||
| return self._config | ||
|
|
||
| @property | ||
| def runtime(self): | ||
| return self._runtime | ||
|
|
||
| @property | ||
| def endpoints(self): | ||
| return [] | ||
|
|
||
| config = MockConfig() | ||
| runtime = MockRuntime(log_dir=tmp_path) | ||
| orchestrator = MockOrchestrator(config, runtime) | ||
|
|
||
| # Run rollup | ||
| rollup_path = orchestrator.run_rollup(tags=["node-test"]) | ||
|
|
||
| assert rollup_path is not None | ||
| assert rollup_path.exists() | ||
|
|
||
| with open(rollup_path) as f: | ||
| data = json.load(f) | ||
|
|
||
| # Verify node summary is present | ||
| assert data["nodes_summary"] is not None | ||
| nodes_summary = data["nodes_summary"] | ||
|
|
||
| assert nodes_summary["total_prefill_nodes"] == 1 | ||
| assert nodes_summary["total_decode_nodes"] == 1 | ||
| assert len(nodes_summary["nodes"]) == 2 | ||
|
|
||
| # Find prefill and decode nodes | ||
| prefill_node = next((n for n in nodes_summary["nodes"] if n["worker_type"] == "prefill"), None) | ||
| decode_node = next((n for n in nodes_summary["nodes"] if n["worker_type"] == "decode"), None) | ||
|
|
||
| assert prefill_node is not None | ||
| assert prefill_node["node_name"] == "node-01" | ||
| assert prefill_node["total_new_tokens"] == 1024 | ||
| assert prefill_node["total_cached_tokens"] == 256 | ||
| assert prefill_node["kv_cache_gb"] == 17.16 | ||
|
|
||
| assert decode_node is not None | ||
| assert decode_node["node_name"] == "node-02" | ||
| assert decode_node["max_running_requests"] == 50 | ||
| assert decode_node["avg_gen_throughput"] == 150.0 | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check for pandas in declared dependencies
rg -n "pandas" pyproject.toml setup.py setup.cfg 2>/dev/null | head -20Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Look for parser imports and NodeAnalyzer usage
rg -n "srtlog\.parsers|NodeAnalyzer|from analysis" tests/test_rollup.py -A 2Repository: ishandhanani/srt-slurm
Length of output: 940
🏁 Script executed:
# Check where RollupStageMixin is defined and how it uses parsers
rg -n "class RollupStageMixin|def run_rollup" -g "*.py" -A 5Repository: ishandhanani/srt-slurm
Length of output: 955
🏁 Script executed:
# Check pyproject.toml more thoroughly for dependencies
cat pyproject.toml | grep -A 50 "dependencies\|optional-dependencies"Repository: ishandhanani/srt-slurm
Length of output: 2208
🏁 Script executed:
# Get the full run_rollup implementation
sed -n '566,650p' src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 3361
🏁 Script executed:
# Find NodeAnalyzer class and its imports
rg -n "class NodeAnalyzer|from.*NodeAnalyzer" -g "*.py"Repository: ishandhanani/srt-slurm
Length of output: 487
🏁 Script executed:
# Find _collect_node_metrics implementation
rg -n "_collect_node_metrics" src/srtctl/cli/mixins/rollup_stage.py -A 30Repository: ishandhanani/srt-slurm
Length of output: 2757
🏁 Script executed:
# Check NodeAnalyzer class and its imports
sed -n '1,100p' analysis/srtlog/log_parser.pyRepository: ishandhanani/srt-slurm
Length of output: 3549
🏁 Script executed:
# Search for pandas imports in analysis module
rg -n "import pandas|from pandas" analysis/ -g "*.py"Repository: ishandhanani/srt-slurm
Length of output: 513
🏁 Script executed:
# Check if analysis is a separate package or part of main package
ls -la analysis/ 2>/dev/null | head -20Repository: ishandhanani/srt-slurm
Length of output: 352
🏁 Script executed:
# Check if there's a separate setup for analysis or if it's optional
rg -n "analysis" pyproject.tomlRepository: ishandhanani/srt-slurm
Length of output: 85
🏁 Script executed:
# Look for any optional-dependencies section more completely
cat pyproject.toml | grep -B5 -A20 "optional-dependencies"Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Verify if parsers module exists and what it imports
head -50 analysis/srtlog/parsers/__init__.py 2>/dev/null || echo "Check if parsers is a directory"Repository: ishandhanani/srt-slurm
Length of output: 1465
🏁 Script executed:
# Check if analysis is properly included as a package in pyproject.toml
grep -A 10 "packages\|include" pyproject.toml | head -20Repository: ishandhanani/srt-slurm
Length of output: 426
🏁 Script executed:
# Verify the full parsers __init__ to see how get_node_parser works
cat analysis/srtlog/parsers/__init__.pyRepository: ishandhanani/srt-slurm
Length of output: 7682
🏁 Script executed:
# Check if there are any try-except guards around pandas imports
rg -n "try:|except ImportError|importorskip" analysis/srtlog/log_parser.py -B 2 -A 2Repository: ishandhanani/srt-slurm
Length of output: 188
🏁 Script executed:
# Check if analysis has its own package configuration
ls -la analysis/setup.py analysis/pyproject.toml 2>&1Repository: ishandhanani/srt-slurm
Length of output: 201
🏁 Script executed:
# Check how analysis module is meant to be installed/used
cat analysis/__init__.pyRepository: ishandhanani/srt-slurm
Length of output: 235
🏁 Script executed:
# Verify pandas is imported unconditionally at module level in log_parser
sed -n '1,30p' analysis/srtlog/log_parser.py | grep -n "import\|pandas"Repository: ishandhanani/srt-slurm
Length of output: 172
🏁 Script executed:
# Check if test currently passes or fails by looking for CI configuration
find . -name "*.yml" -o -name "*.yaml" | head -10 | xargs grep -l "test_rollup_with_node_logs" 2>/dev/null || echo "No CI config found mentioning the test"Repository: ishandhanani/srt-slurm
Length of output: 104
🏁 Script executed:
# Check if pandas is conditionally imported anywhere in analysis/srtlog
rg -n "try:|except.*ImportError" analysis/srtlog/ -B 1 -A 3Repository: ishandhanani/srt-slurm
Length of output: 9916
🏁 Script executed:
# Verify what will happen when the test runs without pandas
# Check the full import chain: parsers -> nodes -> imports
head -50 analysis/srtlog/parsers/nodes.py 2>/dev/null | grep -n "import\|pandas"Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Check if there's a way test could run successfully without analysis module
rg -n "import pytest" tests/test_rollup.py | head -5Repository: ishandhanani/srt-slurm
Length of output: 81
🏁 Script executed:
# Check which parsers are imported when parsers/__init__.py is loaded
sed -n '1,20p' analysis/srtlog/parsers/benchmark/__init__.py 2>/dev/null || echo "Check parsers submodules"Repository: ishandhanani/srt-slurm
Length of output: 442
🏁 Script executed:
# Verify if analysis module is included in the package
grep -r "analysis" pyproject.tomlRepository: ishandhanani/srt-slurm
Length of output: 82
🏁 Script executed:
# Try to understand the actual import error that would occur
python3 -c "import sys; sys.path.insert(0, '.'); from analysis.srtlog.parsers import get_node_parser" 2>&1 | head -20Repository: ishandhanani/srt-slurm
Length of output: 402
🏁 Script executed:
# Verify if pytest.importorskip would catch the pandas import error
python3 << 'EOF'
import sys
sys.path.insert(0, '.')
# Simulate what pytest.importorskip would do
try:
__import__("analysis.srtlog.parsers")
print("Import successful")
except ImportError as e:
print(f"ImportError caught: {e}")
EOFRepository: ishandhanani/srt-slurm
Length of output: 110
🏁 Script executed:
# Check the exact location of the test to confirm line numbers
grep -n "def test_rollup_with_node_logs" tests/test_rollup.pyRepository: ishandhanani/srt-slurm
Length of output: 121
🏁 Script executed:
# Verify if analysis should be optional or required
git log --oneline analysis/srtlog/__init__.py 2>/dev/null | head -5 || echo "Git history not available"Repository: ishandhanani/srt-slurm
Length of output: 86
Test requires pandas dependency which is not declared.
The test calls orchestrator.run_rollup(), which attempts to import analysis.srtlog.parsers. This import chain loads analysis.srtlog modules that unconditionally import pandas (e.g., config_reader.py line 10), causing an ImportError when pandas is unavailable. The exception is caught silently in run_rollup() (line 611-613), causing nodes_summary to be None and subsequent assertions to fail.
Add the import guard or declare pandas as a dependency:
Import guard approach
def test_rollup_with_node_logs(self, tmp_path):
"""Test rollup with actual node log files parsed by NodeAnalyzer."""
+ pytest.importorskip("pandas")
from dataclasses import dataclass, field🧰 Tools
🪛 GitHub Actions: CI
[error] 917-917: Rollup test failure: node metrics unavailable due to missing pandas backend / analysis.srtlog.parsers not available, causing assertions like 'data["nodes_summary"] is not None' to fail.
🤖 Prompt for AI Agents
In `@tests/test_rollup.py` around lines 800 - 938, The test fails because
analysis.srtlog modules import pandas unconditionally (e.g., in
analysis.srtlog.config_reader) causing ImportError to be swallowed by
RollupStageMixin.run_rollup; update the pandas import to be guarded (wrap
"import pandas as pd" in a try/except ImportError and set pd = None) and ensure
functions in analysis.srtlog.config_reader and any parsers check pd is not None
before using it and raise a clear error if pandas is required, and also modify
RollupStageMixin.run_rollup to not silently ignore ImportError (detect
ImportError from analysis.srtlog imports and re-raise or surface a clear
message) so tests fail fast if pandas is genuinely needed.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Line 19: Move "pandas>=2.3.3" out of the core dependencies in pyproject.toml
and add it under an optional extras group named "analysis" by creating a
[project.optional-dependencies] section with analysis = ["pandas>=2.3.3"];
update any packaging metadata that reads dependencies to ensure pandas is no
longer listed as a required dependency for the CLI entry points (srtctl,
srtctl-i, srtctl-sweep, srtctl-setup-head), and add a short note to the README
instructing users who need the analysis/benchmark scripts to install with pip
install srtctl[analysis].
In `@src/srtctl/benchmarks/scripts/longbenchv2/bench.sh`:
- Around line 31-43: The current string-based variable "command" is unsafe
because NUM_EXAMPLES and CATEGORIES can contain spaces/shell tokens and you use
eval; instead build the invocation as an array (e.g., an array variable such as
cmd) starting with the python module and required args (ENDPOINT, MODEL_NAME,
MAX_TOKENS, MAX_CONTEXT_LENGTH, NUM_THREADS) and conditionally append
NUM_EXAMPLES and CATEGORIES as separate array elements only when non-empty;
finally echo a joined representation for logging and run the command using array
execution (array expansion) rather than eval to avoid word-splitting and shell
injection.
In `@src/srtctl/benchmarks/scripts/mooncake-router/bench.sh`:
- Around line 60-62: The script builds user-controlled shell strings into the
variable named command and then runs them with eval (e.g., the lines setting
command and eval $command and the similar block around lines 79-81); replace
that pattern with a safe array-style invocation: construct the command as an
array where each token is a separate element and interpolate variables as quoted
expansions (e.g., cmd=(aiperf profile -m "$MODEL_NAME" --url "$ENDPOINT"
--streaming --ui simple --concurrency 10 --request-count 20)) and then execute
with "${cmd[@]}", applying the same change to the second command block so spaces
and injections are prevented.
In `@src/srtctl/benchmarks/scripts/profiling/profile.sh`:
- Around line 133-136: The script builds a shell command string in the variable
named command and then runs it with eval, which is unsafe for env-provided
PROFILE_* and PROFILING_MODE values; instead construct the argument list as an
array (e.g., cmd=(python3 -m sglang.bench_serving --backend sglang --model
"$model_name" ...)) and run it with "${cmd[@]}" to avoid word-splitting and
shell injection; use printf or echo to log the command safely (e.g., printf '%q
' "${cmd[@]}"; printf '\n') before executing, and apply the same array+printf
replacement for the other eval usages noted around the block that covers lines
142-145.
In `@src/srtctl/benchmarks/scripts/router/bench.sh`:
- Around line 42-45: The script currently builds a single string in the variable
command and uses eval, which permits shell metacharacters in user-controlled
variables (PREFIX_RATIOS, ISL, OSL, REQUESTS, CONCURRENCY) to be interpreted;
replace this with a bash array for safe argument handling: construct an array
named cmd starting with the interpreter and script (e.g., python
prefix_ratio_benchmark.py) and push each flag and its value as separate elements
using quoted variable expansions (e.g., "--prefix-ratios" "$PREFIX_RATIOS"),
then execute the command with "${cmd[@]}" and remove the eval and the SC2086
disable. Ensure result_dir is passed as the value of "--output-dir" similarly
and keep the echo of the command by joining the array for logging if needed.
In `@src/srtctl/benchmarks/scripts/sa-bench/bench.sh`:
- Around line 53-58: The loop builds a shell command string and uses eval which
is unsafe for untrusted values (MODEL_NAME, HOST, concurrency); instead
construct the command as an array and execute it without eval: replace the
string-assigned command and eval with an array like cmd=(python3 -u
"${WORK_DIR}/benchmark_serving.py" --model "${MODEL_NAME}" --tokenizer
"${MODEL_PATH}" --host "${HOST}" --port "${PORT}" --backend dynamo --endpoint
/v1/completions --disable-tqdm --dataset-name random --num-prompts
"${num_prompts}" --random-input-len "${ISL}" --random-output-len "${OSL}"
--random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics
ttft,tpot,itl,e2el --max-concurrency "${concurrency}") and run it via
"${cmd[@]}", ensuring every variable is quoted; apply the same replacement for
the similar block referenced at lines 77-80.
In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 20-26: Tests import dataclasses like NodeRollup and NodesSummary
from rollup_stage but this module only imports them without re-exporting; update
rollup_stage to re-export the needed dataclasses (e.g., NodeRollup,
NodesSummary, EnvironmentConfig, LaunchCommandRollup, RollupResult,
RollupSummary) by importing them from srtctl.cli.mixins.rollup and exposing them
in the module namespace (add explicit imports and/or populate __all__ with these
symbol names) so tests can import them from rollup_stage.
In `@tests/test_rollup.py`:
- Around line 11-17: The tests import NodeRollup (and related classes) from
rollup_stage but NodeRollup isn't re-exported there; update the import to pull
NodeRollup, NodesSummary, RollupResult, RollupSummary directly from the rollup
module (or alternatively add re-export statements in rollup_stage to export
those symbols) so the import in tests/test_rollup.py resolves; target the
identifiers NodeRollup, NodesSummary, RollupResult, RollupSummary and keep
RollupStageMixin import from rollup_stage if it is defined there.
♻️ Duplicate comments (6)
analysis/srtlog/parsers/nodes/sglang.py (1)
98-100: Guard against UnicodeDecodeError when reading logs.If logs contain non-UTF8 bytes,
open()can throw and drop all metrics. Use explicit encoding witherrors="replace"to keep parsing resilient. Note: The TRTLLM parser at Line 97 useslog_path.read_text(errors="replace")which handles this correctly.🔧 Proposed fix
- with open(log_path) as f: + with open(log_path, encoding="utf-8", errors="replace") as f:src/srtctl/README.md (1)
144-158: Add language to fenced block for markdownlint compliance (MD040).The ASCII diagram block is missing a language tag.
🔧 Proposed fix
-``` +```text ┌─────────────────────────────────────────────────────────────────────────────┐docs/architecture.md (1)
822-826: Add language tag to fenced block for MD040 compliance.The ASCII architecture diagram block is missing a language tag.
🔧 Proposed fix
-``` +```text ┌─────────────────────────────────────────────────────────────────────────────────────┐tests/test_rollup.py (1)
800-938: Addpytest.importorskip("pandas")guard for pandas-dependent test.This test exercises
run_rollup()which may importanalysis.srtlogmodules that depend on pandas. Without the guard, the test will fail silently (assertions fail becausenodes_summaryisNone) rather than being skipped when pandas is unavailable.🔧 Proposed fix
def test_rollup_with_node_logs(self, tmp_path): """Test rollup with actual node log files parsed by NodeAnalyzer.""" + pytest.importorskip("pandas") from dataclasses import dataclass, fieldsrc/srtctl/cli/mixins/rollup_stage.py (2)
49-52: Fixendpointsproperty that always returnsNone.The property body
...(ellipsis) causes it to implicitly returnNone, violating thelist[Endpoint]return type and causing the pipeline type error.🔧 Proposed fix
`@property` def endpoints(self) -> list["Endpoint"]: """Endpoint allocation topology.""" - ... + raise NotImplementedError("Subclass must provide endpoints property")
129-149: Fix type errors for optional parser methods.The type checker reports
call-non-callablebecausefind_aiperf_resultsandparse_result_directoryare not defined in the protocol. Usegetattrwith callable check to satisfy the type checker.🔧 Proposed fix
# Try parser-specific result collection first if parser is not None: # For mooncake-router, look for AIPerf results - if hasattr(parser, "find_aiperf_results"): - aiperf_files = parser.find_aiperf_results(self.runtime.log_dir) + find_aiperf = getattr(parser, "find_aiperf_results", None) + if callable(find_aiperf): + aiperf_files = find_aiperf(self.runtime.log_dir) for aiperf_path in aiperf_files: result = parser.parse_result_json(aiperf_path) if result.get("output_tps") is not None: results.append(result) logger.debug("Loaded AIPerf result: %s", aiperf_path) # For sa-bench style, look for result directories - if hasattr(parser, "parse_result_directory"): + parse_result_dir = getattr(parser, "parse_result_directory", None) + if callable(parse_result_dir): for entry in self.runtime.log_dir.iterdir(): if not entry.is_dir(): continue # Match patterns like sa-bench_isl_X_osl_Y if "_isl_" in entry.name and "_osl_" in entry.name: logger.debug("Found benchmark results directory: %s", entry.name) - dir_results = parser.parse_result_directory(entry) + dir_results = parse_result_dir(entry) results.extend(dir_results)
🧹 Nitpick comments (11)
src/srtctl/benchmarks/scripts/mmlu/bench.sh (1)
29-31: Prefer direct execution overeval, or quote the variable.Using
eval $commandwithout quotes can cause word-splitting issues if any variable (especiallyENDPOINT) contains spaces or shell metacharacters. Two options:
- Preferred: Execute directly without
eval:echo "[CMD] python3 -m sglang.test.run_eval --base-url ${ENDPOINT} ..." python3 -m sglang.test.run_eval --base-url "${ENDPOINT}" --model "${MODEL_NAME}" \ --eval-name mmlu --num-examples "${NUM_EXAMPLES}" --max-tokens "${MAX_TOKENS}" \ --repeat "${REPEAT}" --num-threads "${NUM_THREADS}"
- Minimal fix: Quote the eval argument:
-eval $command +eval "$command"src/srtctl/benchmarks/scripts/gpqa/bench.sh (1)
29-31: Quote the command variable inevalfor robustness.While the current variables appear safe, unquoted
$commandinevalcan cause word-splitting issues if any variable (e.g.,MODEL_NAME) contains spaces or special characters.🔧 Suggested fix
command="python3 -m sglang.test.run_eval --base-url ${ENDPOINT} --model ${MODEL_NAME} --eval-name gpqa --num-examples ${NUM_EXAMPLES} --max-tokens ${MAX_TOKENS} --repeat ${REPEAT} --num-threads ${NUM_THREADS}" echo "[CMD] $command" -eval $command +eval "$command"analysis/srtlog/parsers/benchmark/sa_bench.py (1)
101-103: Add explicit encoding when reading JSON files.For consistency with other parsers and to handle potential encoding issues, specify the encoding explicitly.
🔧 Suggested fix
- with open(json_path) as f: + with open(json_path, encoding="utf-8") as f:analysis/srtlog/parsers/benchmark/mooncake_router.py (1)
92-94: Add explicit encoding when reading JSON files.For consistency and to handle potential encoding issues, specify the encoding explicitly.
🔧 Suggested fix
- with open(json_path) as f: + with open(json_path, encoding="utf-8") as f:analysis/srtlog/parsers/nodes/trtllm.py (1)
343-359: Consider extracting shared filename parsing logic.The
_extract_node_info_from_filenamemethod is identical to the one inanalysis/srtlog/parsers/nodes/sglang.py(lines 305-321). Consider extracting this to a shared utility to reduce duplication.analysis/srtlog/rollup_harness.py (2)
27-28: Module-levelbasicConfigmay affect global logging state.Calling
logging.basicConfigat module import time can interfere with the application's logging configuration if this module is imported as a library rather than run as a script. Consider moving this insidemain()or using a conditional guard.🔧 Suggested fix
-logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s") logger = logging.getLogger(__name__) + + +def _setup_logging(verbose: bool = False) -> None: + """Configure logging for the harness.""" + level = logging.DEBUG if verbose else logging.INFO + logging.basicConfig(level=level, format="%(asctime)s - %(levelname)s - %(message)s")Then call
_setup_logging(args.verbose)inmain()before processing.
63-66: Add explicit encoding when reading YAML files.For consistency and to handle potential encoding issues across different environments, specify the encoding explicitly.
🔧 Suggested fix
- with open(config_path) as f: + with open(config_path, encoding="utf-8") as f: return yaml.safe_load(f)Apply the same fix to lines 292 and 295.
src/srtctl/cli/mixins/rollup/models.py (3)
133-138: Misleading comment about kv_cache aggregation.The comment mentions "(or sum for multiple allocations)" but the code only takes the max. Either update the comment to match the implementation or implement the sum logic if that's the intended behavior.
🔧 Proposed fix
if mem.kv_cache_gb is not None: - # Take the max kv_cache seen (or sum for multiple allocations) + # Take the max kv_cache seen if rollup.kv_cache_gb is None: rollup.kv_cache_gb = mem.kv_cache_gb else: rollup.kv_cache_gb = max(rollup.kv_cache_gb, mem.kv_cache_gb)
236-308: Code duplication in aggregation methods.
_aggregate_agg_batchesduplicates logic from_aggregate_prefill_batchesand_aggregate_decode_batches. Consider refactoring to reuse the existing methods for better maintainability.♻️ Optional refactor
def _aggregate_agg_batches(self, batches: list) -> None: """Aggregate metrics for agg workers (handles both prefill and decode batches).""" # Separate prefill and decode batches prefill_batches = [b for b in batches if b.batch_type == "prefill"] decode_batches = [b for b in batches if b.batch_type == "decode"] self.total_prefill_batches = len(prefill_batches) self.total_decode_batches = len(decode_batches) - # Aggregate prefill metrics + # Reuse existing aggregation methods if prefill_batches: - new_tokens = [] - cached_tokens = [] - # ... 30+ lines of duplicated logic - ... - - # Aggregate decode metrics - if decode_batches: - # ... 20+ lines of duplicated logic - ... + # Temporarily store current batch count + saved_total = self.total_prefill_batches + self._aggregate_prefill_batches(prefill_batches) + self.total_prefill_batches = saved_total + + if decode_batches: + saved_total = self.total_decode_batches + self._aggregate_decode_batches(decode_batches) + self.total_decode_batches = saved_total
416-416: Consider documenting the mixed type forrequest_rate.The type
float | str | Noneis unusual. Consider adding a docstring to clarify when this is a float vs a string (e.g., numeric rate vs identifier like "c100").src/srtctl/cli/mixins/rollup_stage.py (1)
99-101: Consider logging the full stack trace for debugging.The broad exception catch logs only the error message. For debugging complex rollup failures, consider logging the full traceback at DEBUG level.
🔧 Proposed enhancement
except Exception as e: - logger.error("Rollup failed: %s", e) + logger.error("Rollup failed: %s", e) + logger.debug("Rollup traceback:", exc_info=True) return None
| "requests>=2.31.0", | ||
| "rich>=13.0.0", | ||
| "questionary>=2.0.0", | ||
| "pandas>=2.3.3", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all pandas imports and their locations
rg -n "import pandas|from pandas" -S --type-listRepository: ishandhanani/srt-slurm
Length of output: 455
🏁 Script executed:
#!/bin/bash
# Get a broader view of directory structure to understand src vs tests layout
git ls-files | head -30Repository: ishandhanani/srt-slurm
Length of output: 953
🏁 Script executed:
#!/bin/bash
# Check the pyproject.toml structure to understand dependency grouping
cat -n pyproject.tomlRepository: ishandhanani/srt-slurm
Length of output: 3324
🏁 Script executed:
#!/bin/bash
# Find all pandas imports
rg -n "import pandas|from pandas" --type pyRepository: ishandhanani/srt-slurm
Length of output: 596
🏁 Script executed:
#!/bin/bash
# Also check the analysis directory structure more carefully
find analysis -type f -name "*.py" | head -20Repository: ishandhanani/srt-slurm
Length of output: 735
🏁 Script executed:
#!/bin/bash
# Check if pandas is imported in the CLI modules
rg -n "import pandas|from pandas" src/srtctl/cli/Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check the structure of src/srtctl to understand what is core
ls -la src/srtctl/Repository: ishandhanani/srt-slurm
Length of output: 706
🏁 Script executed:
#!/bin/bash
# Check if there are any optional dependencies defined in pyproject.toml
grep -n "optional\|extras" pyproject.tomlRepository: ishandhanani/srt-slurm
Length of output: 48
Move pandas to optional dependencies; it is not required by the core CLI.
Pandas is only used in the analysis scripts and benchmark serving tools, not by any of the core CLI entry points (srtctl, srtctl-i, srtctl-sweep, srtctl-setup-head). Adding it to core dependencies unnecessarily increases install size for users who only need the CLI. Move it to an optional dependency group:
[project.optional-dependencies]
analysis = ["pandas>=2.3.3"]Then document in the README that users running analysis tools should install with pip install srtctl[analysis].
🤖 Prompt for AI Agents
In `@pyproject.toml` at line 19, Move "pandas>=2.3.3" out of the core dependencies
in pyproject.toml and add it under an optional extras group named "analysis" by
creating a [project.optional-dependencies] section with analysis =
["pandas>=2.3.3"]; update any packaging metadata that reads dependencies to
ensure pandas is no longer listed as a required dependency for the CLI entry
points (srtctl, srtctl-i, srtctl-sweep, srtctl-setup-head), and add a short note
to the README instructing users who need the analysis/benchmark scripts to
install with pip install srtctl[analysis].
| command="python3 -m sglang.test.run_eval --base-url ${ENDPOINT} --model ${MODEL_NAME} --eval-name longbench_v2 --max-tokens ${MAX_TOKENS} --max-context-length ${MAX_CONTEXT_LENGTH} --num-threads ${NUM_THREADS}" | ||
|
|
||
| # Add optional arguments | ||
| if [ -n "$NUM_EXAMPLES" ]; then | ||
| cmd="$cmd --num-examples ${NUM_EXAMPLES}" | ||
| command="$command --num-examples ${NUM_EXAMPLES}" | ||
| fi | ||
|
|
||
| if [ -n "$CATEGORIES" ]; then | ||
| cmd="$cmd --categories ${CATEGORIES}" | ||
| command="$command --categories ${CATEGORIES}" | ||
| fi | ||
|
|
||
| echo "Executing: $cmd" | ||
| eval "$cmd" | ||
| echo "[CMD] $command" | ||
| eval $command |
There was a problem hiding this comment.
Replace eval with an array-based command.
NUM_EXAMPLES/CATEGORIES may include spaces or shell tokens, and eval executes them.
✅ Safer, structured command building
-command="python3 -m sglang.test.run_eval --base-url ${ENDPOINT} --model ${MODEL_NAME} --eval-name longbench_v2 --max-tokens ${MAX_TOKENS} --max-context-length ${MAX_CONTEXT_LENGTH} --num-threads ${NUM_THREADS}"
+command=(
+ python3 -m sglang.test.run_eval --base-url "${ENDPOINT}" --model "${MODEL_NAME}"
+ --eval-name longbench_v2 --max-tokens "${MAX_TOKENS}"
+ --max-context-length "${MAX_CONTEXT_LENGTH}" --num-threads "${NUM_THREADS}"
+)
if [ -n "$NUM_EXAMPLES" ]; then
- command="$command --num-examples ${NUM_EXAMPLES}"
+ command+=(--num-examples "${NUM_EXAMPLES}")
fi
if [ -n "$CATEGORIES" ]; then
- command="$command --categories ${CATEGORIES}"
+ command+=(--categories "${CATEGORIES}")
fi
-echo "[CMD] $command"
-eval $command
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/longbenchv2/bench.sh` around lines 31 - 43, The
current string-based variable "command" is unsafe because NUM_EXAMPLES and
CATEGORIES can contain spaces/shell tokens and you use eval; instead build the
invocation as an array (e.g., an array variable such as cmd) starting with the
python module and required args (ENDPOINT, MODEL_NAME, MAX_TOKENS,
MAX_CONTEXT_LENGTH, NUM_THREADS) and conditionally append NUM_EXAMPLES and
CATEGORIES as separate array elements only when non-empty; finally echo a joined
representation for logging and run the command using array execution (array
expansion) rather than eval to avoid word-splitting and shell injection.
| command="aiperf profile -m ${MODEL_NAME} --url ${ENDPOINT} --streaming --ui simple --concurrency 10 --request-count 20" | ||
| echo "[CMD] $command" | ||
| eval $command |
There was a problem hiding this comment.
Avoid eval to prevent injection and quoting bugs.
MODEL_NAME, ENDPOINT, and thresholds are user-controlled; eval can mis-handle spaces or execute unintended tokens.
✅ Safer array-based invocation (apply to both commands)
-command="aiperf profile -m ${MODEL_NAME} --url ${ENDPOINT} --streaming --ui simple --concurrency 10 --request-count 20"
-echo "[CMD] $command"
-eval $command
+command=(
+ aiperf profile -m "${MODEL_NAME}" --url "${ENDPOINT}"
+ --streaming --ui simple --concurrency 10 --request-count 20
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"
-command="aiperf profile -m ${MODEL_NAME} --input-file ${INPUT_FILE} --custom-dataset-type mooncake_trace --fixed-schedule --url ${ENDPOINT} --streaming --random-seed 42 --ui simple --artifact-dir ${RUN_ARTIFACT_DIR} --goodput \"time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}\""
-echo "[CMD] $command"
-eval $command
+command=(
+ aiperf profile -m "${MODEL_NAME}" --input-file "${INPUT_FILE}"
+ --custom-dataset-type mooncake_trace --fixed-schedule --url "${ENDPOINT}"
+ --streaming --random-seed 42 --ui simple --artifact-dir "${RUN_ARTIFACT_DIR}"
+ --goodput "time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}"
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"Also applies to: 79-81
🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/mooncake-router/bench.sh` around lines 60 - 62,
The script builds user-controlled shell strings into the variable named command
and then runs them with eval (e.g., the lines setting command and eval $command
and the similar block around lines 79-81); replace that pattern with a safe
array-style invocation: construct the command as an array where each token is a
separate element and interpolate variables as quoted expansions (e.g.,
cmd=(aiperf profile -m "$MODEL_NAME" --url "$ENDPOINT" --streaming --ui simple
--concurrency 10 --request-count 20)) and then execute with "${cmd[@]}",
applying the same change to the second command block so spaces and injections
are prevented.
|
|
||
| command="python3 -m sglang.bench_serving --backend sglang --model ${model_name} --host ${head_node} --port ${head_port} --dataset-name random --max-concurrency ${PROFILE_CONCURRENCY} --num-prompts 128 --random-input-len ${PROFILE_ISL} --random-output-len ${PROFILE_OSL} --random-range-ratio 1 --warmup-request 0" | ||
| echo "[CMD] $command" | ||
| eval $command |
There was a problem hiding this comment.
Avoid eval; it’s unsafe with env-provided values.
PROFILING_MODE/PROFILE_* inputs can include spaces or shell tokens. Use array execution and keep the logging via printf.
✅ Safer execution without eval
-command="python3 -m sglang.bench_serving --backend sglang --model ${model_name} --host ${head_node} --port ${head_port} --dataset-name random --max-concurrency ${PROFILE_CONCURRENCY} --num-prompts 128 --random-input-len ${PROFILE_ISL} --random-output-len ${PROFILE_OSL} --random-range-ratio 1 --warmup-request 0"
-echo "[CMD] $command"
-eval $command
+command=(
+ python3 -m sglang.bench_serving --backend sglang --model "${model_name}"
+ --host "${head_node}" --port "${head_port}" --dataset-name random
+ --max-concurrency "${PROFILE_CONCURRENCY}" --num-prompts 128
+ --random-input-len "${PROFILE_ISL}" --random-output-len "${PROFILE_OSL}"
+ --random-range-ratio 1 --warmup-request 0
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"
-command="python -m lm_eval --model local-completions --tasks gsm8k --model_args \"base_url=http://${head_node}:${head_port}/v1/completions,model=${model_name},tokenized_requests=False,tokenizer_backend=None,num_concurrent=${PROFILE_CONCURRENCY},timeout=6000,max_retries=1\" --limit 10"
-echo "[CMD] $command"
-eval $command
+command=(
+ python -m lm_eval --model local-completions --tasks gsm8k
+ --model_args "base_url=http://${head_node}:${head_port}/v1/completions,model=${model_name},tokenized_requests=False,tokenizer_backend=None,num_concurrent=${PROFILE_CONCURRENCY},timeout=6000,max_retries=1"
+ --limit 10
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"Also applies to: 142-145
🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/profiling/profile.sh` around lines 133 - 136,
The script builds a shell command string in the variable named command and then
runs it with eval, which is unsafe for env-provided PROFILE_* and PROFILING_MODE
values; instead construct the argument list as an array (e.g., cmd=(python3 -m
sglang.bench_serving --backend sglang --model "$model_name" ...)) and run it
with "${cmd[@]}" to avoid word-splitting and shell injection; use printf or echo
to log the command safely (e.g., printf '%q ' "${cmd[@]}"; printf '\n') before
executing, and apply the same array+printf replacement for the other eval usages
noted around the block that covers lines 142-145.
| # shellcheck disable=SC2086 | ||
| python prefix_ratio_benchmark.py \ | ||
| --prefix-ratios $PREFIX_RATIOS \ | ||
| --isl "$ISL" \ | ||
| --osl "$OSL" \ | ||
| --requests "$REQUESTS" \ | ||
| --concurrency "$CONCURRENCY" \ | ||
| --output-dir "$result_dir" | ||
| command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir" | ||
| echo "[CMD] $command" | ||
| eval $command |
There was a problem hiding this comment.
Avoid eval with user-controlled arguments.
Using eval here allows shell metacharacters in PREFIX_RATIOS/ENDPOINT to execute. A bash array keeps proper argument boundaries and is safer.
✅ Safer execution without eval
-# shellcheck disable=SC2086
-command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"
-echo "[CMD] $command"
-eval $command
+read -r -a prefix_ratio_args <<< "$PREFIX_RATIOS"
+command=(
+ python prefix_ratio_benchmark.py
+ --prefix-ratios "${prefix_ratio_args[@]}"
+ --isl "$ISL"
+ --osl "$OSL"
+ --requests "$REQUESTS"
+ --concurrency "$CONCURRENCY"
+ --output-dir "$result_dir"
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # shellcheck disable=SC2086 | |
| python prefix_ratio_benchmark.py \ | |
| --prefix-ratios $PREFIX_RATIOS \ | |
| --isl "$ISL" \ | |
| --osl "$OSL" \ | |
| --requests "$REQUESTS" \ | |
| --concurrency "$CONCURRENCY" \ | |
| --output-dir "$result_dir" | |
| command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir" | |
| echo "[CMD] $command" | |
| eval $command | |
| read -r -a prefix_ratio_args <<< "$PREFIX_RATIOS" | |
| command=( | |
| python prefix_ratio_benchmark.py | |
| --prefix-ratios "${prefix_ratio_args[@]}" | |
| --isl "$ISL" | |
| --osl "$OSL" | |
| --requests "$REQUESTS" | |
| --concurrency "$CONCURRENCY" | |
| --output-dir "$result_dir" | |
| ) | |
| printf '[CMD] %q ' "${command[@]}"; echo | |
| "${command[@]}" |
🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/router/bench.sh` around lines 42 - 45, The
script currently builds a single string in the variable command and uses eval,
which permits shell metacharacters in user-controlled variables (PREFIX_RATIOS,
ISL, OSL, REQUESTS, CONCURRENCY) to be interpreted; replace this with a bash
array for safe argument handling: construct an array named cmd starting with the
interpreter and script (e.g., python prefix_ratio_benchmark.py) and push each
flag and its value as separate elements using quoted variable expansions (e.g.,
"--prefix-ratios" "$PREFIX_RATIOS"), then execute the command with "${cmd[@]}"
and remove the eval and the SC2086 disable. Ensure result_dir is passed as the
value of "--output-dir" similarly and keep the echo of the command by joining
the array for logging if needed.
|
|
||
| command="python3 -u ${WORK_DIR}/benchmark_serving.py --model ${MODEL_NAME} --tokenizer ${MODEL_PATH} --host $HOST --port $PORT --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts $num_prompts --random-input-len $ISL --random-output-len $OSL --random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics ttft,tpot,itl,e2el --max-concurrency $concurrency" | ||
|
|
||
| echo "[CMD] $command" | ||
| eval $command | ||
| done |
There was a problem hiding this comment.
Avoid eval in benchmark loops.
MODEL_NAME, HOST, and concurrency-derived values are user-controlled; eval is unsafe and brittle.
✅ Safer command execution in loops
-command="python3 -u ${WORK_DIR}/benchmark_serving.py --model ${MODEL_NAME} --tokenizer ${MODEL_PATH} --host $HOST --port $PORT --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts $num_prompts --random-input-len $ISL --random-output-len $OSL --random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics ttft,tpot,itl,e2el --max-concurrency $concurrency"
-
-echo "[CMD] $command"
-eval $command
+command=(
+ python3 -u "${WORK_DIR}/benchmark_serving.py"
+ --model "${MODEL_NAME}" --tokenizer "${MODEL_PATH}"
+ --host "$HOST" --port "$PORT" --backend dynamo --endpoint /v1/completions
+ --disable-tqdm --dataset-name random --num-prompts "$num_prompts"
+ --random-input-len "$ISL" --random-output-len "$OSL"
+ --random-range-ratio 0.8 --ignore-eos --request-rate 250
+ --percentile-metrics ttft,tpot,itl,e2el --max-concurrency "$concurrency"
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"
-command="python3 -u ${WORK_DIR}/benchmark_serving.py --model ${MODEL_NAME} --tokenizer ${MODEL_PATH} --host $HOST --port $PORT --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts $num_prompts --random-input-len $ISL --random-output-len $OSL --random-range-ratio 0.8 --ignore-eos --request-rate ${REQ_RATE} --percentile-metrics ttft,tpot,itl,e2el --max-concurrency $concurrency --use-chat-template --save-result --result-dir $result_dir --result-filename $result_filename"
-
-echo "[CMD] $command"
-eval $command
+command=(
+ python3 -u "${WORK_DIR}/benchmark_serving.py"
+ --model "${MODEL_NAME}" --tokenizer "${MODEL_PATH}"
+ --host "$HOST" --port "$PORT" --backend dynamo --endpoint /v1/completions
+ --disable-tqdm --dataset-name random --num-prompts "$num_prompts"
+ --random-input-len "$ISL" --random-output-len "$OSL"
+ --random-range-ratio 0.8 --ignore-eos --request-rate "${REQ_RATE}"
+ --percentile-metrics ttft,tpot,itl,e2el --max-concurrency "$concurrency"
+ --use-chat-template --save-result --result-dir "$result_dir"
+ --result-filename "$result_filename"
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"Also applies to: 77-80
🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/sa-bench/bench.sh` around lines 53 - 58, The
loop builds a shell command string and uses eval which is unsafe for untrusted
values (MODEL_NAME, HOST, concurrency); instead construct the command as an
array and execute it without eval: replace the string-assigned command and eval
with an array like cmd=(python3 -u "${WORK_DIR}/benchmark_serving.py" --model
"${MODEL_NAME}" --tokenizer "${MODEL_PATH}" --host "${HOST}" --port "${PORT}"
--backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random
--num-prompts "${num_prompts}" --random-input-len "${ISL}" --random-output-len
"${OSL}" --random-range-ratio 0.8 --ignore-eos --request-rate 250
--percentile-metrics ttft,tpot,itl,e2el --max-concurrency "${concurrency}") and
run it via "${cmd[@]}", ensuring every variable is quoted; apply the same
replacement for the similar block referenced at lines 77-80.
| from srtctl.cli.mixins.rollup import ( | ||
| EnvironmentConfig, | ||
| LaunchCommandRollup, | ||
| NodesSummary, | ||
| RollupResult, | ||
| RollupSummary, | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Consider re-exporting dataclasses for easier imports.
The tests attempt to import NodeRollup, NodesSummary, etc. from rollup_stage, but these are only imported here and not re-exported. Either update tests to import from srtctl.cli.mixins.rollup directly, or add re-exports here.
🔧 Option: Add re-exports
from srtctl.cli.mixins.rollup import (
EnvironmentConfig,
LaunchCommandRollup,
+ NodeRollup,
NodesSummary,
RollupResult,
RollupSummary,
)
+
+__all__ = [
+ "RollupStageMixin",
+ "EnvironmentConfig",
+ "LaunchCommandRollup",
+ "NodeRollup",
+ "NodesSummary",
+ "RollupResult",
+ "RollupSummary",
+]🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 20 - 26, Tests import
dataclasses like NodeRollup and NodesSummary from rollup_stage but this module
only imports them without re-exporting; update rollup_stage to re-export the
needed dataclasses (e.g., NodeRollup, NodesSummary, EnvironmentConfig,
LaunchCommandRollup, RollupResult, RollupSummary) by importing them from
srtctl.cli.mixins.rollup and exposing them in the module namespace (add explicit
imports and/or populate __all__ with these symbol names) so tests can import
them from rollup_stage.
| from srtctl.cli.mixins.rollup_stage import ( | ||
| NodeRollup, | ||
| NodesSummary, | ||
| RollupResult, | ||
| RollupStageMixin, | ||
| RollupSummary, | ||
| ) |
There was a problem hiding this comment.
Fix import path - NodeRollup is not exported from rollup_stage.
The pipeline fails with ImportError: cannot import name 'NodeRollup' from 'srtctl.cli.mixins.rollup_stage'. The rollup_stage.py module imports these classes from srtctl.cli.mixins.rollup but doesn't re-export them. Either import directly from the rollup submodule or add re-exports to rollup_stage.py.
🔧 Proposed fix - import from correct module
-from srtctl.cli.mixins.rollup_stage import (
- NodeRollup,
- NodesSummary,
- RollupResult,
- RollupStageMixin,
- RollupSummary,
-)
+from srtctl.cli.mixins.rollup import (
+ NodeRollup,
+ NodesSummary,
+ RollupResult,
+ RollupSummary,
+)
+from srtctl.cli.mixins.rollup_stage import RollupStageMixin🧰 Tools
🪛 GitHub Actions: CI
[error] 11-11: ImportError: cannot import name 'NodeRollup' from 'srtctl.cli.mixins.rollup_stage' during test collection. This caused pytest to fail with exit code 2.
🤖 Prompt for AI Agents
In `@tests/test_rollup.py` around lines 11 - 17, The tests import NodeRollup (and
related classes) from rollup_stage but NodeRollup isn't re-exported there;
update the import to pull NodeRollup, NodesSummary, RollupResult, RollupSummary
directly from the rollup module (or alternatively add re-export statements in
rollup_stage to export those symbols) so the import in tests/test_rollup.py
resolves; target the identifiers NodeRollup, NodesSummary, RollupResult,
RollupSummary and keep RollupStageMixin import from rollup_stage if it is
defined there.
|
|
||
| try: | ||
| content = benchmark_out_path.read_text() | ||
|
|
There was a problem hiding this comment.
AIperf provides a profile export json. We should be using that instead of regex
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s") |
There was a problem hiding this comment.
The srtslurm.yaml contains the root of the repo. Not sure if you need it but might be helpful
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@analysis/srtlog/rollup_harness.py`:
- Around line 349-353: The code currently uses
"output_tps=data.get('output_throughput', 0) or data.get('output_tps', 0)" which
treats a valid 0 as missing; update the RollupResult construction to preserve
explicit zeroes by checking for None instead of falsy values — e.g. compute a
value = data.get('output_throughput') if data.get('output_throughput') is not
None else data.get('output_tps') if data.get('output_tps') is not None else 0,
and pass that value as output_tps (similarly use explicit None checks for any
other throughput fields) so RollupResult sees bona fide 0s rather than
overwriting them.
- Around line 31-47: The current sort key in find_job_directories mixes int and
str values causing a TypeError when directory names include both numeric and
non-numeric names; update the job_dirs.sort key to return a homogeneous tuple
that orders numeric directories before (or after) non-numeric ones — e.g.,
return (0, int(p.name)) for purely numeric names and (1, p.name) for others — so
all keys share the same types and provide stable ordering; modify the lambda
used in job_dirs.sort accordingly.
In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 464-478: The _get_parser_order function's mapping for "sglang"
lacks the recommended fallback to "sglang-v2"; update the parser_orders dict in
_get_parser_order(backend_type: str) to return ["sglang", "sglang-v2"] for the
"sglang" key (so it will try the primary parser then the v2 fallback), and
update the function docstring to mention the sglang-v2 fallback order
accordingly.
In `@src/srtctl/cli/mixins/rollup/models.py`:
- Around line 370-407: The aggregation logic drops valid zero values because it
filters metrics with truthy checks (e.g., "if n.avg_input_throughput") — update
_compute_aggregate_stats to treat 0 as a valid value by using explicit None
checks. For each throughput/metric list (avg_input_throughput,
max_input_throughput, avg_gen_throughput, max_gen_throughput, kv_cache_gb)
change comprehensions to include values where the field is not None (e.g.,
[n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput
is not None]) and compute averages using that list length so zeros contribute;
likewise ensure total_prefill_tokens/total_cached_tokens and
overall_cache_hit_rate are set even when totals are zero (use explicit
comparisons to None rather than truthiness).
♻️ Duplicate comments (3)
analysis/srtlog/rollup_harness.py (1)
205-215: Guard optional parser helpers before calling.Parsers that don’t implement
parse_result_directorywill raiseAttributeErrorand skip other parsing.🛠️ Proposed fix
- for entry in logs_dir.iterdir(): + parse_result_dir = getattr(parser, "parse_result_directory", None) + for entry in logs_dir.iterdir(): if entry.is_dir() and "_isl_" in entry.name and "_osl_" in entry.name: - dir_results = parser.parse_result_directory(entry) - results.extend(dir_results) + if callable(parse_result_dir): + dir_results = parse_result_dir(entry) + results.extend(dir_results)src/srtctl/cli/mixins/rollup_stage.py (2)
49-52: Implement the abstractendpointsproperty.The ellipsis body returns
Noneand violates the declared return type.🛠️ Proposed fix
`@property` def endpoints(self) -> list[Endpoint]: """Endpoint allocation topology.""" - ... + raise NotImplementedError("Subclass must provide endpoints property")
129-149: Fix optional parser helper calls that aren’t in the protocol.Type checking will still flag these (and they can be non-callable).
🛠️ Proposed fix
- if hasattr(parser, "find_aiperf_results"): - aiperf_files = parser.find_aiperf_results(self.runtime.log_dir) + find_aiperf = getattr(parser, "find_aiperf_results", None) + if callable(find_aiperf): + aiperf_files = find_aiperf(self.runtime.log_dir) for aiperf_path in aiperf_files: result = parser.parse_result_json(aiperf_path) if result.get("output_tps") is not None: results.append(result) logger.debug("Loaded AIPerf result: %s", aiperf_path) @@ - if hasattr(parser, "parse_result_directory"): + parse_result_dir = getattr(parser, "parse_result_directory", None) + if callable(parse_result_dir): for entry in self.runtime.log_dir.iterdir(): if not entry.is_dir(): continue # Match patterns like sa-bench_isl_X_osl_Y if "_isl_" in entry.name and "_osl_" in entry.name: logger.debug("Found benchmark results directory: %s", entry.name) - dir_results = parser.parse_result_directory(entry) + dir_results = parse_result_dir(entry) results.extend(dir_results)
🧹 Nitpick comments (2)
analysis/srtlog/parsers/nodes/trtllm.py (1)
105-133: Handle multi-lineConfig(...)dumps to avoid missing TP/PP/EP.If the config dump wraps lines, the current regex won’t match and config stays empty.
🛠️ Proposed fix
- config_match = re.search(r"Config\((.*?)\)", clean_content) + config_match = re.search(r"Config\((.*?)\)", clean_content, re.DOTALL)analysis/srtlog/parsers/__init__.py (1)
188-193: Tighten registry typing for better static checks.🛠️ Proposed fix
-_benchmark_parsers: dict[str, type] = {} +_benchmark_parsers: dict[str, type[BenchmarkParserProtocol]] = {} # Registry for node parsers -_node_parsers: dict[str, type] = {} +_node_parsers: dict[str, type[NodeParserProtocol]] = {}
| def find_job_directories(log_dir: Path) -> list[Path]: | ||
| """Find all job directories by searching for sbatch_script.sh files. | ||
|
|
||
| Args: | ||
| log_dir: Root directory to search | ||
|
|
||
| Returns: | ||
| List of job directory paths (parent dirs of sbatch_script.sh) | ||
| """ | ||
| job_dirs = [] | ||
| for sbatch_script in log_dir.rglob("sbatch_script.sh"): | ||
| job_dir = sbatch_script.parent | ||
| job_dirs.append(job_dir) | ||
|
|
||
| # Sort by job ID (directory name) if numeric | ||
| job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name)) | ||
| return job_dirs |
There was a problem hiding this comment.
Fix mixed-type sort keys to avoid TypeError.
job_dirs.sort(...) mixes int and str keys when both numeric and non-numeric dirs exist.
🛠️ Proposed fix
- job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+ job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))🤖 Prompt for AI Agents
In `@analysis/srtlog/rollup_harness.py` around lines 31 - 47, The current sort key
in find_job_directories mixes int and str values causing a TypeError when
directory names include both numeric and non-numeric names; update the
job_dirs.sort key to return a homogeneous tuple that orders numeric directories
before (or after) non-numeric ones — e.g., return (0, int(p.name)) for purely
numeric names and (1, p.name) for others — so all keys share the same types and
provide stable ordering; modify the lambda used in job_dirs.sort accordingly.
| for data in results: | ||
| result = RollupResult( | ||
| concurrency=data.get("max_concurrency", 0), | ||
| output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0), | ||
| total_tps=data.get("total_token_throughput"), |
There was a problem hiding this comment.
Avoid treating 0 throughput as “missing.”
or will override valid 0 values and can skew rollups.
🛠️ Proposed fix
- output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+ output_tps=(
+ data.get("output_throughput")
+ if data.get("output_throughput") is not None
+ else data.get("output_tps", 0)
+ ),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for data in results: | |
| result = RollupResult( | |
| concurrency=data.get("max_concurrency", 0), | |
| output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0), | |
| total_tps=data.get("total_token_throughput"), | |
| for data in results: | |
| result = RollupResult( | |
| concurrency=data.get("max_concurrency", 0), | |
| output_tps=( | |
| data.get("output_throughput") | |
| if data.get("output_throughput") is not None | |
| else data.get("output_tps", 0) | |
| ), | |
| total_tps=data.get("total_token_throughput"), |
🤖 Prompt for AI Agents
In `@analysis/srtlog/rollup_harness.py` around lines 349 - 353, The code currently
uses "output_tps=data.get('output_throughput', 0) or data.get('output_tps', 0)"
which treats a valid 0 as missing; update the RollupResult construction to
preserve explicit zeroes by checking for None instead of falsy values — e.g.
compute a value = data.get('output_throughput') if data.get('output_throughput')
is not None else data.get('output_tps') if data.get('output_tps') is not None
else 0, and pass that value as output_tps (similarly use explicit None checks
for any other throughput fields) so RollupResult sees bona fide 0s rather than
overwriting them.
| def _get_parser_order(self, backend_type: str) -> list[str]: | ||
| """Get the order of parsers to try for a given backend type. | ||
|
|
||
| Args: | ||
| backend_type: Backend type from config (e.g., "sglang", "trtllm") | ||
|
|
||
| Returns: | ||
| List of parser types to try in order | ||
| """ | ||
| parser_orders = { | ||
| "sglang": ["sglang"], | ||
| "trtllm": ["trtllm"], | ||
| } | ||
|
|
||
| return parser_orders.get(backend_type, [backend_type]) |
There was a problem hiding this comment.
Add fallback for sglang-v2 when backend is sglang.
The docstring mentions fallback order, but the mapping doesn’t include it.
🛠️ Proposed fix
- parser_orders = {
- "sglang": ["sglang"],
- "trtllm": ["trtllm"],
- }
+ parser_orders = {
+ "sglang": ["sglang", "sglang-v2"],
+ "sglang-v2": ["sglang-v2", "sglang"],
+ "trtllm": ["trtllm"],
+ }🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 464 - 478, The
_get_parser_order function's mapping for "sglang" lacks the recommended fallback
to "sglang-v2"; update the parser_orders dict in _get_parser_order(backend_type:
str) to return ["sglang", "sglang-v2"] for the "sglang" key (so it will try the
primary parser then the v2 fallback), and update the function docstring to
mention the sglang-v2 fallback order accordingly.
| def _compute_aggregate_stats(self) -> None: | ||
| """Compute aggregate statistics across all nodes.""" | ||
| # Prefill aggregation (includes both prefill and agg nodes) | ||
| prefill_capable_nodes = [n for n in self.nodes if n.worker_type in ("prefill", "agg")] | ||
| if prefill_capable_nodes: | ||
| total_new = sum(n.total_new_tokens or 0 for n in prefill_capable_nodes) | ||
| total_cached = sum(n.total_cached_tokens or 0 for n in prefill_capable_nodes) | ||
|
|
||
| if total_new > 0 or total_cached > 0: | ||
| self.total_prefill_tokens = total_new | ||
| self.total_cached_tokens = total_cached | ||
| total = total_new + total_cached | ||
| if total > 0: | ||
| self.overall_cache_hit_rate = (total_cached / total) * 100 | ||
|
|
||
| throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput] | ||
| if throughputs: | ||
| self.avg_prefill_input_throughput = sum(throughputs) / len(throughputs) | ||
|
|
||
| max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput] | ||
| if max_throughputs: | ||
| self.max_prefill_input_throughput = max(max_throughputs) | ||
|
|
||
| # Decode aggregation (includes both decode and agg nodes) | ||
| decode_capable_nodes = [n for n in self.nodes if n.worker_type in ("decode", "agg")] | ||
| if decode_capable_nodes: | ||
| throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput] | ||
| if throughputs: | ||
| self.avg_decode_gen_throughput = sum(throughputs) / len(throughputs) | ||
|
|
||
| max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput] | ||
| if max_throughputs: | ||
| self.max_decode_gen_throughput = max(max_throughputs) | ||
|
|
||
| # Memory aggregation | ||
| kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb] | ||
| if kv_caches: | ||
| self.total_kv_cache_gb = sum(kv_caches) |
There was a problem hiding this comment.
Preserve legitimate zero values in aggregate stats.
Truthy checks drop valid 0 values (e.g., idle nodes), which inflates averages/maxes.
🛠️ Proposed fix
- throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput]
+ throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput is not None]
if throughputs:
self.avg_prefill_input_throughput = sum(throughputs) / len(throughputs)
- max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput]
+ max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput is not None]
if max_throughputs:
self.max_prefill_input_throughput = max(max_throughputs)
@@
- throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput]
+ throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput is not None]
if throughputs:
self.avg_decode_gen_throughput = sum(throughputs) / len(throughputs)
- max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput]
+ max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput is not None]
if max_throughputs:
self.max_decode_gen_throughput = max(max_throughputs)
@@
- kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb]
+ kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb is not None]
if kv_caches:
self.total_kv_cache_gb = sum(kv_caches)🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup/models.py` around lines 370 - 407, The
aggregation logic drops valid zero values because it filters metrics with truthy
checks (e.g., "if n.avg_input_throughput") — update _compute_aggregate_stats to
treat 0 as a valid value by using explicit None checks. For each
throughput/metric list (avg_input_throughput, max_input_throughput,
avg_gen_throughput, max_gen_throughput, kv_cache_gb) change comprehensions to
include values where the field is not None (e.g., [n.avg_input_throughput for n
in prefill_capable_nodes if n.avg_input_throughput is not None]) and compute
averages using that list length so zeros contribute; likewise ensure
total_prefill_tokens/total_cached_tokens and overall_cache_hit_rate are set even
when totals are zero (use explicit comparisons to None rather than truthiness).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 558-563: The mapping for output_tps in the RollupResult
construction currently only reads data.get("output_throughput", 0) which drops
valid values emitted as "output_tps"; update the construction in the loop that
creates RollupResult (the RollupResult(...) call) to use a fallback that
preserves either key, e.g. read output_throughput first then fallback to
data.get("output_tps") (or vice versa) before defaulting to 0 so parsers
emitting "output_tps" like mooncake-router are preserved.
♻️ Duplicate comments (5)
src/srtctl/cli/mixins/rollup_stage.py (3)
49-52: Makeendpointsabstract instead of returningNone.The ellipsis body returns
None, violating the declared return type and causing type-checker failures.🔧 Proposed fix
`@property` def endpoints(self) -> list[Endpoint]: """Endpoint allocation topology.""" - ... + raise NotImplementedError("Subclass must provide endpoints property")
129-149: Fix type-checker errors for optional benchmark parser hooks.
BenchmarkParserProtocoldoesn’t definefind_aiperf_results/parse_result_directory;hasattrstill leaves the calls untyped. Usegetattr+callable or extend the protocol.🔧 Proposed fix (getattr + callable)
# Try parser-specific result collection first if parser is not None: # For mooncake-router, look for AIPerf results - if hasattr(parser, "find_aiperf_results"): - aiperf_files = parser.find_aiperf_results(self.runtime.log_dir) + find_aiperf = getattr(parser, "find_aiperf_results", None) + if callable(find_aiperf): + aiperf_files = find_aiperf(self.runtime.log_dir) for aiperf_path in aiperf_files: result = parser.parse_result_json(aiperf_path) if result.get("output_tps") is not None: results.append(result) logger.debug("Loaded AIPerf result: %s", aiperf_path) # For sa-bench style, look for result directories - if hasattr(parser, "parse_result_directory"): + parse_result_dir = getattr(parser, "parse_result_directory", None) + if callable(parse_result_dir): for entry in self.runtime.log_dir.iterdir(): if not entry.is_dir(): continue # Match patterns like sa-bench_isl_X_osl_Y if "_isl_" in entry.name and "_osl_" in entry.name: logger.debug("Found benchmark results directory: %s", entry.name) - dir_results = parser.parse_result_directory(entry) + dir_results = parse_result_dir(entry) results.extend(dir_results)
465-479: Addsglang-v2fallback in parser order.Without this,
sglangbackends won’t try the v2 parser even when it’s available.🔧 Proposed fix
parser_orders = { - "sglang": ["sglang"], - "trtllm": ["trtllm"], + "sglang": ["sglang", "sglang-v2"], + "sglang-v2": ["sglang-v2", "sglang"], + "trtllm": ["trtllm"], }analysis/srtlog/rollup_harness.py (2)
45-46: Use homogeneous sort keys for job directory ordering.Mixed
int/strsort keys raiseTypeErrorwhen both numeric and non-numeric job IDs exist.🔧 Proposed fix
- job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name)) + job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))
351-354: Don’t treat valid0throughput as missing.Using
oroverwrites explicit zero values; check forNoneinstead.🔧 Proposed fix
- output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0), + output_tps=( + data.get("output_throughput") + if data.get("output_throughput") is not None + else data.get("output_tps", 0) + ),
| for data in results: | ||
| result = RollupResult( | ||
| concurrency=data.get("max_concurrency", 0), | ||
| output_tps=data.get("output_throughput", 0), | ||
| total_tps=data.get("total_token_throughput"), | ||
| request_throughput=data.get("request_throughput"), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file at the specified lines
head -580 src/srtctl/cli/mixins/rollup_stage.py | tail -40Repository: ishandhanani/srt-slurm
Length of output: 1844
🏁 Script executed:
# Find and examine the RollupResult dataclass definition
rg -n "class RollupResult" src/Repository: ishandhanani/srt-slurm
Length of output: 128
🏁 Script executed:
# Search for output_tps and output_throughput usage in the codebase
rg -n "output_tps|output_throughput" src/Repository: ishandhanani/srt-slurm
Length of output: 1567
🏁 Script executed:
# Examine the RollupResult dataclass definition
sed -n '411,430p' src/srtctl/cli/mixins/rollup/models.pyRepository: ishandhanani/srt-slurm
Length of output: 681
🏁 Script executed:
# Check the context where data comes from - look at line 136 context
sed -n '130,145p' src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 973
🏁 Script executed:
# Look for where 'results' data is populated or parsed
rg -n "def.*results|results\s*=" src/srtctl/cli/mixins/rollup_stage.py | head -20Repository: ishandhanani/srt-slurm
Length of output: 302
🏁 Script executed:
# Check if there are any parsers that produce "output_tps" in the data
rg -B5 -A5 '"output_tps"' src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 722
🏁 Script executed:
# Find all parsers to understand data sources
rg -n "class.*Parser|def parse_result" src/srtctl/ | grep -i parserRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Check the sa-bench benchmark_serving.py to see what it outputs
sed -n '70,85p' src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.pyRepository: ishandhanani/srt-slurm
Length of output: 432
🏁 Script executed:
# Check the AIPerf parser or mooncake parser
fd -e "\.py$" "aiperf\|mooncake" src/srtctl/Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Search for parse_result_json implementations
rg -B3 -A10 "def parse_result_json" src/srtctl/Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Look for all result parsing logic
rg -n "def _collect_benchmark_results" -A50 src/srtctl/cli/mixins/rollup_stage.py | head -80Repository: ishandhanani/srt-slurm
Length of output: 2635
🏁 Script executed:
# Check what parse_result_directory returns
rg -n "def parse_result_directory" src/ -B2 -A15Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Look for mooncake or AIPerf parser
fd -e "\.py$" src/ | xargs rg -l "aiperf\|AIPerf" | head -5Repository: ishandhanani/srt-slurm
Length of output: 387
🏁 Script executed:
# Search for where output_tps key comes from in parsing
rg -n '"output_tps"' src/srtctl/cli/mixins/rollup_stage.py -B5 -A5Repository: ishandhanani/srt-slurm
Length of output: 766
🏁 Script executed:
# Check if there's any transformation of keys between AIPerf result and the mapping
rg -n "parse_result_json" src/srtctl/ -B3 -A20 | head -60Repository: ishandhanani/srt-slurm
Length of output: 2366
🏁 Script executed:
# Verify the complete flow - check analysis.srtlog.parsers location
find . -name "parsers.py" | head -5Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Check if AIPerf results transformation happens anywhere
rg -n "AIPerf\|aiperf" src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Look at the complete mapping code to understand the flow
sed -n '555,600p' src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 2033
🏁 Script executed:
# Check the fallback JSON parsing section to understand all data sources
sed -n '155,200p' src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 1973
🏁 Script executed:
# Look for any key transformation or normalization before line 561
sed -n '520,565p' src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 1935
🏁 Script executed:
# Verify the sa-bench output structure that produces "output_throughput"
sed -n '690,710p' src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.pyRepository: ishandhanani/srt-slurm
Length of output: 991
🏁 Script executed:
# Check if there are any other parsers or places where results are transformed
rg -n "output_tps|output_throughput" src/srtctl/cli/mixins/rollup_stage.py -B2 -A2Repository: ishandhanani/srt-slurm
Length of output: 967
🏁 Script executed:
# Let me verify if AIPerf results are actually added and would go through line 561
# Check what parse_result_json returns for AIPerf
rg -n "find_aiperf_results\|parse_result_json" src/srtctl/ -B5 -A15 | head -80Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Confirm the exact data structure by looking at where both parsers are used
rg -n "output_tps" src/ --type py | grep -v READMERepository: ishandhanani/srt-slurm
Length of output: 818
🏁 Script executed:
# Let me trace through the exact flow - check if AIPerf results with output_tps key reach the mapping
sed -n '131,140p' src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 660
🏁 Script executed:
# Verify that AIPerf results are indeed added to results and then processed at line 561
sed -n '135,165p' src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 1604
🏁 Script executed:
# Confirm the issue: check if there's any key transformation between AIPerf parse and the mapping code
rg -n "results.append\|results.extend" src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Final verification: Check if AIPerf results indeed bypass the filtering at line 136 and reach the mapping
# Let me trace the complete flow - results list is populated from AIPerf, then passed to line 555+
sed -n '555,565p' src/srtctl/cli/mixins/rollup_stage.pyRepository: ishandhanani/srt-slurm
Length of output: 590
🏁 Script executed:
# Verify that the RollupResult dataclass expects float for output_tps (not None)
sed -n '411,425p' src/srtctl/cli/mixins/rollup/models.pyRepository: ishandhanani/srt-slurm
Length of output: 538
Preserve output_tps when output_throughput is missing.
AIPerf results use the output_tps key but the current mapping at line 561 only checks for output_throughput, causing valid throughput data to be lost when mapped to 0. This affects results from parsers like mooncake-router that emit output_tps directly.
🔧 Proposed fix
result = RollupResult(
concurrency=data.get("max_concurrency", 0),
- output_tps=data.get("output_throughput", 0),
+ output_tps=(
+ data.get("output_throughput")
+ if data.get("output_throughput") is not None
+ else data.get("output_tps", 0)
+ ),🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 558 - 563, The mapping
for output_tps in the RollupResult construction currently only reads
data.get("output_throughput", 0) which drops valid values emitted as
"output_tps"; update the construction in the loop that creates RollupResult (the
RollupResult(...) call) to use a fallback that preserves either key, e.g. read
output_throughput first then fallback to data.get("output_tps") (or vice versa)
before defaulting to 0 so parsers emitting "output_tps" like mooncake-router are
preserved.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/architecture.md`:
- Around line 929-1014: The fenced ASCII diagram block currently uses an
untagged code fence (```); to satisfy markdownlint MD040, update the opening
fence in the "Detailed Entity Relationships" diagram (the large block that
begins with the ASCII box showing PARSER LAYER / ROLLUP LAYER) to use a language
tag by replacing the opening ``` with ```text so the block starts with ```text
while leaving the content and closing fence unchanged.
- Around line 882-925: The ASCII ER diagram fenced code block starting with the
box diagram (the block that begins with the large
"┌─────────────────────────────────────────────────────────────┐" line) is
missing a language tag; change the opening triple-backtick for that block to
```text so the fenced block reads with the text language tag and satisfies
Markdownlint MD040. Ensure only the opening fence is modified (leave the closing
``` unchanged).
♻️ Duplicate comments (1)
docs/architecture.md (1)
822-826: Add a language tag to the Rollup architecture fenced block.Markdownlint MD040 is triggered here; set the opening fence to ```text.
| ``` | ||
| ┌─────────────────────────────────────────────────────────────┐ | ||
| │ LOG FILES │ | ||
| └─────────────────────────────────────────────────────────────┘ | ||
| │ | ||
| ┌─────────────────────────────────────────┼─────────────────────────────────────────┐ | ||
| │ │ │ | ||
| ▼ ▼ ▼ | ||
| ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ | ||
| │ worker_*.out/err │ │ benchmark.out │ │ config.yaml │ | ||
| └───────────────────┘ └───────────────────┘ └───────────────────┘ | ||
| │ │ │ | ||
| ▼ ▼ ▼ | ||
| ┌───────────────────────────┐ ┌───────────────────────────┐ ┌───────────────────────────┐ | ||
| │ NodeParserProtocol │ │ BenchmarkParserProtocol │ │ YAML Loader │ | ||
| │ (sglang.py / trtllm.py) │ │ (sa_bench.py / etc.) │ │ │ | ||
| └───────────────────────────┘ └───────────────────────────┘ └───────────────────────────┘ | ||
| │ │ │ | ||
| ┌────────┴────────┐ ┌────────┴────────┐ │ | ||
| ▼ ▼ ▼ ▼ ▼ | ||
| ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ | ||
| │ NodeMetrics │ │NodeLaunchCommand│ │ dict (result) │ │BenchmarkLaunch │ │ EnvironmentConfig │ | ||
| │ (models.py) │ │ (__init__.py) │ │ │ │ Command │ │ (rollup/models.py) │ | ||
| └─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────────┘ | ||
| │ │ │ │ │ | ||
| └────────────────────┼─────────────────────┼────────────────────┼──────────────────────────┘ | ||
| │ │ │ | ||
| ▼ ▼ ▼ | ||
| ┌────────────────────────────────────────────────────────────┐ | ||
| │ RollupStageMixin │ | ||
| │ (rollup_stage.py / rollup_harness.py) │ | ||
| └────────────────────────────────────────────────────────────┘ | ||
| │ | ||
| │ transforms | ||
| ▼ | ||
| ┌───────────────┐ | ||
| │ RollupSummary │ | ||
| └───────┬───────┘ | ||
| │ | ||
| ▼ | ||
| ┌───────────────┐ | ||
| │ rollup.json │ | ||
| └───────────────┘ | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the ER diagram fenced block.
Markdownlint MD040 is triggered; set the opening fence to ```text.
🛠️ Proposed fix
-```
+```text🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
882-882: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@docs/architecture.md` around lines 882 - 925, The ASCII ER diagram fenced
code block starting with the box diagram (the block that begins with the large
"┌─────────────────────────────────────────────────────────────┐" line) is
missing a language tag; change the opening triple-backtick for that block to
```text so the fenced block reads with the text language tag and satisfies
Markdownlint MD040. Ensure only the opening fence is modified (leave the closing
``` unchanged).
| ``` | ||
| ┌─────────────────────────────────────────────────────────────────────────────────────────┐ | ||
| │ PARSER LAYER (analysis/srtlog/parsers/) │ | ||
| ├─────────────────────────────────────────────────────────────────────────────────────────┤ | ||
| │ │ | ||
| │ ┌────────────────────┐ ┌────────────────────┐ ┌────────────────────┐ │ | ||
| │ │ NodeLaunchCommand │ │ BenchmarkLaunch │ │ NodeMetrics │ │ | ||
| │ ├────────────────────┤ │ Command │ ├────────────────────┤ │ | ||
| │ │ backend_type: str │ ├────────────────────┤ │ node_info: dict │ │ | ||
| │ │ worker_type: str │ │ benchmark_type: str│ │ config: dict │ │ | ||
| │ │ raw_command: str │ │ raw_command: str │ │ batches: list │ │ | ||
| │ │ extra_args: dict │ │ extra_args: dict │ │ memory_snapshots │ │ | ||
| │ └────────────────────┘ └────────────────────┘ └────────────────────┘ │ | ||
| │ │ | ||
| └─────────────────────────────────────────────────────────────────────────────────────────┘ | ||
| │ | ||
| ▼ | ||
| ┌─────────────────────────────────────────────────────────────────────────────────────────┐ | ||
| │ ROLLUP LAYER (srtctl/cli/mixins/rollup/) │ | ||
| ├─────────────────────────────────────────────────────────────────────────────────────────┤ | ||
| │ │ | ||
| │ ┌─────────────────────────────────────────────────────────────────────────────────┐ │ | ||
| │ │ LaunchCommandRollup │ │ | ||
| │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ | ||
| │ │ raw_command: str │ command_type: str ("worker" | "benchmark") │ │ | ||
| │ │ model_path: str | None │ served_model_name: str | None │ │ | ||
| │ │ worker_type: str | None │ backend_type: str | None │ │ | ||
| │ │ tp_size: int | None │ pp_size: int | None │ │ | ||
| │ │ dp_size: int | None │ ep_size: int | None │ │ | ||
| │ │ max_num_seqs: int | None │ max_model_len: int | None │ │ | ||
| │ └─────────────────────────────────────────────────────────────────────────────────┘ │ | ||
| │ │ 1:1 │ | ||
| │ ▼ │ | ||
| │ ┌─────────────────────────────────────────────────────────────────────────────────┐ │ | ||
| │ │ NodeRollup │ │ | ||
| │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ | ||
| │ │ node_name: str │ worker_type: str │ │ | ||
| │ │ worker_id: str │ tp_size, pp_size, dp_size, ep_size: int | None │ │ | ||
| │ │ launch_command: LaunchCommandRollup | None │ │ | ||
| │ │ avail_mem_gb: float | None │ kv_cache_gb: float | None │ │ | ||
| │ │ total_batches: int │ avg_gen_throughput: float | None │ │ | ||
| │ └─────────────────────────────────────────────────────────────────────────────────┘ │ | ||
| │ │ 1:N │ | ||
| │ ▼ │ | ||
| │ ┌─────────────────────────────────────────────────────────────────────────────────┐ │ | ||
| │ │ NodesSummary │ │ | ||
| │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ | ||
| │ │ nodes: list[NodeRollup] │ │ | ||
| │ │ total_prefill_nodes: int │ total_decode_nodes: int │ │ | ||
| │ │ total_agg_nodes: int │ total_kv_cache_gb: float | None │ │ | ||
| │ └─────────────────────────────────────────────────────────────────────────────────┘ │ | ||
| │ │ | ||
| │ ┌─────────────────────────────────────────────────────────────────────────────────┐ │ | ||
| │ │ RollupResult │ │ | ||
| │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ | ||
| │ │ max_concurrency: int │ input_len: int │ output_len: int │ │ | ||
| │ │ output_tps: float │ request_tps: float │ total_tokens: int │ │ | ||
| │ │ avg_ttft_ms: float │ avg_tpot_ms: float │ p50/p90/p99 metrics │ │ | ||
| │ └─────────────────────────────────────────────────────────────────────────────────┘ │ | ||
| │ │ | ||
| │ ┌─────────────────────────────────────────────────────────────────────────────────┐ │ | ||
| │ │ EnvironmentConfig │ │ | ||
| │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ | ||
| │ │ prefill_environment: dict │ decode_environment: dict │ │ | ||
| │ │ aggregated_environment: dict│ prefill_engine_config: dict │ │ | ||
| │ │ decode_engine_config: dict │ aggregated_engine_config: dict │ │ | ||
| │ └─────────────────────────────────────────────────────────────────────────────────┘ │ | ||
| │ │ | ||
| │ ┌──────────────────────┬──────────────────────┐ │ | ||
| │ │ │ │ │ | ||
| │ ▼ ▼ ▼ │ | ||
| │ ┌─────────────────────────────────────────────────────────────────────────────────┐ │ | ||
| │ │ RollupSummary │ │ | ||
| │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ | ||
| │ │ job_id: str │ model_name: str │ backend_type: str │ │ | ||
| │ │ benchmark_type: str │ is_disaggregated: bool │ total_nodes: int │ │ | ||
| │ │ max_output_tps: float │ max_request_tps: float │ │ | ||
| │ │ results: list[RollupResult] │ │ | ||
| │ │ nodes_summary: NodesSummary | None │ │ | ||
| │ │ benchmark_command: LaunchCommandRollup | None │ │ | ||
| │ │ environment_config: EnvironmentConfig | None │ │ | ||
| │ │ tags: dict │ timestamp: str │ │ | ||
| │ └─────────────────────────────────────────────────────────────────────────────────┘ │ | ||
| │ │ | ||
| └─────────────────────────────────────────────────────────────────────────────────────────┘ | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the “Detailed Entity Relationships” fenced block.
Markdownlint MD040 is triggered; set the opening fence to ```text.
🛠️ Proposed fix
-```
+```text🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
929-929: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@docs/architecture.md` around lines 929 - 1014, The fenced ASCII diagram block
currently uses an untagged code fence (```); to satisfy markdownlint MD040,
update the opening fence in the "Detailed Entity Relationships" diagram (the
large block that begins with the ASCII box showing PARSER LAYER / ROLLUP LAYER)
to use a language tag by replacing the opening ``` with ```text so the block
starts with ```text while leaving the content and closing fence unchanged.
…ata models with source files
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@analysis/srtlog/parsers/benchmark/mooncake_router.py`:
- Around line 126-129: The truthy check on tps_per_user will drop legitimate
zero values; update the conditional in the block that calls self._get_metric
(variable tps_per_user) to check for None explicitly (e.g., "is not None")
before assigning result["output_tps_per_user"] so that 0 is preserved while
still filtering missing values returned as None.
♻️ Duplicate comments (7)
src/srtctl/cli/mixins/rollup/models.py (1)
394-416: Truthy checks drop valid zero values in aggregate stats.The list comprehensions use truthy filtering (e.g.,
if n.avg_input_throughput) which excludes legitimate0values. This can inflate averages or miss valid zero-throughput nodes.Proposed fix
- throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput] + throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput is not None] if throughputs: self.avg_prefill_input_throughput = sum(throughputs) / len(throughputs) - max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput] + max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput is not None] if max_throughputs: self.max_prefill_input_throughput = max(max_throughputs) # Decode aggregation (includes both decode and agg nodes) decode_capable_nodes = [n for n in self.nodes if n.worker_type in ("decode", "agg")] if decode_capable_nodes: - throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput] + throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput is not None] if throughputs: self.avg_decode_gen_throughput = sum(throughputs) / len(throughputs) - max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput] + max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput is not None] if max_throughputs: self.max_decode_gen_throughput = max(max_throughputs) # Memory aggregation - kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb] + kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb is not None] if kv_caches: self.total_kv_cache_gb = sum(kv_caches)analysis/srtlog/rollup_harness.py (2)
31-47: Mixed-type sort keys can cause TypeError.The sort key returns
intfor numeric directory names andstrfor non-numeric ones. ComparinginttostrraisesTypeErrorin Python 3.Proposed fix
# Sort by job ID (directory name) if numeric - job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name)) + job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name)) return job_dirs
350-371: Truthyoroperator drops valid zero throughput values.
data.get("output_throughput", 0) or data.get("output_tps", 0)treats0as falsy, so ifoutput_throughputis legitimately0, it incorrectly falls back tooutput_tps.Proposed fix
result = RollupResult( concurrency=data.get("max_concurrency", 0), - output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0), + output_tps=( + data.get("output_throughput") + if data.get("output_throughput") is not None + else data.get("output_tps", 0) + ),src/srtctl/cli/mixins/rollup_stage.py (4)
49-52: Theendpointsproperty always returnsNone.The ellipsis (
...) body causes implicitNonereturn, violating thelist[Endpoint]type annotation. For a mixin expecting the concrete class to provide this, raiseNotImplementedError.Proposed fix
`@property` def endpoints(self) -> list["Endpoint"]: """Endpoint allocation topology.""" - ... + raise NotImplementedError("Subclass must provide endpoints property")
129-149: Usegetattrwith callable check to satisfy type checker.The
hasattrchecks are runtime-safe but the type checker can't verify thatfind_aiperf_resultsandparse_result_directoryexist onBenchmarkParserProtocol.Proposed fix using getattr
if parser is not None: # For mooncake-router, look for AIPerf results - if hasattr(parser, "find_aiperf_results"): - aiperf_files = parser.find_aiperf_results(self.runtime.log_dir) + find_aiperf = getattr(parser, "find_aiperf_results", None) + if find_aiperf is not None: + aiperf_files = find_aiperf(self.runtime.log_dir) for aiperf_path in aiperf_files: result = parser.parse_result_json(aiperf_path) if result.get("output_tps") is not None: results.append(result) logger.debug("Loaded AIPerf result: %s", aiperf_path) # For sa-bench style, look for result directories - if hasattr(parser, "parse_result_directory"): + parse_result_dir = getattr(parser, "parse_result_directory", None) + if parse_result_dir is not None: for entry in self.runtime.log_dir.iterdir(): if not entry.is_dir(): continue if "_isl_" in entry.name and "_osl_" in entry.name: logger.debug("Found benchmark results directory: %s", entry.name) - dir_results = parser.parse_result_directory(entry) + dir_results = parse_result_dir(entry) results.extend(dir_results)
475-480: Add fallback forsglang-v2when backend issglang.The docstring at line 185 mentions fallback through parser versions, but the mapping doesn't include
sglang-v2.Proposed fix
parser_orders = { - "sglang": ["sglang"], + "sglang": ["sglang", "sglang-v2"], + "sglang-v2": ["sglang-v2", "sglang"], "trtllm": ["trtllm"], }
559-563: Preserveoutput_tpswhenoutput_throughputis missing.Unlike
rollup_harness.py, this code doesn't fall back to theoutput_tpskey. AIPerf results useoutput_tpsdirectly, causing valid throughput data to be lost.Proposed fix
result = RollupResult( concurrency=data.get("max_concurrency", 0), - output_tps=data.get("output_throughput", 0), + output_tps=( + data.get("output_throughput") + if data.get("output_throughput") is not None + else data.get("output_tps", 0) + ),
🧹 Nitpick comments (9)
analysis/srtlog/parsers/benchmark/mooncake_router.py (2)
33-79: Regex-based parsing is a fallback; prefer structured JSON.The
parsemethod uses regex patterns which can be fragile. Theparse_result_jsonmethod provides more robust parsing from AIPerf's structured output. Consider documenting that this method is intended as a fallback when JSON results aren't available.
92-94: Specify encoding when opening files.Opening files without explicit encoding can cause issues on systems with different default encodings.
Proposed fix
- with open(json_path) as f: + with open(json_path, encoding="utf-8") as f: data = json.load(f)analysis/srtlog/rollup_harness.py (3)
63-69: Specify encoding when opening YAML files.Proposed fix
try: import yaml - with open(config_path) as f: + with open(config_path, encoding="utf-8") as f: return yaml.safe_load(f)
291-299: Specify encoding when opening YAML config files.Proposed fix
if prefill_yaml.exists(): - with open(prefill_yaml) as f: + with open(prefill_yaml, encoding="utf-8") as f: env_config.prefill_engine_config = yaml.safe_load(f) if decode_yaml.exists(): - with open(decode_yaml) as f: + with open(decode_yaml, encoding="utf-8") as f: env_config.decode_engine_config = yaml.safe_load(f)
381-389: Specify encoding when writing JSON output.Proposed fix
rollup_path.parent.mkdir(parents=True, exist_ok=True) - with open(rollup_path, "w") as f: + with open(rollup_path, "w", encoding="utf-8") as f: json.dump(asdict(summary), f, indent=2, default=str)analysis/srtlog/parsers/benchmark/sa_bench.py (1)
101-103: Specify encoding when opening JSON files.Proposed fix
- with open(json_path) as f: + with open(json_path, encoding="utf-8") as f: data = json.load(f)src/srtctl/cli/mixins/rollup_stage.py (2)
508-512: Use ternary operator fortotal_gpuscalculation.Per pipeline warning, the if-else block can be simplified.
Proposed fix
- if is_disaggregated: - total_gpus = r.prefill_gpus + r.decode_gpus - else: - total_gpus = (r.agg_nodes or 1) * r.gpus_per_node + total_gpus = r.prefill_gpus + r.decode_gpus if is_disaggregated else (r.agg_nodes or 1) * r.gpus_per_node
597-612: Specify encoding when writing JSON output.Proposed fix
# Write with nice formatting - with open(path, "w") as f: + with open(path, "w", encoding="utf-8") as f: json.dump(data, f, indent=2, default=str)analysis/srtlog/parsers/__init__.py (1)
244-247: Star imports are necessary for auto-registration but consider documenting.The
# noqacomments suppress linting, but a brief comment explaining why star imports are used here would aid maintainability.Suggested documentation
-# Import parsers to trigger registration +# Import parsers to trigger registration via decorators. +# Star imports are intentional: each submodule uses `@register_`*_parser decorators +# that populate _benchmark_parsers and _node_parsers when the module loads. from analysis.srtlog.parsers.benchmark import * # noqa: E402, F401, F403 from analysis.srtlog.parsers.nodes import * # noqa: E402, F401, F403
| # Also extract per-user throughput if available | ||
| tps_per_user = self._get_metric(data, "output_token_throughput_per_user", "avg") | ||
| if tps_per_user: | ||
| result["output_tps_per_user"] = tps_per_user |
There was a problem hiding this comment.
Truthy check may exclude valid zero throughput.
if tps_per_user: will skip adding the field when the value is 0, which may be a valid measurement.
Proposed fix
tps_per_user = self._get_metric(data, "output_token_throughput_per_user", "avg")
- if tps_per_user:
+ if tps_per_user is not None:
result["output_tps_per_user"] = tps_per_user🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/benchmark/mooncake_router.py` around lines 126 - 129,
The truthy check on tps_per_user will drop legitimate zero values; update the
conditional in the block that calls self._get_metric (variable tps_per_user) to
check for None explicitly (e.g., "is not None") before assigning
result["output_tps_per_user"] so that 0 is preserved while still filtering
missing values returned as None.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@analysis/srtlog/rollup_harness.py`:
- Around line 276-303: The code mutates an EnvironmentConfig instance
(env_config) which will break once EnvironmentConfig is converted to a frozen
dataclass; instead, build a plain dict of constructor kwargs, populate keys from
config.get("backend", {}), load YAMLs from prefill_yaml and decode_yaml (using
logs_dir) into local variables like prefill_engine_config and
decode_engine_config, then construct a single immutable EnvironmentConfig(...)
with those kwargs and assign it to environment_config; reference
EnvironmentConfig, env_config (replace), environment_config, prefill_yaml,
decode_yaml and logs_dir when locating where to change the code.
♻️ Duplicate comments (5)
docs/architecture.md (3)
822-826: Add a language tag to this fenced block for MD040 compliance.Markdownlint flags this block; tagging it as
textis sufficient.
893-936: Add a language tag to the ER diagram fenced block.Markdownlint MD040 is triggered; set the opening fence to ```text.
940-1025: Add a language tag to the "Detailed Entity Relationships" fenced block.Markdownlint MD040 is triggered; set the opening fence to ```text.
analysis/srtlog/rollup_harness.py (2)
45-46: Avoid mixed-type sort keys in job directory ordering.
intandstrkeys can’t be compared in Python; mixed numeric/non‑numeric directory names will raiseTypeError.🛠️ Proposed fix
- job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name)) + job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))
356-358: Preserve explicit zero throughput values.Using
ortreats valid0as missing and overrides it.🛠️ Proposed fix
- output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0), + output_tps=( + data.get("output_throughput") + if data.get("output_throughput") is not None + else data.get("output_tps", 0) + ),
| # Parse environment config | ||
| environment_config = None | ||
| try: | ||
| import yaml | ||
| env_config = EnvironmentConfig() | ||
|
|
||
| backend_section = config.get("backend", {}) | ||
| if "prefill_environment" in backend_section: | ||
| env_config.prefill_environment = backend_section["prefill_environment"] | ||
| if "decode_environment" in backend_section: | ||
| env_config.decode_environment = backend_section["decode_environment"] | ||
| if "aggregated_environment" in backend_section: | ||
| env_config.aggregated_environment = backend_section["aggregated_environment"] | ||
|
|
||
| # Load TRTLLM config files | ||
| prefill_yaml = logs_dir / "trtllm_config_prefill.yaml" | ||
| decode_yaml = logs_dir / "trtllm_config_decode.yaml" | ||
|
|
||
| if prefill_yaml.exists(): | ||
| with open(prefill_yaml) as f: | ||
| env_config.prefill_engine_config = yaml.safe_load(f) | ||
| if decode_yaml.exists(): | ||
| with open(decode_yaml) as f: | ||
| env_config.decode_engine_config = yaml.safe_load(f) | ||
|
|
||
| if any([env_config.prefill_environment, env_config.decode_environment, | ||
| env_config.prefill_engine_config, env_config.decode_engine_config]): | ||
| environment_config = env_config |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find EnvironmentConfig definition
rg -n "class EnvironmentConfig" --type py -C 5Repository: ishandhanani/srt-slurm
Length of output: 886
🏁 Script executed:
#!/bin/bash
# Look at the actual code in rollup_harness.py to see the context
fd "rollup_harness.py" --type f | head -1 | xargs wc -lRepository: ishandhanani/srt-slurm
Length of output: 103
🏁 Script executed:
#!/bin/bash
# Get full EnvironmentConfig definition
rg -n "class EnvironmentConfig" --type py -A 30 src/srtctl/cli/mixins/rollup/models.pyRepository: ishandhanani/srt-slurm
Length of output: 1441
🏁 Script executed:
#!/bin/bash
# Check the actual code in rollup_harness.py at lines 276-303
sed -n '276,303p' analysis/srtlog/rollup_harness.pyRepository: ishandhanani/srt-slurm
Length of output: 1368
EnvironmentConfig must follow frozen dataclass guideline and use factory pattern for construction.
Per coding guidelines, EnvironmentConfig should use @dataclass(frozen=True) for immutability. The current mutation pattern will break once frozen is applied. Build the configuration via kwargs and construct once:
Fix
- env_config = EnvironmentConfig()
+ env_kwargs: dict[str, Any] = {}
backend_section = config.get("backend", {})
if "prefill_environment" in backend_section:
- env_config.prefill_environment = backend_section["prefill_environment"]
+ env_kwargs["prefill_environment"] = backend_section["prefill_environment"]
if "decode_environment" in backend_section:
- env_config.decode_environment = backend_section["decode_environment"]
+ env_kwargs["decode_environment"] = backend_section["decode_environment"]
if "aggregated_environment" in backend_section:
- env_config.aggregated_environment = backend_section["aggregated_environment"]
+ env_kwargs["aggregated_environment"] = backend_section["aggregated_environment"]
# Load TRTLLM config files
prefill_yaml = logs_dir / "trtllm_config_prefill.yaml"
decode_yaml = logs_dir / "trtllm_config_decode.yaml"
if prefill_yaml.exists():
with open(prefill_yaml) as f:
- env_config.prefill_engine_config = yaml.safe_load(f)
+ env_kwargs["prefill_engine_config"] = yaml.safe_load(f)
if decode_yaml.exists():
with open(decode_yaml) as f:
- env_config.decode_engine_config = yaml.safe_load(f)
+ env_kwargs["decode_engine_config"] = yaml.safe_load(f)
- if any([env_config.prefill_environment, env_config.decode_environment,
- env_config.prefill_engine_config, env_config.decode_engine_config]):
- environment_config = env_config
+ if env_kwargs:
+ environment_config = EnvironmentConfig(**env_kwargs)🤖 Prompt for AI Agents
In `@analysis/srtlog/rollup_harness.py` around lines 276 - 303, The code mutates
an EnvironmentConfig instance (env_config) which will break once
EnvironmentConfig is converted to a frozen dataclass; instead, build a plain
dict of constructor kwargs, populate keys from config.get("backend", {}), load
YAMLs from prefill_yaml and decode_yaml (using logs_dir) into local variables
like prefill_engine_config and decode_engine_config, then construct a single
immutable EnvironmentConfig(...) with those kwargs and assign it to
environment_config; reference EnvironmentConfig, env_config (replace),
environment_config, prefill_yaml, decode_yaml and logs_dir when locating where
to change the code.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.json`:
- Around line 1-2: The file is misnamed with a .json extension while containing
a Python script; rename upload_rollup.json to upload_rollups.py, preserve the
shebang (#!/usr/bin/env python3) and module docstring, and update any references
(scripts, CI configs, imports, README, or tooling) that point to
upload_rollup.json to the new filename upload_rollups.py so execution, linting,
and IDE tooling work correctly.
- Around line 170-182: The code is duplicating launch command data:
extract_launch_commands(data) and the loop populates top_level with keys like
"launch_commands.{worker_type}" (see extract_launch_commands and top_level
usage), but later output.update(launch_commands) re-inserts the raw lists under
plain worker_type keys; remove the output.update(launch_commands) call so only
the intended "launch_commands.{worker_type}" string entries are used (or if you
intended a nested dict, replace that line with output["launch_commands"] =
launch_commands); update tests/consumers that expect only the string keys
accordingly.
🧹 Nitpick comments (3)
analysis/scripts/upload_rollup.json (3)
78-81: Consider logging invalid concurrency values instead of silently ignoring them.Silently swallowing
ValueErrorcan hide data quality issues. Consider logging a warning when a concurrency part cannot be parsed.♻️ Suggested improvement
+import logging + +logger = logging.getLogger(__name__) + def parse_concurrencies(concurrencies_str: str) -> list[int]: ... try: result.append(int(part)) except ValueError: - pass + logger.warning(f"Invalid concurrency value ignored: '{part}'") return result
218-234: File read twice unnecessarily.The JSON file is read at line 219-220 to extract metadata, then read again at line 234 for upload. Consider reading once and reusing.
♻️ Proposed optimization
try: - with open(json_path) as f: - data = json.load(f) + json_content = json_path.read_text() + data = json.loads(json_content) except Exception as e: return False, f"Failed to read JSON: {e}" # Extract metadata benchmark_type = data.get("benchmark_type", "unknown") frontend_type = data.get("frontend_type", "unknown") backend_type = data.get("backend_type", "unknown") job_name = data.get("job_name", "unknown") # Use provided session_id or fall back to job_name effective_session_id = session_id if session_id else job_name - # Read the file content for upload - json_content = json_path.read_text() - try: response = requests.post(
367-372: Consider logging cleanup failures instead of silently ignoring.The bare
except Exception: passhides all errors during file cleanup. While cleanup failures may not be critical, logging them could help diagnose permission issues or other problems.♻️ Suggested improvement
if not args.keep_files and not args.dry_run: for fp in generated_files: try: fp.unlink() - except Exception: - pass + except Exception as e: + print(f"Warning: Failed to clean up {fp}: {e}", file=sys.stderr) print(f"Cleaned up {len(generated_files)} generated files")
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.py`:
- Around line 85-110: The docstring for extract_launch_commands is inaccurate:
it claims keys like "worker-0.launch_command.raw_command" but the function
actually returns a defaultdict (node_type_to_launch_command) mapping worker_type
strings (e.g., "prefill", "decode") to lists of prefixed raw command strings;
update the function docstring to describe that it reads
data["nodes_summary"]["nodes"], prefixes each raw_command with
"#{worker_type}:{node_name}\\n", and returns a dict-like mapping of worker_type
-> list[str] (or defaultdict(list)) containing those command strings, and
mention default values used for missing node_name, worker_type, or raw_command.
♻️ Duplicate comments (1)
analysis/scripts/upload_rollup.py (1)
177-187: Bug: launch commands added twice in conflicting formats.At line 173, launch commands are added as joined strings with keys like
launch_commands.prefill. Then at line 182,output.update(launch_commands)adds the raw lists with plainworker_typekeys (e.g.,prefill: ["cmd1", "cmd2"]).This results in duplicate data in conflicting formats. Line 182 should be removed.
🐛 Proposed fix
for concurrency in concurrencies: output = dict(top_level) output["concurrency"] = concurrency - # Add launch commands - output.update(launch_commands) - # Add result for this concurrency with "result_" prefix result = result_by_concurrency.get(concurrency, {})
🧹 Nitpick comments (2)
analysis/scripts/upload_rollup.py (2)
75-82: Silent suppression of parse errors may mask malformed input.When parsing fails (e.g.,
"32x64xabc"), the invalid part is silently dropped. Consider logging a warning to aid debugging.♻️ Suggested improvement
+import logging + +logger = logging.getLogger(__name__) + def parse_concurrencies(concurrencies_str: str) -> list[int]: ... for part in parts: part = part.strip() if part: try: result.append(int(part)) except ValueError: - pass + logger.warning(f"Skipping non-integer concurrency value: {part!r}") return result
218-234: File is read twice; consider consolidating.The file is opened at line 219 for JSON parsing, then read again at line 234 for upload content. This can be combined.
♻️ Suggested improvement
- try: - with open(json_path) as f: - data = json.load(f) - except Exception as e: - return False, f"Failed to read JSON: {e}" - - # Extract metadata - ... - - # Read the file content for upload - json_content = json_path.read_text() + try: + json_content = json_path.read_text() + data = json.loads(json_content) + except Exception as e: + return False, f"Failed to read JSON: {e}" + + # Extract metadata + ...
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.py`:
- Around line 131-138: The code builds a contextualized string in
raw_command_str (including worker_type, node_name, worker_env_config,
engine_config and raw_command) but mistakenly appends raw_command to
node_type_to_launch_command[worker_type]; change the append to add
raw_command_str instead so the list stores the formatted command with its
metadata (update the append call on node_type_to_launch_command in the block
that constructs raw_command_str).
- Around line 191-205: The sbatch_script block uses raw_cmd which may never be
set if benchmark_cmd is falsy; initialize or derive raw_cmd before it’s
referenced (e.g., set raw_cmd = benchmark_cmd.get("raw_command", "") right after
obtaining benchmark_cmd) and then use that safe value in the top_level.update
call (the code that writes "benchmark_command": f"#sbatch_script\n\n{raw_cmd}").
Ensure raw_cmd is defined (or fallback to an empty string) so top_level.update
and the sbatch_script handling cannot raise a NameError.
♻️ Duplicate comments (3)
analysis/srtlog/rollup_harness.py (3)
31-47: Fix mixed-type sort keys to avoidTypeError.When both numeric and non-numeric directory names exist, the sort key mixes
intandstrtypes, causing aTypeErrorin Python 3.🛠️ Proposed fix
- job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name)) + job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))
276-303:EnvironmentConfigmutation will break once frozen dataclass guideline is applied.Per coding guidelines, configuration objects should use
@dataclass(frozen=True). The current mutation pattern will fail. Build kwargs dict and construct once.🛠️ Proposed fix
- env_config = EnvironmentConfig() + env_kwargs: dict[str, Any] = {} backend_section = config.get("backend", {}) if "prefill_environment" in backend_section: - env_config.prefill_environment = backend_section["prefill_environment"] + env_kwargs["prefill_environment"] = backend_section["prefill_environment"] if "decode_environment" in backend_section: - env_config.decode_environment = backend_section["decode_environment"] + env_kwargs["decode_environment"] = backend_section["decode_environment"] if "aggregated_environment" in backend_section: - env_config.aggregated_environment = backend_section["aggregated_environment"] + env_kwargs["aggregated_environment"] = backend_section["aggregated_environment"] # Load TRTLLM config files prefill_yaml = logs_dir / "trtllm_config_prefill.yaml" decode_yaml = logs_dir / "trtllm_config_decode.yaml" if prefill_yaml.exists(): with open(prefill_yaml) as f: - env_config.prefill_engine_config = yaml.safe_load(f) + env_kwargs["prefill_engine_config"] = yaml.safe_load(f) if decode_yaml.exists(): with open(decode_yaml) as f: - env_config.decode_engine_config = yaml.safe_load(f) + env_kwargs["decode_engine_config"] = yaml.safe_load(f) - if any([env_config.prefill_environment, env_config.decode_environment, - env_config.prefill_engine_config, env_config.decode_engine_config]): - environment_config = env_config + if env_kwargs: + environment_config = EnvironmentConfig(**env_kwargs)Based on learnings, frozen dataclasses should be used for configuration objects.
354-358: Avoid treating0throughput as "missing."The
oroperator will override valid0values, potentially skewing rollup data.🛠️ Proposed fix
- output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0), + output_tps=( + data.get("output_throughput") + if data.get("output_throughput") is not None + else data.get("output_tps", 0) + ),
🧹 Nitpick comments (1)
analysis/scripts/upload_rollup.py (1)
484-490: Consider logging cleanup failures instead of silently ignoring them.The bare
except Exception: passhides potential filesystem issues. While not critical, logging at debug level would aid troubleshooting.♻️ Proposed improvement
if not args.keep_files and not args.dry_run: for fp in generated_files: try: fp.unlink() - except Exception: - pass + except Exception as e: + print(f" Warning: Could not delete {fp}: {e}", file=sys.stderr) print(f"\nCleaned up {len(generated_files)} generated files")
| raw_command_str = f"#{worker_type}:{node_name}" | ||
| if worker_env_config: | ||
| raw_command_str += f"\n#env_vars:{worker_env_config}" | ||
| if engine_config: | ||
| raw_command_str += f"\n#engine_config:{engine_config}" | ||
| raw_command_str += f"\n{raw_command}" | ||
|
|
||
| node_type_to_launch_command[worker_type].append(raw_command) |
There was a problem hiding this comment.
Bug: raw_command_str is built but raw_command is appended instead.
The function constructs raw_command_str with environment and engine config context (lines 131-136), but then appends the original raw_command to the list, discarding all the formatted context.
🐛 Proposed fix
raw_command_str = f"#{worker_type}:{node_name}"
if worker_env_config:
raw_command_str += f"\n#env_vars:{worker_env_config}"
if engine_config:
raw_command_str += f"\n#engine_config:{engine_config}"
raw_command_str += f"\n{raw_command}"
- node_type_to_launch_command[worker_type].append(raw_command)
+ node_type_to_launch_command[worker_type].append(raw_command_str)🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 131 - 138, The code builds a
contextualized string in raw_command_str (including worker_type, node_name,
worker_env_config, engine_config and raw_command) but mistakenly appends
raw_command to node_type_to_launch_command[worker_type]; change the append to
add raw_command_str instead so the list stores the formatted command with its
metadata (update the append call on node_type_to_launch_command in the block
that constructs raw_command_str).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.py`:
- Around line 411-414: The metadata assignment currently hardcodes
df["experiment_id"] = 'STUDY' which is a placeholder; replace it so
experiment_id is set from the proper source (e.g., use args.study_id or the
correct experiment identifier variable) so that df["experiment_id"] and
df["study_id"] are populated with meaningful values; update the block that sets
df["experiment_id"], df["study_id"], and df["user_login"] in upload_rollup.py
accordingly (use args.study_id if experiment_id should match study_id, or the
correct experiment id variable if different).
♻️ Duplicate comments (2)
analysis/scripts/upload_rollup.py (2)
131-138: Bug:raw_command_stris built butraw_commandis appended instead.The function constructs
raw_command_strwith worker type, node name, environment config, and engine config context (lines 131-136), but then appends the originalraw_commandto the list (line 138), discarding all the formatted context.🐛 Proposed fix
raw_command_str = f"#{worker_type}:{node_name}" if worker_env_config: raw_command_str += f"\n#env_vars:{worker_env_config}" if engine_config: raw_command_str += f"\n#engine_config:{engine_config}" raw_command_str += f"\n{raw_command}" - node_type_to_launch_command[worker_type].append(raw_command) + node_type_to_launch_command[worker_type].append(raw_command_str)
191-205: PotentialNameError:raw_cmdmay be undefined whensbatch_scriptexists.If
benchmark_cmdis falsy (empty dict,None, or missing),raw_cmdis never defined. However, line 203 usesraw_cmdunconditionally whensbatch_scriptexists, causing aNameError.🐛 Proposed fix
# Extract benchmark_command raw_command only benchmark_cmd = data.get("benchmark_command", {}) + raw_cmd = None if benchmark_cmd: raw_cmd = benchmark_cmd.get("raw_command") - + if raw_cmd: top_level["benchmark_command"] = raw_cmd - + sbatch_script = data.get("sbatch_script") - if sbatch_script: + if sbatch_script and raw_cmd: top_level.update( { "benchmark_command": f"#sbatch_script\n\n{raw_cmd}", } )
🧹 Nitpick comments (2)
analysis/scripts/upload_rollup.py (2)
302-305: Consider narrowing the exception handler.Bare
except Exceptioncould mask important errors like permission issues or encoding problems. Consider catching specific exceptions.♻️ Suggested improvement
if sbatch_path.exists(): try: return sbatch_path.read_text() - except Exception: + except (OSError, UnicodeDecodeError) as e: + # Log or handle specific failure cases if needed return None
448-451: Filename sanitization only handles/character.The sanitization replaces
/with_, but other problematic characters (spaces,\,:,*,?,",<,>,|) could cause issues on some filesystems.♻️ Suggested improvement
+import re + +def _sanitize_filename_part(s: str) -> str: + """Sanitize a string for use in filenames.""" + return re.sub(r'[\\/:*?"<>|\s]+', '_', str(s)) + # Create group Parquet filename - safe_backend = str(backend).replace("/", "_") - safe_frontend = str(frontend).replace("/", "_") - safe_benchmarker = str(benchmarker).replace("/", "_") + safe_backend = _sanitize_filename_part(backend) + safe_frontend = _sanitize_filename_part(frontend) + safe_benchmarker = _sanitize_filename_part(benchmarker)
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.py`:
- Around line 2-18: The module docstring and CLI usage claim the script writes
Parquet but the implementation actually writes .json files and posts gzipped
JSON; update the module-level docstring and the CLI/help text to accurately
describe the current behavior (or, if you want Parquet, change the
implementation to serialize to Parquet instead). Specifically, edit the module
docstring and the usage/help string to state “gzipped JSON” (or add a --mode
flag to choose between parquet/json), and ensure the upload routine (the
function that writes .json and posts gzipped JSON) and any filenames/extensions
match the documented mode so docs and implementation are consistent.
- Around line 174-194: The type annotation for the defaultdict named groups is
incorrect: it declares a 3-tuple key (defaultdict[tuple[str, str, str], ...])
but the code builds a 4-tuple key as group = (data['benchmark_type'],
data['frontend_type'], data['backend_type'], mode); update the annotation at the
groups declaration to use a 4-tuple (tuple[str, str, str, str]) so the type
matches the actual key shape used by the loop that processes rollup_path and
constructs group.
- Around line 33-57: The CLI/handler uses the name study_id but upload_json
currently accepts and posts session_id, causing a payload name mismatch; fix by
aligning names: change upload_json's parameter from session_id to study_id (and
update its docstring), change the JSON payload key from "session_id" to
"study_id", and update every call site that passes the CLI value (the CLI
variable study_id and any callers of upload_json) to use the renamed parameter
so the sent payload matches the server contract.
- Around line 192-200: The code computes mode using the wrong field and directly
indexes required keys which can raise KeyError; change the mode computation to
use data.get("is_disaggregated") (invert logic accordingly) and add defensive
validation before accessing data['benchmark_type'], data['frontend_type'],
data['backend_type'], and data['job_name'] (used later around
read_sbatch_script/ groups[group].append). Follow the function's existing
error-handling pattern: check for missing keys, log an error identifying the
missing field(s) and the rollup_path or job_name, and skip/continue that rollup
instead of indexing into data; ensure read_sbatch_script(rollup_path) usage
remains unchanged.
🧹 Nitpick comments (1)
analysis/scripts/upload_rollup.py (1)
118-118: Add an explicit return type formain.Minor type-hinting polish: declare
main() -> Nonefor consistency and to satisfy the “type hints everywhere” guideline. As per coding guidelines.🧩 Suggested tweak
-def main(): +def main() -> None:
| """Upload rollup.json files to SRT endpoint as Parquet files grouped by backend/frontend/benchmarker. | ||
|
|
||
| Process: | ||
| 1. Find all rollup.json files in the directory | ||
| 2. Create flattened JSON for each concurrency level | ||
| 3. Combine all into a single Parquet file | ||
| 4. Group by (backend_type, frontend_type, benchmark_type) | ||
| 5. Upload separate Parquet file for each group | ||
|
|
||
| Usage: | ||
| python upload_rollup.py <directory> <user_login> [--study-id STUDY] [--endpoint URL] | ||
|
|
||
| Example: | ||
| python upload_rollup.py ./outputs user@example.com | ||
| python upload_rollup.py ./outputs user@example.com --study-id my-study | ||
| python upload_rollup.py ./outputs user@example.com --endpoint that accepts the upload | ||
| """ |
There was a problem hiding this comment.
Docs mention Parquet but the script uploads gzipped JSON.
The module docstring and CLI description say “Parquet,” while the implementation writes .json files and posts gzipped JSON. Please align the wording (and include mode if intentional) to avoid confusion.
✏️ Suggested text update
-"""Upload rollup.json files to SRT endpoint as Parquet files grouped by backend/frontend/benchmarker.
+"""Upload rollup.json files to SRT endpoint as gzipped JSON files grouped by backend/frontend/benchmarker/mode.
Process:
1. Find all rollup.json files in the directory
-2. Create flattened JSON for each concurrency level
-3. Combine all into a single Parquet file
-4. Group by (backend_type, frontend_type, benchmark_type)
-5. Upload separate Parquet file for each group
+2. Group by (benchmark_type, frontend_type, backend_type, mode)
+3. Write a JSON file per group
+4. Upload a gzipped JSON file for each group
...
- description="Upload rollup.json files to SRT endpoint as Parquet (grouped by backend/frontend/benchmarker)"
+ description="Upload rollup.json files to SRT endpoint as gzipped JSON (grouped by backend/frontend/benchmarker/mode)"Also applies to: 118-121
🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 2 - 18, The module docstring
and CLI usage claim the script writes Parquet but the implementation actually
writes .json files and posts gzipped JSON; update the module-level docstring and
the CLI/help text to accurately describe the current behavior (or, if you want
Parquet, change the implementation to serialize to Parquet instead).
Specifically, edit the module docstring and the usage/help string to state
“gzipped JSON” (or add a --mode flag to choose between parquet/json), and ensure
the upload routine (the function that writes .json and posts gzipped JSON) and
any filenames/extensions match the documented mode so docs and implementation
are consistent.
| def upload_json( | ||
| json_path: Path, | ||
| user_login: str, | ||
| session_id: str, | ||
| endpoint: str, | ||
| backend: str, | ||
| benchmarker: str, | ||
| frontend: str, | ||
| mode: str, | ||
| ) -> tuple[bool, str]: | ||
| """Upload a gzipped JSON file to the endpoint. | ||
|
|
||
| Args: | ||
| json_path: Path to the JSON file | ||
| user_login: User login/email | ||
| session_id: Session ID for the upload | ||
| endpoint: API endpoint URL | ||
| backend: Backend type | ||
| benchmarker: Benchmark type | ||
| frontend: Frontend type | ||
| mode: Mode (disaggregated or aggregated) | ||
|
|
||
| Returns: | ||
| Tuple of (success, message) | ||
| """ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "study_id|session_id" -S --type=py | head -100Repository: ishandhanani/srt-slurm
Length of output: 846
🏁 Script executed:
# Get more context around the upload_json function to see what's actually posted
cat -n analysis/scripts/upload_rollup.py | sed -n '33,100p'Repository: ishandhanani/srt-slurm
Length of output: 2535
🏁 Script executed:
# Check for CLI parameter definition
fd -e py -x rg -l "study.id|study-id" {}Repository: ishandhanani/srt-slurm
Length of output: 101
🏁 Script executed:
cat -n analysis/scripts/upload_rollup.py | sed -n '230,250p'Repository: ishandhanani/srt-slurm
Length of output: 758
🏁 Script executed:
# Search for any API documentation or comments about expected field names
rg -n -B2 -A2 "study_id|session_id" analysis/scripts/upload_rollup.py | head -80Repository: ishandhanani/srt-slurm
Length of output: 1261
🏁 Script executed:
# Check if there's any server-side code or documentation
fd -e py -e md | xargs rg -l "study_id.*session_id|session_id.*study_id" 2>/dev/null || echo "No files mention both"Repository: ishandhanani/srt-slurm
Length of output: 87
Confirm naming mismatch between CLI and server payload: study_id vs. session_id.
The CLI captures study_id (line 234), but upload_json() receives it as session_id (line 36) and posts it as "session_id" in the payload (line 70). If the server API expects study_id, uploads will be rejected or misattributed. Verify the server contract and align parameter names consistently.
🐛 Possible alignment (if the API expects `study_id`)
-def upload_json(
- json_path: Path,
- user_login: str,
- session_id: str,
- endpoint: str,
- backend: str,
- benchmarker: str,
- frontend: str,
- mode: str,
-) -> tuple[bool, str]:
+def upload_json(
+ json_path: Path,
+ user_login: str,
+ study_id: str,
+ endpoint: str,
+ backend: str,
+ benchmarker: str,
+ frontend: str,
+ mode: str,
+) -> tuple[bool, str]:
@@
- session_id: Session ID for the upload
+ study_id: Study ID for the upload
@@
- "session_id": session_id,
+ "study_id": study_id,Also applies to: 68–74, 240–244
🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 33 - 57, The CLI/handler uses
the name study_id but upload_json currently accepts and posts session_id,
causing a payload name mismatch; fix by aligning names: change upload_json's
parameter from session_id to study_id (and update its docstring), change the
JSON payload key from "session_id" to "study_id", and update every call site
that passes the CLI value (the CLI variable study_id and any callers of
upload_json) to use the renamed parameter so the sent payload matches the server
contract.
| groups: defaultdict[tuple[str, str, str], list[dict]] = defaultdict(list) | ||
|
|
||
| for rollup_path in sorted(rollup_files): | ||
| print(f"Processing: {rollup_path}") | ||
|
|
||
| try: | ||
| with open(rollup_path) as f: | ||
| data = json.load(f) | ||
| except Exception as e: | ||
| print(f" ✗ Failed to read: {e}") | ||
| failed_count += 1 | ||
| continue | ||
|
|
||
| # Skip if no nodes_summary (job likely failed/cancelled) | ||
| if not data.get("nodes_summary"): | ||
| print(" ⚠ Skipping: no nodes_summary (job may have failed)") | ||
| continue | ||
|
|
||
| mode = "aggregated" if data.get("is_aggregated") else "disaggregated" | ||
| group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode) | ||
| # Read sbatch script for this job |
There was a problem hiding this comment.
groups key annotation is a 3‑tuple but the key is a 4‑tuple.
Line 174 declares defaultdict[tuple[str, str, str], ...], yet Line 193 constructs (benchmark_type, frontend_type, backend_type, mode). This will fail ty and mislead readers. Update the annotation to a 4‑tuple. As per coding guidelines, keep type annotations accurate for ty.
✅ Fix type annotation
-groups: defaultdict[tuple[str, str, str], list[dict]] = defaultdict(list)
+groups: defaultdict[tuple[str, str, str, str], list[dict]] = defaultdict(list)🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 174 - 194, The type
annotation for the defaultdict named groups is incorrect: it declares a 3-tuple
key (defaultdict[tuple[str, str, str], ...]) but the code builds a 4-tuple key
as group = (data['benchmark_type'], data['frontend_type'], data['backend_type'],
mode); update the annotation at the groups declaration to use a 4-tuple
(tuple[str, str, str, str]) so the type matches the actual key shape used by the
loop that processes rollup_path and constructs group.
| mode = "aggregated" if data.get("is_aggregated") else "disaggregated" | ||
| group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode) | ||
| # Read sbatch script for this job | ||
| sbatch_script = read_sbatch_script(rollup_path) | ||
|
|
||
| # Add sbatch script to each row | ||
| if sbatch_script: | ||
| data["sbatch_script"] = sbatch_script | ||
| groups[group].append(data) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the file and get its context
find . -name "upload_rollup.py" -type fRepository: ishandhanani/srt-slurm
Length of output: 101
🏁 Script executed:
# Get the full context around the lines in question (lines 180-250)
sed -n '180,250p' analysis/scripts/upload_rollup.py | cat -nRepository: ishandhanani/srt-slurm
Length of output: 2759
🏁 Script executed:
# Search for rollup schema definitions or sample data
fd -e json -e py | xargs rg -l "rollup|is_aggregated|is_disaggregated|benchmark_type|frontend_type|backend_type" | head -20Repository: ishandhanani/srt-slurm
Length of output: 690
🏁 Script executed:
# Check the rollup models definition
head -100 src/srtctl/cli/mixins/rollup/models.pyRepository: ishandhanani/srt-slurm
Length of output: 3072
🏁 Script executed:
# Check test data for rollup
grep -A 20 "is_aggregated\|is_disaggregated\|benchmark_type" tests/test_rollup.py | head -50Repository: ishandhanani/srt-slurm
Length of output: 1830
🏁 Script executed:
# Check the rollup harness
grep -B 5 -A 10 "is_aggregated\|is_disaggregated" src/srtctl/cli/mixins/rollup_stage.py analysis/srtlog/rollup_harness.pyRepository: ishandhanani/srt-slurm
Length of output: 8362
🏁 Script executed:
# Check the schema.py for field definitions
grep -B 2 -A 5 "benchmark_type\|frontend_type\|backend_type\|is_aggregated\|job_name" src/srtctl/core/schema.pyRepository: ishandhanani/srt-slurm
Length of output: 849
🏁 Script executed:
# Check the RollupSummary dataclass to see all expected fields
grep -A 100 "^class RollupSummary" src/srtctl/cli/mixins/rollup/models.py | head -150Repository: ishandhanani/srt-slurm
Length of output: 2757
🏁 Script executed:
# Verify the field usage in upload_rollup.py more broadly
cat -n analysis/scripts/upload_rollup.py | grep -B 5 -A 5 "is_aggregated\|benchmark_type\|frontend_type\|backend_type\|job_name" | head -60Repository: ishandhanani/srt-slurm
Length of output: 2180
Fix field name and add defensive validation for missing rollup fields.
Line 192 checks is_aggregated but the schema defines is_disaggregated, causing mode to always be calculated as "disaggregated". Lines 193 and 236 directly index required fields without checks—missing benchmark_type, frontend_type, backend_type, or job_name will raise KeyError and abort processing. Add validation to match the existing error-handling pattern in this function:
🛡️ Defensive checks
- mode = "aggregated" if data.get("is_aggregated") else "disaggregated"
- group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode)
+ mode = "aggregated" if data.get("is_disaggregated") else "disaggregated"
+ try:
+ benchmark_type = data["benchmark_type"]
+ frontend_type = data["frontend_type"]
+ backend_type = data["backend_type"]
+ except KeyError as exc:
+ print(f" ✗ Skipping: missing {exc} in rollup.json")
+ failed_count += 1
+ continue
+ group = (benchmark_type, frontend_type, backend_type, mode)
@@
- study_id = rows[0]["job_name"]
+ study_id = rows[0].get("job_name")
+ if not study_id:
+ print(" ✗ Missing job_name; skipping upload")
+ upload_failed_count += 1
+ continueAlso applies to: 232-236
🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 192 - 200, The code computes
mode using the wrong field and directly indexes required keys which can raise
KeyError; change the mode computation to use data.get("is_disaggregated")
(invert logic accordingly) and add defensive validation before accessing
data['benchmark_type'], data['frontend_type'], data['backend_type'], and
data['job_name'] (used later around read_sbatch_script/ groups[group].append).
Follow the function's existing error-handling pattern: check for missing keys,
log an error identifying the missing field(s) and the rollup_path or job_name,
and skip/continue that rollup instead of indexing into data; ensure
read_sbatch_script(rollup_path) usage remains unchanged.
| @property | ||
| def endpoints(self) -> list[Endpoint]: | ||
| """Endpoint allocation topology.""" | ||
| ... |
There was a problem hiding this comment.
raise NotImplementedError/NotApplicable
Summary
This MR introduces a new Rollup Stage to the
SweepOrchestratorthat consolidates all experiment data (benchmark results, node metrics, environment configuration) into a singlerollup.jsonfile after benchmark completion. It also adds a modular parser system for extensible parsing of different benchmark types and backend log formats.Changes
New Files (15 files, ~4700+ lines added):
src/srtctl/cli/mixins/rollup_stage.pyRollupSummary,NodeRollup,NodesSummary,EnvironmentConfig)analysis/srtlog/parsers/__init__.pyBenchmarkParserProtocol,NodeParserProtocol), and dataclassesanalysis/srtlog/parsers/benchmark/sa_bench.pyparse_launch_commandanalysis/srtlog/parsers/benchmark/mooncake_router.pyanalysis/srtlog/parsers/nodes/sglang.pyanalysis/srtlog/parsers/nodes/sglang_v2.pyanalysis/srtlog/parsers/nodes/trtllm.pytests/test_rollup.pytests/test_configs.pyModified Files:
src/srtctl/cli/do_sweep.pyRollupStageMixin, callsrun_rollup()after benchmarksrc/srtctl/cli/mixins/__init__.pyRollupStageMixindocs/architecture.mdsrc/srtctl/README.mdFeatures
1. Rollup Stage Pipeline
Collects benchmark results, worker logs, config.yaml, and engine configs → outputs
rollup.json2. Modular Parser System
@register_benchmark_parserand@register_node_parserdecoratorsget_benchmark_parser("sa-bench")andget_node_parser("trtllm")3. Node Metrics Extraction
4. Environment Config Collection
prefill_environment,decode_environment,aggregated_environment5. Output:
rollup.jsonContains job metadata, benchmark results array, nodes summary with per-node metrics, and environment/engine configuration.
Testing
test_rollup.pycovering all dataclasses and mixin methodstest_configs.pyfor output directory structureSupported Backends/Benchmarks
sa-bench,mooncake-routersglang,sglang-v2,trtllmSummary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.