Add Rollup Stage for Experiment Data Consolidation by KaunilD · Pull Request #92 · ishandhanani/srt-slurm

KaunilD · 2026-01-22T16:38:19Z

Summary

This MR introduces a new Rollup Stage to the SweepOrchestrator that consolidates all experiment data (benchmark results, node metrics, environment configuration) into a single rollup.json file after benchmark completion. It also adds a modular parser system for extensible parsing of different benchmark types and backend log formats.

Changes

New Files (15 files, ~4700+ lines added):

File	Description
`src/srtctl/cli/mixins/rollup_stage.py`	Main rollup logic with dataclasses (`RollupSummary`, `NodeRollup`, `NodesSummary`, `EnvironmentConfig`)
`analysis/srtlog/parsers/__init__.py`	Parser registry, protocols (`BenchmarkParserProtocol`, `NodeParserProtocol`), and dataclasses
`analysis/srtlog/parsers/benchmark/sa_bench.py`	SA-Bench result parser with `parse_launch_command`
`analysis/srtlog/parsers/benchmark/mooncake_router.py`	Mooncake router result parser
`analysis/srtlog/parsers/nodes/sglang.py`	SGLang log parser (DP/TP/EP tag format)
`analysis/srtlog/parsers/nodes/sglang_v2.py`	SGLang v2 log parser (timestamp-only format)
`analysis/srtlog/parsers/nodes/trtllm.py`	TRTLLM log parser (iteration logs, memory metrics)
`tests/test_rollup.py`	Comprehensive tests for rollup functionality
`tests/test_configs.py`	Output directory structure tests

Modified Files:

File	Change
`src/srtctl/cli/do_sweep.py`	Integrates `RollupStageMixin`, calls `run_rollup()` after benchmark
`src/srtctl/cli/mixins/__init__.py`	Exports `RollupStageMixin`
`docs/architecture.md`	Documents rollup architecture, parser system, output format
`src/srtctl/README.md`	Adds rollup stage documentation

Features

1. Rollup Stage Pipeline

Collects benchmark results, worker logs, config.yaml, and engine configs → outputs rollup.json

2. Modular Parser System

Register parsers via @register_benchmark_parser and @register_node_parser decorators
Access parsers via get_benchmark_parser("sa-bench") and get_node_parser("trtllm")

3. Node Metrics Extraction

Batch metrics: iteration counts, throughput, running requests
Memory metrics: usage, KV cache, available memory
Config extraction: TP/PP/EP sizes, batch sizes, sequence lengths

4. Environment Config Collection

prefill_environment, decode_environment, aggregated_environment
Engine configs from YAML files or inline config

5. Output: `rollup.json`

Contains job metadata, benchmark results array, nodes summary with per-node metrics, and environment/engine configuration.

Testing

18 new tests in test_rollup.py covering all dataclasses and mixin methods
7 new tests in test_configs.py for output directory structure
All tests passing ✅

Supported Backends/Benchmarks

Type	Parsers
Benchmark	`sa-bench`, `mooncake-router`
Node logs	`sglang`, `sglang-v2`, `trtllm`

Summary by CodeRabbit

New Features
- Rollup stage & CLI mixin to generate structured rollup.json summaries and a rollup harness for batch processing
- Pluggable benchmark/node parsing system with new benchmark parsers (SA‑Bench, Mooncake Router) and node parsers (SGLang, TRTLLM)
- Upload tool to aggregate rollup.json into Parquet for downstream analysis
Documentation
- Architecture docs updated to include Rollup stage, mixin and extension points
Tests
- Comprehensive tests for rollup logic and output directory behavior
Chores
- Standardized benchmark script command construction; added pandas dependency

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-22T16:38:42Z

📝 Walkthrough

Walkthrough

Adds a pluggable benchmark/node parsing framework, two benchmark parsers and two node parsers, rollup dataclasses and a RollupStageMixin (plus CLI integration), a standalone rollup harness and upload tool, script invocation refactors, tests, and a pandas dependency.

Changes

Cohort / File(s)	Summary
Parsing framework core `analysis/srtlog/parsers/__init__.py`	New pluggable parser API: dataclasses (`BenchmarkLaunchCommand`, `NodeLaunchCommand`), protocols (`BenchmarkParserProtocol`, `NodeParserProtocol`), registries, register/get decorators, listing helpers, and auto-imports to register concrete parsers.
Benchmark parsers `analysis/srtlog/parsers/benchmark/__init__.py`, `analysis/srtlog/parsers/benchmark/mooncake_router.py`, `analysis/srtlog/parsers/benchmark/sa_bench.py`	New SABenchParser and MooncakeRouterParser: registered types, text & JSON parsing, result-directory helpers, launch-command extraction, and metric normalization.
Node parsers `analysis/srtlog/parsers/nodes/__init__.py`, `analysis/srtlog/parsers/nodes/sglang.py`, `analysis/srtlog/parsers/nodes/trtllm.py`	New SGLangNodeParser and TRTLLMNodeParser: per-file parsing for batch/memory metrics, filename node extraction, iteration parsing, engine/config extraction, and node launch-command parsing; registered by backend.
Rollup datamodels `src/srtctl/cli/mixins/rollup/__init__.py`, `src/srtctl/cli/mixins/rollup/models.py`	New rollup dataclasses: `RollupSummary`, `RollupResult`, `NodeRollup`, `NodesSummary`, `EnvironmentConfig`, `LaunchCommandRollup` with conversion/aggregation helpers and compute_summary_stats.
Rollup orchestration mixin `src/srtctl/cli/mixins/rollup_stage.py`, `src/srtctl/cli/do_sweep.py`, `src/srtctl/cli/mixins/__init__.py`	Added `RollupStageMixin`, integrated into SweepOrchestrator; run_rollup collects benchmark/node metrics, environment/config, launch commands, builds and writes `rollup.json`.
Standalone rollup & upload `analysis/srtlog/rollup_harness.py`, `analysis/scripts/upload_rollup.py`	New rollup harness to discover job folders and produce rollup.json; upload tool flattens rollups into Parquet, groups by backend/frontend/benchmarker, and POSTs grouped files.
Benchmark script invocations `src/srtctl/benchmarks/scripts/.../bench.sh` (gpqa, longbenchv2, mmlu, mooncake-router, profiling, router, sa-bench)	Replaced multi-line direct Python invocations with single constructed command strings echoed and executed via `eval`; added defensive result-file handling and logging of `[CMD]`.
Docs & config `docs/architecture.md`, `src/srtctl/README.md`, `pyproject.toml`	Docs updated to include RollupStage and trtllm backend; `pandas>=2.3.3` added to pyproject dependencies.
Tests `tests/test_configs.py`, `tests/test_rollup.py`	New and expanded tests for output directory behavior, rollup datamodels, aggregation logic, RollupStageMixin collection, and end-to-end rollup integration.

Sequence Diagram(s)

sequenceDiagram
    participant Orch as SweepOrchestrator
    participant Rollup as RollupStageMixin
    participant BenchFactory as BenchmarkParserFactory
    participant NodeFactory as NodeParserFactory
    participant Logs as LogDirectory
    participant Agg as RollupAggregator
    participant File as rollup.json

    Orch->>Rollup: run_rollup(tags?)
    Rollup->>BenchFactory: get_benchmark_parser(benchmark_type)
    BenchFactory->>Logs: parse_result_json()/parse_result_directory()
    Logs-->>BenchFactory: benchmark metrics
    Rollup->>NodeFactory: get_node_parser(backend_type)
    NodeFactory->>Logs: parse_logs(log_dir)
    Logs-->>NodeFactory: NodeMetrics[]
    Rollup->>Rollup: _collect_environment_config()
    Rollup->>Agg: _build_rollup_summary(benchmarks, nodes, env, commands)
    Agg->>Agg: compute_summary_stats()
    Rollup->>File: _write_rollup(RollupSummary)
    File-->>Orch: rollup.json path

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

complete refactor #65 — strongly related: touches srt-slurm parsing/rollup subsystems and appears to implement overlapping parser and rollup changes.

Poem

🐰
I nibbled logs and chased each line,
Parsed JSON crumbs and regex fine,
I hopped through nodes and stitched the tune,
Rollup gleams beneath the moon,
A carrot-coded summary — hop, review, refine!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.52% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'Add Rollup Stage for Experiment Data Consolidation' clearly and concisely summarizes the main objective of the changeset: introducing a rollup stage to consolidate experiment data.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@analysis/srtlog/parsers/nodes/sglang.py`:
- Around line 93-95: The log-reading try block that opens log_path and iterates
lines should guard against UnicodeDecodeError by opening the file with an
explicit encoding and a replacement error handler; update the with
open(log_path) call in the try block that reads lines to use encoding="utf-8"
and errors="replace" so non‑UTF8 bytes are replaced instead of raising and
dropping metrics (keep the rest of the loop logic unchanged and preserve the
existing try/except behavior).

In `@docs/architecture.md`:
- Around line 822-826: The fenced code block that renders the "ROLLUP STAGE
ARCHITECTURE" ASCII art is missing a language tag and triggers MD040; update the
opening fence from ``` to ```text so the block is tagged as plain text (i.e.,
replace the triple backticks before the ASCII box with ```text) to satisfy
Markdownlint. Locate the fenced block containing the "ROLLUP STAGE ARCHITECTURE"
header and modify only the opening fence.

In `@src/srtctl/cli/do_sweep.py`:
- Around line 211-214: The code uses hasattr(self.config, "tags") which leaves
tags typed as object and fails the run_rollup(tags: list[str] | None) signature;
update the extraction to explicitly narrow to list[str] | None before calling
run_rollup (e.g., check isinstance(self.config.tags, list) and verify items are
str, or use typing.cast[list[str] | None](...) after validating) so that the
local variable tags is typed as list[str] | None and then call
self.run_rollup(tags=tags).

In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 561-564: The endpoints property currently has an ellipsis body
which returns None and violates the declared return type list["Endpoint"];
replace the ellipsis with either (a) remove the `@property` method and declare a
class-level type hint endpoints: list["Endpoint"] if you want a pure type-only
mixin, or (b) implement the property to raise NotImplementedError (e.g. def
endpoints(self) -> list["Endpoint"]: raise NotImplementedError) so concrete
classes must provide it; update the symbol named endpoints accordingly.
- Around line 641-661: The type checker flags calls to optional parser methods
because BenchmarkParserProtocol doesn't declare find_aiperf_results and
parse_result_directory; fix by either using getattr with a callable check before
invoking (e.g., replace hasattr checks with m = getattr(parser,
"find_aiperf_results", None) and if callable(m): use m(...), same for
parse_result_directory) or add these methods as optional members to
BenchmarkParserProtocol (with default implementations returning empty lists) so
parse_result_json, find_aiperf_results, and parse_result_directory are
statically known; update references in the rollup logic where
parser.parse_result_json, parser.find_aiperf_results, and
parser.parse_result_directory are used.

In `@src/srtctl/README.md`:
- Around line 144-158: The fenced diagram block in the README triggers
markdownlint MD040 because it lacks a language tag; update the fenced code block
in the ROLLUP STAGE PIPELINE section (the triple-backtick block showing the
ASCII diagram) to include a language identifier (e.g., change ``` to ```text) so
the block is explicitly marked and the lint rule is satisfied.

In `@tests/test_rollup.py`:
- Around line 800-938: The test fails because analysis.srtlog modules import
pandas unconditionally (e.g., in analysis.srtlog.config_reader) causing
ImportError to be swallowed by RollupStageMixin.run_rollup; update the pandas
import to be guarded (wrap "import pandas as pd" in a try/except ImportError and
set pd = None) and ensure functions in analysis.srtlog.config_reader and any
parsers check pd is not None before using it and raise a clear error if pandas
is required, and also modify RollupStageMixin.run_rollup to not silently ignore
ImportError (detect ImportError from analysis.srtlog imports and re-raise or
surface a clear message) so tests fail fast if pandas is genuinely needed.

🧹 Nitpick comments (8)

tests/test_configs.py (1)

388-390: Extract shared setup into pytest fixtures to reduce repetition.

The new test class repeats config creation, root monkeypatching, and sbatch mocking. A small fixture/helper would keep this suite shorter and easier to maintain.

tests/test_rollup.py (1)

701-744: Consider freezing mock config dataclasses.

These mocks model configuration objects; setting frozen=True would align with the immutable config pattern used in production code. As per coding guidelines, ...
analysis/srtlog/parsers/__init__.py (2)
29-53: Consider making data-carrying dataclasses frozen for immutability.

Per coding guidelines, configuration objects should use @dataclass(frozen=True). Since BenchmarkLaunchCommand represents parsed, immutable launch command data, consider making it frozen to prevent accidental mutation after parsing.
♻️ Suggested change
-@dataclass
+@dataclass(frozen=True)
 class BenchmarkLaunchCommand:
     """Parsed benchmark launch command information."""
Note: This would require using field(default_factory=dict) replacement. If mutability is intentional for incremental field population during parsing, this can be ignored.
272-275: Wildcard imports for auto-registration are acceptable but could be more explicit.

The wildcard imports trigger parser registration as a side effect. While this works, explicit imports (e.g., from analysis.srtlog.parsers.benchmark import SABenchParser, MooncakeRouterParser) would make the registered parsers more discoverable in this file. This is a minor readability preference.
src/srtctl/cli/mixins/rollup_stage.py (4)
28-57: Consider frozen dataclass for immutable rollup data.

Similar to the parsers module, these data objects represent parsed results that shouldn't be mutated after construction. Per coding guidelines, consider @dataclass(frozen=True) for LaunchCommandRollup and other rollup data models.

245-318: Consider extracting common aggregation logic to reduce duplication.

The _aggregate_agg_batches method duplicates logic from _aggregate_prefill_batches (lines 254-287) and _aggregate_decode_batches (lines 289-317). Consider refactoring to reuse the existing methods:
♻️ Suggested refactor
     def _aggregate_agg_batches(self, batches: list) -> None:
         """Aggregate metrics for agg workers (handles both prefill and decode batches)."""
         # Separate prefill and decode batches
         prefill_batches = [b for b in batches if b.batch_type == "prefill"]
         decode_batches = [b for b in batches if b.batch_type == "decode"]

-        self.total_prefill_batches = len(prefill_batches)
-        self.total_decode_batches = len(decode_batches)
-
-        # Aggregate prefill metrics
+        # Reuse existing aggregation methods
         if prefill_batches:
-            new_tokens = []
-            # ... (duplicated prefill logic)
+            self._aggregate_prefill_batches(prefill_batches)
-
-        # Aggregate decode metrics
         if decode_batches:
-            running_reqs = []
-            # ... (duplicated decode logic)
+            self._aggregate_decode_batches(decode_batches)
773-773: Move import os to module level.

The os module import should be at the top of the file with other standard library imports, not inside the method.
♻️ Suggested change

Add to imports at top of file (around line 11):
import json
import logging
import os  # Add this
from dataclasses import asdict, dataclass, field
Then remove line 773.
952-959: Minor: Use generator expression instead of list for any().

The any() function can short-circuit with a generator, avoiding creation of an intermediate list.
♻️ Suggested change
-        if not any([
+        if not any((
             config.prefill_environment,
             config.decode_environment,
             config.aggregated_environment,
             config.prefill_engine_config,
             config.decode_engine_config,
             config.aggregated_engine_config,
-        ]):
+        )):

coderabbitai · 2026-01-22T16:49:17Z

+        try:
+            with open(log_path) as f:
+                for line in f:


⚠️ Potential issue | 🟡 Minor

Guard against UnicodeDecodeError when reading logs.

If logs contain non‑UTF8 bytes, open() can throw and drop all metrics. Use explicit encoding with errors="replace" to keep parsing resilient.

🔧 Proposed fix

- with open(log_path) as f: + with open(log_path, "r", encoding="utf-8", errors="replace") as f:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

with open(log_path) as f:

for line in f:

try:

with open(log_path, "r", encoding="utf-8", errors="replace") as f:

for line in f:

🤖 Prompt for AI Agents

In `@analysis/srtlog/parsers/nodes/sglang.py` around lines 93 - 95, The log-reading try block that opens log_path and iterates lines should guard against UnicodeDecodeError by opening the file with an explicit encoding and a replacement error handler; update the with open(log_path) call in the try block that reads lines to use encoding="utf-8" and errors="replace" so non‑UTF8 bytes are replaced instead of raising and dropping metrics (keep the rest of the loop logic unchanged and preserve the existing try/except behavior).

coderabbitai · 2026-01-22T16:49:17Z

+```
+┌─────────────────────────────────────────────────────────────────────────────────────┐
+│                              ROLLUP STAGE ARCHITECTURE                               │
+└─────────────────────────────────────────────────────────────────────────────────────┘
+


⚠️ Potential issue | 🟡 Minor

Add a language tag to this fenced block for MD040 compliance.

Markdownlint flags this block; tagging it as text is sufficient.

🛠️ Suggested change

-``` +```text

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

```

┌─────────────────────────────────────────────────────────────────────────────────────┐

│ ROLLUP STAGE ARCHITECTURE │

└─────────────────────────────────────────────────────────────────────────────────────┘

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

822-822: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@docs/architecture.md` around lines 822 - 826, The fenced code block that renders the "ROLLUP STAGE ARCHITECTURE" ASCII art is missing a language tag and triggers MD040; update the opening fence from ``` to ```text so the block is tagged as plain text (i.e., replace the triple backticks before the ASCII box with ```text) to satisfy Markdownlint. Locate the fenced block containing the "ROLLUP STAGE ARCHITECTURE" header and modify only the opening fence.

coderabbitai · 2026-01-22T16:49:17Z

+            # Run rollup to consolidate experiment data
+            if exit_code == 0:
+                tags = self.config.tags if hasattr(self.config, "tags") else []
+                self.run_rollup(tags=tags)


⚠️ Potential issue | 🟠 Major

Fix tags type narrowing to satisfy run_rollup signature (CI failure).

Current hasattr path leaves tags typed as object, triggering the CI type error. Narrow explicitly to list[str] | None.

✅ Suggested fix

- if exit_code == 0: - tags = self.config.tags if hasattr(self.config, "tags") else [] - self.run_rollup(tags=tags) + if exit_code == 0: + tags: list[str] | None = None + raw_tags = getattr(self.config, "tags", None) + if isinstance(raw_tags, list): + tags = raw_tags + self.run_rollup(tags=tags)

🧰 Tools

🪛 GitHub Actions: CI

[error] 214-214: invalid-argument-type: Argument to bound method run_rollup is incorrect. Expected list[str] | None, found object.

🤖 Prompt for AI Agents

In `@src/srtctl/cli/do_sweep.py` around lines 211 - 214, The code uses hasattr(self.config, "tags") which leaves tags typed as object and fails the run_rollup(tags: list[str] | None) signature; update the extraction to explicitly narrow to list[str] | None before calling run_rollup (e.g., check isinstance(self.config.tags, list) and verify items are str, or use typing.cast[list[str] | None](...) after validating) so that the local variable tags is typed as list[str] | None and then call self.run_rollup(tags=tags).

coderabbitai · 2026-01-22T16:49:17Z

+    @property
+    def endpoints(self) -> list["Endpoint"]:
+        """Endpoint allocation topology."""
+        ...


⚠️ Potential issue | 🔴 Critical

Fix the endpoints property that always returns None.

The property body ... (ellipsis) causes it to implicitly return None, violating the list[Endpoint] return type annotation. This triggers the pipeline error. For a mixin expecting the concrete class to provide this property, either:

Remove the implementation entirely (just declare the type hint)

Raise NotImplementedError

🔧 Proposed fix

`@property` def endpoints(self) -> list["Endpoint"]: """Endpoint allocation topology.""" - ... + raise NotImplementedError("Subclass must provide endpoints property")

Or, if using a pure type-hint approach for mixins:

# Just declare as class variable type hint without `@property` endpoints: list["Endpoint"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@property

def endpoints(self) -> list["Endpoint"]:

"""Endpoint allocation topology."""

...

`@property`

def endpoints(self) -> list["Endpoint"]:

"""Endpoint allocation topology."""

raise NotImplementedError("Subclass must provide endpoints property")

🧰 Tools

🪛 GitHub Actions: CI

[error] 562-562: invalid-return-type: Function always implicitly returns None, which is not assignable to return type list[Endpoint].

🤖 Prompt for AI Agents

In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 561 - 564, The endpoints property currently has an ellipsis body which returns None and violates the declared return type list["Endpoint"]; replace the ellipsis with either (a) remove the `@property` method and declare a class-level type hint endpoints: list["Endpoint"] if you want a pure type-only mixin, or (b) implement the property to raise NotImplementedError (e.g. def endpoints(self) -> list["Endpoint"]: raise NotImplementedError) so concrete classes must provide it; update the symbol named endpoints accordingly.

coderabbitai · 2026-01-22T16:49:17Z

+            # Try parser-specific result collection first
+            if parser is not None:
+                # For mooncake-router, look for AIPerf results
+                if hasattr(parser, "find_aiperf_results"):
+                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                    for aiperf_path in aiperf_files:
+                        result = parser.parse_result_json(aiperf_path)
+                        if result.get("output_tps") is not None:
+                            results.append(result)
+                            logger.debug("Loaded AIPerf result: %s", aiperf_path)
+
+                # For sa-bench style, look for result directories
+                if hasattr(parser, "parse_result_directory"):
+                    for entry in self.runtime.log_dir.iterdir():
+                        if not entry.is_dir():
+                            continue
+                        # Match patterns like sa-bench_isl_X_osl_Y
+                        if "_isl_" in entry.name and "_osl_" in entry.name:
+                            logger.debug("Found benchmark results directory: %s", entry.name)
+                            dir_results = parser.parse_result_directory(entry)
+                            results.extend(dir_results)


⚠️ Potential issue | 🟠 Major

Fix type errors for optional parser methods not in protocol.

The type checker reports call-non-callable because find_aiperf_results and parse_result_directory are not defined in BenchmarkParserProtocol. While the hasattr checks make this runtime-safe, the type checker cannot verify this.

🔧 Proposed fix using getattr

# Try parser-specific result collection first if parser is not None: # For mooncake-router, look for AIPerf results - if hasattr(parser, "find_aiperf_results"): - aiperf_files = parser.find_aiperf_results(self.runtime.log_dir) + find_aiperf = getattr(parser, "find_aiperf_results", None) + if find_aiperf is not None: + aiperf_files = find_aiperf(self.runtime.log_dir) for aiperf_path in aiperf_files: result = parser.parse_result_json(aiperf_path) if result.get("output_tps") is not None: results.append(result) logger.debug("Loaded AIPerf result: %s", aiperf_path) # For sa-bench style, look for result directories - if hasattr(parser, "parse_result_directory"): + parse_result_dir = getattr(parser, "parse_result_directory", None) + if parse_result_dir is not None: for entry in self.runtime.log_dir.iterdir(): if not entry.is_dir(): continue # Match patterns like sa-bench_isl_X_osl_Y if "_isl_" in entry.name and "_osl_" in entry.name: logger.debug("Found benchmark results directory: %s", entry.name) - dir_results = parser.parse_result_directory(entry) + dir_results = parse_result_dir(entry) results.extend(dir_results)

Alternatively, add these methods as optional (with default implementations returning empty results) to BenchmarkParserProtocol.

🧰 Tools

🪛 GitHub Actions: CI

[error] 645-645: call-non-callable: Object of type object is not callable.

[error] 660-660: call-non-callable: Object of type object is not callable.

🤖 Prompt for AI Agents

In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 641 - 661, The type checker flags calls to optional parser methods because BenchmarkParserProtocol doesn't declare find_aiperf_results and parse_result_directory; fix by either using getattr with a callable check before invoking (e.g., replace hasattr checks with m = getattr(parser, "find_aiperf_results", None) and if callable(m): use m(...), same for parse_result_directory) or add these methods as optional members to BenchmarkParserProtocol (with default implementations returning empty lists) so parse_result_json, find_aiperf_results, and parse_result_directory are statically known; update references in the rollup logic where parser.parse_result_json, parser.find_aiperf_results, and parser.parse_result_directory are used.

coderabbitai · 2026-01-22T16:49:17Z

+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                           ROLLUP STAGE PIPELINE                             │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  INPUT FILES                          OUTPUT                                │
+│  ───────────                          ──────                                │
+│  • benchmark results (*.json)    ──┐                                        │
+│  • worker logs (*.out/*.err)     ──┼──►  rollup.json                        │
+│  • config.yaml                   ──┤     • benchmark results                │
+│  • engine configs (*.yaml)       ──┘     • node metrics                     │
+│                                          • environment config               │
+│                                          • summary statistics               │
+└─────────────────────────────────────────────────────────────────────────────┘
+```


⚠️ Potential issue | 🟡 Minor

Add language to fenced block to satisfy markdownlint (MD040).

🔧 Proposed fix

-``` +```text ┌─────────────────────────────────────────────────────────────────────────────┐ │ ROLLUP STAGE PIPELINE │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ INPUT FILES OUTPUT │ │ ─────────── ────── │ │ • benchmark results (*.json) ──┐ │ │ • worker logs (*.out/*.err) ──┼──► rollup.json │ │ • config.yaml ──┤ • benchmark results │ │ • engine configs (*.yaml) ──┘ • node metrics │ │ • environment config │ │ • summary statistics │ └─────────────────────────────────────────────────────────────────────────────┘

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

```

┌─────────────────────────────────────────────────────────────────────────────┐

│ ROLLUP STAGE PIPELINE │

├─────────────────────────────────────────────────────────────────────────────┤

│ │

│ INPUT FILES OUTPUT │

│ ─────────── ────── │

│ • benchmark results (*.json) ──┐ │

│ • worker logs (*.out/*.err) ──┼──► rollup.json │

│ • config.yaml ──┤ • benchmark results │

│ • engine configs (*.yaml) ──┘ • node metrics │

│ • environment config │

│ • summary statistics │

└─────────────────────────────────────────────────────────────────────────────┘

```

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

144-144: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@src/srtctl/README.md` around lines 144 - 158, The fenced diagram block in the README triggers markdownlint MD040 because it lacks a language tag; update the fenced code block in the ROLLUP STAGE PIPELINE section (the triple-backtick block showing the ASCII diagram) to include a language identifier (e.g., change ``` to ```text) so the block is explicitly marked and the lint rule is satisfied.

coderabbitai · 2026-01-22T16:49:18Z

+    def test_rollup_with_node_logs(self, tmp_path):
+        """Test rollup with actual node log files parsed by NodeAnalyzer."""
+        from dataclasses import dataclass, field
+
+        # Create mock benchmark results
+        bench_dir = tmp_path / "sa-bench_isl_1024_osl_1024"
+        bench_dir.mkdir()
+
+        result_file = bench_dir / "result_c100.json"
+        result_file.write_text(
+            json.dumps(
+                {
+                    "max_concurrency": 100,
+                    "output_throughput": 5000.0,
+                    "mean_ttft_ms": 150.0,
+                    "mean_itl_ms": 20.0,
+                }
+            )
+        )
+
+        # Create mock prefill log file (matches NodeAnalyzer expected format)
+        prefill_log = tmp_path / "node-01_prefill_w0.err"
+        prefill_log.write_text(
+            """[2025-01-22 10:00:00 DP0 TP0 EP0] Prefill batch, #new-seq: 10, #new-token: 1024, #cached-token: 256, token usage: 0.50, #running-req: 5, #queue-req: 2, #prealloc-req: 0, #inflight-req: 3, input throughput (token/s): 5000.00,
+[2025-01-22 10:00:01 DP0 TP0 EP0] Load weight end. type=DeepseekV3ForCausalLM, dtype=torch.bfloat16, avail mem=75.11 GB, mem usage=107.07 GB.
+[2025-01-22 10:00:02 DP0 TP0 EP0] KV Cache is allocated. #tokens: 524288, KV size: 17.16 GB
+"""
+        )
+
+        # Create mock decode log file
+        decode_log = tmp_path / "node-02_decode_w0.err"
+        decode_log.write_text(
+            """[2025-01-22 10:00:00 DP31 TP31 EP31] Decode batch, #running-req: 50, #token: 5000, token usage: 0.50, pre-allocated usage: 0.10, #prealloc-req: 2, #transfer-req: 1, #queue-req: 3, gen throughput (token/s): 150.00,
+"""
+        )
+
+        # Mock config classes
+        @dataclass
+        class MockResourceConfig:
+            is_disaggregated: bool = True
+            prefill_gpus: int = 8
+            decode_gpus: int = 8
+            agg_nodes: int | None = None
+            gpus_per_node: int = 8
+            gpu_type: str = "B200"
+            total_nodes: int = 2
+            prefill_nodes: int = 1
+            decode_nodes: int = 1
+            num_prefill: int = 1
+            num_decode: int = 1
+            num_agg: int | None = None
+
+        @dataclass
+        class MockBenchmarkConfig:
+            type: str = "sa-bench"
+            isl: int = 1024
+            osl: int = 1024
+            concurrencies: str = "100"
+
+            def get_concurrency_list(self):
+                return [100]
+
+        @dataclass
+        class MockModelConfig:
+            precision: str = "fp8"
+
+        @dataclass
+        class MockFrontendConfig:
+            type: str = "sglang"
+
+        @dataclass
+        class MockConfig:
+            name: str = "test-job"
+            served_model_name: str = "deepseek-v3"
+            backend_type: str = "sglang"
+            resources: MockResourceConfig = field(default_factory=MockResourceConfig)
+            benchmark: MockBenchmarkConfig = field(default_factory=MockBenchmarkConfig)
+            model: MockModelConfig = field(default_factory=MockModelConfig)
+            frontend: MockFrontendConfig = field(default_factory=MockFrontendConfig)
+
+        @dataclass
+        class MockRuntime:
+            job_id: str = "12345"
+            log_dir: Path = field(default_factory=Path)
+            model_path: Path = field(default_factory=lambda: Path("/models/deepseek-v3"))
+
+        class MockOrchestrator(RollupStageMixin):
+            def __init__(self, config, runtime):
+                self._config = config
+                self._runtime = runtime
+
+            @property
+            def config(self):
+                return self._config
+
+            @property
+            def runtime(self):
+                return self._runtime
+
+            @property
+            def endpoints(self):
+                return []
+
+        config = MockConfig()
+        runtime = MockRuntime(log_dir=tmp_path)
+        orchestrator = MockOrchestrator(config, runtime)
+
+        # Run rollup
+        rollup_path = orchestrator.run_rollup(tags=["node-test"])
+
+        assert rollup_path is not None
+        assert rollup_path.exists()
+
+        with open(rollup_path) as f:
+            data = json.load(f)
+
+        # Verify node summary is present
+        assert data["nodes_summary"] is not None
+        nodes_summary = data["nodes_summary"]
+
+        assert nodes_summary["total_prefill_nodes"] == 1
+        assert nodes_summary["total_decode_nodes"] == 1
+        assert len(nodes_summary["nodes"]) == 2
+
+        # Find prefill and decode nodes
+        prefill_node = next((n for n in nodes_summary["nodes"] if n["worker_type"] == "prefill"), None)
+        decode_node = next((n for n in nodes_summary["nodes"] if n["worker_type"] == "decode"), None)
+
+        assert prefill_node is not None
+        assert prefill_node["node_name"] == "node-01"
+        assert prefill_node["total_new_tokens"] == 1024
+        assert prefill_node["total_cached_tokens"] == 256
+        assert prefill_node["kv_cache_gb"] == 17.16
+
+        assert decode_node is not None
+        assert decode_node["node_name"] == "node-02"
+        assert decode_node["max_running_requests"] == 50
+        assert decode_node["avg_gen_throughput"] == 150.0
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check for pandas in declared dependencies rg -n "pandas" pyproject.toml setup.py setup.cfg 2>/dev/null | head -20

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Look for parser imports and NodeAnalyzer usage rg -n "srtlog\.parsers|NodeAnalyzer|from analysis" tests/test_rollup.py -A 2

Repository: ishandhanani/srt-slurm

Length of output: 940

🏁 Script executed:

# Check where RollupStageMixin is defined and how it uses parsers rg -n "class RollupStageMixin|def run_rollup" -g "*.py" -A 5

Repository: ishandhanani/srt-slurm

Length of output: 955

🏁 Script executed:

# Check pyproject.toml more thoroughly for dependencies cat pyproject.toml | grep -A 50 "dependencies\|optional-dependencies"

Repository: ishandhanani/srt-slurm

Length of output: 2208

🏁 Script executed:

# Get the full run_rollup implementation sed -n '566,650p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 3361

🏁 Script executed:

# Find NodeAnalyzer class and its imports rg -n "class NodeAnalyzer|from.*NodeAnalyzer" -g "*.py"

Repository: ishandhanani/srt-slurm

Length of output: 487

🏁 Script executed:

# Find _collect_node_metrics implementation rg -n "_collect_node_metrics" src/srtctl/cli/mixins/rollup_stage.py -A 30

Repository: ishandhanani/srt-slurm

Length of output: 2757

🏁 Script executed:

# Check NodeAnalyzer class and its imports sed -n '1,100p' analysis/srtlog/log_parser.py

Repository: ishandhanani/srt-slurm

Length of output: 3549

🏁 Script executed:

# Search for pandas imports in analysis module rg -n "import pandas|from pandas" analysis/ -g "*.py"

Repository: ishandhanani/srt-slurm

Length of output: 513

🏁 Script executed:

# Check if analysis is a separate package or part of main package ls -la analysis/ 2>/dev/null | head -20

Repository: ishandhanani/srt-slurm

Length of output: 352

🏁 Script executed:

# Check if there's a separate setup for analysis or if it's optional rg -n "analysis" pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 85

🏁 Script executed:

# Look for any optional-dependencies section more completely cat pyproject.toml | grep -B5 -A20 "optional-dependencies"

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Verify if parsers module exists and what it imports head -50 analysis/srtlog/parsers/__init__.py 2>/dev/null || echo "Check if parsers is a directory"

Repository: ishandhanani/srt-slurm

Length of output: 1465

🏁 Script executed:

# Check if analysis is properly included as a package in pyproject.toml grep -A 10 "packages\|include" pyproject.toml | head -20

Repository: ishandhanani/srt-slurm

Length of output: 426

🏁 Script executed:

# Verify the full parsers __init__ to see how get_node_parser works cat analysis/srtlog/parsers/__init__.py

Repository: ishandhanani/srt-slurm

Length of output: 7682

🏁 Script executed:

# Check if there are any try-except guards around pandas imports rg -n "try:|except ImportError|importorskip" analysis/srtlog/log_parser.py -B 2 -A 2

Repository: ishandhanani/srt-slurm

Length of output: 188

🏁 Script executed:

# Check if analysis has its own package configuration ls -la analysis/setup.py analysis/pyproject.toml 2>&1

Repository: ishandhanani/srt-slurm

Length of output: 201

🏁 Script executed:

# Check how analysis module is meant to be installed/used cat analysis/__init__.py

Repository: ishandhanani/srt-slurm

Length of output: 235

🏁 Script executed:

# Verify pandas is imported unconditionally at module level in log_parser sed -n '1,30p' analysis/srtlog/log_parser.py | grep -n "import\|pandas"

Repository: ishandhanani/srt-slurm

Length of output: 172

🏁 Script executed:

# Check if test currently passes or fails by looking for CI configuration find . -name "*.yml" -o -name "*.yaml" | head -10 | xargs grep -l "test_rollup_with_node_logs" 2>/dev/null || echo "No CI config found mentioning the test"

Repository: ishandhanani/srt-slurm

Length of output: 104

🏁 Script executed:

# Check if pandas is conditionally imported anywhere in analysis/srtlog rg -n "try:|except.*ImportError" analysis/srtlog/ -B 1 -A 3

Repository: ishandhanani/srt-slurm

Length of output: 9916

🏁 Script executed:

# Verify what will happen when the test runs without pandas # Check the full import chain: parsers -> nodes -> imports head -50 analysis/srtlog/parsers/nodes.py 2>/dev/null | grep -n "import\|pandas"

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Check if there's a way test could run successfully without analysis module rg -n "import pytest" tests/test_rollup.py | head -5

Repository: ishandhanani/srt-slurm

Length of output: 81

🏁 Script executed:

# Check which parsers are imported when parsers/__init__.py is loaded sed -n '1,20p' analysis/srtlog/parsers/benchmark/__init__.py 2>/dev/null || echo "Check parsers submodules"

Repository: ishandhanani/srt-slurm

Length of output: 442

🏁 Script executed:

# Verify if analysis module is included in the package grep -r "analysis" pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 82

🏁 Script executed:

# Try to understand the actual import error that would occur python3 -c "import sys; sys.path.insert(0, '.'); from analysis.srtlog.parsers import get_node_parser" 2>&1 | head -20

Repository: ishandhanani/srt-slurm

Length of output: 402

🏁 Script executed:

# Verify if pytest.importorskip would catch the pandas import error python3 << 'EOF' import sys sys.path.insert(0, '.') # Simulate what pytest.importorskip would do try: __import__("analysis.srtlog.parsers") print("Import successful") except ImportError as e: print(f"ImportError caught: {e}") EOF

Repository: ishandhanani/srt-slurm

Length of output: 110

🏁 Script executed:

# Check the exact location of the test to confirm line numbers grep -n "def test_rollup_with_node_logs" tests/test_rollup.py

Repository: ishandhanani/srt-slurm

Length of output: 121

🏁 Script executed:

# Verify if analysis should be optional or required git log --oneline analysis/srtlog/__init__.py 2>/dev/null | head -5 || echo "Git history not available"

Repository: ishandhanani/srt-slurm

Length of output: 86

Test requires pandas dependency which is not declared.

The test calls orchestrator.run_rollup(), which attempts to import analysis.srtlog.parsers. This import chain loads analysis.srtlog modules that unconditionally import pandas (e.g., config_reader.py line 10), causing an ImportError when pandas is unavailable. The exception is caught silently in run_rollup() (line 611-613), causing nodes_summary to be None and subsequent assertions to fail.

Add the import guard or declare pandas as a dependency:

Import guard approach

def test_rollup_with_node_logs(self, tmp_path): """Test rollup with actual node log files parsed by NodeAnalyzer.""" + pytest.importorskip("pandas") from dataclasses import dataclass, field

🧰 Tools

🪛 GitHub Actions: CI

[error] 917-917: Rollup test failure: node metrics unavailable due to missing pandas backend / analysis.srtlog.parsers not available, causing assertions like 'data["nodes_summary"] is not None' to fail.

🤖 Prompt for AI Agents

In `@tests/test_rollup.py` around lines 800 - 938, The test fails because analysis.srtlog modules import pandas unconditionally (e.g., in analysis.srtlog.config_reader) causing ImportError to be swallowed by RollupStageMixin.run_rollup; update the pandas import to be guarded (wrap "import pandas as pd" in a try/except ImportError and set pd = None) and ensure functions in analysis.srtlog.config_reader and any parsers check pd is not None before using it and raise a clear error if pandas is required, and also modify RollupStageMixin.run_rollup to not silently ignore ImportError (detect ImportError from analysis.srtlog imports and re-raise or surface a clear message) so tests fail fast if pandas is genuinely needed.

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@pyproject.toml`:
- Line 19: Move "pandas>=2.3.3" out of the core dependencies in pyproject.toml
and add it under an optional extras group named "analysis" by creating a
[project.optional-dependencies] section with analysis = ["pandas>=2.3.3"];
update any packaging metadata that reads dependencies to ensure pandas is no
longer listed as a required dependency for the CLI entry points (srtctl,
srtctl-i, srtctl-sweep, srtctl-setup-head), and add a short note to the README
instructing users who need the analysis/benchmark scripts to install with pip
install srtctl[analysis].

In `@src/srtctl/benchmarks/scripts/longbenchv2/bench.sh`:
- Around line 31-43: The current string-based variable "command" is unsafe
because NUM_EXAMPLES and CATEGORIES can contain spaces/shell tokens and you use
eval; instead build the invocation as an array (e.g., an array variable such as
cmd) starting with the python module and required args (ENDPOINT, MODEL_NAME,
MAX_TOKENS, MAX_CONTEXT_LENGTH, NUM_THREADS) and conditionally append
NUM_EXAMPLES and CATEGORIES as separate array elements only when non-empty;
finally echo a joined representation for logging and run the command using array
execution (array expansion) rather than eval to avoid word-splitting and shell
injection.

In `@src/srtctl/benchmarks/scripts/mooncake-router/bench.sh`:
- Around line 60-62: The script builds user-controlled shell strings into the
variable named command and then runs them with eval (e.g., the lines setting
command and eval $command and the similar block around lines 79-81); replace
that pattern with a safe array-style invocation: construct the command as an
array where each token is a separate element and interpolate variables as quoted
expansions (e.g., cmd=(aiperf profile -m "$MODEL_NAME" --url "$ENDPOINT"
--streaming --ui simple --concurrency 10 --request-count 20)) and then execute
with "${cmd[@]}", applying the same change to the second command block so spaces
and injections are prevented.

In `@src/srtctl/benchmarks/scripts/profiling/profile.sh`:
- Around line 133-136: The script builds a shell command string in the variable
named command and then runs it with eval, which is unsafe for env-provided
PROFILE_* and PROFILING_MODE values; instead construct the argument list as an
array (e.g., cmd=(python3 -m sglang.bench_serving --backend sglang --model
"$model_name" ...)) and run it with "${cmd[@]}" to avoid word-splitting and
shell injection; use printf or echo to log the command safely (e.g., printf '%q
' "${cmd[@]}"; printf '\n') before executing, and apply the same array+printf
replacement for the other eval usages noted around the block that covers lines
142-145.

In `@src/srtctl/benchmarks/scripts/router/bench.sh`:
- Around line 42-45: The script currently builds a single string in the variable
command and uses eval, which permits shell metacharacters in user-controlled
variables (PREFIX_RATIOS, ISL, OSL, REQUESTS, CONCURRENCY) to be interpreted;
replace this with a bash array for safe argument handling: construct an array
named cmd starting with the interpreter and script (e.g., python
prefix_ratio_benchmark.py) and push each flag and its value as separate elements
using quoted variable expansions (e.g., "--prefix-ratios" "$PREFIX_RATIOS"),
then execute the command with "${cmd[@]}" and remove the eval and the SC2086
disable. Ensure result_dir is passed as the value of "--output-dir" similarly
and keep the echo of the command by joining the array for logging if needed.

In `@src/srtctl/benchmarks/scripts/sa-bench/bench.sh`:
- Around line 53-58: The loop builds a shell command string and uses eval which
is unsafe for untrusted values (MODEL_NAME, HOST, concurrency); instead
construct the command as an array and execute it without eval: replace the
string-assigned command and eval with an array like cmd=(python3 -u
"${WORK_DIR}/benchmark_serving.py" --model "${MODEL_NAME}" --tokenizer
"${MODEL_PATH}" --host "${HOST}" --port "${PORT}" --backend dynamo --endpoint
/v1/completions --disable-tqdm --dataset-name random --num-prompts
"${num_prompts}" --random-input-len "${ISL}" --random-output-len "${OSL}"
--random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics
ttft,tpot,itl,e2el --max-concurrency "${concurrency}") and run it via
"${cmd[@]}", ensuring every variable is quoted; apply the same replacement for
the similar block referenced at lines 77-80.

In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 20-26: Tests import dataclasses like NodeRollup and NodesSummary
from rollup_stage but this module only imports them without re-exporting; update
rollup_stage to re-export the needed dataclasses (e.g., NodeRollup,
NodesSummary, EnvironmentConfig, LaunchCommandRollup, RollupResult,
RollupSummary) by importing them from srtctl.cli.mixins.rollup and exposing them
in the module namespace (add explicit imports and/or populate __all__ with these
symbol names) so tests can import them from rollup_stage.

In `@tests/test_rollup.py`:
- Around line 11-17: The tests import NodeRollup (and related classes) from
rollup_stage but NodeRollup isn't re-exported there; update the import to pull
NodeRollup, NodesSummary, RollupResult, RollupSummary directly from the rollup
module (or alternatively add re-export statements in rollup_stage to export
those symbols) so the import in tests/test_rollup.py resolves; target the
identifiers NodeRollup, NodesSummary, RollupResult, RollupSummary and keep
RollupStageMixin import from rollup_stage if it is defined there.

♻️ Duplicate comments (6)

analysis/srtlog/parsers/nodes/sglang.py (1)
98-100: Guard against UnicodeDecodeError when reading logs.

If logs contain non-UTF8 bytes, open() can throw and drop all metrics. Use explicit encoding with errors="replace" to keep parsing resilient. Note: The TRTLLM parser at Line 97 uses log_path.read_text(errors="replace") which handles this correctly.
🔧 Proposed fix
-            with open(log_path) as f:
+            with open(log_path, encoding="utf-8", errors="replace") as f:
src/srtctl/README.md (1)
144-158: Add language to fenced block for markdownlint compliance (MD040).

The ASCII diagram block is missing a language tag.
🔧 Proposed fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────────────────────┐
docs/architecture.md (1)
822-826: Add language tag to fenced block for MD040 compliance.

The ASCII architecture diagram block is missing a language tag.
🔧 Proposed fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────────────────────────────┐
tests/test_rollup.py (1)
800-938: Add pytest.importorskip("pandas") guard for pandas-dependent test.

This test exercises run_rollup() which may import analysis.srtlog modules that depend on pandas. Without the guard, the test will fail silently (assertions fail because nodes_summary is None) rather than being skipped when pandas is unavailable.
🔧 Proposed fix
     def test_rollup_with_node_logs(self, tmp_path):
         """Test rollup with actual node log files parsed by NodeAnalyzer."""
+        pytest.importorskip("pandas")
         from dataclasses import dataclass, field
src/srtctl/cli/mixins/rollup_stage.py (2)
49-52: Fix endpoints property that always returns None.

The property body ... (ellipsis) causes it to implicitly return None, violating the list[Endpoint] return type and causing the pipeline type error.
🔧 Proposed fix
     `@property`
     def endpoints(self) -> list["Endpoint"]:
         """Endpoint allocation topology."""
-        ...
+        raise NotImplementedError("Subclass must provide endpoints property")
129-149: Fix type errors for optional parser methods.

The type checker reports call-non-callable because find_aiperf_results and parse_result_directory are not defined in the protocol. Use getattr with callable check to satisfy the type checker.
🔧 Proposed fix
             # Try parser-specific result collection first
             if parser is not None:
                 # For mooncake-router, look for AIPerf results
-                if hasattr(parser, "find_aiperf_results"):
-                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                find_aiperf = getattr(parser, "find_aiperf_results", None)
+                if callable(find_aiperf):
+                    aiperf_files = find_aiperf(self.runtime.log_dir)
                     for aiperf_path in aiperf_files:
                         result = parser.parse_result_json(aiperf_path)
                         if result.get("output_tps") is not None:
                             results.append(result)
                             logger.debug("Loaded AIPerf result: %s", aiperf_path)

                 # For sa-bench style, look for result directories
-                if hasattr(parser, "parse_result_directory"):
+                parse_result_dir = getattr(parser, "parse_result_directory", None)
+                if callable(parse_result_dir):
                     for entry in self.runtime.log_dir.iterdir():
                         if not entry.is_dir():
                             continue
                         # Match patterns like sa-bench_isl_X_osl_Y
                         if "_isl_" in entry.name and "_osl_" in entry.name:
                             logger.debug("Found benchmark results directory: %s", entry.name)
-                            dir_results = parser.parse_result_directory(entry)
+                            dir_results = parse_result_dir(entry)
                             results.extend(dir_results)

🧹 Nitpick comments (11)

src/srtctl/benchmarks/scripts/mmlu/bench.sh (1)
29-31: Prefer direct execution over eval, or quote the variable.

Using eval $command without quotes can cause word-splitting issues if any variable (especially ENDPOINT) contains spaces or shell metacharacters. Two options:

Preferred: Execute directly without eval:
echo "[CMD] python3 -m sglang.test.run_eval --base-url ${ENDPOINT} ..."
python3 -m sglang.test.run_eval --base-url "${ENDPOINT}" --model "${MODEL_NAME}" \
    --eval-name mmlu --num-examples "${NUM_EXAMPLES}" --max-tokens "${MAX_TOKENS}" \
    --repeat "${REPEAT}" --num-threads "${NUM_THREADS}"
Minimal fix: Quote the eval argument:
-eval $command
+eval "$command"
src/srtctl/benchmarks/scripts/gpqa/bench.sh (1)
29-31: Quote the command variable in eval for robustness.

While the current variables appear safe, unquoted $command in eval can cause word-splitting issues if any variable (e.g., MODEL_NAME) contains spaces or special characters.
🔧 Suggested fix
 command="python3 -m sglang.test.run_eval --base-url ${ENDPOINT} --model ${MODEL_NAME} --eval-name gpqa --num-examples ${NUM_EXAMPLES} --max-tokens ${MAX_TOKENS} --repeat ${REPEAT} --num-threads ${NUM_THREADS}"
 echo "[CMD] $command"
-eval $command
+eval "$command"
analysis/srtlog/parsers/benchmark/sa_bench.py (1)
101-103: Add explicit encoding when reading JSON files.

For consistency with other parsers and to handle potential encoding issues, specify the encoding explicitly.
🔧 Suggested fix
-            with open(json_path) as f:
+            with open(json_path, encoding="utf-8") as f:
analysis/srtlog/parsers/benchmark/mooncake_router.py (1)
92-94: Add explicit encoding when reading JSON files.

For consistency and to handle potential encoding issues, specify the encoding explicitly.
🔧 Suggested fix
-            with open(json_path) as f:
+            with open(json_path, encoding="utf-8") as f:
analysis/srtlog/parsers/nodes/trtllm.py (1)

343-359: Consider extracting shared filename parsing logic.

The _extract_node_info_from_filename method is identical to the one in analysis/srtlog/parsers/nodes/sglang.py (lines 305-321). Consider extracting this to a shared utility to reduce duplication.
analysis/srtlog/rollup_harness.py (2)
27-28: Module-level basicConfig may affect global logging state.

Calling logging.basicConfig at module import time can interfere with the application's logging configuration if this module is imported as a library rather than run as a script. Consider moving this inside main() or using a conditional guard.
🔧 Suggested fix
-logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
 logger = logging.getLogger(__name__)
+
+
+def _setup_logging(verbose: bool = False) -> None:
+    """Configure logging for the harness."""
+    level = logging.DEBUG if verbose else logging.INFO
+    logging.basicConfig(level=level, format="%(asctime)s - %(levelname)s - %(message)s")
Then call _setup_logging(args.verbose) in main() before processing.
63-66: Add explicit encoding when reading YAML files.

For consistency and to handle potential encoding issues across different environments, specify the encoding explicitly.
🔧 Suggested fix
-        with open(config_path) as f:
+        with open(config_path, encoding="utf-8") as f:
             return yaml.safe_load(f)
Apply the same fix to lines 292 and 295.
src/srtctl/cli/mixins/rollup/models.py (3)
133-138: Misleading comment about kv_cache aggregation.

The comment mentions "(or sum for multiple allocations)" but the code only takes the max. Either update the comment to match the implementation or implement the sum logic if that's the intended behavior.
🔧 Proposed fix
                 if mem.kv_cache_gb is not None:
-                    # Take the max kv_cache seen (or sum for multiple allocations)
+                    # Take the max kv_cache seen
                     if rollup.kv_cache_gb is None:
                         rollup.kv_cache_gb = mem.kv_cache_gb
                     else:
                         rollup.kv_cache_gb = max(rollup.kv_cache_gb, mem.kv_cache_gb)
236-308: Code duplication in aggregation methods.

_aggregate_agg_batches duplicates logic from _aggregate_prefill_batches and _aggregate_decode_batches. Consider refactoring to reuse the existing methods for better maintainability.
♻️ Optional refactor
     def _aggregate_agg_batches(self, batches: list) -> None:
         """Aggregate metrics for agg workers (handles both prefill and decode batches)."""
         # Separate prefill and decode batches
         prefill_batches = [b for b in batches if b.batch_type == "prefill"]
         decode_batches = [b for b in batches if b.batch_type == "decode"]

         self.total_prefill_batches = len(prefill_batches)
         self.total_decode_batches = len(decode_batches)

-        # Aggregate prefill metrics
+        # Reuse existing aggregation methods
         if prefill_batches:
-            new_tokens = []
-            cached_tokens = []
-            # ... 30+ lines of duplicated logic
-            ...
-
-        # Aggregate decode metrics
-        if decode_batches:
-            # ... 20+ lines of duplicated logic
-            ...
+            # Temporarily store current batch count
+            saved_total = self.total_prefill_batches
+            self._aggregate_prefill_batches(prefill_batches)
+            self.total_prefill_batches = saved_total
+
+        if decode_batches:
+            saved_total = self.total_decode_batches
+            self._aggregate_decode_batches(decode_batches)
+            self.total_decode_batches = saved_total
416-416: Consider documenting the mixed type for request_rate.

The type float | str | None is unusual. Consider adding a docstring to clarify when this is a float vs a string (e.g., numeric rate vs identifier like "c100").
src/srtctl/cli/mixins/rollup_stage.py (1)
99-101: Consider logging the full stack trace for debugging.

The broad exception catch logs only the error message. For debugging complex rollup failures, consider logging the full traceback at DEBUG level.
🔧 Proposed enhancement
         except Exception as e:
-            logger.error("Rollup failed: %s", e)
+            logger.error("Rollup failed: %s", e)
+            logger.debug("Rollup traceback:", exc_info=True)
             return None

coderabbitai · 2026-01-22T18:17:41Z

    "requests>=2.31.0",
    "rich>=13.0.0",
    "questionary>=2.0.0",
+    "pandas>=2.3.3",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all pandas imports and their locations rg -n "import pandas|from pandas" -S --type-list

Repository: ishandhanani/srt-slurm

Length of output: 455

🏁 Script executed:

#!/bin/bash # Get a broader view of directory structure to understand src vs tests layout git ls-files | head -30

Repository: ishandhanani/srt-slurm

Length of output: 953

🏁 Script executed:

#!/bin/bash # Check the pyproject.toml structure to understand dependency grouping cat -n pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 3324

🏁 Script executed:

#!/bin/bash # Find all pandas imports rg -n "import pandas|from pandas" --type py

Repository: ishandhanani/srt-slurm

Length of output: 596

🏁 Script executed:

#!/bin/bash # Also check the analysis directory structure more carefully find analysis -type f -name "*.py" | head -20

Repository: ishandhanani/srt-slurm

Length of output: 735

🏁 Script executed:

#!/bin/bash # Check if pandas is imported in the CLI modules rg -n "import pandas|from pandas" src/srtctl/cli/

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Check the structure of src/srtctl to understand what is core ls -la src/srtctl/

Repository: ishandhanani/srt-slurm

Length of output: 706

🏁 Script executed:

#!/bin/bash # Check if there are any optional dependencies defined in pyproject.toml grep -n "optional\|extras" pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 48

Move pandas to optional dependencies; it is not required by the core CLI.

Pandas is only used in the analysis scripts and benchmark serving tools, not by any of the core CLI entry points (srtctl, srtctl-i, srtctl-sweep, srtctl-setup-head). Adding it to core dependencies unnecessarily increases install size for users who only need the CLI. Move it to an optional dependency group:

[project.optional-dependencies] analysis = ["pandas>=2.3.3"]

Then document in the README that users running analysis tools should install with pip install srtctl[analysis].

🤖 Prompt for AI Agents

In `@pyproject.toml` at line 19, Move "pandas>=2.3.3" out of the core dependencies in pyproject.toml and add it under an optional extras group named "analysis" by creating a [project.optional-dependencies] section with analysis = ["pandas>=2.3.3"]; update any packaging metadata that reads dependencies to ensure pandas is no longer listed as a required dependency for the CLI entry points (srtctl, srtctl-i, srtctl-sweep, srtctl-setup-head), and add a short note to the README instructing users who need the analysis/benchmark scripts to install with pip install srtctl[analysis].

coderabbitai · 2026-01-22T18:17:41Z

+command="python3 -m sglang.test.run_eval --base-url ${ENDPOINT} --model ${MODEL_NAME} --eval-name longbench_v2 --max-tokens ${MAX_TOKENS} --max-context-length ${MAX_CONTEXT_LENGTH} --num-threads ${NUM_THREADS}"

 # Add optional arguments
 if [ -n "$NUM_EXAMPLES" ]; then
-    cmd="$cmd --num-examples ${NUM_EXAMPLES}"
+    command="$command --num-examples ${NUM_EXAMPLES}"
 fi

 if [ -n "$CATEGORIES" ]; then
-    cmd="$cmd --categories ${CATEGORIES}"
+    command="$command --categories ${CATEGORIES}"
 fi

-echo "Executing: $cmd"
-eval "$cmd"
+echo "[CMD] $command"
+eval $command


⚠️ Potential issue | 🟠 Major

Replace eval with an array-based command.
NUM_EXAMPLES/CATEGORIES may include spaces or shell tokens, and eval executes them.

✅ Safer, structured command building

-command="python3 -m sglang.test.run_eval --base-url ${ENDPOINT} --model ${MODEL_NAME} --eval-name longbench_v2 --max-tokens ${MAX_TOKENS} --max-context-length ${MAX_CONTEXT_LENGTH} --num-threads ${NUM_THREADS}" +command=( + python3 -m sglang.test.run_eval --base-url "${ENDPOINT}" --model "${MODEL_NAME}" + --eval-name longbench_v2 --max-tokens "${MAX_TOKENS}" + --max-context-length "${MAX_CONTEXT_LENGTH}" --num-threads "${NUM_THREADS}" +) if [ -n "$NUM_EXAMPLES" ]; then - command="$command --num-examples ${NUM_EXAMPLES}" + command+=(--num-examples "${NUM_EXAMPLES}") fi if [ -n "$CATEGORIES" ]; then - command="$command --categories ${CATEGORIES}" + command+=(--categories "${CATEGORIES}") fi -echo "[CMD] $command" -eval $command +printf '[CMD] %q ' "${command[@]}"; echo +"${command[@]}"

🤖 Prompt for AI Agents

In `@src/srtctl/benchmarks/scripts/longbenchv2/bench.sh` around lines 31 - 43, The current string-based variable "command" is unsafe because NUM_EXAMPLES and CATEGORIES can contain spaces/shell tokens and you use eval; instead build the invocation as an array (e.g., an array variable such as cmd) starting with the python module and required args (ENDPOINT, MODEL_NAME, MAX_TOKENS, MAX_CONTEXT_LENGTH, NUM_THREADS) and conditionally append NUM_EXAMPLES and CATEGORIES as separate array elements only when non-empty; finally echo a joined representation for logging and run the command using array execution (array expansion) rather than eval to avoid word-splitting and shell injection.

coderabbitai · 2026-01-22T18:17:41Z

+command="aiperf profile -m ${MODEL_NAME} --url ${ENDPOINT} --streaming --ui simple --concurrency 10 --request-count 20"
+echo "[CMD] $command"
+eval $command


⚠️ Potential issue | 🟠 Major

Avoid eval to prevent injection and quoting bugs.
MODEL_NAME, ENDPOINT, and thresholds are user-controlled; eval can mis-handle spaces or execute unintended tokens.

✅ Safer array-based invocation (apply to both commands)

-command="aiperf profile -m ${MODEL_NAME} --url ${ENDPOINT} --streaming --ui simple --concurrency 10 --request-count 20" -echo "[CMD] $command" -eval $command +command=( + aiperf profile -m "${MODEL_NAME}" --url "${ENDPOINT}" + --streaming --ui simple --concurrency 10 --request-count 20 +) +printf '[CMD] %q ' "${command[@]}"; echo +"${command[@]}" -command="aiperf profile -m ${MODEL_NAME} --input-file ${INPUT_FILE} --custom-dataset-type mooncake_trace --fixed-schedule --url ${ENDPOINT} --streaming --random-seed 42 --ui simple --artifact-dir ${RUN_ARTIFACT_DIR} --goodput \"time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}\"" -echo "[CMD] $command" -eval $command +command=( + aiperf profile -m "${MODEL_NAME}" --input-file "${INPUT_FILE}" + --custom-dataset-type mooncake_trace --fixed-schedule --url "${ENDPOINT}" + --streaming --random-seed 42 --ui simple --artifact-dir "${RUN_ARTIFACT_DIR}" + --goodput "time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}" +) +printf '[CMD] %q ' "${command[@]}"; echo +"${command[@]}"

Also applies to: 79-81

🤖 Prompt for AI Agents

In `@src/srtctl/benchmarks/scripts/mooncake-router/bench.sh` around lines 60 - 62, The script builds user-controlled shell strings into the variable named command and then runs them with eval (e.g., the lines setting command and eval $command and the similar block around lines 79-81); replace that pattern with a safe array-style invocation: construct the command as an array where each token is a separate element and interpolate variables as quoted expansions (e.g., cmd=(aiperf profile -m "$MODEL_NAME" --url "$ENDPOINT" --streaming --ui simple --concurrency 10 --request-count 20)) and then execute with "${cmd[@]}", applying the same change to the second command block so spaces and injections are prevented.

coderabbitai · 2026-01-22T18:17:41Z

+
+    command="python3 -m sglang.bench_serving --backend sglang --model ${model_name} --host ${head_node} --port ${head_port} --dataset-name random --max-concurrency ${PROFILE_CONCURRENCY} --num-prompts 128 --random-input-len ${PROFILE_ISL} --random-output-len ${PROFILE_OSL} --random-range-ratio 1 --warmup-request 0"
+    echo "[CMD] $command"
+    eval $command


⚠️ Potential issue | 🟠 Major

Avoid eval; it’s unsafe with env-provided values.
PROFILING_MODE/PROFILE_* inputs can include spaces or shell tokens. Use array execution and keep the logging via printf.

✅ Safer execution without eval

-command="python3 -m sglang.bench_serving --backend sglang --model ${model_name} --host ${head_node} --port ${head_port} --dataset-name random --max-concurrency ${PROFILE_CONCURRENCY} --num-prompts 128 --random-input-len ${PROFILE_ISL} --random-output-len ${PROFILE_OSL} --random-range-ratio 1 --warmup-request 0" -echo "[CMD] $command" -eval $command +command=( + python3 -m sglang.bench_serving --backend sglang --model "${model_name}" + --host "${head_node}" --port "${head_port}" --dataset-name random + --max-concurrency "${PROFILE_CONCURRENCY}" --num-prompts 128 + --random-input-len "${PROFILE_ISL}" --random-output-len "${PROFILE_OSL}" + --random-range-ratio 1 --warmup-request 0 +) +printf '[CMD] %q ' "${command[@]}"; echo +"${command[@]}" -command="python -m lm_eval --model local-completions --tasks gsm8k --model_args \"base_url=http://${head_node}:${head_port}/v1/completions,model=${model_name},tokenized_requests=False,tokenizer_backend=None,num_concurrent=${PROFILE_CONCURRENCY},timeout=6000,max_retries=1\" --limit 10" -echo "[CMD] $command" -eval $command +command=( + python -m lm_eval --model local-completions --tasks gsm8k + --model_args "base_url=http://${head_node}:${head_port}/v1/completions,model=${model_name},tokenized_requests=False,tokenizer_backend=None,num_concurrent=${PROFILE_CONCURRENCY},timeout=6000,max_retries=1" + --limit 10 +) +printf '[CMD] %q ' "${command[@]}"; echo +"${command[@]}"

Also applies to: 142-145

🤖 Prompt for AI Agents

In `@src/srtctl/benchmarks/scripts/profiling/profile.sh` around lines 133 - 136, The script builds a shell command string in the variable named command and then runs it with eval, which is unsafe for env-provided PROFILE_* and PROFILING_MODE values; instead construct the argument list as an array (e.g., cmd=(python3 -m sglang.bench_serving --backend sglang --model "$model_name" ...)) and run it with "${cmd[@]}" to avoid word-splitting and shell injection; use printf or echo to log the command safely (e.g., printf '%q ' "${cmd[@]}"; printf '\n') before executing, and apply the same array+printf replacement for the other eval usages noted around the block that covers lines 142-145.

coderabbitai · 2026-01-22T18:17:41Z

 # shellcheck disable=SC2086
-python prefix_ratio_benchmark.py \
-    --prefix-ratios $PREFIX_RATIOS \
-    --isl "$ISL" \
-    --osl "$OSL" \
-    --requests "$REQUESTS" \
-    --concurrency "$CONCURRENCY" \
-    --output-dir "$result_dir"
+command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"
+echo "[CMD] $command"
+eval $command


⚠️ Potential issue | 🟠 Major

Avoid eval with user-controlled arguments.
Using eval here allows shell metacharacters in PREFIX_RATIOS/ENDPOINT to execute. A bash array keeps proper argument boundaries and is safer.

✅ Safer execution without eval

-# shellcheck disable=SC2086 -command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir" -echo "[CMD] $command" -eval $command +read -r -a prefix_ratio_args <<< "$PREFIX_RATIOS" +command=( + python prefix_ratio_benchmark.py + --prefix-ratios "${prefix_ratio_args[@]}" + --isl "$ISL" + --osl "$OSL" + --requests "$REQUESTS" + --concurrency "$CONCURRENCY" + --output-dir "$result_dir" +) +printf '[CMD] %q ' "${command[@]}"; echo +"${command[@]}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# shellcheck disable=SC2086

python prefix_ratio_benchmark.py \

--prefix-ratios $PREFIX_RATIOS \

--isl "$ISL" \

--osl "$OSL" \

--requests "$REQUESTS" \

--concurrency "$CONCURRENCY" \

--output-dir "$result_dir"

command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"

echo "[CMD] $command"

eval $command

read -r -a prefix_ratio_args <<< "$PREFIX_RATIOS"

command=(

python prefix_ratio_benchmark.py

--prefix-ratios "${prefix_ratio_args[@]}"

--isl "$ISL"

--osl "$OSL"

--requests "$REQUESTS"

--concurrency "$CONCURRENCY"

--output-dir "$result_dir"

)

printf '[CMD] %q ' "${command[@]}"; echo

"${command[@]}"

🤖 Prompt for AI Agents

In `@src/srtctl/benchmarks/scripts/router/bench.sh` around lines 42 - 45, The script currently builds a single string in the variable command and uses eval, which permits shell metacharacters in user-controlled variables (PREFIX_RATIOS, ISL, OSL, REQUESTS, CONCURRENCY) to be interpreted; replace this with a bash array for safe argument handling: construct an array named cmd starting with the interpreter and script (e.g., python prefix_ratio_benchmark.py) and push each flag and its value as separate elements using quoted variable expansions (e.g., "--prefix-ratios" "$PREFIX_RATIOS"), then execute the command with "${cmd[@]}" and remove the eval and the SC2086 disable. Ensure result_dir is passed as the value of "--output-dir" similarly and keep the echo of the command by joining the array for logging if needed.

coderabbitai · 2026-01-22T18:17:41Z

+
+    command="python3 -u ${WORK_DIR}/benchmark_serving.py --model ${MODEL_NAME} --tokenizer ${MODEL_PATH} --host $HOST --port $PORT --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts $num_prompts --random-input-len $ISL --random-output-len $OSL --random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics ttft,tpot,itl,e2el --max-concurrency $concurrency"
+
+    echo "[CMD] $command"
+    eval $command
 done


⚠️ Potential issue | 🟠 Major

Avoid eval in benchmark loops.
MODEL_NAME, HOST, and concurrency-derived values are user-controlled; eval is unsafe and brittle.

✅ Safer command execution in loops

-command="python3 -u ${WORK_DIR}/benchmark_serving.py --model ${MODEL_NAME} --tokenizer ${MODEL_PATH} --host $HOST --port $PORT --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts $num_prompts --random-input-len $ISL --random-output-len $OSL --random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics ttft,tpot,itl,e2el --max-concurrency $concurrency" - -echo "[CMD] $command" -eval $command +command=( + python3 -u "${WORK_DIR}/benchmark_serving.py" + --model "${MODEL_NAME}" --tokenizer "${MODEL_PATH}" + --host "$HOST" --port "$PORT" --backend dynamo --endpoint /v1/completions + --disable-tqdm --dataset-name random --num-prompts "$num_prompts" + --random-input-len "$ISL" --random-output-len "$OSL" + --random-range-ratio 0.8 --ignore-eos --request-rate 250 + --percentile-metrics ttft,tpot,itl,e2el --max-concurrency "$concurrency" +) +printf '[CMD] %q ' "${command[@]}"; echo +"${command[@]}" -command="python3 -u ${WORK_DIR}/benchmark_serving.py --model ${MODEL_NAME} --tokenizer ${MODEL_PATH} --host $HOST --port $PORT --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts $num_prompts --random-input-len $ISL --random-output-len $OSL --random-range-ratio 0.8 --ignore-eos --request-rate ${REQ_RATE} --percentile-metrics ttft,tpot,itl,e2el --max-concurrency $concurrency --use-chat-template --save-result --result-dir $result_dir --result-filename $result_filename" - -echo "[CMD] $command" -eval $command +command=( + python3 -u "${WORK_DIR}/benchmark_serving.py" + --model "${MODEL_NAME}" --tokenizer "${MODEL_PATH}" + --host "$HOST" --port "$PORT" --backend dynamo --endpoint /v1/completions + --disable-tqdm --dataset-name random --num-prompts "$num_prompts" + --random-input-len "$ISL" --random-output-len "$OSL" + --random-range-ratio 0.8 --ignore-eos --request-rate "${REQ_RATE}" + --percentile-metrics ttft,tpot,itl,e2el --max-concurrency "$concurrency" + --use-chat-template --save-result --result-dir "$result_dir" + --result-filename "$result_filename" +) +printf '[CMD] %q ' "${command[@]}"; echo +"${command[@]}"

Also applies to: 77-80

🤖 Prompt for AI Agents

In `@src/srtctl/benchmarks/scripts/sa-bench/bench.sh` around lines 53 - 58, The loop builds a shell command string and uses eval which is unsafe for untrusted values (MODEL_NAME, HOST, concurrency); instead construct the command as an array and execute it without eval: replace the string-assigned command and eval with an array like cmd=(python3 -u "${WORK_DIR}/benchmark_serving.py" --model "${MODEL_NAME}" --tokenizer "${MODEL_PATH}" --host "${HOST}" --port "${PORT}" --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts "${num_prompts}" --random-input-len "${ISL}" --random-output-len "${OSL}" --random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics ttft,tpot,itl,e2el --max-concurrency "${concurrency}") and run it via "${cmd[@]}", ensuring every variable is quoted; apply the same replacement for the similar block referenced at lines 77-80.

coderabbitai · 2026-01-22T18:17:41Z

+from srtctl.cli.mixins.rollup import (
+    EnvironmentConfig,
+    LaunchCommandRollup,
+    NodesSummary,
+    RollupResult,
+    RollupSummary,
+)


🛠️ Refactor suggestion | 🟠 Major

Consider re-exporting dataclasses for easier imports.

The tests attempt to import NodeRollup, NodesSummary, etc. from rollup_stage, but these are only imported here and not re-exported. Either update tests to import from srtctl.cli.mixins.rollup directly, or add re-exports here.

🔧 Option: Add re-exports

from srtctl.cli.mixins.rollup import ( EnvironmentConfig, LaunchCommandRollup, + NodeRollup, NodesSummary, RollupResult, RollupSummary, ) + +__all__ = [ + "RollupStageMixin", + "EnvironmentConfig", + "LaunchCommandRollup", + "NodeRollup", + "NodesSummary", + "RollupResult", + "RollupSummary", +]

🤖 Prompt for AI Agents

In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 20 - 26, Tests import dataclasses like NodeRollup and NodesSummary from rollup_stage but this module only imports them without re-exporting; update rollup_stage to re-export the needed dataclasses (e.g., NodeRollup, NodesSummary, EnvironmentConfig, LaunchCommandRollup, RollupResult, RollupSummary) by importing them from srtctl.cli.mixins.rollup and exposing them in the module namespace (add explicit imports and/or populate __all__ with these symbol names) so tests can import them from rollup_stage.

coderabbitai · 2026-01-22T18:17:41Z

+from srtctl.cli.mixins.rollup_stage import (
+    NodeRollup,
+    NodesSummary,
+    RollupResult,
+    RollupStageMixin,
+    RollupSummary,
+)


⚠️ Potential issue | 🔴 Critical

Fix import path - NodeRollup is not exported from rollup_stage.

The pipeline fails with ImportError: cannot import name 'NodeRollup' from 'srtctl.cli.mixins.rollup_stage'. The rollup_stage.py module imports these classes from srtctl.cli.mixins.rollup but doesn't re-export them. Either import directly from the rollup submodule or add re-exports to rollup_stage.py.

🔧 Proposed fix - import from correct module

-from srtctl.cli.mixins.rollup_stage import ( - NodeRollup, - NodesSummary, - RollupResult, - RollupStageMixin, - RollupSummary, -) +from srtctl.cli.mixins.rollup import ( + NodeRollup, + NodesSummary, + RollupResult, + RollupSummary, +) +from srtctl.cli.mixins.rollup_stage import RollupStageMixin

🧰 Tools

🪛 GitHub Actions: CI

[error] 11-11: ImportError: cannot import name 'NodeRollup' from 'srtctl.cli.mixins.rollup_stage' during test collection. This caused pytest to fail with exit code 2.

🤖 Prompt for AI Agents

In `@tests/test_rollup.py` around lines 11 - 17, The tests import NodeRollup (and related classes) from rollup_stage but NodeRollup isn't re-exported there; update the import to pull NodeRollup, NodesSummary, RollupResult, RollupSummary directly from the rollup module (or alternatively add re-export statements in rollup_stage to export those symbols) so the import in tests/test_rollup.py resolves; target the identifiers NodeRollup, NodesSummary, RollupResult, RollupSummary and keep RollupStageMixin import from rollup_stage if it is defined there.

ishandhanani · 2026-01-22T18:29:06Z

+
+        try:
+            content = benchmark_out_path.read_text()
+


AIperf provides a profile export json. We should be using that instead of regex

ishandhanani · 2026-01-22T18:33:53Z

+from pathlib import Path
+from typing import Any
+
+logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")


The srtslurm.yaml contains the root of the repo. Not sure if you need it but might be helpful

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@analysis/srtlog/rollup_harness.py`:
- Around line 349-353: The code currently uses
"output_tps=data.get('output_throughput', 0) or data.get('output_tps', 0)" which
treats a valid 0 as missing; update the RollupResult construction to preserve
explicit zeroes by checking for None instead of falsy values — e.g. compute a
value = data.get('output_throughput') if data.get('output_throughput') is not
None else data.get('output_tps') if data.get('output_tps') is not None else 0,
and pass that value as output_tps (similarly use explicit None checks for any
other throughput fields) so RollupResult sees bona fide 0s rather than
overwriting them.
- Around line 31-47: The current sort key in find_job_directories mixes int and
str values causing a TypeError when directory names include both numeric and
non-numeric names; update the job_dirs.sort key to return a homogeneous tuple
that orders numeric directories before (or after) non-numeric ones — e.g.,
return (0, int(p.name)) for purely numeric names and (1, p.name) for others — so
all keys share the same types and provide stable ordering; modify the lambda
used in job_dirs.sort accordingly.

In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 464-478: The _get_parser_order function's mapping for "sglang"
lacks the recommended fallback to "sglang-v2"; update the parser_orders dict in
_get_parser_order(backend_type: str) to return ["sglang", "sglang-v2"] for the
"sglang" key (so it will try the primary parser then the v2 fallback), and
update the function docstring to mention the sglang-v2 fallback order
accordingly.

In `@src/srtctl/cli/mixins/rollup/models.py`:
- Around line 370-407: The aggregation logic drops valid zero values because it
filters metrics with truthy checks (e.g., "if n.avg_input_throughput") — update
_compute_aggregate_stats to treat 0 as a valid value by using explicit None
checks. For each throughput/metric list (avg_input_throughput,
max_input_throughput, avg_gen_throughput, max_gen_throughput, kv_cache_gb)
change comprehensions to include values where the field is not None (e.g.,
[n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput
is not None]) and compute averages using that list length so zeros contribute;
likewise ensure total_prefill_tokens/total_cached_tokens and
overall_cache_hit_rate are set even when totals are zero (use explicit
comparisons to None rather than truthiness).

♻️ Duplicate comments (3)

analysis/srtlog/rollup_harness.py (1)

205-215: Guard optional parser helpers before calling.

Parsers that don’t implement parse_result_directory will raise AttributeError and skip other parsing.

🛠️ Proposed fix

-        for entry in logs_dir.iterdir():
+        parse_result_dir = getattr(parser, "parse_result_directory", None)
+        for entry in logs_dir.iterdir():
             if entry.is_dir() and "_isl_" in entry.name and "_osl_" in entry.name:
-                dir_results = parser.parse_result_directory(entry)
-                results.extend(dir_results)
+                if callable(parse_result_dir):
+                    dir_results = parse_result_dir(entry)
+                    results.extend(dir_results)

src/srtctl/cli/mixins/rollup_stage.py (2)

49-52: Implement the abstract endpoints property.

The ellipsis body returns None and violates the declared return type.

🛠️ Proposed fix

     `@property`
     def endpoints(self) -> list[Endpoint]:
         """Endpoint allocation topology."""
-        ...
+        raise NotImplementedError("Subclass must provide endpoints property")

129-149: Fix optional parser helper calls that aren’t in the protocol.

Type checking will still flag these (and they can be non-callable).

🛠️ Proposed fix

-                if hasattr(parser, "find_aiperf_results"):
-                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                find_aiperf = getattr(parser, "find_aiperf_results", None)
+                if callable(find_aiperf):
+                    aiperf_files = find_aiperf(self.runtime.log_dir)
                     for aiperf_path in aiperf_files:
                         result = parser.parse_result_json(aiperf_path)
                         if result.get("output_tps") is not None:
                             results.append(result)
                             logger.debug("Loaded AIPerf result: %s", aiperf_path)
@@
-                if hasattr(parser, "parse_result_directory"):
+                parse_result_dir = getattr(parser, "parse_result_directory", None)
+                if callable(parse_result_dir):
                     for entry in self.runtime.log_dir.iterdir():
                         if not entry.is_dir():
                             continue
                         # Match patterns like sa-bench_isl_X_osl_Y
                         if "_isl_" in entry.name and "_osl_" in entry.name:
                             logger.debug("Found benchmark results directory: %s", entry.name)
-                            dir_results = parser.parse_result_directory(entry)
+                            dir_results = parse_result_dir(entry)
                             results.extend(dir_results)

🧹 Nitpick comments (2)

analysis/srtlog/parsers/nodes/trtllm.py (1)
105-133: Handle multi-line Config(...) dumps to avoid missing TP/PP/EP.

If the config dump wraps lines, the current regex won’t match and config stays empty.
🛠️ Proposed fix
-            config_match = re.search(r"Config$(.*?)$", clean_content)
+            config_match = re.search(r"Config$(.*?)$", clean_content, re.DOTALL)
analysis/srtlog/parsers/__init__.py (1)
188-193: Tighten registry typing for better static checks.
🛠️ Proposed fix
-_benchmark_parsers: dict[str, type] = {}
+_benchmark_parsers: dict[str, type[BenchmarkParserProtocol]] = {}
 
 # Registry for node parsers
-_node_parsers: dict[str, type] = {}
+_node_parsers: dict[str, type[NodeParserProtocol]] = {}

coderabbitai · 2026-01-22T18:39:21Z

+def find_job_directories(log_dir: Path) -> list[Path]:
+    """Find all job directories by searching for sbatch_script.sh files.
+
+    Args:
+        log_dir: Root directory to search
+
+    Returns:
+        List of job directory paths (parent dirs of sbatch_script.sh)
+    """
+    job_dirs = []
+    for sbatch_script in log_dir.rglob("sbatch_script.sh"):
+        job_dir = sbatch_script.parent
+        job_dirs.append(job_dir)
+
+    # Sort by job ID (directory name) if numeric
+    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    return job_dirs


⚠️ Potential issue | 🟠 Major

Fix mixed-type sort keys to avoid TypeError.

job_dirs.sort(...) mixes int and str keys when both numeric and non-numeric dirs exist.

🛠️ Proposed fix

- job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name)) + job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))

🤖 Prompt for AI Agents

In `@analysis/srtlog/rollup_harness.py` around lines 31 - 47, The current sort key in find_job_directories mixes int and str values causing a TypeError when directory names include both numeric and non-numeric names; update the job_dirs.sort key to return a homogeneous tuple that orders numeric directories before (or after) non-numeric ones — e.g., return (0, int(p.name)) for purely numeric names and (1, p.name) for others — so all keys share the same types and provide stable ordering; modify the lambda used in job_dirs.sort accordingly.

coderabbitai · 2026-01-22T18:39:21Z

+    for data in results:
+        result = RollupResult(
+            concurrency=data.get("max_concurrency", 0),
+            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            total_tps=data.get("total_token_throughput"),


⚠️ Potential issue | 🟡 Minor

Avoid treating 0 throughput as “missing.”

or will override valid 0 values and can skew rollups.

🛠️ Proposed fix

- output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0), + output_tps=( + data.get("output_throughput") + if data.get("output_throughput") is not None + else data.get("output_tps", 0) + ),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for data in results:

result = RollupResult(

concurrency=data.get("max_concurrency", 0),

output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),

total_tps=data.get("total_token_throughput"),

for data in results:

result = RollupResult(

concurrency=data.get("max_concurrency", 0),

output_tps=(

data.get("output_throughput")

if data.get("output_throughput") is not None

else data.get("output_tps", 0)

),

total_tps=data.get("total_token_throughput"),

🤖 Prompt for AI Agents

In `@analysis/srtlog/rollup_harness.py` around lines 349 - 353, The code currently uses "output_tps=data.get('output_throughput', 0) or data.get('output_tps', 0)" which treats a valid 0 as missing; update the RollupResult construction to preserve explicit zeroes by checking for None instead of falsy values — e.g. compute a value = data.get('output_throughput') if data.get('output_throughput') is not None else data.get('output_tps') if data.get('output_tps') is not None else 0, and pass that value as output_tps (similarly use explicit None checks for any other throughput fields) so RollupResult sees bona fide 0s rather than overwriting them.

coderabbitai · 2026-01-22T18:39:21Z

+    def _get_parser_order(self, backend_type: str) -> list[str]:
+        """Get the order of parsers to try for a given backend type.
+
+        Args:
+            backend_type: Backend type from config (e.g., "sglang", "trtllm")
+
+        Returns:
+            List of parser types to try in order
+        """
+        parser_orders = {
+            "sglang": ["sglang"],
+            "trtllm": ["trtllm"],
+        }
+
+        return parser_orders.get(backend_type, [backend_type])


⚠️ Potential issue | 🟠 Major

Add fallback for sglang-v2 when backend is sglang.

The docstring mentions fallback order, but the mapping doesn’t include it.

🛠️ Proposed fix

- parser_orders = { - "sglang": ["sglang"], - "trtllm": ["trtllm"], - } + parser_orders = { + "sglang": ["sglang", "sglang-v2"], + "sglang-v2": ["sglang-v2", "sglang"], + "trtllm": ["trtllm"], + }

🤖 Prompt for AI Agents

In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 464 - 478, The _get_parser_order function's mapping for "sglang" lacks the recommended fallback to "sglang-v2"; update the parser_orders dict in _get_parser_order(backend_type: str) to return ["sglang", "sglang-v2"] for the "sglang" key (so it will try the primary parser then the v2 fallback), and update the function docstring to mention the sglang-v2 fallback order accordingly.

coderabbitai · 2026-01-22T18:39:21Z

+    def _compute_aggregate_stats(self) -> None:
+        """Compute aggregate statistics across all nodes."""
+        # Prefill aggregation (includes both prefill and agg nodes)
+        prefill_capable_nodes = [n for n in self.nodes if n.worker_type in ("prefill", "agg")]
+        if prefill_capable_nodes:
+            total_new = sum(n.total_new_tokens or 0 for n in prefill_capable_nodes)
+            total_cached = sum(n.total_cached_tokens or 0 for n in prefill_capable_nodes)
+
+            if total_new > 0 or total_cached > 0:
+                self.total_prefill_tokens = total_new
+                self.total_cached_tokens = total_cached
+                total = total_new + total_cached
+                if total > 0:
+                    self.overall_cache_hit_rate = (total_cached / total) * 100
+
+            throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput]
+            if throughputs:
+                self.avg_prefill_input_throughput = sum(throughputs) / len(throughputs)
+
+            max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput]
+            if max_throughputs:
+                self.max_prefill_input_throughput = max(max_throughputs)
+
+        # Decode aggregation (includes both decode and agg nodes)
+        decode_capable_nodes = [n for n in self.nodes if n.worker_type in ("decode", "agg")]
+        if decode_capable_nodes:
+            throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput]
+            if throughputs:
+                self.avg_decode_gen_throughput = sum(throughputs) / len(throughputs)
+
+            max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput]
+            if max_throughputs:
+                self.max_decode_gen_throughput = max(max_throughputs)
+
+        # Memory aggregation
+        kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb]
+        if kv_caches:
+            self.total_kv_cache_gb = sum(kv_caches)


⚠️ Potential issue | 🟡 Minor

Preserve legitimate zero values in aggregate stats.

Truthy checks drop valid 0 values (e.g., idle nodes), which inflates averages/maxes.

🛠️ Proposed fix

- throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput] + throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput is not None] if throughputs: self.avg_prefill_input_throughput = sum(throughputs) / len(throughputs) - max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput] + max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput is not None] if max_throughputs: self.max_prefill_input_throughput = max(max_throughputs) @@ - throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput] + throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput is not None] if throughputs: self.avg_decode_gen_throughput = sum(throughputs) / len(throughputs) - max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput] + max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput is not None] if max_throughputs: self.max_decode_gen_throughput = max(max_throughputs) @@ - kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb] + kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb is not None] if kv_caches: self.total_kv_cache_gb = sum(kv_caches)

🤖 Prompt for AI Agents

In `@src/srtctl/cli/mixins/rollup/models.py` around lines 370 - 407, The aggregation logic drops valid zero values because it filters metrics with truthy checks (e.g., "if n.avg_input_throughput") — update _compute_aggregate_stats to treat 0 as a valid value by using explicit None checks. For each throughput/metric list (avg_input_throughput, max_input_throughput, avg_gen_throughput, max_gen_throughput, kv_cache_gb) change comprehensions to include values where the field is not None (e.g., [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput is not None]) and compute averages using that list length so zeros contribute; likewise ensure total_prefill_tokens/total_cached_tokens and overall_cache_hit_rate are set even when totals are zero (use explicit comparisons to None rather than truthiness).

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 558-563: The mapping for output_tps in the RollupResult
construction currently only reads data.get("output_throughput", 0) which drops
valid values emitted as "output_tps"; update the construction in the loop that
creates RollupResult (the RollupResult(...) call) to use a fallback that
preserves either key, e.g. read output_throughput first then fallback to
data.get("output_tps") (or vice versa) before defaulting to 0 so parsers
emitting "output_tps" like mooncake-router are preserved.

♻️ Duplicate comments (5)

src/srtctl/cli/mixins/rollup_stage.py (3)

49-52: Make endpoints abstract instead of returning None.

The ellipsis body returns None, violating the declared return type and causing type-checker failures.

🔧 Proposed fix

     `@property`
     def endpoints(self) -> list[Endpoint]:
         """Endpoint allocation topology."""
-        ...
+        raise NotImplementedError("Subclass must provide endpoints property")

129-149: Fix type-checker errors for optional benchmark parser hooks.

BenchmarkParserProtocol doesn’t define find_aiperf_results / parse_result_directory; hasattr still leaves the calls untyped. Use getattr+callable or extend the protocol.

🔧 Proposed fix (getattr + callable)

             # Try parser-specific result collection first
             if parser is not None:
                 # For mooncake-router, look for AIPerf results
-                if hasattr(parser, "find_aiperf_results"):
-                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                find_aiperf = getattr(parser, "find_aiperf_results", None)
+                if callable(find_aiperf):
+                    aiperf_files = find_aiperf(self.runtime.log_dir)
                     for aiperf_path in aiperf_files:
                         result = parser.parse_result_json(aiperf_path)
                         if result.get("output_tps") is not None:
                             results.append(result)
                             logger.debug("Loaded AIPerf result: %s", aiperf_path)

                 # For sa-bench style, look for result directories
-                if hasattr(parser, "parse_result_directory"):
+                parse_result_dir = getattr(parser, "parse_result_directory", None)
+                if callable(parse_result_dir):
                     for entry in self.runtime.log_dir.iterdir():
                         if not entry.is_dir():
                             continue
                         # Match patterns like sa-bench_isl_X_osl_Y
                         if "_isl_" in entry.name and "_osl_" in entry.name:
                             logger.debug("Found benchmark results directory: %s", entry.name)
-                            dir_results = parser.parse_result_directory(entry)
+                            dir_results = parse_result_dir(entry)
                             results.extend(dir_results)

465-479: Add sglang-v2 fallback in parser order.

Without this, sglang backends won’t try the v2 parser even when it’s available.

🔧 Proposed fix

         parser_orders = {
-            "sglang": ["sglang"],
-            "trtllm": ["trtllm"],
+            "sglang": ["sglang", "sglang-v2"],
+            "sglang-v2": ["sglang-v2", "sglang"],
+            "trtllm": ["trtllm"],
         }

analysis/srtlog/rollup_harness.py (2)

45-46: Use homogeneous sort keys for job directory ordering.

Mixed int/str sort keys raise TypeError when both numeric and non-numeric job IDs exist.

🔧 Proposed fix

-    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))

351-354: Don’t treat valid 0 throughput as missing.

Using or overwrites explicit zero values; check for None instead.

🔧 Proposed fix

-            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            output_tps=(
+                data.get("output_throughput")
+                if data.get("output_throughput") is not None
+                else data.get("output_tps", 0)
+            ),

coderabbitai · 2026-01-22T18:50:48Z

+        for data in results:
+            result = RollupResult(
+                concurrency=data.get("max_concurrency", 0),
+                output_tps=data.get("output_throughput", 0),
+                total_tps=data.get("total_token_throughput"),
+                request_throughput=data.get("request_throughput"),


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file at the specified lines head -580 src/srtctl/cli/mixins/rollup_stage.py | tail -40

Repository: ishandhanani/srt-slurm

Length of output: 1844

🏁 Script executed:

# Find and examine the RollupResult dataclass definition rg -n "class RollupResult" src/

Repository: ishandhanani/srt-slurm

Length of output: 128

🏁 Script executed:

# Search for output_tps and output_throughput usage in the codebase rg -n "output_tps|output_throughput" src/

Repository: ishandhanani/srt-slurm

Length of output: 1567

🏁 Script executed:

# Examine the RollupResult dataclass definition sed -n '411,430p' src/srtctl/cli/mixins/rollup/models.py

Repository: ishandhanani/srt-slurm

Length of output: 681

🏁 Script executed:

# Check the context where data comes from - look at line 136 context sed -n '130,145p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 973

🏁 Script executed:

# Look for where 'results' data is populated or parsed rg -n "def.*results|results\s*=" src/srtctl/cli/mixins/rollup_stage.py | head -20

Repository: ishandhanani/srt-slurm

Length of output: 302

🏁 Script executed:

# Check if there are any parsers that produce "output_tps" in the data rg -B5 -A5 '"output_tps"' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 722

🏁 Script executed:

# Find all parsers to understand data sources rg -n "class.*Parser|def parse_result" src/srtctl/ | grep -i parser

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Check the sa-bench benchmark_serving.py to see what it outputs sed -n '70,85p' src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py

Repository: ishandhanani/srt-slurm

Length of output: 432

🏁 Script executed:

# Check the AIPerf parser or mooncake parser fd -e "\.py$" "aiperf\|mooncake" src/srtctl/

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Search for parse_result_json implementations rg -B3 -A10 "def parse_result_json" src/srtctl/

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Look for all result parsing logic rg -n "def _collect_benchmark_results" -A50 src/srtctl/cli/mixins/rollup_stage.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 2635

🏁 Script executed:

# Check what parse_result_directory returns rg -n "def parse_result_directory" src/ -B2 -A15

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Look for mooncake or AIPerf parser fd -e "\.py$" src/ | xargs rg -l "aiperf\|AIPerf" | head -5

Repository: ishandhanani/srt-slurm

Length of output: 387

🏁 Script executed:

# Search for where output_tps key comes from in parsing rg -n '"output_tps"' src/srtctl/cli/mixins/rollup_stage.py -B5 -A5

Repository: ishandhanani/srt-slurm

Length of output: 766

🏁 Script executed:

# Check if there's any transformation of keys between AIPerf result and the mapping rg -n "parse_result_json" src/srtctl/ -B3 -A20 | head -60

Repository: ishandhanani/srt-slurm

Length of output: 2366

🏁 Script executed:

# Verify the complete flow - check analysis.srtlog.parsers location find . -name "parsers.py" | head -5

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Check if AIPerf results transformation happens anywhere rg -n "AIPerf\|aiperf" src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Look at the complete mapping code to understand the flow sed -n '555,600p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 2033

🏁 Script executed:

# Check the fallback JSON parsing section to understand all data sources sed -n '155,200p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 1973

🏁 Script executed:

# Look for any key transformation or normalization before line 561 sed -n '520,565p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 1935

🏁 Script executed:

# Verify the sa-bench output structure that produces "output_throughput" sed -n '690,710p' src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py

Repository: ishandhanani/srt-slurm

Length of output: 991

🏁 Script executed:

# Check if there are any other parsers or places where results are transformed rg -n "output_tps|output_throughput" src/srtctl/cli/mixins/rollup_stage.py -B2 -A2

Repository: ishandhanani/srt-slurm

Length of output: 967

🏁 Script executed:

# Let me verify if AIPerf results are actually added and would go through line 561 # Check what parse_result_json returns for AIPerf rg -n "find_aiperf_results\|parse_result_json" src/srtctl/ -B5 -A15 | head -80

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Confirm the exact data structure by looking at where both parsers are used rg -n "output_tps" src/ --type py | grep -v README

Repository: ishandhanani/srt-slurm

Length of output: 818

🏁 Script executed:

# Let me trace through the exact flow - check if AIPerf results with output_tps key reach the mapping sed -n '131,140p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 660

🏁 Script executed:

# Verify that AIPerf results are indeed added to results and then processed at line 561 sed -n '135,165p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 1604

🏁 Script executed:

# Confirm the issue: check if there's any key transformation between AIPerf parse and the mapping code rg -n "results.append\|results.extend" src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Final verification: Check if AIPerf results indeed bypass the filtering at line 136 and reach the mapping # Let me trace the complete flow - results list is populated from AIPerf, then passed to line 555+ sed -n '555,565p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 590

🏁 Script executed:

# Verify that the RollupResult dataclass expects float for output_tps (not None) sed -n '411,425p' src/srtctl/cli/mixins/rollup/models.py

Repository: ishandhanani/srt-slurm

Length of output: 538

Preserve output_tps when output_throughput is missing.

AIPerf results use the output_tps key but the current mapping at line 561 only checks for output_throughput, causing valid throughput data to be lost when mapped to 0. This affects results from parsers like mooncake-router that emit output_tps directly.

🔧 Proposed fix

result = RollupResult( concurrency=data.get("max_concurrency", 0), - output_tps=data.get("output_throughput", 0), + output_tps=( + data.get("output_throughput") + if data.get("output_throughput") is not None + else data.get("output_tps", 0) + ),

🤖 Prompt for AI Agents

In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 558 - 563, The mapping for output_tps in the RollupResult construction currently only reads data.get("output_throughput", 0) which drops valid values emitted as "output_tps"; update the construction in the loop that creates RollupResult (the RollupResult(...) call) to use a fallback that preserves either key, e.g. read output_throughput first then fallback to data.get("output_tps") (or vice versa) before defaulting to 0 so parsers emitting "output_tps" like mooncake-router are preserved.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@docs/architecture.md`:
- Around line 929-1014: The fenced ASCII diagram block currently uses an
untagged code fence (```); to satisfy markdownlint MD040, update the opening
fence in the "Detailed Entity Relationships" diagram (the large block that
begins with the ASCII box showing PARSER LAYER / ROLLUP LAYER) to use a language
tag by replacing the opening ``` with ```text so the block starts with ```text
while leaving the content and closing fence unchanged.
- Around line 882-925: The ASCII ER diagram fenced code block starting with the
box diagram (the block that begins with the large
"┌─────────────────────────────────────────────────────────────┐" line) is
missing a language tag; change the opening triple-backtick for that block to
```text so the fenced block reads with the text language tag and satisfies
Markdownlint MD040. Ensure only the opening fence is modified (leave the closing
``` unchanged).

♻️ Duplicate comments (1)

docs/architecture.md (1)

822-826: Add a language tag to the Rollup architecture fenced block.

Markdownlint MD040 is triggered here; set the opening fence to ```text.

coderabbitai · 2026-01-22T19:04:48Z

+```
+                                    ┌─────────────────────────────────────────────────────────────┐
+                                    │                      LOG FILES                               │
+                                    └─────────────────────────────────────────────────────────────┘
+                                                              │
+                    ┌─────────────────────────────────────────┼─────────────────────────────────────────┐
+                    │                                         │                                         │
+                    ▼                                         ▼                                         ▼
+        ┌───────────────────┐                     ┌───────────────────┐                     ┌───────────────────┐
+        │  worker_*.out/err │                     │   benchmark.out   │                     │   config.yaml     │
+        └───────────────────┘                     └───────────────────┘                     └───────────────────┘
+                    │                                         │                                         │
+                    ▼                                         ▼                                         ▼
+    ┌───────────────────────────┐             ┌───────────────────────────┐             ┌───────────────────────────┐
+    │    NodeParserProtocol     │             │  BenchmarkParserProtocol  │             │      YAML Loader          │
+    │  (sglang.py / trtllm.py)  │             │  (sa_bench.py / etc.)     │             │                           │
+    └───────────────────────────┘             └───────────────────────────┘             └───────────────────────────┘
+                    │                                         │                                         │
+          ┌────────┴────────┐                       ┌────────┴────────┐                                 │
+          ▼                 ▼                       ▼                 ▼                                 ▼
+┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐        ┌─────────────────────┐
+│  NodeMetrics    │  │NodeLaunchCommand│  │  dict (result)  │  │BenchmarkLaunch  │        │  EnvironmentConfig  │
+│  (models.py)    │  │ (__init__.py)   │  │                 │  │    Command      │        │  (rollup/models.py) │
+└─────────────────┘  └─────────────────┘  └─────────────────┘  └─────────────────┘        └─────────────────────┘
+        │                    │                     │                    │                          │
+        └────────────────────┼─────────────────────┼────────────────────┼──────────────────────────┘
+                             │                     │                    │
+                             ▼                     ▼                    ▼
+                    ┌────────────────────────────────────────────────────────────┐
+                    │                    RollupStageMixin                         │
+                    │              (rollup_stage.py / rollup_harness.py)          │
+                    └────────────────────────────────────────────────────────────┘
+                                                   │
+                                                   │ transforms
+                                                   ▼
+                                          ┌───────────────┐
+                                          │ RollupSummary │
+                                          └───────┬───────┘
+                                                  │
+                                                  ▼
+                                          ┌───────────────┐
+                                          │  rollup.json  │
+                                          └───────────────┘
+```


⚠️ Potential issue | 🟡 Minor

Add a language tag to the ER diagram fenced block.

Markdownlint MD040 is triggered; set the opening fence to ```text.

🛠️ Proposed fix

-``` +```text

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

882-882: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@docs/architecture.md` around lines 882 - 925, The ASCII ER diagram fenced code block starting with the box diagram (the block that begins with the large "┌─────────────────────────────────────────────────────────────┐" line) is missing a language tag; change the opening triple-backtick for that block to ```text so the fenced block reads with the text language tag and satisfies Markdownlint MD040. Ensure only the opening fence is modified (leave the closing ``` unchanged).

coderabbitai · 2026-01-22T19:04:48Z

+```
+┌─────────────────────────────────────────────────────────────────────────────────────────┐
+│                              PARSER LAYER (analysis/srtlog/parsers/)                     │
+├─────────────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                          │
+│  ┌────────────────────┐         ┌────────────────────┐         ┌────────────────────┐   │
+│  │  NodeLaunchCommand │         │ BenchmarkLaunch    │         │    NodeMetrics     │   │
+│  ├────────────────────┤         │      Command       │         ├────────────────────┤   │
+│  │ backend_type: str  │         ├────────────────────┤         │ node_info: dict    │   │
+│  │ worker_type: str   │         │ benchmark_type: str│         │ config: dict       │   │
+│  │ raw_command: str   │         │ raw_command: str   │         │ batches: list      │   │
+│  │ extra_args: dict   │         │ extra_args: dict   │         │ memory_snapshots   │   │
+│  └────────────────────┘         └────────────────────┘         └────────────────────┘   │
+│                                                                                          │
+└─────────────────────────────────────────────────────────────────────────────────────────┘
+                                              │
+                                              ▼
+┌─────────────────────────────────────────────────────────────────────────────────────────┐
+│                           ROLLUP LAYER (srtctl/cli/mixins/rollup/)                       │
+├─────────────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                          │
+│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
+│  │                            LaunchCommandRollup                                   │    │
+│  ├─────────────────────────────────────────────────────────────────────────────────┤    │
+│  │  raw_command: str          │  command_type: str ("worker" | "benchmark")        │    │
+│  │  model_path: str | None    │  served_model_name: str | None                     │    │
+│  │  worker_type: str | None   │  backend_type: str | None                          │    │
+│  │  tp_size: int | None       │  pp_size: int | None                               │    │
+│  │  dp_size: int | None       │  ep_size: int | None                               │    │
+│  │  max_num_seqs: int | None  │  max_model_len: int | None                         │    │
+│  └─────────────────────────────────────────────────────────────────────────────────┘    │
+│                                           │ 1:1                                          │
+│                                           ▼                                              │
+│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
+│  │                                 NodeRollup                                       │    │
+│  ├─────────────────────────────────────────────────────────────────────────────────┤    │
+│  │  node_name: str              │  worker_type: str                                 │    │
+│  │  worker_id: str              │  tp_size, pp_size, dp_size, ep_size: int | None  │    │
+│  │  launch_command: LaunchCommandRollup | None                                      │    │
+│  │  avail_mem_gb: float | None  │  kv_cache_gb: float | None                       │    │
+│  │  total_batches: int          │  avg_gen_throughput: float | None                │    │
+│  └─────────────────────────────────────────────────────────────────────────────────┘    │
+│                                           │ 1:N                                          │
+│                                           ▼                                              │
+│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
+│  │                               NodesSummary                                       │    │
+│  ├─────────────────────────────────────────────────────────────────────────────────┤    │
+│  │  nodes: list[NodeRollup]                                                         │    │
+│  │  total_prefill_nodes: int    │  total_decode_nodes: int                         │    │
+│  │  total_agg_nodes: int        │  total_kv_cache_gb: float | None                 │    │
+│  └─────────────────────────────────────────────────────────────────────────────────┘    │
+│                                                                                          │
+│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
+│  │                               RollupResult                                       │    │
+│  ├─────────────────────────────────────────────────────────────────────────────────┤    │
+│  │  max_concurrency: int        │  input_len: int        │  output_len: int        │    │
+│  │  output_tps: float           │  request_tps: float    │  total_tokens: int      │    │
+│  │  avg_ttft_ms: float          │  avg_tpot_ms: float    │  p50/p90/p99 metrics    │    │
+│  └─────────────────────────────────────────────────────────────────────────────────┘    │
+│                                                                                          │
+│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
+│  │                            EnvironmentConfig                                     │    │
+│  ├─────────────────────────────────────────────────────────────────────────────────┤    │
+│  │  prefill_environment: dict   │  decode_environment: dict                        │    │
+│  │  aggregated_environment: dict│  prefill_engine_config: dict                     │    │
+│  │  decode_engine_config: dict  │  aggregated_engine_config: dict                  │    │
+│  └─────────────────────────────────────────────────────────────────────────────────┘    │
+│                                                                                          │
+│                    ┌──────────────────────┬──────────────────────┐                      │
+│                    │                      │                      │                      │
+│                    ▼                      ▼                      ▼                      │
+│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
+│  │                              RollupSummary                                       │    │
+│  ├─────────────────────────────────────────────────────────────────────────────────┤    │
+│  │  job_id: str                 │  model_name: str         │  backend_type: str    │    │
+│  │  benchmark_type: str         │  is_disaggregated: bool  │  total_nodes: int     │    │
+│  │  max_output_tps: float       │  max_request_tps: float                          │    │
+│  │  results: list[RollupResult]                                                     │    │
+│  │  nodes_summary: NodesSummary | None                                              │    │
+│  │  benchmark_command: LaunchCommandRollup | None                                   │    │
+│  │  environment_config: EnvironmentConfig | None                                    │    │
+│  │  tags: dict                  │  timestamp: str                                   │    │
+│  └─────────────────────────────────────────────────────────────────────────────────┘    │
+│                                                                                          │
+└─────────────────────────────────────────────────────────────────────────────────────────┘
+```


⚠️ Potential issue | 🟡 Minor

Add a language tag to the “Detailed Entity Relationships” fenced block.

Markdownlint MD040 is triggered; set the opening fence to ```text.

🛠️ Proposed fix

-``` +```text

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

929-929: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@docs/architecture.md` around lines 929 - 1014, The fenced ASCII diagram block currently uses an untagged code fence (```); to satisfy markdownlint MD040, update the opening fence in the "Detailed Entity Relationships" diagram (the large block that begins with the ASCII box showing PARSER LAYER / ROLLUP LAYER) to use a language tag by replacing the opening ``` with ```text so the block starts with ```text while leaving the content and closing fence unchanged.

…ata models with source files

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@analysis/srtlog/parsers/benchmark/mooncake_router.py`:
- Around line 126-129: The truthy check on tps_per_user will drop legitimate
zero values; update the conditional in the block that calls self._get_metric
(variable tps_per_user) to check for None explicitly (e.g., "is not None")
before assigning result["output_tps_per_user"] so that 0 is preserved while
still filtering missing values returned as None.

♻️ Duplicate comments (7)

src/srtctl/cli/mixins/rollup/models.py (1)

394-416: Truthy checks drop valid zero values in aggregate stats.

The list comprehensions use truthy filtering (e.g., if n.avg_input_throughput) which excludes legitimate 0 values. This can inflate averages or miss valid zero-throughput nodes.

Proposed fix

-            throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput]
+            throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput is not None]
             if throughputs:
                 self.avg_prefill_input_throughput = sum(throughputs) / len(throughputs)

-            max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput]
+            max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput is not None]
             if max_throughputs:
                 self.max_prefill_input_throughput = max(max_throughputs)

         # Decode aggregation (includes both decode and agg nodes)
         decode_capable_nodes = [n for n in self.nodes if n.worker_type in ("decode", "agg")]
         if decode_capable_nodes:
-            throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput]
+            throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput is not None]
             if throughputs:
                 self.avg_decode_gen_throughput = sum(throughputs) / len(throughputs)

-            max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput]
+            max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput is not None]
             if max_throughputs:
                 self.max_decode_gen_throughput = max(max_throughputs)

         # Memory aggregation
-        kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb]
+        kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb is not None]
         if kv_caches:
             self.total_kv_cache_gb = sum(kv_caches)

analysis/srtlog/rollup_harness.py (2)

31-47: Mixed-type sort keys can cause TypeError.

The sort key returns int for numeric directory names and str for non-numeric ones. Comparing int to str raises TypeError in Python 3.
Proposed fix
     # Sort by job ID (directory name) if numeric
-    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))
     return job_dirs
350-371: Truthy or operator drops valid zero throughput values.

data.get("output_throughput", 0) or data.get("output_tps", 0) treats 0 as falsy, so if output_throughput is legitimately 0, it incorrectly falls back to output_tps.
Proposed fix
         result = RollupResult(
             concurrency=data.get("max_concurrency", 0),
-            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            output_tps=(
+                data.get("output_throughput")
+                if data.get("output_throughput") is not None
+                else data.get("output_tps", 0)
+            ),

src/srtctl/cli/mixins/rollup_stage.py (4)

49-52: The endpoints property always returns None.

The ellipsis (...) body causes implicit None return, violating the list[Endpoint] type annotation. For a mixin expecting the concrete class to provide this, raise NotImplementedError.

Proposed fix

     `@property`
     def endpoints(self) -> list["Endpoint"]:
         """Endpoint allocation topology."""
-        ...
+        raise NotImplementedError("Subclass must provide endpoints property")

129-149: Use getattr with callable check to satisfy type checker.

The hasattr checks are runtime-safe but the type checker can't verify that find_aiperf_results and parse_result_directory exist on BenchmarkParserProtocol.

Proposed fix using getattr

             if parser is not None:
                 # For mooncake-router, look for AIPerf results
-                if hasattr(parser, "find_aiperf_results"):
-                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                find_aiperf = getattr(parser, "find_aiperf_results", None)
+                if find_aiperf is not None:
+                    aiperf_files = find_aiperf(self.runtime.log_dir)
                     for aiperf_path in aiperf_files:
                         result = parser.parse_result_json(aiperf_path)
                         if result.get("output_tps") is not None:
                             results.append(result)
                             logger.debug("Loaded AIPerf result: %s", aiperf_path)

                 # For sa-bench style, look for result directories
-                if hasattr(parser, "parse_result_directory"):
+                parse_result_dir = getattr(parser, "parse_result_directory", None)
+                if parse_result_dir is not None:
                     for entry in self.runtime.log_dir.iterdir():
                         if not entry.is_dir():
                             continue
                         if "_isl_" in entry.name and "_osl_" in entry.name:
                             logger.debug("Found benchmark results directory: %s", entry.name)
-                            dir_results = parser.parse_result_directory(entry)
+                            dir_results = parse_result_dir(entry)
                             results.extend(dir_results)

475-480: Add fallback for sglang-v2 when backend is sglang.

The docstring at line 185 mentions fallback through parser versions, but the mapping doesn't include sglang-v2.

Proposed fix

         parser_orders = {
-            "sglang": ["sglang"],
+            "sglang": ["sglang", "sglang-v2"],
+            "sglang-v2": ["sglang-v2", "sglang"],
             "trtllm": ["trtllm"],
         }

559-563: Preserve output_tps when output_throughput is missing.

Unlike rollup_harness.py, this code doesn't fall back to the output_tps key. AIPerf results use output_tps directly, causing valid throughput data to be lost.

Proposed fix

             result = RollupResult(
                 concurrency=data.get("max_concurrency", 0),
-                output_tps=data.get("output_throughput", 0),
+                output_tps=(
+                    data.get("output_throughput")
+                    if data.get("output_throughput") is not None
+                    else data.get("output_tps", 0)
+                ),

🧹 Nitpick comments (9)

analysis/srtlog/parsers/benchmark/mooncake_router.py (2)
33-79: Regex-based parsing is a fallback; prefer structured JSON.

The parse method uses regex patterns which can be fragile. The parse_result_json method provides more robust parsing from AIPerf's structured output. Consider documenting that this method is intended as a fallback when JSON results aren't available.

92-94: Specify encoding when opening files.

Opening files without explicit encoding can cause issues on systems with different default encodings.
Proposed fix
-            with open(json_path) as f:
+            with open(json_path, encoding="utf-8") as f:
                 data = json.load(f)
analysis/srtlog/rollup_harness.py (3)
63-69: Specify encoding when opening YAML files.
Proposed fix
     try:
         import yaml
-        with open(config_path) as f:
+        with open(config_path, encoding="utf-8") as f:
             return yaml.safe_load(f)
291-299: Specify encoding when opening YAML config files.
Proposed fix
         if prefill_yaml.exists():
-            with open(prefill_yaml) as f:
+            with open(prefill_yaml, encoding="utf-8") as f:
                 env_config.prefill_engine_config = yaml.safe_load(f)
         if decode_yaml.exists():
-            with open(decode_yaml) as f:
+            with open(decode_yaml, encoding="utf-8") as f:
                 env_config.decode_engine_config = yaml.safe_load(f)
381-389: Specify encoding when writing JSON output.
Proposed fix
     rollup_path.parent.mkdir(parents=True, exist_ok=True)
-    with open(rollup_path, "w") as f:
+    with open(rollup_path, "w", encoding="utf-8") as f:
         json.dump(asdict(summary), f, indent=2, default=str)
analysis/srtlog/parsers/benchmark/sa_bench.py (1)
101-103: Specify encoding when opening JSON files.
Proposed fix
-            with open(json_path) as f:
+            with open(json_path, encoding="utf-8") as f:
                 data = json.load(f)
src/srtctl/cli/mixins/rollup_stage.py (2)
508-512: Use ternary operator for total_gpus calculation.

Per pipeline warning, the if-else block can be simplified.
Proposed fix
-        if is_disaggregated:
-            total_gpus = r.prefill_gpus + r.decode_gpus
-        else:
-            total_gpus = (r.agg_nodes or 1) * r.gpus_per_node
+        total_gpus = r.prefill_gpus + r.decode_gpus if is_disaggregated else (r.agg_nodes or 1) * r.gpus_per_node
597-612: Specify encoding when writing JSON output.
Proposed fix
         # Write with nice formatting
-        with open(path, "w") as f:
+        with open(path, "w", encoding="utf-8") as f:
             json.dump(data, f, indent=2, default=str)
analysis/srtlog/parsers/__init__.py (1)
244-247: Star imports are necessary for auto-registration but consider documenting.

The # noqa comments suppress linting, but a brief comment explaining why star imports are used here would aid maintainability.
Suggested documentation
-# Import parsers to trigger registration
+# Import parsers to trigger registration via decorators.
+# Star imports are intentional: each submodule uses `@register_`*_parser decorators
+# that populate _benchmark_parsers and _node_parsers when the module loads.
 from analysis.srtlog.parsers.benchmark import *  # noqa: E402, F401, F403
 from analysis.srtlog.parsers.nodes import *  # noqa: E402, F401, F403

coderabbitai · 2026-01-22T19:23:13Z

+            # Also extract per-user throughput if available
+            tps_per_user = self._get_metric(data, "output_token_throughput_per_user", "avg")
+            if tps_per_user:
+                result["output_tps_per_user"] = tps_per_user


⚠️ Potential issue | 🟡 Minor

Truthy check may exclude valid zero throughput.

if tps_per_user: will skip adding the field when the value is 0, which may be a valid measurement.

Proposed fix

tps_per_user = self._get_metric(data, "output_token_throughput_per_user", "avg") - if tps_per_user: + if tps_per_user is not None: result["output_tps_per_user"] = tps_per_user

🤖 Prompt for AI Agents

In `@analysis/srtlog/parsers/benchmark/mooncake_router.py` around lines 126 - 129, The truthy check on tps_per_user will drop legitimate zero values; update the conditional in the block that calls self._get_metric (variable tps_per_user) to check for None explicitly (e.g., "is not None") before assigning result["output_tps_per_user"] so that 0 is preserved while still filtering missing values returned as None.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@analysis/srtlog/rollup_harness.py`:
- Around line 276-303: The code mutates an EnvironmentConfig instance
(env_config) which will break once EnvironmentConfig is converted to a frozen
dataclass; instead, build a plain dict of constructor kwargs, populate keys from
config.get("backend", {}), load YAMLs from prefill_yaml and decode_yaml (using
logs_dir) into local variables like prefill_engine_config and
decode_engine_config, then construct a single immutable EnvironmentConfig(...)
with those kwargs and assign it to environment_config; reference
EnvironmentConfig, env_config (replace), environment_config, prefill_yaml,
decode_yaml and logs_dir when locating where to change the code.

♻️ Duplicate comments (5)

docs/architecture.md (3)

822-826: Add a language tag to this fenced block for MD040 compliance.

Markdownlint flags this block; tagging it as text is sufficient.

893-936: Add a language tag to the ER diagram fenced block.

Markdownlint MD040 is triggered; set the opening fence to ```text.

940-1025: Add a language tag to the "Detailed Entity Relationships" fenced block.

Markdownlint MD040 is triggered; set the opening fence to ```text.
analysis/srtlog/rollup_harness.py (2)
45-46: Avoid mixed-type sort keys in job directory ordering.

int and str keys can’t be compared in Python; mixed numeric/non‑numeric directory names will raise TypeError.
🛠️ Proposed fix
-    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))
356-358: Preserve explicit zero throughput values.

Using or treats valid 0 as missing and overrides it.
🛠️ Proposed fix
-            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            output_tps=(
+                data.get("output_throughput")
+                if data.get("output_throughput") is not None
+                else data.get("output_tps", 0)
+            ),

coderabbitai · 2026-01-22T21:58:07Z

+    # Parse environment config
+    environment_config = None
+    try:
+        import yaml
+        env_config = EnvironmentConfig()
+
+        backend_section = config.get("backend", {})
+        if "prefill_environment" in backend_section:
+            env_config.prefill_environment = backend_section["prefill_environment"]
+        if "decode_environment" in backend_section:
+            env_config.decode_environment = backend_section["decode_environment"]
+        if "aggregated_environment" in backend_section:
+            env_config.aggregated_environment = backend_section["aggregated_environment"]
+
+        # Load TRTLLM config files
+        prefill_yaml = logs_dir / "trtllm_config_prefill.yaml"
+        decode_yaml = logs_dir / "trtllm_config_decode.yaml"
+
+        if prefill_yaml.exists():
+            with open(prefill_yaml) as f:
+                env_config.prefill_engine_config = yaml.safe_load(f)
+        if decode_yaml.exists():
+            with open(decode_yaml) as f:
+                env_config.decode_engine_config = yaml.safe_load(f)
+
+        if any([env_config.prefill_environment, env_config.decode_environment, 
+                env_config.prefill_engine_config, env_config.decode_engine_config]):
+            environment_config = env_config


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find EnvironmentConfig definition rg -n "class EnvironmentConfig" --type py -C 5

Repository: ishandhanani/srt-slurm

Length of output: 886

🏁 Script executed:

#!/bin/bash # Look at the actual code in rollup_harness.py to see the context fd "rollup_harness.py" --type f | head -1 | xargs wc -l

Repository: ishandhanani/srt-slurm

Length of output: 103

🏁 Script executed:

#!/bin/bash # Get full EnvironmentConfig definition rg -n "class EnvironmentConfig" --type py -A 30 src/srtctl/cli/mixins/rollup/models.py

Repository: ishandhanani/srt-slurm

Length of output: 1441

🏁 Script executed:

#!/bin/bash # Check the actual code in rollup_harness.py at lines 276-303 sed -n '276,303p' analysis/srtlog/rollup_harness.py

Repository: ishandhanani/srt-slurm

Length of output: 1368

EnvironmentConfig must follow frozen dataclass guideline and use factory pattern for construction.

Per coding guidelines, EnvironmentConfig should use @dataclass(frozen=True) for immutability. The current mutation pattern will break once frozen is applied. Build the configuration via kwargs and construct once:

Fix

- env_config = EnvironmentConfig() + env_kwargs: dict[str, Any] = {} backend_section = config.get("backend", {}) if "prefill_environment" in backend_section: - env_config.prefill_environment = backend_section["prefill_environment"] + env_kwargs["prefill_environment"] = backend_section["prefill_environment"] if "decode_environment" in backend_section: - env_config.decode_environment = backend_section["decode_environment"] + env_kwargs["decode_environment"] = backend_section["decode_environment"] if "aggregated_environment" in backend_section: - env_config.aggregated_environment = backend_section["aggregated_environment"] + env_kwargs["aggregated_environment"] = backend_section["aggregated_environment"] # Load TRTLLM config files prefill_yaml = logs_dir / "trtllm_config_prefill.yaml" decode_yaml = logs_dir / "trtllm_config_decode.yaml" if prefill_yaml.exists(): with open(prefill_yaml) as f: - env_config.prefill_engine_config = yaml.safe_load(f) + env_kwargs["prefill_engine_config"] = yaml.safe_load(f) if decode_yaml.exists(): with open(decode_yaml) as f: - env_config.decode_engine_config = yaml.safe_load(f) + env_kwargs["decode_engine_config"] = yaml.safe_load(f) - if any([env_config.prefill_environment, env_config.decode_environment, - env_config.prefill_engine_config, env_config.decode_engine_config]): - environment_config = env_config + if env_kwargs: + environment_config = EnvironmentConfig(**env_kwargs)

🤖 Prompt for AI Agents

In `@analysis/srtlog/rollup_harness.py` around lines 276 - 303, The code mutates an EnvironmentConfig instance (env_config) which will break once EnvironmentConfig is converted to a frozen dataclass; instead, build a plain dict of constructor kwargs, populate keys from config.get("backend", {}), load YAMLs from prefill_yaml and decode_yaml (using logs_dir) into local variables like prefill_engine_config and decode_engine_config, then construct a single immutable EnvironmentConfig(...) with those kwargs and assign it to environment_config; reference EnvironmentConfig, env_config (replace), environment_config, prefill_yaml, decode_yaml and logs_dir when locating where to change the code.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@analysis/scripts/upload_rollup.json`:
- Around line 1-2: The file is misnamed with a .json extension while containing
a Python script; rename upload_rollup.json to upload_rollups.py, preserve the
shebang (#!/usr/bin/env python3) and module docstring, and update any references
(scripts, CI configs, imports, README, or tooling) that point to
upload_rollup.json to the new filename upload_rollups.py so execution, linting,
and IDE tooling work correctly.
- Around line 170-182: The code is duplicating launch command data:
extract_launch_commands(data) and the loop populates top_level with keys like
"launch_commands.{worker_type}" (see extract_launch_commands and top_level
usage), but later output.update(launch_commands) re-inserts the raw lists under
plain worker_type keys; remove the output.update(launch_commands) call so only
the intended "launch_commands.{worker_type}" string entries are used (or if you
intended a nested dict, replace that line with output["launch_commands"] =
launch_commands); update tests/consumers that expect only the string keys
accordingly.

🧹 Nitpick comments (3)

analysis/scripts/upload_rollup.json (3)

78-81: Consider logging invalid concurrency values instead of silently ignoring them.

Silently swallowing ValueError can hide data quality issues. Consider logging a warning when a concurrency part cannot be parsed.

♻️ Suggested improvement

+import logging
+
+logger = logging.getLogger(__name__)
+
 def parse_concurrencies(concurrencies_str: str) -> list[int]:
     ...
             try:
                 result.append(int(part))
             except ValueError:
-                pass
+                logger.warning(f"Invalid concurrency value ignored: '{part}'")
     return result

218-234: File read twice unnecessarily.

The JSON file is read at line 219-220 to extract metadata, then read again at line 234 for upload. Consider reading once and reusing.

♻️ Proposed optimization

     try:
-        with open(json_path) as f:
-            data = json.load(f)
+        json_content = json_path.read_text()
+        data = json.loads(json_content)
     except Exception as e:
         return False, f"Failed to read JSON: {e}"
 
     # Extract metadata
     benchmark_type = data.get("benchmark_type", "unknown")
     frontend_type = data.get("frontend_type", "unknown")
     backend_type = data.get("backend_type", "unknown")
     job_name = data.get("job_name", "unknown")
 
     # Use provided session_id or fall back to job_name
     effective_session_id = session_id if session_id else job_name
 
-    # Read the file content for upload
-    json_content = json_path.read_text()
-
     try:
         response = requests.post(

367-372: Consider logging cleanup failures instead of silently ignoring.

The bare except Exception: pass hides all errors during file cleanup. While cleanup failures may not be critical, logging them could help diagnose permission issues or other problems.

♻️ Suggested improvement

     if not args.keep_files and not args.dry_run:
         for fp in generated_files:
             try:
                 fp.unlink()
-            except Exception:
-                pass
+            except Exception as e:
+                print(f"Warning: Failed to clean up {fp}: {e}", file=sys.stderr)
         print(f"Cleaned up {len(generated_files)} generated files")

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@analysis/scripts/upload_rollup.py`:
- Around line 85-110: The docstring for extract_launch_commands is inaccurate:
it claims keys like "worker-0.launch_command.raw_command" but the function
actually returns a defaultdict (node_type_to_launch_command) mapping worker_type
strings (e.g., "prefill", "decode") to lists of prefixed raw command strings;
update the function docstring to describe that it reads
data["nodes_summary"]["nodes"], prefixes each raw_command with
"#{worker_type}:{node_name}\\n", and returns a dict-like mapping of worker_type
-> list[str] (or defaultdict(list)) containing those command strings, and
mention default values used for missing node_name, worker_type, or raw_command.

♻️ Duplicate comments (1)

analysis/scripts/upload_rollup.py (1)
177-187: Bug: launch commands added twice in conflicting formats.

At line 173, launch commands are added as joined strings with keys like launch_commands.prefill. Then at line 182, output.update(launch_commands) adds the raw lists with plain worker_type keys (e.g., prefill: ["cmd1", "cmd2"]).

This results in duplicate data in conflicting formats. Line 182 should be removed.
🐛 Proposed fix
     for concurrency in concurrencies:
         output = dict(top_level)
         output["concurrency"] = concurrency
 
-        # Add launch commands
-        output.update(launch_commands)
-
         # Add result for this concurrency with "result_" prefix
         result = result_by_concurrency.get(concurrency, {})

🧹 Nitpick comments (2)

analysis/scripts/upload_rollup.py (2)

75-82: Silent suppression of parse errors may mask malformed input.

When parsing fails (e.g., "32x64xabc"), the invalid part is silently dropped. Consider logging a warning to aid debugging.

♻️ Suggested improvement

+import logging
+
+logger = logging.getLogger(__name__)
+
 def parse_concurrencies(concurrencies_str: str) -> list[int]:
     ...
     for part in parts:
         part = part.strip()
         if part:
             try:
                 result.append(int(part))
             except ValueError:
-                pass
+                logger.warning(f"Skipping non-integer concurrency value: {part!r}")
     return result

218-234: File is read twice; consider consolidating.

The file is opened at line 219 for JSON parsing, then read again at line 234 for upload content. This can be combined.

♻️ Suggested improvement

-    try:
-        with open(json_path) as f:
-            data = json.load(f)
-    except Exception as e:
-        return False, f"Failed to read JSON: {e}"
-
-    # Extract metadata
-    ...
-
-    # Read the file content for upload
-    json_content = json_path.read_text()
+    try:
+        json_content = json_path.read_text()
+        data = json.loads(json_content)
+    except Exception as e:
+        return False, f"Failed to read JSON: {e}"
+
+    # Extract metadata
+    ...

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@analysis/scripts/upload_rollup.py`:
- Around line 131-138: The code builds a contextualized string in
raw_command_str (including worker_type, node_name, worker_env_config,
engine_config and raw_command) but mistakenly appends raw_command to
node_type_to_launch_command[worker_type]; change the append to add
raw_command_str instead so the list stores the formatted command with its
metadata (update the append call on node_type_to_launch_command in the block
that constructs raw_command_str).
- Around line 191-205: The sbatch_script block uses raw_cmd which may never be
set if benchmark_cmd is falsy; initialize or derive raw_cmd before it’s
referenced (e.g., set raw_cmd = benchmark_cmd.get("raw_command", "") right after
obtaining benchmark_cmd) and then use that safe value in the top_level.update
call (the code that writes "benchmark_command": f"#sbatch_script\n\n{raw_cmd}").
Ensure raw_cmd is defined (or fallback to an empty string) so top_level.update
and the sbatch_script handling cannot raise a NameError.

♻️ Duplicate comments (3)

analysis/srtlog/rollup_harness.py (3)

31-47: Fix mixed-type sort keys to avoid TypeError.

When both numeric and non-numeric directory names exist, the sort key mixes int and str types, causing a TypeError in Python 3.

🛠️ Proposed fix

-    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))

276-303: EnvironmentConfig mutation will break once frozen dataclass guideline is applied.

Per coding guidelines, configuration objects should use @dataclass(frozen=True). The current mutation pattern will fail. Build kwargs dict and construct once.

🛠️ Proposed fix

-        env_config = EnvironmentConfig()
+        env_kwargs: dict[str, Any] = {}
         
         backend_section = config.get("backend", {})
         if "prefill_environment" in backend_section:
-            env_config.prefill_environment = backend_section["prefill_environment"]
+            env_kwargs["prefill_environment"] = backend_section["prefill_environment"]
         if "decode_environment" in backend_section:
-            env_config.decode_environment = backend_section["decode_environment"]
+            env_kwargs["decode_environment"] = backend_section["decode_environment"]
         if "aggregated_environment" in backend_section:
-            env_config.aggregated_environment = backend_section["aggregated_environment"]
+            env_kwargs["aggregated_environment"] = backend_section["aggregated_environment"]
             
         # Load TRTLLM config files
         prefill_yaml = logs_dir / "trtllm_config_prefill.yaml"
         decode_yaml = logs_dir / "trtllm_config_decode.yaml"
         
         if prefill_yaml.exists():
             with open(prefill_yaml) as f:
-                env_config.prefill_engine_config = yaml.safe_load(f)
+                env_kwargs["prefill_engine_config"] = yaml.safe_load(f)
         if decode_yaml.exists():
             with open(decode_yaml) as f:
-                env_config.decode_engine_config = yaml.safe_load(f)
+                env_kwargs["decode_engine_config"] = yaml.safe_load(f)
                 
-        if any([env_config.prefill_environment, env_config.decode_environment, 
-                env_config.prefill_engine_config, env_config.decode_engine_config]):
-            environment_config = env_config
+        if env_kwargs:
+            environment_config = EnvironmentConfig(**env_kwargs)

Based on learnings, frozen dataclasses should be used for configuration objects.

354-358: Avoid treating 0 throughput as "missing."

The or operator will override valid 0 values, potentially skewing rollup data.

🛠️ Proposed fix

-            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            output_tps=(
+                data.get("output_throughput")
+                if data.get("output_throughput") is not None
+                else data.get("output_tps", 0)
+            ),

🧹 Nitpick comments (1)

analysis/scripts/upload_rollup.py (1)
484-490: Consider logging cleanup failures instead of silently ignoring them.

The bare except Exception: pass hides potential filesystem issues. While not critical, logging at debug level would aid troubleshooting.
♻️ Proposed improvement
     if not args.keep_files and not args.dry_run:
         for fp in generated_files:
             try:
                 fp.unlink()
-            except Exception:
-                pass
+            except Exception as e:
+                print(f"  Warning: Could not delete {fp}: {e}", file=sys.stderr)
         print(f"\nCleaned up {len(generated_files)} generated files")

coderabbitai · 2026-01-23T00:39:40Z

+        raw_command_str = f"#{worker_type}:{node_name}"
+        if worker_env_config:
+            raw_command_str += f"\n#env_vars:{worker_env_config}"
+        if engine_config:
+            raw_command_str += f"\n#engine_config:{engine_config}"
+        raw_command_str += f"\n{raw_command}"
+
+        node_type_to_launch_command[worker_type].append(raw_command)


⚠️ Potential issue | 🔴 Critical

Bug: raw_command_str is built but raw_command is appended instead.

The function constructs raw_command_str with environment and engine config context (lines 131-136), but then appends the original raw_command to the list, discarding all the formatted context.

🐛 Proposed fix

raw_command_str = f"#{worker_type}:{node_name}" if worker_env_config: raw_command_str += f"\n#env_vars:{worker_env_config}" if engine_config: raw_command_str += f"\n#engine_config:{engine_config}" raw_command_str += f"\n{raw_command}" - node_type_to_launch_command[worker_type].append(raw_command) + node_type_to_launch_command[worker_type].append(raw_command_str)

🤖 Prompt for AI Agents

In `@analysis/scripts/upload_rollup.py` around lines 131 - 138, The code builds a contextualized string in raw_command_str (including worker_type, node_name, worker_env_config, engine_config and raw_command) but mistakenly appends raw_command to node_type_to_launch_command[worker_type]; change the append to add raw_command_str instead so the list stores the formatted command with its metadata (update the append call on node_type_to_launch_command in the block that constructs raw_command_str).

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@analysis/scripts/upload_rollup.py`:
- Around line 411-414: The metadata assignment currently hardcodes
df["experiment_id"] = 'STUDY' which is a placeholder; replace it so
experiment_id is set from the proper source (e.g., use args.study_id or the
correct experiment identifier variable) so that df["experiment_id"] and
df["study_id"] are populated with meaningful values; update the block that sets
df["experiment_id"], df["study_id"], and df["user_login"] in upload_rollup.py
accordingly (use args.study_id if experiment_id should match study_id, or the
correct experiment id variable if different).

♻️ Duplicate comments (2)

analysis/scripts/upload_rollup.py (2)
131-138: Bug: raw_command_str is built but raw_command is appended instead.

The function constructs raw_command_str with worker type, node name, environment config, and engine config context (lines 131-136), but then appends the original raw_command to the list (line 138), discarding all the formatted context.
🐛 Proposed fix
         raw_command_str = f"#{worker_type}:{node_name}"
         if worker_env_config:
             raw_command_str += f"\n#env_vars:{worker_env_config}"
         if engine_config:
             raw_command_str += f"\n#engine_config:{engine_config}"
         raw_command_str += f"\n{raw_command}"

-        node_type_to_launch_command[worker_type].append(raw_command)
+        node_type_to_launch_command[worker_type].append(raw_command_str)
191-205: Potential NameError: raw_cmd may be undefined when sbatch_script exists.

If benchmark_cmd is falsy (empty dict, None, or missing), raw_cmd is never defined. However, line 203 uses raw_cmd unconditionally when sbatch_script exists, causing a NameError.
🐛 Proposed fix
     # Extract benchmark_command raw_command only
     benchmark_cmd = data.get("benchmark_command", {})
+    raw_cmd = None
     if benchmark_cmd:
         raw_cmd = benchmark_cmd.get("raw_command")
-        
+
         if raw_cmd:
             top_level["benchmark_command"] = raw_cmd
-    
+
     sbatch_script = data.get("sbatch_script")
-    if sbatch_script:
+    if sbatch_script and raw_cmd:
         top_level.update(
             {
                 "benchmark_command": f"#sbatch_script\n\n{raw_cmd}",
             }
         )

🧹 Nitpick comments (2)

analysis/scripts/upload_rollup.py (2)

302-305: Consider narrowing the exception handler.

Bare except Exception could mask important errors like permission issues or encoding problems. Consider catching specific exceptions.

♻️ Suggested improvement

     if sbatch_path.exists():
         try:
             return sbatch_path.read_text()
-        except Exception:
+        except (OSError, UnicodeDecodeError) as e:
+            # Log or handle specific failure cases if needed
             return None

448-451: Filename sanitization only handles / character.

The sanitization replaces / with _, but other problematic characters (spaces, \, :, *, ?, ", <, >, |) could cause issues on some filesystems.

♻️ Suggested improvement

+import re
+
+def _sanitize_filename_part(s: str) -> str:
+    """Sanitize a string for use in filenames."""
+    return re.sub(r'[\\/:*?"<>|\s]+', '_', str(s))
+
         # Create group Parquet filename
-        safe_backend = str(backend).replace("/", "_")
-        safe_frontend = str(frontend).replace("/", "_")
-        safe_benchmarker = str(benchmarker).replace("/", "_")
+        safe_backend = _sanitize_filename_part(backend)
+        safe_frontend = _sanitize_filename_part(frontend)
+        safe_benchmarker = _sanitize_filename_part(benchmarker)

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@analysis/scripts/upload_rollup.py`:
- Around line 2-18: The module docstring and CLI usage claim the script writes
Parquet but the implementation actually writes .json files and posts gzipped
JSON; update the module-level docstring and the CLI/help text to accurately
describe the current behavior (or, if you want Parquet, change the
implementation to serialize to Parquet instead). Specifically, edit the module
docstring and the usage/help string to state “gzipped JSON” (or add a --mode
flag to choose between parquet/json), and ensure the upload routine (the
function that writes .json and posts gzipped JSON) and any filenames/extensions
match the documented mode so docs and implementation are consistent.
- Around line 174-194: The type annotation for the defaultdict named groups is
incorrect: it declares a 3-tuple key (defaultdict[tuple[str, str, str], ...])
but the code builds a 4-tuple key as group = (data['benchmark_type'],
data['frontend_type'], data['backend_type'], mode); update the annotation at the
groups declaration to use a 4-tuple (tuple[str, str, str, str]) so the type
matches the actual key shape used by the loop that processes rollup_path and
constructs group.
- Around line 33-57: The CLI/handler uses the name study_id but upload_json
currently accepts and posts session_id, causing a payload name mismatch; fix by
aligning names: change upload_json's parameter from session_id to study_id (and
update its docstring), change the JSON payload key from "session_id" to
"study_id", and update every call site that passes the CLI value (the CLI
variable study_id and any callers of upload_json) to use the renamed parameter
so the sent payload matches the server contract.
- Around line 192-200: The code computes mode using the wrong field and directly
indexes required keys which can raise KeyError; change the mode computation to
use data.get("is_disaggregated") (invert logic accordingly) and add defensive
validation before accessing data['benchmark_type'], data['frontend_type'],
data['backend_type'], and data['job_name'] (used later around
read_sbatch_script/ groups[group].append). Follow the function's existing
error-handling pattern: check for missing keys, log an error identifying the
missing field(s) and the rollup_path or job_name, and skip/continue that rollup
instead of indexing into data; ensure read_sbatch_script(rollup_path) usage
remains unchanged.

🧹 Nitpick comments (1)

analysis/scripts/upload_rollup.py (1)
118-118: Add an explicit return type for main.

Minor type-hinting polish: declare main() -> None for consistency and to satisfy the “type hints everywhere” guideline. As per coding guidelines.
🧩 Suggested tweak
-def main():
+def main() -> None:

coderabbitai · 2026-01-23T20:28:07Z

+"""Upload rollup.json files to SRT endpoint as Parquet files grouped by backend/frontend/benchmarker.
+
+Process:
+1. Find all rollup.json files in the directory
+2. Create flattened JSON for each concurrency level
+3. Combine all into a single Parquet file
+4. Group by (backend_type, frontend_type, benchmark_type)
+5. Upload separate Parquet file for each group
+
+Usage:
+    python upload_rollup.py <directory> <user_login> [--study-id STUDY] [--endpoint URL]
+
+Example:
+    python upload_rollup.py ./outputs user@example.com
+    python upload_rollup.py ./outputs user@example.com --study-id my-study
+    python upload_rollup.py ./outputs user@example.com --endpoint that accepts the upload
+"""


⚠️ Potential issue | 🟡 Minor

Docs mention Parquet but the script uploads gzipped JSON.

The module docstring and CLI description say “Parquet,” while the implementation writes .json files and posts gzipped JSON. Please align the wording (and include mode if intentional) to avoid confusion.

✏️ Suggested text update

-"""Upload rollup.json files to SRT endpoint as Parquet files grouped by backend/frontend/benchmarker. +"""Upload rollup.json files to SRT endpoint as gzipped JSON files grouped by backend/frontend/benchmarker/mode. Process: 1. Find all rollup.json files in the directory -2. Create flattened JSON for each concurrency level -3. Combine all into a single Parquet file -4. Group by (backend_type, frontend_type, benchmark_type) -5. Upload separate Parquet file for each group +2. Group by (benchmark_type, frontend_type, backend_type, mode) +3. Write a JSON file per group +4. Upload a gzipped JSON file for each group ... - description="Upload rollup.json files to SRT endpoint as Parquet (grouped by backend/frontend/benchmarker)" + description="Upload rollup.json files to SRT endpoint as gzipped JSON (grouped by backend/frontend/benchmarker/mode)"

Also applies to: 118-121

🤖 Prompt for AI Agents

In `@analysis/scripts/upload_rollup.py` around lines 2 - 18, The module docstring and CLI usage claim the script writes Parquet but the implementation actually writes .json files and posts gzipped JSON; update the module-level docstring and the CLI/help text to accurately describe the current behavior (or, if you want Parquet, change the implementation to serialize to Parquet instead). Specifically, edit the module docstring and the usage/help string to state “gzipped JSON” (or add a --mode flag to choose between parquet/json), and ensure the upload routine (the function that writes .json and posts gzipped JSON) and any filenames/extensions match the documented mode so docs and implementation are consistent.

coderabbitai · 2026-01-23T20:28:07Z

+def upload_json(
+    json_path: Path,
+    user_login: str,
+    session_id: str,
+    endpoint: str,
+    backend: str,
+    benchmarker: str,
+    frontend: str,
+    mode: str,
+) -> tuple[bool, str]:
+    """Upload a gzipped JSON file to the endpoint.
+
+    Args:
+        json_path: Path to the JSON file
+        user_login: User login/email
+        session_id: Session ID for the upload
+        endpoint: API endpoint URL
+        backend: Backend type
+        benchmarker: Benchmark type
+        frontend: Frontend type
+        mode: Mode (disaggregated or aggregated)
+
+    Returns:
+        Tuple of (success, message)
+    """


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "study_id|session_id" -S --type=py | head -100

Repository: ishandhanani/srt-slurm

Length of output: 846

🏁 Script executed:

# Get more context around the upload_json function to see what's actually posted cat -n analysis/scripts/upload_rollup.py | sed -n '33,100p'

Repository: ishandhanani/srt-slurm

Length of output: 2535

🏁 Script executed:

# Check for CLI parameter definition fd -e py -x rg -l "study.id|study-id" {}

Repository: ishandhanani/srt-slurm

Length of output: 101

🏁 Script executed:

cat -n analysis/scripts/upload_rollup.py | sed -n '230,250p'

Repository: ishandhanani/srt-slurm

Length of output: 758

🏁 Script executed:

# Search for any API documentation or comments about expected field names rg -n -B2 -A2 "study_id|session_id" analysis/scripts/upload_rollup.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 1261

🏁 Script executed:

# Check if there's any server-side code or documentation fd -e py -e md | xargs rg -l "study_id.*session_id|session_id.*study_id" 2>/dev/null || echo "No files mention both"

Repository: ishandhanani/srt-slurm

Length of output: 87

Confirm naming mismatch between CLI and server payload: study_id vs. session_id.

The CLI captures study_id (line 234), but upload_json() receives it as session_id (line 36) and posts it as "session_id" in the payload (line 70). If the server API expects study_id, uploads will be rejected or misattributed. Verify the server contract and align parameter names consistently.

🐛 Possible alignment (if the API expects `study_id`)

-def upload_json( - json_path: Path, - user_login: str, - session_id: str, - endpoint: str, - backend: str, - benchmarker: str, - frontend: str, - mode: str, -) -> tuple[bool, str]: +def upload_json( + json_path: Path, + user_login: str, + study_id: str, + endpoint: str, + backend: str, + benchmarker: str, + frontend: str, + mode: str, +) -> tuple[bool, str]: @@ - session_id: Session ID for the upload + study_id: Study ID for the upload @@ - "session_id": session_id, + "study_id": study_id,

Also applies to: 68–74, 240–244

🤖 Prompt for AI Agents

In `@analysis/scripts/upload_rollup.py` around lines 33 - 57, The CLI/handler uses the name study_id but upload_json currently accepts and posts session_id, causing a payload name mismatch; fix by aligning names: change upload_json's parameter from session_id to study_id (and update its docstring), change the JSON payload key from "session_id" to "study_id", and update every call site that passes the CLI value (the CLI variable study_id and any callers of upload_json) to use the renamed parameter so the sent payload matches the server contract.

coderabbitai · 2026-01-23T20:28:07Z

+    groups: defaultdict[tuple[str, str, str], list[dict]] = defaultdict(list)
+
+    for rollup_path in sorted(rollup_files):
+        print(f"Processing: {rollup_path}")
+
+        try:
+            with open(rollup_path) as f:
+                data = json.load(f)
+        except Exception as e:
+            print(f"  ✗ Failed to read: {e}")
+            failed_count += 1
+            continue
+
+        # Skip if no nodes_summary (job likely failed/cancelled)
+        if not data.get("nodes_summary"):
+            print("  ⚠ Skipping: no nodes_summary (job may have failed)")
+            continue
+
+        mode = "aggregated" if data.get("is_aggregated") else "disaggregated"
+        group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode)
+        # Read sbatch script for this job


⚠️ Potential issue | 🟠 Major

groups key annotation is a 3‑tuple but the key is a 4‑tuple.

Line 174 declares defaultdict[tuple[str, str, str], ...], yet Line 193 constructs (benchmark_type, frontend_type, backend_type, mode). This will fail ty and mislead readers. Update the annotation to a 4‑tuple. As per coding guidelines, keep type annotations accurate for ty.

✅ Fix type annotation

-groups: defaultdict[tuple[str, str, str], list[dict]] = defaultdict(list) +groups: defaultdict[tuple[str, str, str, str], list[dict]] = defaultdict(list)

🤖 Prompt for AI Agents

In `@analysis/scripts/upload_rollup.py` around lines 174 - 194, The type annotation for the defaultdict named groups is incorrect: it declares a 3-tuple key (defaultdict[tuple[str, str, str], ...]) but the code builds a 4-tuple key as group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode); update the annotation at the groups declaration to use a 4-tuple (tuple[str, str, str, str]) so the type matches the actual key shape used by the loop that processes rollup_path and constructs group.

coderabbitai · 2026-01-23T20:28:07Z

+        mode = "aggregated" if data.get("is_aggregated") else "disaggregated"
+        group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode)
+        # Read sbatch script for this job
+        sbatch_script = read_sbatch_script(rollup_path)
+
+        # Add sbatch script to each row
+        if sbatch_script:
+            data["sbatch_script"] = sbatch_script
+        groups[group].append(data)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the file and get its context find . -name "upload_rollup.py" -type f

Repository: ishandhanani/srt-slurm

Length of output: 101

🏁 Script executed:

# Get the full context around the lines in question (lines 180-250) sed -n '180,250p' analysis/scripts/upload_rollup.py | cat -n

Repository: ishandhanani/srt-slurm

Length of output: 2759

🏁 Script executed:

# Search for rollup schema definitions or sample data fd -e json -e py | xargs rg -l "rollup|is_aggregated|is_disaggregated|benchmark_type|frontend_type|backend_type" | head -20

Repository: ishandhanani/srt-slurm

Length of output: 690

🏁 Script executed:

# Check the rollup models definition head -100 src/srtctl/cli/mixins/rollup/models.py

Repository: ishandhanani/srt-slurm

Length of output: 3072

🏁 Script executed:

# Check test data for rollup grep -A 20 "is_aggregated\|is_disaggregated\|benchmark_type" tests/test_rollup.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 1830

🏁 Script executed:

# Check the rollup harness grep -B 5 -A 10 "is_aggregated\|is_disaggregated" src/srtctl/cli/mixins/rollup_stage.py analysis/srtlog/rollup_harness.py

Repository: ishandhanani/srt-slurm

Length of output: 8362

🏁 Script executed:

# Check the schema.py for field definitions grep -B 2 -A 5 "benchmark_type\|frontend_type\|backend_type\|is_aggregated\|job_name" src/srtctl/core/schema.py

Repository: ishandhanani/srt-slurm

Length of output: 849

🏁 Script executed:

# Check the RollupSummary dataclass to see all expected fields grep -A 100 "^class RollupSummary" src/srtctl/cli/mixins/rollup/models.py | head -150

Repository: ishandhanani/srt-slurm

Length of output: 2757

🏁 Script executed:

# Verify the field usage in upload_rollup.py more broadly cat -n analysis/scripts/upload_rollup.py | grep -B 5 -A 5 "is_aggregated\|benchmark_type\|frontend_type\|backend_type\|job_name" | head -60

Repository: ishandhanani/srt-slurm

Length of output: 2180

Fix field name and add defensive validation for missing rollup fields.

Line 192 checks is_aggregated but the schema defines is_disaggregated, causing mode to always be calculated as "disaggregated". Lines 193 and 236 directly index required fields without checks—missing benchmark_type, frontend_type, backend_type, or job_name will raise KeyError and abort processing. Add validation to match the existing error-handling pattern in this function:

🛡️ Defensive checks

- mode = "aggregated" if data.get("is_aggregated") else "disaggregated" - group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode) + mode = "aggregated" if data.get("is_disaggregated") else "disaggregated" + try: + benchmark_type = data["benchmark_type"] + frontend_type = data["frontend_type"] + backend_type = data["backend_type"] + except KeyError as exc: + print(f" ✗ Skipping: missing {exc} in rollup.json") + failed_count += 1 + continue + group = (benchmark_type, frontend_type, backend_type, mode) @@ - study_id = rows[0]["job_name"] + study_id = rows[0].get("job_name") + if not study_id: + print(" ✗ Missing job_name; skipping upload") + upload_failed_count += 1 + continue

Also applies to: 232-236

🤖 Prompt for AI Agents

In `@analysis/scripts/upload_rollup.py` around lines 192 - 200, The code computes mode using the wrong field and directly indexes required keys which can raise KeyError; change the mode computation to use data.get("is_disaggregated") (invert logic accordingly) and add defensive validation before accessing data['benchmark_type'], data['frontend_type'], data['backend_type'], and data['job_name'] (used later around read_sbatch_script/ groups[group].append). Follow the function's existing error-handling pattern: check for missing keys, log an error identifying the missing field(s) and the rollup_path or job_name, and skip/continue that rollup instead of indexing into data; ensure read_sbatch_script(rollup_path) usage remains unchanged.

KaunilD · 2026-01-26T07:42:36Z

+    @property
+    def endpoints(self) -> list[Endpoint]:
+        """Endpoint allocation topology."""
+        ...


raise NotImplementedError/NotApplicable

rollup stage

5431a6b

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

KaunilD added 3 commits January 22, 2026 09:13

added pandas

8b93d9b

removed diss

5d0fa61

tainted lines with CMD and added the harness

af32e12

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

harness support for launch command tags

349e95e

ishandhanani reviewed Jan 22, 2026

View reviewed changes

better arg parsing design

6810807

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

arch, ER diagrams

938c732

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

propogarted teh extra args to benchamrker. updated docs and tainted d…

31d18b2

…ata models with source files

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

KaunilD added 2 commits January 22, 2026 13:52

architeure update

e2f7234

upload rollup

c1288f2

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

Comment thread analysis/scripts/upload_rollup.py Outdated

Comment thread analysis/scripts/upload_rollup.py Outdated

rename scripot

4f6d334

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

Comment thread analysis/scripts/upload_rollup.py Outdated

updated rollup harness and the upload script

16ff0d9

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

upload rollup

f2fc887

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

Comment thread analysis/scripts/upload_rollup.py Outdated

offloaded ugliness to another stage

9f7f9f9

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

KaunilD requested a review from ishandhanani January 24, 2026 17:50

KaunilD commented Jan 26, 2026

View reviewed changes

ishandhanani closed this Jan 27, 2026

Conversation

KaunilD commented Jan 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Features

1. Rollup Stage Pipeline

2. Modular Parser System

3. Node Metrics Extraction

4. Environment Config Collection

5. Output: rollup.json

Testing

Supported Backends/Benchmarks

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

ishandhanani Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

ishandhanani Jan 22, 2026

Choose a reason for hiding this comment

KaunilD commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

5. Output: `rollup.json`

coderabbitai bot commented Jan 22, 2026 •

edited

Loading