Skip to content

Add Rollup Stage for Experiment Data Consolidation#92

Closed
KaunilD wants to merge 14 commits intomainfrom
kdhruv/rollup
Closed

Add Rollup Stage for Experiment Data Consolidation#92
KaunilD wants to merge 14 commits intomainfrom
kdhruv/rollup

Conversation

@KaunilD
Copy link
Copy Markdown
Contributor

@KaunilD KaunilD commented Jan 22, 2026

Summary

This MR introduces a new Rollup Stage to the SweepOrchestrator that consolidates all experiment data (benchmark results, node metrics, environment configuration) into a single rollup.json file after benchmark completion. It also adds a modular parser system for extensible parsing of different benchmark types and backend log formats.

Changes

New Files (15 files, ~4700+ lines added):

File Description
src/srtctl/cli/mixins/rollup_stage.py Main rollup logic with dataclasses (RollupSummary, NodeRollup, NodesSummary, EnvironmentConfig)
analysis/srtlog/parsers/__init__.py Parser registry, protocols (BenchmarkParserProtocol, NodeParserProtocol), and dataclasses
analysis/srtlog/parsers/benchmark/sa_bench.py SA-Bench result parser with parse_launch_command
analysis/srtlog/parsers/benchmark/mooncake_router.py Mooncake router result parser
analysis/srtlog/parsers/nodes/sglang.py SGLang log parser (DP/TP/EP tag format)
analysis/srtlog/parsers/nodes/sglang_v2.py SGLang v2 log parser (timestamp-only format)
analysis/srtlog/parsers/nodes/trtllm.py TRTLLM log parser (iteration logs, memory metrics)
tests/test_rollup.py Comprehensive tests for rollup functionality
tests/test_configs.py Output directory structure tests

Modified Files:

File Change
src/srtctl/cli/do_sweep.py Integrates RollupStageMixin, calls run_rollup() after benchmark
src/srtctl/cli/mixins/__init__.py Exports RollupStageMixin
docs/architecture.md Documents rollup architecture, parser system, output format
src/srtctl/README.md Adds rollup stage documentation

Features

1. Rollup Stage Pipeline

Collects benchmark results, worker logs, config.yaml, and engine configs → outputs rollup.json

2. Modular Parser System

  • Register parsers via @register_benchmark_parser and @register_node_parser decorators
  • Access parsers via get_benchmark_parser("sa-bench") and get_node_parser("trtllm")

3. Node Metrics Extraction

  • Batch metrics: iteration counts, throughput, running requests
  • Memory metrics: usage, KV cache, available memory
  • Config extraction: TP/PP/EP sizes, batch sizes, sequence lengths

4. Environment Config Collection

  • prefill_environment, decode_environment, aggregated_environment
  • Engine configs from YAML files or inline config

5. Output: rollup.json

Contains job metadata, benchmark results array, nodes summary with per-node metrics, and environment/engine configuration.

Testing

  • 18 new tests in test_rollup.py covering all dataclasses and mixin methods
  • 7 new tests in test_configs.py for output directory structure
  • All tests passing ✅

Supported Backends/Benchmarks

Type Parsers
Benchmark sa-bench, mooncake-router
Node logs sglang, sglang-v2, trtllm

Summary by CodeRabbit

  • New Features

    • Rollup stage & CLI mixin to generate structured rollup.json summaries and a rollup harness for batch processing
    • Pluggable benchmark/node parsing system with new benchmark parsers (SA‑Bench, Mooncake Router) and node parsers (SGLang, TRTLLM)
    • Upload tool to aggregate rollup.json into Parquet for downstream analysis
  • Documentation

    • Architecture docs updated to include Rollup stage, mixin and extension points
  • Tests

    • Comprehensive tests for rollup logic and output directory behavior
  • Chores

    • Standardized benchmark script command construction; added pandas dependency

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Adds a pluggable benchmark/node parsing framework, two benchmark parsers and two node parsers, rollup dataclasses and a RollupStageMixin (plus CLI integration), a standalone rollup harness and upload tool, script invocation refactors, tests, and a pandas dependency.

Changes

Cohort / File(s) Summary
Parsing framework core
analysis/srtlog/parsers/__init__.py
New pluggable parser API: dataclasses (BenchmarkLaunchCommand, NodeLaunchCommand), protocols (BenchmarkParserProtocol, NodeParserProtocol), registries, register/get decorators, listing helpers, and auto-imports to register concrete parsers.
Benchmark parsers
analysis/srtlog/parsers/benchmark/__init__.py, analysis/srtlog/parsers/benchmark/mooncake_router.py, analysis/srtlog/parsers/benchmark/sa_bench.py
New SABenchParser and MooncakeRouterParser: registered types, text & JSON parsing, result-directory helpers, launch-command extraction, and metric normalization.
Node parsers
analysis/srtlog/parsers/nodes/__init__.py, analysis/srtlog/parsers/nodes/sglang.py, analysis/srtlog/parsers/nodes/trtllm.py
New SGLangNodeParser and TRTLLMNodeParser: per-file parsing for batch/memory metrics, filename node extraction, iteration parsing, engine/config extraction, and node launch-command parsing; registered by backend.
Rollup datamodels
src/srtctl/cli/mixins/rollup/__init__.py, src/srtctl/cli/mixins/rollup/models.py
New rollup dataclasses: RollupSummary, RollupResult, NodeRollup, NodesSummary, EnvironmentConfig, LaunchCommandRollup with conversion/aggregation helpers and compute_summary_stats.
Rollup orchestration mixin
src/srtctl/cli/mixins/rollup_stage.py, src/srtctl/cli/do_sweep.py, src/srtctl/cli/mixins/__init__.py
Added RollupStageMixin, integrated into SweepOrchestrator; run_rollup collects benchmark/node metrics, environment/config, launch commands, builds and writes rollup.json.
Standalone rollup & upload
analysis/srtlog/rollup_harness.py, analysis/scripts/upload_rollup.py
New rollup harness to discover job folders and produce rollup.json; upload tool flattens rollups into Parquet, groups by backend/frontend/benchmarker, and POSTs grouped files.
Benchmark script invocations
src/srtctl/benchmarks/scripts/.../bench.sh (gpqa, longbenchv2, mmlu, mooncake-router, profiling, router, sa-bench)
Replaced multi-line direct Python invocations with single constructed command strings echoed and executed via eval; added defensive result-file handling and logging of [CMD].
Docs & config
docs/architecture.md, src/srtctl/README.md, pyproject.toml
Docs updated to include RollupStage and trtllm backend; pandas>=2.3.3 added to pyproject dependencies.
Tests
tests/test_configs.py, tests/test_rollup.py
New and expanded tests for output directory behavior, rollup datamodels, aggregation logic, RollupStageMixin collection, and end-to-end rollup integration.

Sequence Diagram(s)

sequenceDiagram
    participant Orch as SweepOrchestrator
    participant Rollup as RollupStageMixin
    participant BenchFactory as BenchmarkParserFactory
    participant NodeFactory as NodeParserFactory
    participant Logs as LogDirectory
    participant Agg as RollupAggregator
    participant File as rollup.json

    Orch->>Rollup: run_rollup(tags?)
    Rollup->>BenchFactory: get_benchmark_parser(benchmark_type)
    BenchFactory->>Logs: parse_result_json()/parse_result_directory()
    Logs-->>BenchFactory: benchmark metrics
    Rollup->>NodeFactory: get_node_parser(backend_type)
    NodeFactory->>Logs: parse_logs(log_dir)
    Logs-->>NodeFactory: NodeMetrics[]
    Rollup->>Rollup: _collect_environment_config()
    Rollup->>Agg: _build_rollup_summary(benchmarks, nodes, env, commands)
    Agg->>Agg: compute_summary_stats()
    Rollup->>File: _write_rollup(RollupSummary)
    File-->>Orch: rollup.json path
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • complete refactor #65 — strongly related: touches srt-slurm parsing/rollup subsystems and appears to implement overlapping parser and rollup changes.

Poem

🐰
I nibbled logs and chased each line,
Parsed JSON crumbs and regex fine,
I hopped through nodes and stitched the tune,
Rollup gleams beneath the moon,
A carrot-coded summary — hop, review, refine!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add Rollup Stage for Experiment Data Consolidation' clearly and concisely summarizes the main objective of the changeset: introducing a rollup stage to consolidate experiment data.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@analysis/srtlog/parsers/nodes/sglang.py`:
- Around line 93-95: The log-reading try block that opens log_path and iterates
lines should guard against UnicodeDecodeError by opening the file with an
explicit encoding and a replacement error handler; update the with
open(log_path) call in the try block that reads lines to use encoding="utf-8"
and errors="replace" so non‑UTF8 bytes are replaced instead of raising and
dropping metrics (keep the rest of the loop logic unchanged and preserve the
existing try/except behavior).

In `@docs/architecture.md`:
- Around line 822-826: The fenced code block that renders the "ROLLUP STAGE
ARCHITECTURE" ASCII art is missing a language tag and triggers MD040; update the
opening fence from ``` to ```text so the block is tagged as plain text (i.e.,
replace the triple backticks before the ASCII box with ```text) to satisfy
Markdownlint. Locate the fenced block containing the "ROLLUP STAGE ARCHITECTURE"
header and modify only the opening fence.

In `@src/srtctl/cli/do_sweep.py`:
- Around line 211-214: The code uses hasattr(self.config, "tags") which leaves
tags typed as object and fails the run_rollup(tags: list[str] | None) signature;
update the extraction to explicitly narrow to list[str] | None before calling
run_rollup (e.g., check isinstance(self.config.tags, list) and verify items are
str, or use typing.cast[list[str] | None](...) after validating) so that the
local variable tags is typed as list[str] | None and then call
self.run_rollup(tags=tags).

In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 561-564: The endpoints property currently has an ellipsis body
which returns None and violates the declared return type list["Endpoint"];
replace the ellipsis with either (a) remove the `@property` method and declare a
class-level type hint endpoints: list["Endpoint"] if you want a pure type-only
mixin, or (b) implement the property to raise NotImplementedError (e.g. def
endpoints(self) -> list["Endpoint"]: raise NotImplementedError) so concrete
classes must provide it; update the symbol named endpoints accordingly.
- Around line 641-661: The type checker flags calls to optional parser methods
because BenchmarkParserProtocol doesn't declare find_aiperf_results and
parse_result_directory; fix by either using getattr with a callable check before
invoking (e.g., replace hasattr checks with m = getattr(parser,
"find_aiperf_results", None) and if callable(m): use m(...), same for
parse_result_directory) or add these methods as optional members to
BenchmarkParserProtocol (with default implementations returning empty lists) so
parse_result_json, find_aiperf_results, and parse_result_directory are
statically known; update references in the rollup logic where
parser.parse_result_json, parser.find_aiperf_results, and
parser.parse_result_directory are used.

In `@src/srtctl/README.md`:
- Around line 144-158: The fenced diagram block in the README triggers
markdownlint MD040 because it lacks a language tag; update the fenced code block
in the ROLLUP STAGE PIPELINE section (the triple-backtick block showing the
ASCII diagram) to include a language identifier (e.g., change ``` to ```text) so
the block is explicitly marked and the lint rule is satisfied.

In `@tests/test_rollup.py`:
- Around line 800-938: The test fails because analysis.srtlog modules import
pandas unconditionally (e.g., in analysis.srtlog.config_reader) causing
ImportError to be swallowed by RollupStageMixin.run_rollup; update the pandas
import to be guarded (wrap "import pandas as pd" in a try/except ImportError and
set pd = None) and ensure functions in analysis.srtlog.config_reader and any
parsers check pd is not None before using it and raise a clear error if pandas
is required, and also modify RollupStageMixin.run_rollup to not silently ignore
ImportError (detect ImportError from analysis.srtlog imports and re-raise or
surface a clear message) so tests fail fast if pandas is genuinely needed.
🧹 Nitpick comments (8)
tests/test_configs.py (1)

388-390: Extract shared setup into pytest fixtures to reduce repetition.

The new test class repeats config creation, root monkeypatching, and sbatch mocking. A small fixture/helper would keep this suite shorter and easier to maintain.

tests/test_rollup.py (1)

701-744: Consider freezing mock config dataclasses.

These mocks model configuration objects; setting frozen=True would align with the immutable config pattern used in production code. As per coding guidelines, ...

analysis/srtlog/parsers/__init__.py (2)

29-53: Consider making data-carrying dataclasses frozen for immutability.

Per coding guidelines, configuration objects should use @dataclass(frozen=True). Since BenchmarkLaunchCommand represents parsed, immutable launch command data, consider making it frozen to prevent accidental mutation after parsing.

♻️ Suggested change
-@dataclass
+@dataclass(frozen=True)
 class BenchmarkLaunchCommand:
     """Parsed benchmark launch command information."""

Note: This would require using field(default_factory=dict) replacement. If mutability is intentional for incremental field population during parsing, this can be ignored.


272-275: Wildcard imports for auto-registration are acceptable but could be more explicit.

The wildcard imports trigger parser registration as a side effect. While this works, explicit imports (e.g., from analysis.srtlog.parsers.benchmark import SABenchParser, MooncakeRouterParser) would make the registered parsers more discoverable in this file. This is a minor readability preference.

src/srtctl/cli/mixins/rollup_stage.py (4)

28-57: Consider frozen dataclass for immutable rollup data.

Similar to the parsers module, these data objects represent parsed results that shouldn't be mutated after construction. Per coding guidelines, consider @dataclass(frozen=True) for LaunchCommandRollup and other rollup data models.


245-318: Consider extracting common aggregation logic to reduce duplication.

The _aggregate_agg_batches method duplicates logic from _aggregate_prefill_batches (lines 254-287) and _aggregate_decode_batches (lines 289-317). Consider refactoring to reuse the existing methods:

♻️ Suggested refactor
     def _aggregate_agg_batches(self, batches: list) -> None:
         """Aggregate metrics for agg workers (handles both prefill and decode batches)."""
         # Separate prefill and decode batches
         prefill_batches = [b for b in batches if b.batch_type == "prefill"]
         decode_batches = [b for b in batches if b.batch_type == "decode"]

-        self.total_prefill_batches = len(prefill_batches)
-        self.total_decode_batches = len(decode_batches)
-
-        # Aggregate prefill metrics
+        # Reuse existing aggregation methods
         if prefill_batches:
-            new_tokens = []
-            # ... (duplicated prefill logic)
+            self._aggregate_prefill_batches(prefill_batches)
-
-        # Aggregate decode metrics
         if decode_batches:
-            running_reqs = []
-            # ... (duplicated decode logic)
+            self._aggregate_decode_batches(decode_batches)

773-773: Move import os to module level.

The os module import should be at the top of the file with other standard library imports, not inside the method.

♻️ Suggested change

Add to imports at top of file (around line 11):

import json
import logging
import os  # Add this
from dataclasses import asdict, dataclass, field

Then remove line 773.


952-959: Minor: Use generator expression instead of list for any().

The any() function can short-circuit with a generator, avoiding creation of an intermediate list.

♻️ Suggested change
-        if not any([
+        if not any((
             config.prefill_environment,
             config.decode_environment,
             config.aggregated_environment,
             config.prefill_engine_config,
             config.decode_engine_config,
             config.aggregated_engine_config,
-        ]):
+        )):

Comment on lines +93 to +95
try:
with open(log_path) as f:
for line in f:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against UnicodeDecodeError when reading logs.

If logs contain non‑UTF8 bytes, open() can throw and drop all metrics. Use explicit encoding with errors="replace" to keep parsing resilient.

🔧 Proposed fix
-            with open(log_path) as f:
+            with open(log_path, "r", encoding="utf-8", errors="replace") as f:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
with open(log_path) as f:
for line in f:
try:
with open(log_path, "r", encoding="utf-8", errors="replace") as f:
for line in f:
🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/nodes/sglang.py` around lines 93 - 95, The
log-reading try block that opens log_path and iterates lines should guard
against UnicodeDecodeError by opening the file with an explicit encoding and a
replacement error handler; update the with open(log_path) call in the try block
that reads lines to use encoding="utf-8" and errors="replace" so non‑UTF8 bytes
are replaced instead of raising and dropping metrics (keep the rest of the loop
logic unchanged and preserve the existing try/except behavior).

Comment thread docs/architecture.md
Comment on lines +822 to +826
```
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ ROLLUP STAGE ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────────────────┘

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to this fenced block for MD040 compliance.

Markdownlint flags this block; tagging it as text is sufficient.

🛠️ Suggested change
-```
+```text
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ ROLLUP STAGE ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────────────────┘
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

822-822: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@docs/architecture.md` around lines 822 - 826, The fenced code block that
renders the "ROLLUP STAGE ARCHITECTURE" ASCII art is missing a language tag and
triggers MD040; update the opening fence from ``` to ```text so the block is
tagged as plain text (i.e., replace the triple backticks before the ASCII box
with ```text) to satisfy Markdownlint. Locate the fenced block containing the
"ROLLUP STAGE ARCHITECTURE" header and modify only the opening fence.

Comment on lines +211 to +214
# Run rollup to consolidate experiment data
if exit_code == 0:
tags = self.config.tags if hasattr(self.config, "tags") else []
self.run_rollup(tags=tags)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix tags type narrowing to satisfy run_rollup signature (CI failure).

Current hasattr path leaves tags typed as object, triggering the CI type error. Narrow explicitly to list[str] | None.

✅ Suggested fix
-            if exit_code == 0:
-                tags = self.config.tags if hasattr(self.config, "tags") else []
-                self.run_rollup(tags=tags)
+            if exit_code == 0:
+                tags: list[str] | None = None
+                raw_tags = getattr(self.config, "tags", None)
+                if isinstance(raw_tags, list):
+                    tags = raw_tags
+                self.run_rollup(tags=tags)
🧰 Tools
🪛 GitHub Actions: CI

[error] 214-214: invalid-argument-type: Argument to bound method run_rollup is incorrect. Expected list[str] | None, found object.

🤖 Prompt for AI Agents
In `@src/srtctl/cli/do_sweep.py` around lines 211 - 214, The code uses
hasattr(self.config, "tags") which leaves tags typed as object and fails the
run_rollup(tags: list[str] | None) signature; update the extraction to
explicitly narrow to list[str] | None before calling run_rollup (e.g., check
isinstance(self.config.tags, list) and verify items are str, or use
typing.cast[list[str] | None](...) after validating) so that the local variable
tags is typed as list[str] | None and then call self.run_rollup(tags=tags).

Comment on lines +561 to +564
@property
def endpoints(self) -> list["Endpoint"]:
"""Endpoint allocation topology."""
...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the endpoints property that always returns None.

The property body ... (ellipsis) causes it to implicitly return None, violating the list[Endpoint] return type annotation. This triggers the pipeline error. For a mixin expecting the concrete class to provide this property, either:

  1. Remove the implementation entirely (just declare the type hint)
  2. Raise NotImplementedError
🔧 Proposed fix
     `@property`
     def endpoints(self) -> list["Endpoint"]:
         """Endpoint allocation topology."""
-        ...
+        raise NotImplementedError("Subclass must provide endpoints property")

Or, if using a pure type-hint approach for mixins:

# Just declare as class variable type hint without `@property`
endpoints: list["Endpoint"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@property
def endpoints(self) -> list["Endpoint"]:
"""Endpoint allocation topology."""
...
`@property`
def endpoints(self) -> list["Endpoint"]:
"""Endpoint allocation topology."""
raise NotImplementedError("Subclass must provide endpoints property")
🧰 Tools
🪛 GitHub Actions: CI

[error] 562-562: invalid-return-type: Function always implicitly returns None, which is not assignable to return type list[Endpoint].

🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 561 - 564, The endpoints
property currently has an ellipsis body which returns None and violates the
declared return type list["Endpoint"]; replace the ellipsis with either (a)
remove the `@property` method and declare a class-level type hint endpoints:
list["Endpoint"] if you want a pure type-only mixin, or (b) implement the
property to raise NotImplementedError (e.g. def endpoints(self) ->
list["Endpoint"]: raise NotImplementedError) so concrete classes must provide
it; update the symbol named endpoints accordingly.

Comment on lines +641 to +661
# Try parser-specific result collection first
if parser is not None:
# For mooncake-router, look for AIPerf results
if hasattr(parser, "find_aiperf_results"):
aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
for aiperf_path in aiperf_files:
result = parser.parse_result_json(aiperf_path)
if result.get("output_tps") is not None:
results.append(result)
logger.debug("Loaded AIPerf result: %s", aiperf_path)

# For sa-bench style, look for result directories
if hasattr(parser, "parse_result_directory"):
for entry in self.runtime.log_dir.iterdir():
if not entry.is_dir():
continue
# Match patterns like sa-bench_isl_X_osl_Y
if "_isl_" in entry.name and "_osl_" in entry.name:
logger.debug("Found benchmark results directory: %s", entry.name)
dir_results = parser.parse_result_directory(entry)
results.extend(dir_results)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix type errors for optional parser methods not in protocol.

The type checker reports call-non-callable because find_aiperf_results and parse_result_directory are not defined in BenchmarkParserProtocol. While the hasattr checks make this runtime-safe, the type checker cannot verify this.

🔧 Proposed fix using getattr
             # Try parser-specific result collection first
             if parser is not None:
                 # For mooncake-router, look for AIPerf results
-                if hasattr(parser, "find_aiperf_results"):
-                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                find_aiperf = getattr(parser, "find_aiperf_results", None)
+                if find_aiperf is not None:
+                    aiperf_files = find_aiperf(self.runtime.log_dir)
                     for aiperf_path in aiperf_files:
                         result = parser.parse_result_json(aiperf_path)
                         if result.get("output_tps") is not None:
                             results.append(result)
                             logger.debug("Loaded AIPerf result: %s", aiperf_path)

                 # For sa-bench style, look for result directories
-                if hasattr(parser, "parse_result_directory"):
+                parse_result_dir = getattr(parser, "parse_result_directory", None)
+                if parse_result_dir is not None:
                     for entry in self.runtime.log_dir.iterdir():
                         if not entry.is_dir():
                             continue
                         # Match patterns like sa-bench_isl_X_osl_Y
                         if "_isl_" in entry.name and "_osl_" in entry.name:
                             logger.debug("Found benchmark results directory: %s", entry.name)
-                            dir_results = parser.parse_result_directory(entry)
+                            dir_results = parse_result_dir(entry)
                             results.extend(dir_results)

Alternatively, add these methods as optional (with default implementations returning empty results) to BenchmarkParserProtocol.

🧰 Tools
🪛 GitHub Actions: CI

[error] 645-645: call-non-callable: Object of type object is not callable.


[error] 660-660: call-non-callable: Object of type object is not callable.

🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 641 - 661, The type
checker flags calls to optional parser methods because BenchmarkParserProtocol
doesn't declare find_aiperf_results and parse_result_directory; fix by either
using getattr with a callable check before invoking (e.g., replace hasattr
checks with m = getattr(parser, "find_aiperf_results", None) and if callable(m):
use m(...), same for parse_result_directory) or add these methods as optional
members to BenchmarkParserProtocol (with default implementations returning empty
lists) so parse_result_json, find_aiperf_results, and parse_result_directory are
statically known; update references in the rollup logic where
parser.parse_result_json, parser.find_aiperf_results, and
parser.parse_result_directory are used.

Comment thread src/srtctl/README.md
Comment on lines +144 to +158
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ROLLUP STAGE PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ INPUT FILES OUTPUT │
│ ─────────── ────── │
│ • benchmark results (*.json) ──┐ │
│ • worker logs (*.out/*.err) ──┼──► rollup.json │
│ • config.yaml ──┤ • benchmark results │
│ • engine configs (*.yaml) ──┘ • node metrics │
│ • environment config │
│ • summary statistics │
└─────────────────────────────────────────────────────────────────────────────┘
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language to fenced block to satisfy markdownlint (MD040).

🔧 Proposed fix
-```
+```text
┌─────────────────────────────────────────────────────────────────────────────┐
│                           ROLLUP STAGE PIPELINE                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  INPUT FILES                          OUTPUT                                │
│  ───────────                          ──────                                │
│  • benchmark results (*.json)    ──┐                                        │
│  • worker logs (*.out/*.err)     ──┼──►  rollup.json                        │
│  • config.yaml                   ──┤     • benchmark results                │
│  • engine configs (*.yaml)       ──┘     • node metrics                     │
│                                          • environment config               │
│                                          • summary statistics               │
└─────────────────────────────────────────────────────────────────────────────┘
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ROLLUP STAGE PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ INPUT FILES OUTPUT │
│ ─────────── ────── │
│ • benchmark results (*.json) ──┐ │
│ • worker logs (*.out/*.err) ──┼──► rollup.json │
│ • config.yaml ──┤ • benchmark results │
│ • engine configs (*.yaml) ──┘ • node metrics │
│ • environment config │
│ • summary statistics │
└─────────────────────────────────────────────────────────────────────────────┘
```
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

144-144: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@src/srtctl/README.md` around lines 144 - 158, The fenced diagram block in the
README triggers markdownlint MD040 because it lacks a language tag; update the
fenced code block in the ROLLUP STAGE PIPELINE section (the triple-backtick
block showing the ASCII diagram) to include a language identifier (e.g., change
``` to ```text) so the block is explicitly marked and the lint rule is
satisfied.

Comment thread tests/test_rollup.py
Comment on lines +800 to +938
def test_rollup_with_node_logs(self, tmp_path):
"""Test rollup with actual node log files parsed by NodeAnalyzer."""
from dataclasses import dataclass, field

# Create mock benchmark results
bench_dir = tmp_path / "sa-bench_isl_1024_osl_1024"
bench_dir.mkdir()

result_file = bench_dir / "result_c100.json"
result_file.write_text(
json.dumps(
{
"max_concurrency": 100,
"output_throughput": 5000.0,
"mean_ttft_ms": 150.0,
"mean_itl_ms": 20.0,
}
)
)

# Create mock prefill log file (matches NodeAnalyzer expected format)
prefill_log = tmp_path / "node-01_prefill_w0.err"
prefill_log.write_text(
"""[2025-01-22 10:00:00 DP0 TP0 EP0] Prefill batch, #new-seq: 10, #new-token: 1024, #cached-token: 256, token usage: 0.50, #running-req: 5, #queue-req: 2, #prealloc-req: 0, #inflight-req: 3, input throughput (token/s): 5000.00,
[2025-01-22 10:00:01 DP0 TP0 EP0] Load weight end. type=DeepseekV3ForCausalLM, dtype=torch.bfloat16, avail mem=75.11 GB, mem usage=107.07 GB.
[2025-01-22 10:00:02 DP0 TP0 EP0] KV Cache is allocated. #tokens: 524288, KV size: 17.16 GB
"""
)

# Create mock decode log file
decode_log = tmp_path / "node-02_decode_w0.err"
decode_log.write_text(
"""[2025-01-22 10:00:00 DP31 TP31 EP31] Decode batch, #running-req: 50, #token: 5000, token usage: 0.50, pre-allocated usage: 0.10, #prealloc-req: 2, #transfer-req: 1, #queue-req: 3, gen throughput (token/s): 150.00,
"""
)

# Mock config classes
@dataclass
class MockResourceConfig:
is_disaggregated: bool = True
prefill_gpus: int = 8
decode_gpus: int = 8
agg_nodes: int | None = None
gpus_per_node: int = 8
gpu_type: str = "B200"
total_nodes: int = 2
prefill_nodes: int = 1
decode_nodes: int = 1
num_prefill: int = 1
num_decode: int = 1
num_agg: int | None = None

@dataclass
class MockBenchmarkConfig:
type: str = "sa-bench"
isl: int = 1024
osl: int = 1024
concurrencies: str = "100"

def get_concurrency_list(self):
return [100]

@dataclass
class MockModelConfig:
precision: str = "fp8"

@dataclass
class MockFrontendConfig:
type: str = "sglang"

@dataclass
class MockConfig:
name: str = "test-job"
served_model_name: str = "deepseek-v3"
backend_type: str = "sglang"
resources: MockResourceConfig = field(default_factory=MockResourceConfig)
benchmark: MockBenchmarkConfig = field(default_factory=MockBenchmarkConfig)
model: MockModelConfig = field(default_factory=MockModelConfig)
frontend: MockFrontendConfig = field(default_factory=MockFrontendConfig)

@dataclass
class MockRuntime:
job_id: str = "12345"
log_dir: Path = field(default_factory=Path)
model_path: Path = field(default_factory=lambda: Path("/models/deepseek-v3"))

class MockOrchestrator(RollupStageMixin):
def __init__(self, config, runtime):
self._config = config
self._runtime = runtime

@property
def config(self):
return self._config

@property
def runtime(self):
return self._runtime

@property
def endpoints(self):
return []

config = MockConfig()
runtime = MockRuntime(log_dir=tmp_path)
orchestrator = MockOrchestrator(config, runtime)

# Run rollup
rollup_path = orchestrator.run_rollup(tags=["node-test"])

assert rollup_path is not None
assert rollup_path.exists()

with open(rollup_path) as f:
data = json.load(f)

# Verify node summary is present
assert data["nodes_summary"] is not None
nodes_summary = data["nodes_summary"]

assert nodes_summary["total_prefill_nodes"] == 1
assert nodes_summary["total_decode_nodes"] == 1
assert len(nodes_summary["nodes"]) == 2

# Find prefill and decode nodes
prefill_node = next((n for n in nodes_summary["nodes"] if n["worker_type"] == "prefill"), None)
decode_node = next((n for n in nodes_summary["nodes"] if n["worker_type"] == "decode"), None)

assert prefill_node is not None
assert prefill_node["node_name"] == "node-01"
assert prefill_node["total_new_tokens"] == 1024
assert prefill_node["total_cached_tokens"] == 256
assert prefill_node["kv_cache_gb"] == 17.16

assert decode_node is not None
assert decode_node["node_name"] == "node-02"
assert decode_node["max_running_requests"] == 50
assert decode_node["avg_gen_throughput"] == 150.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check for pandas in declared dependencies
rg -n "pandas" pyproject.toml setup.py setup.cfg 2>/dev/null | head -20

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Look for parser imports and NodeAnalyzer usage
rg -n "srtlog\.parsers|NodeAnalyzer|from analysis" tests/test_rollup.py -A 2

Repository: ishandhanani/srt-slurm

Length of output: 940


🏁 Script executed:

# Check where RollupStageMixin is defined and how it uses parsers
rg -n "class RollupStageMixin|def run_rollup" -g "*.py" -A 5

Repository: ishandhanani/srt-slurm

Length of output: 955


🏁 Script executed:

# Check pyproject.toml more thoroughly for dependencies
cat pyproject.toml | grep -A 50 "dependencies\|optional-dependencies"

Repository: ishandhanani/srt-slurm

Length of output: 2208


🏁 Script executed:

# Get the full run_rollup implementation
sed -n '566,650p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 3361


🏁 Script executed:

# Find NodeAnalyzer class and its imports
rg -n "class NodeAnalyzer|from.*NodeAnalyzer" -g "*.py"

Repository: ishandhanani/srt-slurm

Length of output: 487


🏁 Script executed:

# Find _collect_node_metrics implementation
rg -n "_collect_node_metrics" src/srtctl/cli/mixins/rollup_stage.py -A 30

Repository: ishandhanani/srt-slurm

Length of output: 2757


🏁 Script executed:

# Check NodeAnalyzer class and its imports
sed -n '1,100p' analysis/srtlog/log_parser.py

Repository: ishandhanani/srt-slurm

Length of output: 3549


🏁 Script executed:

# Search for pandas imports in analysis module
rg -n "import pandas|from pandas" analysis/ -g "*.py"

Repository: ishandhanani/srt-slurm

Length of output: 513


🏁 Script executed:

# Check if analysis is a separate package or part of main package
ls -la analysis/ 2>/dev/null | head -20

Repository: ishandhanani/srt-slurm

Length of output: 352


🏁 Script executed:

# Check if there's a separate setup for analysis or if it's optional
rg -n "analysis" pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 85


🏁 Script executed:

# Look for any optional-dependencies section more completely
cat pyproject.toml | grep -B5 -A20 "optional-dependencies"

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Verify if parsers module exists and what it imports
head -50 analysis/srtlog/parsers/__init__.py 2>/dev/null || echo "Check if parsers is a directory"

Repository: ishandhanani/srt-slurm

Length of output: 1465


🏁 Script executed:

# Check if analysis is properly included as a package in pyproject.toml
grep -A 10 "packages\|include" pyproject.toml | head -20

Repository: ishandhanani/srt-slurm

Length of output: 426


🏁 Script executed:

# Verify the full parsers __init__ to see how get_node_parser works
cat analysis/srtlog/parsers/__init__.py

Repository: ishandhanani/srt-slurm

Length of output: 7682


🏁 Script executed:

# Check if there are any try-except guards around pandas imports
rg -n "try:|except ImportError|importorskip" analysis/srtlog/log_parser.py -B 2 -A 2

Repository: ishandhanani/srt-slurm

Length of output: 188


🏁 Script executed:

# Check if analysis has its own package configuration
ls -la analysis/setup.py analysis/pyproject.toml 2>&1

Repository: ishandhanani/srt-slurm

Length of output: 201


🏁 Script executed:

# Check how analysis module is meant to be installed/used
cat analysis/__init__.py

Repository: ishandhanani/srt-slurm

Length of output: 235


🏁 Script executed:

# Verify pandas is imported unconditionally at module level in log_parser
sed -n '1,30p' analysis/srtlog/log_parser.py | grep -n "import\|pandas"

Repository: ishandhanani/srt-slurm

Length of output: 172


🏁 Script executed:

# Check if test currently passes or fails by looking for CI configuration
find . -name "*.yml" -o -name "*.yaml" | head -10 | xargs grep -l "test_rollup_with_node_logs" 2>/dev/null || echo "No CI config found mentioning the test"

Repository: ishandhanani/srt-slurm

Length of output: 104


🏁 Script executed:

# Check if pandas is conditionally imported anywhere in analysis/srtlog
rg -n "try:|except.*ImportError" analysis/srtlog/ -B 1 -A 3

Repository: ishandhanani/srt-slurm

Length of output: 9916


🏁 Script executed:

# Verify what will happen when the test runs without pandas
# Check the full import chain: parsers -> nodes -> imports
head -50 analysis/srtlog/parsers/nodes.py 2>/dev/null | grep -n "import\|pandas"

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Check if there's a way test could run successfully without analysis module
rg -n "import pytest" tests/test_rollup.py | head -5

Repository: ishandhanani/srt-slurm

Length of output: 81


🏁 Script executed:

# Check which parsers are imported when parsers/__init__.py is loaded
sed -n '1,20p' analysis/srtlog/parsers/benchmark/__init__.py 2>/dev/null || echo "Check parsers submodules"

Repository: ishandhanani/srt-slurm

Length of output: 442


🏁 Script executed:

# Verify if analysis module is included in the package
grep -r "analysis" pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 82


🏁 Script executed:

# Try to understand the actual import error that would occur
python3 -c "import sys; sys.path.insert(0, '.'); from analysis.srtlog.parsers import get_node_parser" 2>&1 | head -20

Repository: ishandhanani/srt-slurm

Length of output: 402


🏁 Script executed:

# Verify if pytest.importorskip would catch the pandas import error
python3 << 'EOF'
import sys
sys.path.insert(0, '.')

# Simulate what pytest.importorskip would do
try:
    __import__("analysis.srtlog.parsers")
    print("Import successful")
except ImportError as e:
    print(f"ImportError caught: {e}")
EOF

Repository: ishandhanani/srt-slurm

Length of output: 110


🏁 Script executed:

# Check the exact location of the test to confirm line numbers
grep -n "def test_rollup_with_node_logs" tests/test_rollup.py

Repository: ishandhanani/srt-slurm

Length of output: 121


🏁 Script executed:

# Verify if analysis should be optional or required
git log --oneline analysis/srtlog/__init__.py 2>/dev/null | head -5 || echo "Git history not available"

Repository: ishandhanani/srt-slurm

Length of output: 86


Test requires pandas dependency which is not declared.

The test calls orchestrator.run_rollup(), which attempts to import analysis.srtlog.parsers. This import chain loads analysis.srtlog modules that unconditionally import pandas (e.g., config_reader.py line 10), causing an ImportError when pandas is unavailable. The exception is caught silently in run_rollup() (line 611-613), causing nodes_summary to be None and subsequent assertions to fail.

Add the import guard or declare pandas as a dependency:

Import guard approach
 def test_rollup_with_node_logs(self, tmp_path):
     """Test rollup with actual node log files parsed by NodeAnalyzer."""
+    pytest.importorskip("pandas")
     from dataclasses import dataclass, field
🧰 Tools
🪛 GitHub Actions: CI

[error] 917-917: Rollup test failure: node metrics unavailable due to missing pandas backend / analysis.srtlog.parsers not available, causing assertions like 'data["nodes_summary"] is not None' to fail.

🤖 Prompt for AI Agents
In `@tests/test_rollup.py` around lines 800 - 938, The test fails because
analysis.srtlog modules import pandas unconditionally (e.g., in
analysis.srtlog.config_reader) causing ImportError to be swallowed by
RollupStageMixin.run_rollup; update the pandas import to be guarded (wrap
"import pandas as pd" in a try/except ImportError and set pd = None) and ensure
functions in analysis.srtlog.config_reader and any parsers check pd is not None
before using it and raise a clear error if pandas is required, and also modify
RollupStageMixin.run_rollup to not silently ignore ImportError (detect
ImportError from analysis.srtlog imports and re-raise or surface a clear
message) so tests fail fast if pandas is genuinely needed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Line 19: Move "pandas>=2.3.3" out of the core dependencies in pyproject.toml
and add it under an optional extras group named "analysis" by creating a
[project.optional-dependencies] section with analysis = ["pandas>=2.3.3"];
update any packaging metadata that reads dependencies to ensure pandas is no
longer listed as a required dependency for the CLI entry points (srtctl,
srtctl-i, srtctl-sweep, srtctl-setup-head), and add a short note to the README
instructing users who need the analysis/benchmark scripts to install with pip
install srtctl[analysis].

In `@src/srtctl/benchmarks/scripts/longbenchv2/bench.sh`:
- Around line 31-43: The current string-based variable "command" is unsafe
because NUM_EXAMPLES and CATEGORIES can contain spaces/shell tokens and you use
eval; instead build the invocation as an array (e.g., an array variable such as
cmd) starting with the python module and required args (ENDPOINT, MODEL_NAME,
MAX_TOKENS, MAX_CONTEXT_LENGTH, NUM_THREADS) and conditionally append
NUM_EXAMPLES and CATEGORIES as separate array elements only when non-empty;
finally echo a joined representation for logging and run the command using array
execution (array expansion) rather than eval to avoid word-splitting and shell
injection.

In `@src/srtctl/benchmarks/scripts/mooncake-router/bench.sh`:
- Around line 60-62: The script builds user-controlled shell strings into the
variable named command and then runs them with eval (e.g., the lines setting
command and eval $command and the similar block around lines 79-81); replace
that pattern with a safe array-style invocation: construct the command as an
array where each token is a separate element and interpolate variables as quoted
expansions (e.g., cmd=(aiperf profile -m "$MODEL_NAME" --url "$ENDPOINT"
--streaming --ui simple --concurrency 10 --request-count 20)) and then execute
with "${cmd[@]}", applying the same change to the second command block so spaces
and injections are prevented.

In `@src/srtctl/benchmarks/scripts/profiling/profile.sh`:
- Around line 133-136: The script builds a shell command string in the variable
named command and then runs it with eval, which is unsafe for env-provided
PROFILE_* and PROFILING_MODE values; instead construct the argument list as an
array (e.g., cmd=(python3 -m sglang.bench_serving --backend sglang --model
"$model_name" ...)) and run it with "${cmd[@]}" to avoid word-splitting and
shell injection; use printf or echo to log the command safely (e.g., printf '%q
' "${cmd[@]}"; printf '\n') before executing, and apply the same array+printf
replacement for the other eval usages noted around the block that covers lines
142-145.

In `@src/srtctl/benchmarks/scripts/router/bench.sh`:
- Around line 42-45: The script currently builds a single string in the variable
command and uses eval, which permits shell metacharacters in user-controlled
variables (PREFIX_RATIOS, ISL, OSL, REQUESTS, CONCURRENCY) to be interpreted;
replace this with a bash array for safe argument handling: construct an array
named cmd starting with the interpreter and script (e.g., python
prefix_ratio_benchmark.py) and push each flag and its value as separate elements
using quoted variable expansions (e.g., "--prefix-ratios" "$PREFIX_RATIOS"),
then execute the command with "${cmd[@]}" and remove the eval and the SC2086
disable. Ensure result_dir is passed as the value of "--output-dir" similarly
and keep the echo of the command by joining the array for logging if needed.

In `@src/srtctl/benchmarks/scripts/sa-bench/bench.sh`:
- Around line 53-58: The loop builds a shell command string and uses eval which
is unsafe for untrusted values (MODEL_NAME, HOST, concurrency); instead
construct the command as an array and execute it without eval: replace the
string-assigned command and eval with an array like cmd=(python3 -u
"${WORK_DIR}/benchmark_serving.py" --model "${MODEL_NAME}" --tokenizer
"${MODEL_PATH}" --host "${HOST}" --port "${PORT}" --backend dynamo --endpoint
/v1/completions --disable-tqdm --dataset-name random --num-prompts
"${num_prompts}" --random-input-len "${ISL}" --random-output-len "${OSL}"
--random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics
ttft,tpot,itl,e2el --max-concurrency "${concurrency}") and run it via
"${cmd[@]}", ensuring every variable is quoted; apply the same replacement for
the similar block referenced at lines 77-80.

In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 20-26: Tests import dataclasses like NodeRollup and NodesSummary
from rollup_stage but this module only imports them without re-exporting; update
rollup_stage to re-export the needed dataclasses (e.g., NodeRollup,
NodesSummary, EnvironmentConfig, LaunchCommandRollup, RollupResult,
RollupSummary) by importing them from srtctl.cli.mixins.rollup and exposing them
in the module namespace (add explicit imports and/or populate __all__ with these
symbol names) so tests can import them from rollup_stage.

In `@tests/test_rollup.py`:
- Around line 11-17: The tests import NodeRollup (and related classes) from
rollup_stage but NodeRollup isn't re-exported there; update the import to pull
NodeRollup, NodesSummary, RollupResult, RollupSummary directly from the rollup
module (or alternatively add re-export statements in rollup_stage to export
those symbols) so the import in tests/test_rollup.py resolves; target the
identifiers NodeRollup, NodesSummary, RollupResult, RollupSummary and keep
RollupStageMixin import from rollup_stage if it is defined there.
♻️ Duplicate comments (6)
analysis/srtlog/parsers/nodes/sglang.py (1)

98-100: Guard against UnicodeDecodeError when reading logs.

If logs contain non-UTF8 bytes, open() can throw and drop all metrics. Use explicit encoding with errors="replace" to keep parsing resilient. Note: The TRTLLM parser at Line 97 uses log_path.read_text(errors="replace") which handles this correctly.

🔧 Proposed fix
-            with open(log_path) as f:
+            with open(log_path, encoding="utf-8", errors="replace") as f:
src/srtctl/README.md (1)

144-158: Add language to fenced block for markdownlint compliance (MD040).

The ASCII diagram block is missing a language tag.

🔧 Proposed fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────────────────────┐
docs/architecture.md (1)

822-826: Add language tag to fenced block for MD040 compliance.

The ASCII architecture diagram block is missing a language tag.

🔧 Proposed fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────────────────────────────┐
tests/test_rollup.py (1)

800-938: Add pytest.importorskip("pandas") guard for pandas-dependent test.

This test exercises run_rollup() which may import analysis.srtlog modules that depend on pandas. Without the guard, the test will fail silently (assertions fail because nodes_summary is None) rather than being skipped when pandas is unavailable.

🔧 Proposed fix
     def test_rollup_with_node_logs(self, tmp_path):
         """Test rollup with actual node log files parsed by NodeAnalyzer."""
+        pytest.importorskip("pandas")
         from dataclasses import dataclass, field
src/srtctl/cli/mixins/rollup_stage.py (2)

49-52: Fix endpoints property that always returns None.

The property body ... (ellipsis) causes it to implicitly return None, violating the list[Endpoint] return type and causing the pipeline type error.

🔧 Proposed fix
     `@property`
     def endpoints(self) -> list["Endpoint"]:
         """Endpoint allocation topology."""
-        ...
+        raise NotImplementedError("Subclass must provide endpoints property")

129-149: Fix type errors for optional parser methods.

The type checker reports call-non-callable because find_aiperf_results and parse_result_directory are not defined in the protocol. Use getattr with callable check to satisfy the type checker.

🔧 Proposed fix
             # Try parser-specific result collection first
             if parser is not None:
                 # For mooncake-router, look for AIPerf results
-                if hasattr(parser, "find_aiperf_results"):
-                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                find_aiperf = getattr(parser, "find_aiperf_results", None)
+                if callable(find_aiperf):
+                    aiperf_files = find_aiperf(self.runtime.log_dir)
                     for aiperf_path in aiperf_files:
                         result = parser.parse_result_json(aiperf_path)
                         if result.get("output_tps") is not None:
                             results.append(result)
                             logger.debug("Loaded AIPerf result: %s", aiperf_path)

                 # For sa-bench style, look for result directories
-                if hasattr(parser, "parse_result_directory"):
+                parse_result_dir = getattr(parser, "parse_result_directory", None)
+                if callable(parse_result_dir):
                     for entry in self.runtime.log_dir.iterdir():
                         if not entry.is_dir():
                             continue
                         # Match patterns like sa-bench_isl_X_osl_Y
                         if "_isl_" in entry.name and "_osl_" in entry.name:
                             logger.debug("Found benchmark results directory: %s", entry.name)
-                            dir_results = parser.parse_result_directory(entry)
+                            dir_results = parse_result_dir(entry)
                             results.extend(dir_results)
🧹 Nitpick comments (11)
src/srtctl/benchmarks/scripts/mmlu/bench.sh (1)

29-31: Prefer direct execution over eval, or quote the variable.

Using eval $command without quotes can cause word-splitting issues if any variable (especially ENDPOINT) contains spaces or shell metacharacters. Two options:

  1. Preferred: Execute directly without eval:
echo "[CMD] python3 -m sglang.test.run_eval --base-url ${ENDPOINT} ..."
python3 -m sglang.test.run_eval --base-url "${ENDPOINT}" --model "${MODEL_NAME}" \
    --eval-name mmlu --num-examples "${NUM_EXAMPLES}" --max-tokens "${MAX_TOKENS}" \
    --repeat "${REPEAT}" --num-threads "${NUM_THREADS}"
  1. Minimal fix: Quote the eval argument:
-eval $command
+eval "$command"
src/srtctl/benchmarks/scripts/gpqa/bench.sh (1)

29-31: Quote the command variable in eval for robustness.

While the current variables appear safe, unquoted $command in eval can cause word-splitting issues if any variable (e.g., MODEL_NAME) contains spaces or special characters.

🔧 Suggested fix
 command="python3 -m sglang.test.run_eval --base-url ${ENDPOINT} --model ${MODEL_NAME} --eval-name gpqa --num-examples ${NUM_EXAMPLES} --max-tokens ${MAX_TOKENS} --repeat ${REPEAT} --num-threads ${NUM_THREADS}"
 echo "[CMD] $command"
-eval $command
+eval "$command"
analysis/srtlog/parsers/benchmark/sa_bench.py (1)

101-103: Add explicit encoding when reading JSON files.

For consistency with other parsers and to handle potential encoding issues, specify the encoding explicitly.

🔧 Suggested fix
-            with open(json_path) as f:
+            with open(json_path, encoding="utf-8") as f:
analysis/srtlog/parsers/benchmark/mooncake_router.py (1)

92-94: Add explicit encoding when reading JSON files.

For consistency and to handle potential encoding issues, specify the encoding explicitly.

🔧 Suggested fix
-            with open(json_path) as f:
+            with open(json_path, encoding="utf-8") as f:
analysis/srtlog/parsers/nodes/trtllm.py (1)

343-359: Consider extracting shared filename parsing logic.

The _extract_node_info_from_filename method is identical to the one in analysis/srtlog/parsers/nodes/sglang.py (lines 305-321). Consider extracting this to a shared utility to reduce duplication.

analysis/srtlog/rollup_harness.py (2)

27-28: Module-level basicConfig may affect global logging state.

Calling logging.basicConfig at module import time can interfere with the application's logging configuration if this module is imported as a library rather than run as a script. Consider moving this inside main() or using a conditional guard.

🔧 Suggested fix
-logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
 logger = logging.getLogger(__name__)
+
+
+def _setup_logging(verbose: bool = False) -> None:
+    """Configure logging for the harness."""
+    level = logging.DEBUG if verbose else logging.INFO
+    logging.basicConfig(level=level, format="%(asctime)s - %(levelname)s - %(message)s")

Then call _setup_logging(args.verbose) in main() before processing.


63-66: Add explicit encoding when reading YAML files.

For consistency and to handle potential encoding issues across different environments, specify the encoding explicitly.

🔧 Suggested fix
-        with open(config_path) as f:
+        with open(config_path, encoding="utf-8") as f:
             return yaml.safe_load(f)

Apply the same fix to lines 292 and 295.

src/srtctl/cli/mixins/rollup/models.py (3)

133-138: Misleading comment about kv_cache aggregation.

The comment mentions "(or sum for multiple allocations)" but the code only takes the max. Either update the comment to match the implementation or implement the sum logic if that's the intended behavior.

🔧 Proposed fix
                 if mem.kv_cache_gb is not None:
-                    # Take the max kv_cache seen (or sum for multiple allocations)
+                    # Take the max kv_cache seen
                     if rollup.kv_cache_gb is None:
                         rollup.kv_cache_gb = mem.kv_cache_gb
                     else:
                         rollup.kv_cache_gb = max(rollup.kv_cache_gb, mem.kv_cache_gb)

236-308: Code duplication in aggregation methods.

_aggregate_agg_batches duplicates logic from _aggregate_prefill_batches and _aggregate_decode_batches. Consider refactoring to reuse the existing methods for better maintainability.

♻️ Optional refactor
     def _aggregate_agg_batches(self, batches: list) -> None:
         """Aggregate metrics for agg workers (handles both prefill and decode batches)."""
         # Separate prefill and decode batches
         prefill_batches = [b for b in batches if b.batch_type == "prefill"]
         decode_batches = [b for b in batches if b.batch_type == "decode"]

         self.total_prefill_batches = len(prefill_batches)
         self.total_decode_batches = len(decode_batches)

-        # Aggregate prefill metrics
+        # Reuse existing aggregation methods
         if prefill_batches:
-            new_tokens = []
-            cached_tokens = []
-            # ... 30+ lines of duplicated logic
-            ...
-
-        # Aggregate decode metrics
-        if decode_batches:
-            # ... 20+ lines of duplicated logic
-            ...
+            # Temporarily store current batch count
+            saved_total = self.total_prefill_batches
+            self._aggregate_prefill_batches(prefill_batches)
+            self.total_prefill_batches = saved_total
+
+        if decode_batches:
+            saved_total = self.total_decode_batches
+            self._aggregate_decode_batches(decode_batches)
+            self.total_decode_batches = saved_total

416-416: Consider documenting the mixed type for request_rate.

The type float | str | None is unusual. Consider adding a docstring to clarify when this is a float vs a string (e.g., numeric rate vs identifier like "c100").

src/srtctl/cli/mixins/rollup_stage.py (1)

99-101: Consider logging the full stack trace for debugging.

The broad exception catch logs only the error message. For debugging complex rollup failures, consider logging the full traceback at DEBUG level.

🔧 Proposed enhancement
         except Exception as e:
-            logger.error("Rollup failed: %s", e)
+            logger.error("Rollup failed: %s", e)
+            logger.debug("Rollup traceback:", exc_info=True)
             return None

Comment thread pyproject.toml
"requests>=2.31.0",
"rich>=13.0.0",
"questionary>=2.0.0",
"pandas>=2.3.3",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all pandas imports and their locations
rg -n "import pandas|from pandas" -S --type-list

Repository: ishandhanani/srt-slurm

Length of output: 455


🏁 Script executed:

#!/bin/bash
# Get a broader view of directory structure to understand src vs tests layout
git ls-files | head -30

Repository: ishandhanani/srt-slurm

Length of output: 953


🏁 Script executed:

#!/bin/bash
# Check the pyproject.toml structure to understand dependency grouping
cat -n pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 3324


🏁 Script executed:

#!/bin/bash
# Find all pandas imports
rg -n "import pandas|from pandas" --type py

Repository: ishandhanani/srt-slurm

Length of output: 596


🏁 Script executed:

#!/bin/bash
# Also check the analysis directory structure more carefully
find analysis -type f -name "*.py" | head -20

Repository: ishandhanani/srt-slurm

Length of output: 735


🏁 Script executed:

#!/bin/bash
# Check if pandas is imported in the CLI modules
rg -n "import pandas|from pandas" src/srtctl/cli/

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check the structure of src/srtctl to understand what is core
ls -la src/srtctl/

Repository: ishandhanani/srt-slurm

Length of output: 706


🏁 Script executed:

#!/bin/bash
# Check if there are any optional dependencies defined in pyproject.toml
grep -n "optional\|extras" pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 48


Move pandas to optional dependencies; it is not required by the core CLI.

Pandas is only used in the analysis scripts and benchmark serving tools, not by any of the core CLI entry points (srtctl, srtctl-i, srtctl-sweep, srtctl-setup-head). Adding it to core dependencies unnecessarily increases install size for users who only need the CLI. Move it to an optional dependency group:

[project.optional-dependencies]
analysis = ["pandas>=2.3.3"]

Then document in the README that users running analysis tools should install with pip install srtctl[analysis].

🤖 Prompt for AI Agents
In `@pyproject.toml` at line 19, Move "pandas>=2.3.3" out of the core dependencies
in pyproject.toml and add it under an optional extras group named "analysis" by
creating a [project.optional-dependencies] section with analysis =
["pandas>=2.3.3"]; update any packaging metadata that reads dependencies to
ensure pandas is no longer listed as a required dependency for the CLI entry
points (srtctl, srtctl-i, srtctl-sweep, srtctl-setup-head), and add a short note
to the README instructing users who need the analysis/benchmark scripts to
install with pip install srtctl[analysis].

Comment on lines +31 to +43
command="python3 -m sglang.test.run_eval --base-url ${ENDPOINT} --model ${MODEL_NAME} --eval-name longbench_v2 --max-tokens ${MAX_TOKENS} --max-context-length ${MAX_CONTEXT_LENGTH} --num-threads ${NUM_THREADS}"

# Add optional arguments
if [ -n "$NUM_EXAMPLES" ]; then
cmd="$cmd --num-examples ${NUM_EXAMPLES}"
command="$command --num-examples ${NUM_EXAMPLES}"
fi

if [ -n "$CATEGORIES" ]; then
cmd="$cmd --categories ${CATEGORIES}"
command="$command --categories ${CATEGORIES}"
fi

echo "Executing: $cmd"
eval "$cmd"
echo "[CMD] $command"
eval $command
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace eval with an array-based command.
NUM_EXAMPLES/CATEGORIES may include spaces or shell tokens, and eval executes them.

✅ Safer, structured command building
-command="python3 -m sglang.test.run_eval --base-url ${ENDPOINT} --model ${MODEL_NAME} --eval-name longbench_v2 --max-tokens ${MAX_TOKENS} --max-context-length ${MAX_CONTEXT_LENGTH} --num-threads ${NUM_THREADS}"
+command=(
+  python3 -m sglang.test.run_eval --base-url "${ENDPOINT}" --model "${MODEL_NAME}"
+  --eval-name longbench_v2 --max-tokens "${MAX_TOKENS}"
+  --max-context-length "${MAX_CONTEXT_LENGTH}" --num-threads "${NUM_THREADS}"
+)

 if [ -n "$NUM_EXAMPLES" ]; then
-    command="$command --num-examples ${NUM_EXAMPLES}"
+    command+=(--num-examples "${NUM_EXAMPLES}")
 fi

 if [ -n "$CATEGORIES" ]; then
-    command="$command --categories ${CATEGORIES}"
+    command+=(--categories "${CATEGORIES}")
 fi

-echo "[CMD] $command"
-eval $command
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"
🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/longbenchv2/bench.sh` around lines 31 - 43, The
current string-based variable "command" is unsafe because NUM_EXAMPLES and
CATEGORIES can contain spaces/shell tokens and you use eval; instead build the
invocation as an array (e.g., an array variable such as cmd) starting with the
python module and required args (ENDPOINT, MODEL_NAME, MAX_TOKENS,
MAX_CONTEXT_LENGTH, NUM_THREADS) and conditionally append NUM_EXAMPLES and
CATEGORIES as separate array elements only when non-empty; finally echo a joined
representation for logging and run the command using array execution (array
expansion) rather than eval to avoid word-splitting and shell injection.

Comment on lines +60 to +62
command="aiperf profile -m ${MODEL_NAME} --url ${ENDPOINT} --streaming --ui simple --concurrency 10 --request-count 20"
echo "[CMD] $command"
eval $command
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid eval to prevent injection and quoting bugs.
MODEL_NAME, ENDPOINT, and thresholds are user-controlled; eval can mis-handle spaces or execute unintended tokens.

✅ Safer array-based invocation (apply to both commands)
-command="aiperf profile -m ${MODEL_NAME} --url ${ENDPOINT} --streaming --ui simple --concurrency 10 --request-count 20"
-echo "[CMD] $command"
-eval $command
+command=(
+  aiperf profile -m "${MODEL_NAME}" --url "${ENDPOINT}"
+  --streaming --ui simple --concurrency 10 --request-count 20
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"

-command="aiperf profile -m ${MODEL_NAME} --input-file ${INPUT_FILE} --custom-dataset-type mooncake_trace --fixed-schedule --url ${ENDPOINT} --streaming --random-seed 42 --ui simple --artifact-dir ${RUN_ARTIFACT_DIR} --goodput \"time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}\""
-echo "[CMD] $command"
-eval $command
+command=(
+  aiperf profile -m "${MODEL_NAME}" --input-file "${INPUT_FILE}"
+  --custom-dataset-type mooncake_trace --fixed-schedule --url "${ENDPOINT}"
+  --streaming --random-seed 42 --ui simple --artifact-dir "${RUN_ARTIFACT_DIR}"
+  --goodput "time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}"
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"

Also applies to: 79-81

🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/mooncake-router/bench.sh` around lines 60 - 62,
The script builds user-controlled shell strings into the variable named command
and then runs them with eval (e.g., the lines setting command and eval $command
and the similar block around lines 79-81); replace that pattern with a safe
array-style invocation: construct the command as an array where each token is a
separate element and interpolate variables as quoted expansions (e.g.,
cmd=(aiperf profile -m "$MODEL_NAME" --url "$ENDPOINT" --streaming --ui simple
--concurrency 10 --request-count 20)) and then execute with "${cmd[@]}",
applying the same change to the second command block so spaces and injections
are prevented.

Comment on lines +133 to +136

command="python3 -m sglang.bench_serving --backend sglang --model ${model_name} --host ${head_node} --port ${head_port} --dataset-name random --max-concurrency ${PROFILE_CONCURRENCY} --num-prompts 128 --random-input-len ${PROFILE_ISL} --random-output-len ${PROFILE_OSL} --random-range-ratio 1 --warmup-request 0"
echo "[CMD] $command"
eval $command
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid eval; it’s unsafe with env-provided values.
PROFILING_MODE/PROFILE_* inputs can include spaces or shell tokens. Use array execution and keep the logging via printf.

✅ Safer execution without eval
-command="python3 -m sglang.bench_serving --backend sglang --model ${model_name} --host ${head_node} --port ${head_port} --dataset-name random --max-concurrency ${PROFILE_CONCURRENCY} --num-prompts 128 --random-input-len ${PROFILE_ISL} --random-output-len ${PROFILE_OSL} --random-range-ratio 1 --warmup-request 0"
-echo "[CMD] $command"
-eval $command
+command=(
+  python3 -m sglang.bench_serving --backend sglang --model "${model_name}"
+  --host "${head_node}" --port "${head_port}" --dataset-name random
+  --max-concurrency "${PROFILE_CONCURRENCY}" --num-prompts 128
+  --random-input-len "${PROFILE_ISL}" --random-output-len "${PROFILE_OSL}"
+  --random-range-ratio 1 --warmup-request 0
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"

-command="python -m lm_eval --model local-completions --tasks gsm8k --model_args \"base_url=http://${head_node}:${head_port}/v1/completions,model=${model_name},tokenized_requests=False,tokenizer_backend=None,num_concurrent=${PROFILE_CONCURRENCY},timeout=6000,max_retries=1\" --limit 10"
-echo "[CMD] $command"
-eval $command
+command=(
+  python -m lm_eval --model local-completions --tasks gsm8k
+  --model_args "base_url=http://${head_node}:${head_port}/v1/completions,model=${model_name},tokenized_requests=False,tokenizer_backend=None,num_concurrent=${PROFILE_CONCURRENCY},timeout=6000,max_retries=1"
+  --limit 10
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"

Also applies to: 142-145

🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/profiling/profile.sh` around lines 133 - 136,
The script builds a shell command string in the variable named command and then
runs it with eval, which is unsafe for env-provided PROFILE_* and PROFILING_MODE
values; instead construct the argument list as an array (e.g., cmd=(python3 -m
sglang.bench_serving --backend sglang --model "$model_name" ...)) and run it
with "${cmd[@]}" to avoid word-splitting and shell injection; use printf or echo
to log the command safely (e.g., printf '%q ' "${cmd[@]}"; printf '\n') before
executing, and apply the same array+printf replacement for the other eval usages
noted around the block that covers lines 142-145.

Comment on lines 42 to +45
# shellcheck disable=SC2086
python prefix_ratio_benchmark.py \
--prefix-ratios $PREFIX_RATIOS \
--isl "$ISL" \
--osl "$OSL" \
--requests "$REQUESTS" \
--concurrency "$CONCURRENCY" \
--output-dir "$result_dir"
command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"
echo "[CMD] $command"
eval $command
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid eval with user-controlled arguments.
Using eval here allows shell metacharacters in PREFIX_RATIOS/ENDPOINT to execute. A bash array keeps proper argument boundaries and is safer.

✅ Safer execution without eval
-# shellcheck disable=SC2086
-command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"
-echo "[CMD] $command"
-eval $command
+read -r -a prefix_ratio_args <<< "$PREFIX_RATIOS"
+command=(
+  python prefix_ratio_benchmark.py
+  --prefix-ratios "${prefix_ratio_args[@]}"
+  --isl "$ISL"
+  --osl "$OSL"
+  --requests "$REQUESTS"
+  --concurrency "$CONCURRENCY"
+  --output-dir "$result_dir"
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# shellcheck disable=SC2086
python prefix_ratio_benchmark.py \
--prefix-ratios $PREFIX_RATIOS \
--isl "$ISL" \
--osl "$OSL" \
--requests "$REQUESTS" \
--concurrency "$CONCURRENCY" \
--output-dir "$result_dir"
command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"
echo "[CMD] $command"
eval $command
read -r -a prefix_ratio_args <<< "$PREFIX_RATIOS"
command=(
python prefix_ratio_benchmark.py
--prefix-ratios "${prefix_ratio_args[@]}"
--isl "$ISL"
--osl "$OSL"
--requests "$REQUESTS"
--concurrency "$CONCURRENCY"
--output-dir "$result_dir"
)
printf '[CMD] %q ' "${command[@]}"; echo
"${command[@]}"
🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/router/bench.sh` around lines 42 - 45, The
script currently builds a single string in the variable command and uses eval,
which permits shell metacharacters in user-controlled variables (PREFIX_RATIOS,
ISL, OSL, REQUESTS, CONCURRENCY) to be interpreted; replace this with a bash
array for safe argument handling: construct an array named cmd starting with the
interpreter and script (e.g., python prefix_ratio_benchmark.py) and push each
flag and its value as separate elements using quoted variable expansions (e.g.,
"--prefix-ratios" "$PREFIX_RATIOS"), then execute the command with "${cmd[@]}"
and remove the eval and the SC2086 disable. Ensure result_dir is passed as the
value of "--output-dir" similarly and keep the echo of the command by joining
the array for logging if needed.

Comment on lines +53 to 58

command="python3 -u ${WORK_DIR}/benchmark_serving.py --model ${MODEL_NAME} --tokenizer ${MODEL_PATH} --host $HOST --port $PORT --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts $num_prompts --random-input-len $ISL --random-output-len $OSL --random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics ttft,tpot,itl,e2el --max-concurrency $concurrency"

echo "[CMD] $command"
eval $command
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid eval in benchmark loops.
MODEL_NAME, HOST, and concurrency-derived values are user-controlled; eval is unsafe and brittle.

✅ Safer command execution in loops
-command="python3 -u ${WORK_DIR}/benchmark_serving.py --model ${MODEL_NAME} --tokenizer ${MODEL_PATH} --host $HOST --port $PORT --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts $num_prompts --random-input-len $ISL --random-output-len $OSL --random-range-ratio 0.8 --ignore-eos --request-rate 250 --percentile-metrics ttft,tpot,itl,e2el --max-concurrency $concurrency"
-
-echo "[CMD] $command"
-eval $command
+command=(
+  python3 -u "${WORK_DIR}/benchmark_serving.py"
+  --model "${MODEL_NAME}" --tokenizer "${MODEL_PATH}"
+  --host "$HOST" --port "$PORT" --backend dynamo --endpoint /v1/completions
+  --disable-tqdm --dataset-name random --num-prompts "$num_prompts"
+  --random-input-len "$ISL" --random-output-len "$OSL"
+  --random-range-ratio 0.8 --ignore-eos --request-rate 250
+  --percentile-metrics ttft,tpot,itl,e2el --max-concurrency "$concurrency"
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"

-command="python3 -u ${WORK_DIR}/benchmark_serving.py --model ${MODEL_NAME} --tokenizer ${MODEL_PATH} --host $HOST --port $PORT --backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random --num-prompts $num_prompts --random-input-len $ISL --random-output-len $OSL --random-range-ratio 0.8 --ignore-eos --request-rate ${REQ_RATE} --percentile-metrics ttft,tpot,itl,e2el --max-concurrency $concurrency --use-chat-template --save-result --result-dir $result_dir --result-filename $result_filename"
-
-echo "[CMD] $command"
-eval $command
+command=(
+  python3 -u "${WORK_DIR}/benchmark_serving.py"
+  --model "${MODEL_NAME}" --tokenizer "${MODEL_PATH}"
+  --host "$HOST" --port "$PORT" --backend dynamo --endpoint /v1/completions
+  --disable-tqdm --dataset-name random --num-prompts "$num_prompts"
+  --random-input-len "$ISL" --random-output-len "$OSL"
+  --random-range-ratio 0.8 --ignore-eos --request-rate "${REQ_RATE}"
+  --percentile-metrics ttft,tpot,itl,e2el --max-concurrency "$concurrency"
+  --use-chat-template --save-result --result-dir "$result_dir"
+  --result-filename "$result_filename"
+)
+printf '[CMD] %q ' "${command[@]}"; echo
+"${command[@]}"

Also applies to: 77-80

🤖 Prompt for AI Agents
In `@src/srtctl/benchmarks/scripts/sa-bench/bench.sh` around lines 53 - 58, The
loop builds a shell command string and uses eval which is unsafe for untrusted
values (MODEL_NAME, HOST, concurrency); instead construct the command as an
array and execute it without eval: replace the string-assigned command and eval
with an array like cmd=(python3 -u "${WORK_DIR}/benchmark_serving.py" --model
"${MODEL_NAME}" --tokenizer "${MODEL_PATH}" --host "${HOST}" --port "${PORT}"
--backend dynamo --endpoint /v1/completions --disable-tqdm --dataset-name random
--num-prompts "${num_prompts}" --random-input-len "${ISL}" --random-output-len
"${OSL}" --random-range-ratio 0.8 --ignore-eos --request-rate 250
--percentile-metrics ttft,tpot,itl,e2el --max-concurrency "${concurrency}") and
run it via "${cmd[@]}", ensuring every variable is quoted; apply the same
replacement for the similar block referenced at lines 77-80.

Comment on lines +20 to +26
from srtctl.cli.mixins.rollup import (
EnvironmentConfig,
LaunchCommandRollup,
NodesSummary,
RollupResult,
RollupSummary,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Consider re-exporting dataclasses for easier imports.

The tests attempt to import NodeRollup, NodesSummary, etc. from rollup_stage, but these are only imported here and not re-exported. Either update tests to import from srtctl.cli.mixins.rollup directly, or add re-exports here.

🔧 Option: Add re-exports
 from srtctl.cli.mixins.rollup import (
     EnvironmentConfig,
     LaunchCommandRollup,
+    NodeRollup,
     NodesSummary,
     RollupResult,
     RollupSummary,
 )
+
+__all__ = [
+    "RollupStageMixin",
+    "EnvironmentConfig",
+    "LaunchCommandRollup",
+    "NodeRollup",
+    "NodesSummary",
+    "RollupResult",
+    "RollupSummary",
+]
🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 20 - 26, Tests import
dataclasses like NodeRollup and NodesSummary from rollup_stage but this module
only imports them without re-exporting; update rollup_stage to re-export the
needed dataclasses (e.g., NodeRollup, NodesSummary, EnvironmentConfig,
LaunchCommandRollup, RollupResult, RollupSummary) by importing them from
srtctl.cli.mixins.rollup and exposing them in the module namespace (add explicit
imports and/or populate __all__ with these symbol names) so tests can import
them from rollup_stage.

Comment thread tests/test_rollup.py
Comment on lines +11 to +17
from srtctl.cli.mixins.rollup_stage import (
NodeRollup,
NodesSummary,
RollupResult,
RollupStageMixin,
RollupSummary,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix import path - NodeRollup is not exported from rollup_stage.

The pipeline fails with ImportError: cannot import name 'NodeRollup' from 'srtctl.cli.mixins.rollup_stage'. The rollup_stage.py module imports these classes from srtctl.cli.mixins.rollup but doesn't re-export them. Either import directly from the rollup submodule or add re-exports to rollup_stage.py.

🔧 Proposed fix - import from correct module
-from srtctl.cli.mixins.rollup_stage import (
-    NodeRollup,
-    NodesSummary,
-    RollupResult,
-    RollupStageMixin,
-    RollupSummary,
-)
+from srtctl.cli.mixins.rollup import (
+    NodeRollup,
+    NodesSummary,
+    RollupResult,
+    RollupSummary,
+)
+from srtctl.cli.mixins.rollup_stage import RollupStageMixin
🧰 Tools
🪛 GitHub Actions: CI

[error] 11-11: ImportError: cannot import name 'NodeRollup' from 'srtctl.cli.mixins.rollup_stage' during test collection. This caused pytest to fail with exit code 2.

🤖 Prompt for AI Agents
In `@tests/test_rollup.py` around lines 11 - 17, The tests import NodeRollup (and
related classes) from rollup_stage but NodeRollup isn't re-exported there;
update the import to pull NodeRollup, NodesSummary, RollupResult, RollupSummary
directly from the rollup module (or alternatively add re-export statements in
rollup_stage to export those symbols) so the import in tests/test_rollup.py
resolves; target the identifiers NodeRollup, NodesSummary, RollupResult,
RollupSummary and keep RollupStageMixin import from rollup_stage if it is
defined there.


try:
content = benchmark_out_path.read_text()

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIperf provides a profile export json. We should be using that instead of regex

from pathlib import Path
from typing import Any

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The srtslurm.yaml contains the root of the repo. Not sure if you need it but might be helpful

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@analysis/srtlog/rollup_harness.py`:
- Around line 349-353: The code currently uses
"output_tps=data.get('output_throughput', 0) or data.get('output_tps', 0)" which
treats a valid 0 as missing; update the RollupResult construction to preserve
explicit zeroes by checking for None instead of falsy values — e.g. compute a
value = data.get('output_throughput') if data.get('output_throughput') is not
None else data.get('output_tps') if data.get('output_tps') is not None else 0,
and pass that value as output_tps (similarly use explicit None checks for any
other throughput fields) so RollupResult sees bona fide 0s rather than
overwriting them.
- Around line 31-47: The current sort key in find_job_directories mixes int and
str values causing a TypeError when directory names include both numeric and
non-numeric names; update the job_dirs.sort key to return a homogeneous tuple
that orders numeric directories before (or after) non-numeric ones — e.g.,
return (0, int(p.name)) for purely numeric names and (1, p.name) for others — so
all keys share the same types and provide stable ordering; modify the lambda
used in job_dirs.sort accordingly.

In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 464-478: The _get_parser_order function's mapping for "sglang"
lacks the recommended fallback to "sglang-v2"; update the parser_orders dict in
_get_parser_order(backend_type: str) to return ["sglang", "sglang-v2"] for the
"sglang" key (so it will try the primary parser then the v2 fallback), and
update the function docstring to mention the sglang-v2 fallback order
accordingly.

In `@src/srtctl/cli/mixins/rollup/models.py`:
- Around line 370-407: The aggregation logic drops valid zero values because it
filters metrics with truthy checks (e.g., "if n.avg_input_throughput") — update
_compute_aggregate_stats to treat 0 as a valid value by using explicit None
checks. For each throughput/metric list (avg_input_throughput,
max_input_throughput, avg_gen_throughput, max_gen_throughput, kv_cache_gb)
change comprehensions to include values where the field is not None (e.g.,
[n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput
is not None]) and compute averages using that list length so zeros contribute;
likewise ensure total_prefill_tokens/total_cached_tokens and
overall_cache_hit_rate are set even when totals are zero (use explicit
comparisons to None rather than truthiness).
♻️ Duplicate comments (3)
analysis/srtlog/rollup_harness.py (1)

205-215: Guard optional parser helpers before calling.

Parsers that don’t implement parse_result_directory will raise AttributeError and skip other parsing.

🛠️ Proposed fix
-        for entry in logs_dir.iterdir():
+        parse_result_dir = getattr(parser, "parse_result_directory", None)
+        for entry in logs_dir.iterdir():
             if entry.is_dir() and "_isl_" in entry.name and "_osl_" in entry.name:
-                dir_results = parser.parse_result_directory(entry)
-                results.extend(dir_results)
+                if callable(parse_result_dir):
+                    dir_results = parse_result_dir(entry)
+                    results.extend(dir_results)
src/srtctl/cli/mixins/rollup_stage.py (2)

49-52: Implement the abstract endpoints property.

The ellipsis body returns None and violates the declared return type.

🛠️ Proposed fix
     `@property`
     def endpoints(self) -> list[Endpoint]:
         """Endpoint allocation topology."""
-        ...
+        raise NotImplementedError("Subclass must provide endpoints property")

129-149: Fix optional parser helper calls that aren’t in the protocol.

Type checking will still flag these (and they can be non-callable).

🛠️ Proposed fix
-                if hasattr(parser, "find_aiperf_results"):
-                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                find_aiperf = getattr(parser, "find_aiperf_results", None)
+                if callable(find_aiperf):
+                    aiperf_files = find_aiperf(self.runtime.log_dir)
                     for aiperf_path in aiperf_files:
                         result = parser.parse_result_json(aiperf_path)
                         if result.get("output_tps") is not None:
                             results.append(result)
                             logger.debug("Loaded AIPerf result: %s", aiperf_path)
@@
-                if hasattr(parser, "parse_result_directory"):
+                parse_result_dir = getattr(parser, "parse_result_directory", None)
+                if callable(parse_result_dir):
                     for entry in self.runtime.log_dir.iterdir():
                         if not entry.is_dir():
                             continue
                         # Match patterns like sa-bench_isl_X_osl_Y
                         if "_isl_" in entry.name and "_osl_" in entry.name:
                             logger.debug("Found benchmark results directory: %s", entry.name)
-                            dir_results = parser.parse_result_directory(entry)
+                            dir_results = parse_result_dir(entry)
                             results.extend(dir_results)
🧹 Nitpick comments (2)
analysis/srtlog/parsers/nodes/trtllm.py (1)

105-133: Handle multi-line Config(...) dumps to avoid missing TP/PP/EP.

If the config dump wraps lines, the current regex won’t match and config stays empty.

🛠️ Proposed fix
-            config_match = re.search(r"Config\((.*?)\)", clean_content)
+            config_match = re.search(r"Config\((.*?)\)", clean_content, re.DOTALL)
analysis/srtlog/parsers/__init__.py (1)

188-193: Tighten registry typing for better static checks.

🛠️ Proposed fix
-_benchmark_parsers: dict[str, type] = {}
+_benchmark_parsers: dict[str, type[BenchmarkParserProtocol]] = {}
 
 # Registry for node parsers
-_node_parsers: dict[str, type] = {}
+_node_parsers: dict[str, type[NodeParserProtocol]] = {}

Comment on lines +31 to +47
def find_job_directories(log_dir: Path) -> list[Path]:
"""Find all job directories by searching for sbatch_script.sh files.

Args:
log_dir: Root directory to search

Returns:
List of job directory paths (parent dirs of sbatch_script.sh)
"""
job_dirs = []
for sbatch_script in log_dir.rglob("sbatch_script.sh"):
job_dir = sbatch_script.parent
job_dirs.append(job_dir)

# Sort by job ID (directory name) if numeric
job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
return job_dirs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix mixed-type sort keys to avoid TypeError.

job_dirs.sort(...) mixes int and str keys when both numeric and non-numeric dirs exist.

🛠️ Proposed fix
-    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))
🤖 Prompt for AI Agents
In `@analysis/srtlog/rollup_harness.py` around lines 31 - 47, The current sort key
in find_job_directories mixes int and str values causing a TypeError when
directory names include both numeric and non-numeric names; update the
job_dirs.sort key to return a homogeneous tuple that orders numeric directories
before (or after) non-numeric ones — e.g., return (0, int(p.name)) for purely
numeric names and (1, p.name) for others — so all keys share the same types and
provide stable ordering; modify the lambda used in job_dirs.sort accordingly.

Comment on lines +349 to +353
for data in results:
result = RollupResult(
concurrency=data.get("max_concurrency", 0),
output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
total_tps=data.get("total_token_throughput"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid treating 0 throughput as “missing.”

or will override valid 0 values and can skew rollups.

🛠️ Proposed fix
-            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            output_tps=(
+                data.get("output_throughput")
+                if data.get("output_throughput") is not None
+                else data.get("output_tps", 0)
+            ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for data in results:
result = RollupResult(
concurrency=data.get("max_concurrency", 0),
output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
total_tps=data.get("total_token_throughput"),
for data in results:
result = RollupResult(
concurrency=data.get("max_concurrency", 0),
output_tps=(
data.get("output_throughput")
if data.get("output_throughput") is not None
else data.get("output_tps", 0)
),
total_tps=data.get("total_token_throughput"),
🤖 Prompt for AI Agents
In `@analysis/srtlog/rollup_harness.py` around lines 349 - 353, The code currently
uses "output_tps=data.get('output_throughput', 0) or data.get('output_tps', 0)"
which treats a valid 0 as missing; update the RollupResult construction to
preserve explicit zeroes by checking for None instead of falsy values — e.g.
compute a value = data.get('output_throughput') if data.get('output_throughput')
is not None else data.get('output_tps') if data.get('output_tps') is not None
else 0, and pass that value as output_tps (similarly use explicit None checks
for any other throughput fields) so RollupResult sees bona fide 0s rather than
overwriting them.

Comment on lines +464 to +478
def _get_parser_order(self, backend_type: str) -> list[str]:
"""Get the order of parsers to try for a given backend type.

Args:
backend_type: Backend type from config (e.g., "sglang", "trtllm")

Returns:
List of parser types to try in order
"""
parser_orders = {
"sglang": ["sglang"],
"trtllm": ["trtllm"],
}

return parser_orders.get(backend_type, [backend_type])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add fallback for sglang-v2 when backend is sglang.

The docstring mentions fallback order, but the mapping doesn’t include it.

🛠️ Proposed fix
-        parser_orders = {
-            "sglang": ["sglang"],
-            "trtllm": ["trtllm"],
-        }
+        parser_orders = {
+            "sglang": ["sglang", "sglang-v2"],
+            "sglang-v2": ["sglang-v2", "sglang"],
+            "trtllm": ["trtllm"],
+        }
🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 464 - 478, The
_get_parser_order function's mapping for "sglang" lacks the recommended fallback
to "sglang-v2"; update the parser_orders dict in _get_parser_order(backend_type:
str) to return ["sglang", "sglang-v2"] for the "sglang" key (so it will try the
primary parser then the v2 fallback), and update the function docstring to
mention the sglang-v2 fallback order accordingly.

Comment on lines +370 to +407
def _compute_aggregate_stats(self) -> None:
"""Compute aggregate statistics across all nodes."""
# Prefill aggregation (includes both prefill and agg nodes)
prefill_capable_nodes = [n for n in self.nodes if n.worker_type in ("prefill", "agg")]
if prefill_capable_nodes:
total_new = sum(n.total_new_tokens or 0 for n in prefill_capable_nodes)
total_cached = sum(n.total_cached_tokens or 0 for n in prefill_capable_nodes)

if total_new > 0 or total_cached > 0:
self.total_prefill_tokens = total_new
self.total_cached_tokens = total_cached
total = total_new + total_cached
if total > 0:
self.overall_cache_hit_rate = (total_cached / total) * 100

throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput]
if throughputs:
self.avg_prefill_input_throughput = sum(throughputs) / len(throughputs)

max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput]
if max_throughputs:
self.max_prefill_input_throughput = max(max_throughputs)

# Decode aggregation (includes both decode and agg nodes)
decode_capable_nodes = [n for n in self.nodes if n.worker_type in ("decode", "agg")]
if decode_capable_nodes:
throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput]
if throughputs:
self.avg_decode_gen_throughput = sum(throughputs) / len(throughputs)

max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput]
if max_throughputs:
self.max_decode_gen_throughput = max(max_throughputs)

# Memory aggregation
kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb]
if kv_caches:
self.total_kv_cache_gb = sum(kv_caches)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve legitimate zero values in aggregate stats.

Truthy checks drop valid 0 values (e.g., idle nodes), which inflates averages/maxes.

🛠️ Proposed fix
-            throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput]
+            throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput is not None]
             if throughputs:
                 self.avg_prefill_input_throughput = sum(throughputs) / len(throughputs)
 
-            max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput]
+            max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput is not None]
             if max_throughputs:
                 self.max_prefill_input_throughput = max(max_throughputs)
@@
-            throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput]
+            throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput is not None]
             if throughputs:
                 self.avg_decode_gen_throughput = sum(throughputs) / len(throughputs)
 
-            max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput]
+            max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput is not None]
             if max_throughputs:
                 self.max_decode_gen_throughput = max(max_throughputs)
@@
-        kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb]
+        kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb is not None]
         if kv_caches:
             self.total_kv_cache_gb = sum(kv_caches)
🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup/models.py` around lines 370 - 407, The
aggregation logic drops valid zero values because it filters metrics with truthy
checks (e.g., "if n.avg_input_throughput") — update _compute_aggregate_stats to
treat 0 as a valid value by using explicit None checks. For each
throughput/metric list (avg_input_throughput, max_input_throughput,
avg_gen_throughput, max_gen_throughput, kv_cache_gb) change comprehensions to
include values where the field is not None (e.g., [n.avg_input_throughput for n
in prefill_capable_nodes if n.avg_input_throughput is not None]) and compute
averages using that list length so zeros contribute; likewise ensure
total_prefill_tokens/total_cached_tokens and overall_cache_hit_rate are set even
when totals are zero (use explicit comparisons to None rather than truthiness).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/srtctl/cli/mixins/rollup_stage.py`:
- Around line 558-563: The mapping for output_tps in the RollupResult
construction currently only reads data.get("output_throughput", 0) which drops
valid values emitted as "output_tps"; update the construction in the loop that
creates RollupResult (the RollupResult(...) call) to use a fallback that
preserves either key, e.g. read output_throughput first then fallback to
data.get("output_tps") (or vice versa) before defaulting to 0 so parsers
emitting "output_tps" like mooncake-router are preserved.
♻️ Duplicate comments (5)
src/srtctl/cli/mixins/rollup_stage.py (3)

49-52: Make endpoints abstract instead of returning None.

The ellipsis body returns None, violating the declared return type and causing type-checker failures.

🔧 Proposed fix
     `@property`
     def endpoints(self) -> list[Endpoint]:
         """Endpoint allocation topology."""
-        ...
+        raise NotImplementedError("Subclass must provide endpoints property")

129-149: Fix type-checker errors for optional benchmark parser hooks.

BenchmarkParserProtocol doesn’t define find_aiperf_results / parse_result_directory; hasattr still leaves the calls untyped. Use getattr+callable or extend the protocol.

🔧 Proposed fix (getattr + callable)
             # Try parser-specific result collection first
             if parser is not None:
                 # For mooncake-router, look for AIPerf results
-                if hasattr(parser, "find_aiperf_results"):
-                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                find_aiperf = getattr(parser, "find_aiperf_results", None)
+                if callable(find_aiperf):
+                    aiperf_files = find_aiperf(self.runtime.log_dir)
                     for aiperf_path in aiperf_files:
                         result = parser.parse_result_json(aiperf_path)
                         if result.get("output_tps") is not None:
                             results.append(result)
                             logger.debug("Loaded AIPerf result: %s", aiperf_path)

                 # For sa-bench style, look for result directories
-                if hasattr(parser, "parse_result_directory"):
+                parse_result_dir = getattr(parser, "parse_result_directory", None)
+                if callable(parse_result_dir):
                     for entry in self.runtime.log_dir.iterdir():
                         if not entry.is_dir():
                             continue
                         # Match patterns like sa-bench_isl_X_osl_Y
                         if "_isl_" in entry.name and "_osl_" in entry.name:
                             logger.debug("Found benchmark results directory: %s", entry.name)
-                            dir_results = parser.parse_result_directory(entry)
+                            dir_results = parse_result_dir(entry)
                             results.extend(dir_results)

465-479: Add sglang-v2 fallback in parser order.

Without this, sglang backends won’t try the v2 parser even when it’s available.

🔧 Proposed fix
         parser_orders = {
-            "sglang": ["sglang"],
-            "trtllm": ["trtllm"],
+            "sglang": ["sglang", "sglang-v2"],
+            "sglang-v2": ["sglang-v2", "sglang"],
+            "trtllm": ["trtllm"],
         }
analysis/srtlog/rollup_harness.py (2)

45-46: Use homogeneous sort keys for job directory ordering.

Mixed int/str sort keys raise TypeError when both numeric and non-numeric job IDs exist.

🔧 Proposed fix
-    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))

351-354: Don’t treat valid 0 throughput as missing.

Using or overwrites explicit zero values; check for None instead.

🔧 Proposed fix
-            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            output_tps=(
+                data.get("output_throughput")
+                if data.get("output_throughput") is not None
+                else data.get("output_tps", 0)
+            ),

Comment on lines +558 to +563
for data in results:
result = RollupResult(
concurrency=data.get("max_concurrency", 0),
output_tps=data.get("output_throughput", 0),
total_tps=data.get("total_token_throughput"),
request_throughput=data.get("request_throughput"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file at the specified lines
head -580 src/srtctl/cli/mixins/rollup_stage.py | tail -40

Repository: ishandhanani/srt-slurm

Length of output: 1844


🏁 Script executed:

# Find and examine the RollupResult dataclass definition
rg -n "class RollupResult" src/

Repository: ishandhanani/srt-slurm

Length of output: 128


🏁 Script executed:

# Search for output_tps and output_throughput usage in the codebase
rg -n "output_tps|output_throughput" src/

Repository: ishandhanani/srt-slurm

Length of output: 1567


🏁 Script executed:

# Examine the RollupResult dataclass definition
sed -n '411,430p' src/srtctl/cli/mixins/rollup/models.py

Repository: ishandhanani/srt-slurm

Length of output: 681


🏁 Script executed:

# Check the context where data comes from - look at line 136 context
sed -n '130,145p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 973


🏁 Script executed:

# Look for where 'results' data is populated or parsed
rg -n "def.*results|results\s*=" src/srtctl/cli/mixins/rollup_stage.py | head -20

Repository: ishandhanani/srt-slurm

Length of output: 302


🏁 Script executed:

# Check if there are any parsers that produce "output_tps" in the data
rg -B5 -A5 '"output_tps"' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 722


🏁 Script executed:

# Find all parsers to understand data sources
rg -n "class.*Parser|def parse_result" src/srtctl/ | grep -i parser

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Check the sa-bench benchmark_serving.py to see what it outputs
sed -n '70,85p' src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py

Repository: ishandhanani/srt-slurm

Length of output: 432


🏁 Script executed:

# Check the AIPerf parser or mooncake parser
fd -e "\.py$" "aiperf\|mooncake" src/srtctl/

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Search for parse_result_json implementations
rg -B3 -A10 "def parse_result_json" src/srtctl/

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Look for all result parsing logic
rg -n "def _collect_benchmark_results" -A50 src/srtctl/cli/mixins/rollup_stage.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 2635


🏁 Script executed:

# Check what parse_result_directory returns
rg -n "def parse_result_directory" src/ -B2 -A15

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Look for mooncake or AIPerf parser
fd -e "\.py$" src/ | xargs rg -l "aiperf\|AIPerf" | head -5

Repository: ishandhanani/srt-slurm

Length of output: 387


🏁 Script executed:

# Search for where output_tps key comes from in parsing
rg -n '"output_tps"' src/srtctl/cli/mixins/rollup_stage.py -B5 -A5

Repository: ishandhanani/srt-slurm

Length of output: 766


🏁 Script executed:

# Check if there's any transformation of keys between AIPerf result and the mapping
rg -n "parse_result_json" src/srtctl/ -B3 -A20 | head -60

Repository: ishandhanani/srt-slurm

Length of output: 2366


🏁 Script executed:

# Verify the complete flow - check analysis.srtlog.parsers location
find . -name "parsers.py" | head -5

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Check if AIPerf results transformation happens anywhere
rg -n "AIPerf\|aiperf" src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Look at the complete mapping code to understand the flow
sed -n '555,600p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 2033


🏁 Script executed:

# Check the fallback JSON parsing section to understand all data sources
sed -n '155,200p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 1973


🏁 Script executed:

# Look for any key transformation or normalization before line 561
sed -n '520,565p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 1935


🏁 Script executed:

# Verify the sa-bench output structure that produces "output_throughput"
sed -n '690,710p' src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py

Repository: ishandhanani/srt-slurm

Length of output: 991


🏁 Script executed:

# Check if there are any other parsers or places where results are transformed
rg -n "output_tps|output_throughput" src/srtctl/cli/mixins/rollup_stage.py -B2 -A2

Repository: ishandhanani/srt-slurm

Length of output: 967


🏁 Script executed:

# Let me verify if AIPerf results are actually added and would go through line 561
# Check what parse_result_json returns for AIPerf
rg -n "find_aiperf_results\|parse_result_json" src/srtctl/ -B5 -A15 | head -80

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Confirm the exact data structure by looking at where both parsers are used
rg -n "output_tps" src/ --type py | grep -v README

Repository: ishandhanani/srt-slurm

Length of output: 818


🏁 Script executed:

# Let me trace through the exact flow - check if AIPerf results with output_tps key reach the mapping
sed -n '131,140p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 660


🏁 Script executed:

# Verify that AIPerf results are indeed added to results and then processed at line 561
sed -n '135,165p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 1604


🏁 Script executed:

# Confirm the issue: check if there's any key transformation between AIPerf parse and the mapping code
rg -n "results.append\|results.extend" src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Final verification: Check if AIPerf results indeed bypass the filtering at line 136 and reach the mapping
# Let me trace the complete flow - results list is populated from AIPerf, then passed to line 555+
sed -n '555,565p' src/srtctl/cli/mixins/rollup_stage.py

Repository: ishandhanani/srt-slurm

Length of output: 590


🏁 Script executed:

# Verify that the RollupResult dataclass expects float for output_tps (not None)
sed -n '411,425p' src/srtctl/cli/mixins/rollup/models.py

Repository: ishandhanani/srt-slurm

Length of output: 538


Preserve output_tps when output_throughput is missing.

AIPerf results use the output_tps key but the current mapping at line 561 only checks for output_throughput, causing valid throughput data to be lost when mapped to 0. This affects results from parsers like mooncake-router that emit output_tps directly.

🔧 Proposed fix
             result = RollupResult(
                 concurrency=data.get("max_concurrency", 0),
-                output_tps=data.get("output_throughput", 0),
+                output_tps=(
+                    data.get("output_throughput")
+                    if data.get("output_throughput") is not None
+                    else data.get("output_tps", 0)
+                ),
🤖 Prompt for AI Agents
In `@src/srtctl/cli/mixins/rollup_stage.py` around lines 558 - 563, The mapping
for output_tps in the RollupResult construction currently only reads
data.get("output_throughput", 0) which drops valid values emitted as
"output_tps"; update the construction in the loop that creates RollupResult (the
RollupResult(...) call) to use a fallback that preserves either key, e.g. read
output_throughput first then fallback to data.get("output_tps") (or vice versa)
before defaulting to 0 so parsers emitting "output_tps" like mooncake-router are
preserved.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/architecture.md`:
- Around line 929-1014: The fenced ASCII diagram block currently uses an
untagged code fence (```); to satisfy markdownlint MD040, update the opening
fence in the "Detailed Entity Relationships" diagram (the large block that
begins with the ASCII box showing PARSER LAYER / ROLLUP LAYER) to use a language
tag by replacing the opening ``` with ```text so the block starts with ```text
while leaving the content and closing fence unchanged.
- Around line 882-925: The ASCII ER diagram fenced code block starting with the
box diagram (the block that begins with the large
"┌─────────────────────────────────────────────────────────────┐" line) is
missing a language tag; change the opening triple-backtick for that block to
```text so the fenced block reads with the text language tag and satisfies
Markdownlint MD040. Ensure only the opening fence is modified (leave the closing
``` unchanged).
♻️ Duplicate comments (1)
docs/architecture.md (1)

822-826: Add a language tag to the Rollup architecture fenced block.

Markdownlint MD040 is triggered here; set the opening fence to ```text.

Comment thread docs/architecture.md
Comment on lines +882 to +925
```
┌─────────────────────────────────────────────────────────────┐
│ LOG FILES │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────┼─────────────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ worker_*.out/err │ │ benchmark.out │ │ config.yaml │
└───────────────────┘ └───────────────────┘ └───────────────────┘
│ │ │
▼ ▼ ▼
┌───────────────────────────┐ ┌───────────────────────────┐ ┌───────────────────────────┐
│ NodeParserProtocol │ │ BenchmarkParserProtocol │ │ YAML Loader │
│ (sglang.py / trtllm.py) │ │ (sa_bench.py / etc.) │ │ │
└───────────────────────────┘ └───────────────────────────┘ └───────────────────────────┘
│ │ │
┌────────┴────────┐ ┌────────┴────────┐ │
▼ ▼ ▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐
│ NodeMetrics │ │NodeLaunchCommand│ │ dict (result) │ │BenchmarkLaunch │ │ EnvironmentConfig │
│ (models.py) │ │ (__init__.py) │ │ │ │ Command │ │ (rollup/models.py) │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────────┘
│ │ │ │ │
└────────────────────┼─────────────────────┼────────────────────┼──────────────────────────┘
│ │ │
▼ ▼ ▼
┌────────────────────────────────────────────────────────────┐
│ RollupStageMixin │
│ (rollup_stage.py / rollup_harness.py) │
└────────────────────────────────────────────────────────────┘
│ transforms
┌───────────────┐
│ RollupSummary │
└───────┬───────┘
┌───────────────┐
│ rollup.json │
└───────────────┘
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to the ER diagram fenced block.

Markdownlint MD040 is triggered; set the opening fence to ```text.

🛠️ Proposed fix
-```
+```text
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

882-882: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@docs/architecture.md` around lines 882 - 925, The ASCII ER diagram fenced
code block starting with the box diagram (the block that begins with the large
"┌─────────────────────────────────────────────────────────────┐" line) is
missing a language tag; change the opening triple-backtick for that block to
```text so the fenced block reads with the text language tag and satisfies
Markdownlint MD040. Ensure only the opening fence is modified (leave the closing
``` unchanged).

Comment thread docs/architecture.md
Comment on lines +929 to +1014
```
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ PARSER LAYER (analysis/srtlog/parsers/) │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────┐ ┌────────────────────┐ ┌────────────────────┐ │
│ │ NodeLaunchCommand │ │ BenchmarkLaunch │ │ NodeMetrics │ │
│ ├────────────────────┤ │ Command │ ├────────────────────┤ │
│ │ backend_type: str │ ├────────────────────┤ │ node_info: dict │ │
│ │ worker_type: str │ │ benchmark_type: str│ │ config: dict │ │
│ │ raw_command: str │ │ raw_command: str │ │ batches: list │ │
│ │ extra_args: dict │ │ extra_args: dict │ │ memory_snapshots │ │
│ └────────────────────┘ └────────────────────┘ └────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ ROLLUP LAYER (srtctl/cli/mixins/rollup/) │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ LaunchCommandRollup │ │
│ ├─────────────────────────────────────────────────────────────────────────────────┤ │
│ │ raw_command: str │ command_type: str ("worker" | "benchmark") │ │
│ │ model_path: str | None │ served_model_name: str | None │ │
│ │ worker_type: str | None │ backend_type: str | None │ │
│ │ tp_size: int | None │ pp_size: int | None │ │
│ │ dp_size: int | None │ ep_size: int | None │ │
│ │ max_num_seqs: int | None │ max_model_len: int | None │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │ 1:1 │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ NodeRollup │ │
│ ├─────────────────────────────────────────────────────────────────────────────────┤ │
│ │ node_name: str │ worker_type: str │ │
│ │ worker_id: str │ tp_size, pp_size, dp_size, ep_size: int | None │ │
│ │ launch_command: LaunchCommandRollup | None │ │
│ │ avail_mem_gb: float | None │ kv_cache_gb: float | None │ │
│ │ total_batches: int │ avg_gen_throughput: float | None │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │ 1:N │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ NodesSummary │ │
│ ├─────────────────────────────────────────────────────────────────────────────────┤ │
│ │ nodes: list[NodeRollup] │ │
│ │ total_prefill_nodes: int │ total_decode_nodes: int │ │
│ │ total_agg_nodes: int │ total_kv_cache_gb: float | None │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ RollupResult │ │
│ ├─────────────────────────────────────────────────────────────────────────────────┤ │
│ │ max_concurrency: int │ input_len: int │ output_len: int │ │
│ │ output_tps: float │ request_tps: float │ total_tokens: int │ │
│ │ avg_ttft_ms: float │ avg_tpot_ms: float │ p50/p90/p99 metrics │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ EnvironmentConfig │ │
│ ├─────────────────────────────────────────────────────────────────────────────────┤ │
│ │ prefill_environment: dict │ decode_environment: dict │ │
│ │ aggregated_environment: dict│ prefill_engine_config: dict │ │
│ │ decode_engine_config: dict │ aggregated_engine_config: dict │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────┬──────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ RollupSummary │ │
│ ├─────────────────────────────────────────────────────────────────────────────────┤ │
│ │ job_id: str │ model_name: str │ backend_type: str │ │
│ │ benchmark_type: str │ is_disaggregated: bool │ total_nodes: int │ │
│ │ max_output_tps: float │ max_request_tps: float │ │
│ │ results: list[RollupResult] │ │
│ │ nodes_summary: NodesSummary | None │ │
│ │ benchmark_command: LaunchCommandRollup | None │ │
│ │ environment_config: EnvironmentConfig | None │ │
│ │ tags: dict │ timestamp: str │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to the “Detailed Entity Relationships” fenced block.

Markdownlint MD040 is triggered; set the opening fence to ```text.

🛠️ Proposed fix
-```
+```text
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

929-929: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@docs/architecture.md` around lines 929 - 1014, The fenced ASCII diagram block
currently uses an untagged code fence (```); to satisfy markdownlint MD040,
update the opening fence in the "Detailed Entity Relationships" diagram (the
large block that begins with the ASCII box showing PARSER LAYER / ROLLUP LAYER)
to use a language tag by replacing the opening ``` with ```text so the block
starts with ```text while leaving the content and closing fence unchanged.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@analysis/srtlog/parsers/benchmark/mooncake_router.py`:
- Around line 126-129: The truthy check on tps_per_user will drop legitimate
zero values; update the conditional in the block that calls self._get_metric
(variable tps_per_user) to check for None explicitly (e.g., "is not None")
before assigning result["output_tps_per_user"] so that 0 is preserved while
still filtering missing values returned as None.
♻️ Duplicate comments (7)
src/srtctl/cli/mixins/rollup/models.py (1)

394-416: Truthy checks drop valid zero values in aggregate stats.

The list comprehensions use truthy filtering (e.g., if n.avg_input_throughput) which excludes legitimate 0 values. This can inflate averages or miss valid zero-throughput nodes.

Proposed fix
-            throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput]
+            throughputs = [n.avg_input_throughput for n in prefill_capable_nodes if n.avg_input_throughput is not None]
             if throughputs:
                 self.avg_prefill_input_throughput = sum(throughputs) / len(throughputs)

-            max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput]
+            max_throughputs = [n.max_input_throughput for n in prefill_capable_nodes if n.max_input_throughput is not None]
             if max_throughputs:
                 self.max_prefill_input_throughput = max(max_throughputs)

         # Decode aggregation (includes both decode and agg nodes)
         decode_capable_nodes = [n for n in self.nodes if n.worker_type in ("decode", "agg")]
         if decode_capable_nodes:
-            throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput]
+            throughputs = [n.avg_gen_throughput for n in decode_capable_nodes if n.avg_gen_throughput is not None]
             if throughputs:
                 self.avg_decode_gen_throughput = sum(throughputs) / len(throughputs)

-            max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput]
+            max_throughputs = [n.max_gen_throughput for n in decode_capable_nodes if n.max_gen_throughput is not None]
             if max_throughputs:
                 self.max_decode_gen_throughput = max(max_throughputs)

         # Memory aggregation
-        kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb]
+        kv_caches = [n.kv_cache_gb for n in self.nodes if n.kv_cache_gb is not None]
         if kv_caches:
             self.total_kv_cache_gb = sum(kv_caches)
analysis/srtlog/rollup_harness.py (2)

31-47: Mixed-type sort keys can cause TypeError.

The sort key returns int for numeric directory names and str for non-numeric ones. Comparing int to str raises TypeError in Python 3.

Proposed fix
     # Sort by job ID (directory name) if numeric
-    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))
     return job_dirs

350-371: Truthy or operator drops valid zero throughput values.

data.get("output_throughput", 0) or data.get("output_tps", 0) treats 0 as falsy, so if output_throughput is legitimately 0, it incorrectly falls back to output_tps.

Proposed fix
         result = RollupResult(
             concurrency=data.get("max_concurrency", 0),
-            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            output_tps=(
+                data.get("output_throughput")
+                if data.get("output_throughput") is not None
+                else data.get("output_tps", 0)
+            ),
src/srtctl/cli/mixins/rollup_stage.py (4)

49-52: The endpoints property always returns None.

The ellipsis (...) body causes implicit None return, violating the list[Endpoint] type annotation. For a mixin expecting the concrete class to provide this, raise NotImplementedError.

Proposed fix
     `@property`
     def endpoints(self) -> list["Endpoint"]:
         """Endpoint allocation topology."""
-        ...
+        raise NotImplementedError("Subclass must provide endpoints property")

129-149: Use getattr with callable check to satisfy type checker.

The hasattr checks are runtime-safe but the type checker can't verify that find_aiperf_results and parse_result_directory exist on BenchmarkParserProtocol.

Proposed fix using getattr
             if parser is not None:
                 # For mooncake-router, look for AIPerf results
-                if hasattr(parser, "find_aiperf_results"):
-                    aiperf_files = parser.find_aiperf_results(self.runtime.log_dir)
+                find_aiperf = getattr(parser, "find_aiperf_results", None)
+                if find_aiperf is not None:
+                    aiperf_files = find_aiperf(self.runtime.log_dir)
                     for aiperf_path in aiperf_files:
                         result = parser.parse_result_json(aiperf_path)
                         if result.get("output_tps") is not None:
                             results.append(result)
                             logger.debug("Loaded AIPerf result: %s", aiperf_path)

                 # For sa-bench style, look for result directories
-                if hasattr(parser, "parse_result_directory"):
+                parse_result_dir = getattr(parser, "parse_result_directory", None)
+                if parse_result_dir is not None:
                     for entry in self.runtime.log_dir.iterdir():
                         if not entry.is_dir():
                             continue
                         if "_isl_" in entry.name and "_osl_" in entry.name:
                             logger.debug("Found benchmark results directory: %s", entry.name)
-                            dir_results = parser.parse_result_directory(entry)
+                            dir_results = parse_result_dir(entry)
                             results.extend(dir_results)

475-480: Add fallback for sglang-v2 when backend is sglang.

The docstring at line 185 mentions fallback through parser versions, but the mapping doesn't include sglang-v2.

Proposed fix
         parser_orders = {
-            "sglang": ["sglang"],
+            "sglang": ["sglang", "sglang-v2"],
+            "sglang-v2": ["sglang-v2", "sglang"],
             "trtllm": ["trtllm"],
         }

559-563: Preserve output_tps when output_throughput is missing.

Unlike rollup_harness.py, this code doesn't fall back to the output_tps key. AIPerf results use output_tps directly, causing valid throughput data to be lost.

Proposed fix
             result = RollupResult(
                 concurrency=data.get("max_concurrency", 0),
-                output_tps=data.get("output_throughput", 0),
+                output_tps=(
+                    data.get("output_throughput")
+                    if data.get("output_throughput") is not None
+                    else data.get("output_tps", 0)
+                ),
🧹 Nitpick comments (9)
analysis/srtlog/parsers/benchmark/mooncake_router.py (2)

33-79: Regex-based parsing is a fallback; prefer structured JSON.

The parse method uses regex patterns which can be fragile. The parse_result_json method provides more robust parsing from AIPerf's structured output. Consider documenting that this method is intended as a fallback when JSON results aren't available.


92-94: Specify encoding when opening files.

Opening files without explicit encoding can cause issues on systems with different default encodings.

Proposed fix
-            with open(json_path) as f:
+            with open(json_path, encoding="utf-8") as f:
                 data = json.load(f)
analysis/srtlog/rollup_harness.py (3)

63-69: Specify encoding when opening YAML files.

Proposed fix
     try:
         import yaml
-        with open(config_path) as f:
+        with open(config_path, encoding="utf-8") as f:
             return yaml.safe_load(f)

291-299: Specify encoding when opening YAML config files.

Proposed fix
         if prefill_yaml.exists():
-            with open(prefill_yaml) as f:
+            with open(prefill_yaml, encoding="utf-8") as f:
                 env_config.prefill_engine_config = yaml.safe_load(f)
         if decode_yaml.exists():
-            with open(decode_yaml) as f:
+            with open(decode_yaml, encoding="utf-8") as f:
                 env_config.decode_engine_config = yaml.safe_load(f)

381-389: Specify encoding when writing JSON output.

Proposed fix
     rollup_path.parent.mkdir(parents=True, exist_ok=True)
-    with open(rollup_path, "w") as f:
+    with open(rollup_path, "w", encoding="utf-8") as f:
         json.dump(asdict(summary), f, indent=2, default=str)
analysis/srtlog/parsers/benchmark/sa_bench.py (1)

101-103: Specify encoding when opening JSON files.

Proposed fix
-            with open(json_path) as f:
+            with open(json_path, encoding="utf-8") as f:
                 data = json.load(f)
src/srtctl/cli/mixins/rollup_stage.py (2)

508-512: Use ternary operator for total_gpus calculation.

Per pipeline warning, the if-else block can be simplified.

Proposed fix
-        if is_disaggregated:
-            total_gpus = r.prefill_gpus + r.decode_gpus
-        else:
-            total_gpus = (r.agg_nodes or 1) * r.gpus_per_node
+        total_gpus = r.prefill_gpus + r.decode_gpus if is_disaggregated else (r.agg_nodes or 1) * r.gpus_per_node

597-612: Specify encoding when writing JSON output.

Proposed fix
         # Write with nice formatting
-        with open(path, "w") as f:
+        with open(path, "w", encoding="utf-8") as f:
             json.dump(data, f, indent=2, default=str)
analysis/srtlog/parsers/__init__.py (1)

244-247: Star imports are necessary for auto-registration but consider documenting.

The # noqa comments suppress linting, but a brief comment explaining why star imports are used here would aid maintainability.

Suggested documentation
-# Import parsers to trigger registration
+# Import parsers to trigger registration via decorators.
+# Star imports are intentional: each submodule uses `@register_`*_parser decorators
+# that populate _benchmark_parsers and _node_parsers when the module loads.
 from analysis.srtlog.parsers.benchmark import *  # noqa: E402, F401, F403
 from analysis.srtlog.parsers.nodes import *  # noqa: E402, F401, F403

Comment on lines +126 to +129
# Also extract per-user throughput if available
tps_per_user = self._get_metric(data, "output_token_throughput_per_user", "avg")
if tps_per_user:
result["output_tps_per_user"] = tps_per_user
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Truthy check may exclude valid zero throughput.

if tps_per_user: will skip adding the field when the value is 0, which may be a valid measurement.

Proposed fix
             tps_per_user = self._get_metric(data, "output_token_throughput_per_user", "avg")
-            if tps_per_user:
+            if tps_per_user is not None:
                 result["output_tps_per_user"] = tps_per_user
🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/benchmark/mooncake_router.py` around lines 126 - 129,
The truthy check on tps_per_user will drop legitimate zero values; update the
conditional in the block that calls self._get_metric (variable tps_per_user) to
check for None explicitly (e.g., "is not None") before assigning
result["output_tps_per_user"] so that 0 is preserved while still filtering
missing values returned as None.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@analysis/srtlog/rollup_harness.py`:
- Around line 276-303: The code mutates an EnvironmentConfig instance
(env_config) which will break once EnvironmentConfig is converted to a frozen
dataclass; instead, build a plain dict of constructor kwargs, populate keys from
config.get("backend", {}), load YAMLs from prefill_yaml and decode_yaml (using
logs_dir) into local variables like prefill_engine_config and
decode_engine_config, then construct a single immutable EnvironmentConfig(...)
with those kwargs and assign it to environment_config; reference
EnvironmentConfig, env_config (replace), environment_config, prefill_yaml,
decode_yaml and logs_dir when locating where to change the code.
♻️ Duplicate comments (5)
docs/architecture.md (3)

822-826: Add a language tag to this fenced block for MD040 compliance.

Markdownlint flags this block; tagging it as text is sufficient.


893-936: Add a language tag to the ER diagram fenced block.

Markdownlint MD040 is triggered; set the opening fence to ```text.


940-1025: Add a language tag to the "Detailed Entity Relationships" fenced block.

Markdownlint MD040 is triggered; set the opening fence to ```text.

analysis/srtlog/rollup_harness.py (2)

45-46: Avoid mixed-type sort keys in job directory ordering.

int and str keys can’t be compared in Python; mixed numeric/non‑numeric directory names will raise TypeError.

🛠️ Proposed fix
-    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))

356-358: Preserve explicit zero throughput values.

Using or treats valid 0 as missing and overrides it.

🛠️ Proposed fix
-            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            output_tps=(
+                data.get("output_throughput")
+                if data.get("output_throughput") is not None
+                else data.get("output_tps", 0)
+            ),

Comment on lines +276 to +303
# Parse environment config
environment_config = None
try:
import yaml
env_config = EnvironmentConfig()

backend_section = config.get("backend", {})
if "prefill_environment" in backend_section:
env_config.prefill_environment = backend_section["prefill_environment"]
if "decode_environment" in backend_section:
env_config.decode_environment = backend_section["decode_environment"]
if "aggregated_environment" in backend_section:
env_config.aggregated_environment = backend_section["aggregated_environment"]

# Load TRTLLM config files
prefill_yaml = logs_dir / "trtllm_config_prefill.yaml"
decode_yaml = logs_dir / "trtllm_config_decode.yaml"

if prefill_yaml.exists():
with open(prefill_yaml) as f:
env_config.prefill_engine_config = yaml.safe_load(f)
if decode_yaml.exists():
with open(decode_yaml) as f:
env_config.decode_engine_config = yaml.safe_load(f)

if any([env_config.prefill_environment, env_config.decode_environment,
env_config.prefill_engine_config, env_config.decode_engine_config]):
environment_config = env_config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find EnvironmentConfig definition
rg -n "class EnvironmentConfig" --type py -C 5

Repository: ishandhanani/srt-slurm

Length of output: 886


🏁 Script executed:

#!/bin/bash
# Look at the actual code in rollup_harness.py to see the context
fd "rollup_harness.py" --type f | head -1 | xargs wc -l

Repository: ishandhanani/srt-slurm

Length of output: 103


🏁 Script executed:

#!/bin/bash
# Get full EnvironmentConfig definition
rg -n "class EnvironmentConfig" --type py -A 30 src/srtctl/cli/mixins/rollup/models.py

Repository: ishandhanani/srt-slurm

Length of output: 1441


🏁 Script executed:

#!/bin/bash
# Check the actual code in rollup_harness.py at lines 276-303
sed -n '276,303p' analysis/srtlog/rollup_harness.py

Repository: ishandhanani/srt-slurm

Length of output: 1368


EnvironmentConfig must follow frozen dataclass guideline and use factory pattern for construction.

Per coding guidelines, EnvironmentConfig should use @dataclass(frozen=True) for immutability. The current mutation pattern will break once frozen is applied. Build the configuration via kwargs and construct once:

Fix
-        env_config = EnvironmentConfig()
+        env_kwargs: dict[str, Any] = {}
         
         backend_section = config.get("backend", {})
         if "prefill_environment" in backend_section:
-            env_config.prefill_environment = backend_section["prefill_environment"]
+            env_kwargs["prefill_environment"] = backend_section["prefill_environment"]
         if "decode_environment" in backend_section:
-            env_config.decode_environment = backend_section["decode_environment"]
+            env_kwargs["decode_environment"] = backend_section["decode_environment"]
         if "aggregated_environment" in backend_section:
-            env_config.aggregated_environment = backend_section["aggregated_environment"]
+            env_kwargs["aggregated_environment"] = backend_section["aggregated_environment"]
             
         # Load TRTLLM config files
         prefill_yaml = logs_dir / "trtllm_config_prefill.yaml"
         decode_yaml = logs_dir / "trtllm_config_decode.yaml"
         
         if prefill_yaml.exists():
             with open(prefill_yaml) as f:
-                env_config.prefill_engine_config = yaml.safe_load(f)
+                env_kwargs["prefill_engine_config"] = yaml.safe_load(f)
         if decode_yaml.exists():
             with open(decode_yaml) as f:
-                env_config.decode_engine_config = yaml.safe_load(f)
+                env_kwargs["decode_engine_config"] = yaml.safe_load(f)
                 
-        if any([env_config.prefill_environment, env_config.decode_environment, 
-                env_config.prefill_engine_config, env_config.decode_engine_config]):
-            environment_config = env_config
+        if env_kwargs:
+            environment_config = EnvironmentConfig(**env_kwargs)
🤖 Prompt for AI Agents
In `@analysis/srtlog/rollup_harness.py` around lines 276 - 303, The code mutates
an EnvironmentConfig instance (env_config) which will break once
EnvironmentConfig is converted to a frozen dataclass; instead, build a plain
dict of constructor kwargs, populate keys from config.get("backend", {}), load
YAMLs from prefill_yaml and decode_yaml (using logs_dir) into local variables
like prefill_engine_config and decode_engine_config, then construct a single
immutable EnvironmentConfig(...) with those kwargs and assign it to
environment_config; reference EnvironmentConfig, env_config (replace),
environment_config, prefill_yaml, decode_yaml and logs_dir when locating where
to change the code.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.json`:
- Around line 1-2: The file is misnamed with a .json extension while containing
a Python script; rename upload_rollup.json to upload_rollups.py, preserve the
shebang (#!/usr/bin/env python3) and module docstring, and update any references
(scripts, CI configs, imports, README, or tooling) that point to
upload_rollup.json to the new filename upload_rollups.py so execution, linting,
and IDE tooling work correctly.
- Around line 170-182: The code is duplicating launch command data:
extract_launch_commands(data) and the loop populates top_level with keys like
"launch_commands.{worker_type}" (see extract_launch_commands and top_level
usage), but later output.update(launch_commands) re-inserts the raw lists under
plain worker_type keys; remove the output.update(launch_commands) call so only
the intended "launch_commands.{worker_type}" string entries are used (or if you
intended a nested dict, replace that line with output["launch_commands"] =
launch_commands); update tests/consumers that expect only the string keys
accordingly.
🧹 Nitpick comments (3)
analysis/scripts/upload_rollup.json (3)

78-81: Consider logging invalid concurrency values instead of silently ignoring them.

Silently swallowing ValueError can hide data quality issues. Consider logging a warning when a concurrency part cannot be parsed.

♻️ Suggested improvement
+import logging
+
+logger = logging.getLogger(__name__)
+
 def parse_concurrencies(concurrencies_str: str) -> list[int]:
     ...
             try:
                 result.append(int(part))
             except ValueError:
-                pass
+                logger.warning(f"Invalid concurrency value ignored: '{part}'")
     return result

218-234: File read twice unnecessarily.

The JSON file is read at line 219-220 to extract metadata, then read again at line 234 for upload. Consider reading once and reusing.

♻️ Proposed optimization
     try:
-        with open(json_path) as f:
-            data = json.load(f)
+        json_content = json_path.read_text()
+        data = json.loads(json_content)
     except Exception as e:
         return False, f"Failed to read JSON: {e}"
 
     # Extract metadata
     benchmark_type = data.get("benchmark_type", "unknown")
     frontend_type = data.get("frontend_type", "unknown")
     backend_type = data.get("backend_type", "unknown")
     job_name = data.get("job_name", "unknown")
 
     # Use provided session_id or fall back to job_name
     effective_session_id = session_id if session_id else job_name
 
-    # Read the file content for upload
-    json_content = json_path.read_text()
-
     try:
         response = requests.post(

367-372: Consider logging cleanup failures instead of silently ignoring.

The bare except Exception: pass hides all errors during file cleanup. While cleanup failures may not be critical, logging them could help diagnose permission issues or other problems.

♻️ Suggested improvement
     if not args.keep_files and not args.dry_run:
         for fp in generated_files:
             try:
                 fp.unlink()
-            except Exception:
-                pass
+            except Exception as e:
+                print(f"Warning: Failed to clean up {fp}: {e}", file=sys.stderr)
         print(f"Cleaned up {len(generated_files)} generated files")

Comment thread analysis/scripts/upload_rollup.py Outdated
Comment thread analysis/scripts/upload_rollup.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.py`:
- Around line 85-110: The docstring for extract_launch_commands is inaccurate:
it claims keys like "worker-0.launch_command.raw_command" but the function
actually returns a defaultdict (node_type_to_launch_command) mapping worker_type
strings (e.g., "prefill", "decode") to lists of prefixed raw command strings;
update the function docstring to describe that it reads
data["nodes_summary"]["nodes"], prefixes each raw_command with
"#{worker_type}:{node_name}\\n", and returns a dict-like mapping of worker_type
-> list[str] (or defaultdict(list)) containing those command strings, and
mention default values used for missing node_name, worker_type, or raw_command.
♻️ Duplicate comments (1)
analysis/scripts/upload_rollup.py (1)

177-187: Bug: launch commands added twice in conflicting formats.

At line 173, launch commands are added as joined strings with keys like launch_commands.prefill. Then at line 182, output.update(launch_commands) adds the raw lists with plain worker_type keys (e.g., prefill: ["cmd1", "cmd2"]).

This results in duplicate data in conflicting formats. Line 182 should be removed.

🐛 Proposed fix
     for concurrency in concurrencies:
         output = dict(top_level)
         output["concurrency"] = concurrency
 
-        # Add launch commands
-        output.update(launch_commands)
-
         # Add result for this concurrency with "result_" prefix
         result = result_by_concurrency.get(concurrency, {})
🧹 Nitpick comments (2)
analysis/scripts/upload_rollup.py (2)

75-82: Silent suppression of parse errors may mask malformed input.

When parsing fails (e.g., "32x64xabc"), the invalid part is silently dropped. Consider logging a warning to aid debugging.

♻️ Suggested improvement
+import logging
+
+logger = logging.getLogger(__name__)
+
 def parse_concurrencies(concurrencies_str: str) -> list[int]:
     ...
     for part in parts:
         part = part.strip()
         if part:
             try:
                 result.append(int(part))
             except ValueError:
-                pass
+                logger.warning(f"Skipping non-integer concurrency value: {part!r}")
     return result

218-234: File is read twice; consider consolidating.

The file is opened at line 219 for JSON parsing, then read again at line 234 for upload content. This can be combined.

♻️ Suggested improvement
-    try:
-        with open(json_path) as f:
-            data = json.load(f)
-    except Exception as e:
-        return False, f"Failed to read JSON: {e}"
-
-    # Extract metadata
-    ...
-
-    # Read the file content for upload
-    json_content = json_path.read_text()
+    try:
+        json_content = json_path.read_text()
+        data = json.loads(json_content)
+    except Exception as e:
+        return False, f"Failed to read JSON: {e}"
+
+    # Extract metadata
+    ...

Comment thread analysis/scripts/upload_rollup.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.py`:
- Around line 131-138: The code builds a contextualized string in
raw_command_str (including worker_type, node_name, worker_env_config,
engine_config and raw_command) but mistakenly appends raw_command to
node_type_to_launch_command[worker_type]; change the append to add
raw_command_str instead so the list stores the formatted command with its
metadata (update the append call on node_type_to_launch_command in the block
that constructs raw_command_str).
- Around line 191-205: The sbatch_script block uses raw_cmd which may never be
set if benchmark_cmd is falsy; initialize or derive raw_cmd before it’s
referenced (e.g., set raw_cmd = benchmark_cmd.get("raw_command", "") right after
obtaining benchmark_cmd) and then use that safe value in the top_level.update
call (the code that writes "benchmark_command": f"#sbatch_script\n\n{raw_cmd}").
Ensure raw_cmd is defined (or fallback to an empty string) so top_level.update
and the sbatch_script handling cannot raise a NameError.
♻️ Duplicate comments (3)
analysis/srtlog/rollup_harness.py (3)

31-47: Fix mixed-type sort keys to avoid TypeError.

When both numeric and non-numeric directory names exist, the sort key mixes int and str types, causing a TypeError in Python 3.

🛠️ Proposed fix
-    job_dirs.sort(key=lambda p: (int(p.name) if p.name.isdigit() else p.name))
+    job_dirs.sort(key=lambda p: (0, int(p.name)) if p.name.isdigit() else (1, p.name))

276-303: EnvironmentConfig mutation will break once frozen dataclass guideline is applied.

Per coding guidelines, configuration objects should use @dataclass(frozen=True). The current mutation pattern will fail. Build kwargs dict and construct once.

🛠️ Proposed fix
-        env_config = EnvironmentConfig()
+        env_kwargs: dict[str, Any] = {}
         
         backend_section = config.get("backend", {})
         if "prefill_environment" in backend_section:
-            env_config.prefill_environment = backend_section["prefill_environment"]
+            env_kwargs["prefill_environment"] = backend_section["prefill_environment"]
         if "decode_environment" in backend_section:
-            env_config.decode_environment = backend_section["decode_environment"]
+            env_kwargs["decode_environment"] = backend_section["decode_environment"]
         if "aggregated_environment" in backend_section:
-            env_config.aggregated_environment = backend_section["aggregated_environment"]
+            env_kwargs["aggregated_environment"] = backend_section["aggregated_environment"]
             
         # Load TRTLLM config files
         prefill_yaml = logs_dir / "trtllm_config_prefill.yaml"
         decode_yaml = logs_dir / "trtllm_config_decode.yaml"
         
         if prefill_yaml.exists():
             with open(prefill_yaml) as f:
-                env_config.prefill_engine_config = yaml.safe_load(f)
+                env_kwargs["prefill_engine_config"] = yaml.safe_load(f)
         if decode_yaml.exists():
             with open(decode_yaml) as f:
-                env_config.decode_engine_config = yaml.safe_load(f)
+                env_kwargs["decode_engine_config"] = yaml.safe_load(f)
                 
-        if any([env_config.prefill_environment, env_config.decode_environment, 
-                env_config.prefill_engine_config, env_config.decode_engine_config]):
-            environment_config = env_config
+        if env_kwargs:
+            environment_config = EnvironmentConfig(**env_kwargs)

Based on learnings, frozen dataclasses should be used for configuration objects.


354-358: Avoid treating 0 throughput as "missing."

The or operator will override valid 0 values, potentially skewing rollup data.

🛠️ Proposed fix
-            output_tps=data.get("output_throughput", 0) or data.get("output_tps", 0),
+            output_tps=(
+                data.get("output_throughput")
+                if data.get("output_throughput") is not None
+                else data.get("output_tps", 0)
+            ),
🧹 Nitpick comments (1)
analysis/scripts/upload_rollup.py (1)

484-490: Consider logging cleanup failures instead of silently ignoring them.

The bare except Exception: pass hides potential filesystem issues. While not critical, logging at debug level would aid troubleshooting.

♻️ Proposed improvement
     if not args.keep_files and not args.dry_run:
         for fp in generated_files:
             try:
                 fp.unlink()
-            except Exception:
-                pass
+            except Exception as e:
+                print(f"  Warning: Could not delete {fp}: {e}", file=sys.stderr)
         print(f"\nCleaned up {len(generated_files)} generated files")

Comment thread analysis/scripts/upload_rollup.py Outdated
Comment on lines +131 to +138
raw_command_str = f"#{worker_type}:{node_name}"
if worker_env_config:
raw_command_str += f"\n#env_vars:{worker_env_config}"
if engine_config:
raw_command_str += f"\n#engine_config:{engine_config}"
raw_command_str += f"\n{raw_command}"

node_type_to_launch_command[worker_type].append(raw_command)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: raw_command_str is built but raw_command is appended instead.

The function constructs raw_command_str with environment and engine config context (lines 131-136), but then appends the original raw_command to the list, discarding all the formatted context.

🐛 Proposed fix
         raw_command_str = f"#{worker_type}:{node_name}"
         if worker_env_config:
             raw_command_str += f"\n#env_vars:{worker_env_config}"
         if engine_config:
             raw_command_str += f"\n#engine_config:{engine_config}"
         raw_command_str += f"\n{raw_command}"
 
-        node_type_to_launch_command[worker_type].append(raw_command)
+        node_type_to_launch_command[worker_type].append(raw_command_str)
🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 131 - 138, The code builds a
contextualized string in raw_command_str (including worker_type, node_name,
worker_env_config, engine_config and raw_command) but mistakenly appends
raw_command to node_type_to_launch_command[worker_type]; change the append to
add raw_command_str instead so the list stores the formatted command with its
metadata (update the append call on node_type_to_launch_command in the block
that constructs raw_command_str).

Comment thread analysis/scripts/upload_rollup.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.py`:
- Around line 411-414: The metadata assignment currently hardcodes
df["experiment_id"] = 'STUDY' which is a placeholder; replace it so
experiment_id is set from the proper source (e.g., use args.study_id or the
correct experiment identifier variable) so that df["experiment_id"] and
df["study_id"] are populated with meaningful values; update the block that sets
df["experiment_id"], df["study_id"], and df["user_login"] in upload_rollup.py
accordingly (use args.study_id if experiment_id should match study_id, or the
correct experiment id variable if different).
♻️ Duplicate comments (2)
analysis/scripts/upload_rollup.py (2)

131-138: Bug: raw_command_str is built but raw_command is appended instead.

The function constructs raw_command_str with worker type, node name, environment config, and engine config context (lines 131-136), but then appends the original raw_command to the list (line 138), discarding all the formatted context.

🐛 Proposed fix
         raw_command_str = f"#{worker_type}:{node_name}"
         if worker_env_config:
             raw_command_str += f"\n#env_vars:{worker_env_config}"
         if engine_config:
             raw_command_str += f"\n#engine_config:{engine_config}"
         raw_command_str += f"\n{raw_command}"

-        node_type_to_launch_command[worker_type].append(raw_command)
+        node_type_to_launch_command[worker_type].append(raw_command_str)

191-205: Potential NameError: raw_cmd may be undefined when sbatch_script exists.

If benchmark_cmd is falsy (empty dict, None, or missing), raw_cmd is never defined. However, line 203 uses raw_cmd unconditionally when sbatch_script exists, causing a NameError.

🐛 Proposed fix
     # Extract benchmark_command raw_command only
     benchmark_cmd = data.get("benchmark_command", {})
+    raw_cmd = None
     if benchmark_cmd:
         raw_cmd = benchmark_cmd.get("raw_command")
-        
+
         if raw_cmd:
             top_level["benchmark_command"] = raw_cmd
-    
+
     sbatch_script = data.get("sbatch_script")
-    if sbatch_script:
+    if sbatch_script and raw_cmd:
         top_level.update(
             {
                 "benchmark_command": f"#sbatch_script\n\n{raw_cmd}",
             }
         )
🧹 Nitpick comments (2)
analysis/scripts/upload_rollup.py (2)

302-305: Consider narrowing the exception handler.

Bare except Exception could mask important errors like permission issues or encoding problems. Consider catching specific exceptions.

♻️ Suggested improvement
     if sbatch_path.exists():
         try:
             return sbatch_path.read_text()
-        except Exception:
+        except (OSError, UnicodeDecodeError) as e:
+            # Log or handle specific failure cases if needed
             return None

448-451: Filename sanitization only handles / character.

The sanitization replaces / with _, but other problematic characters (spaces, \, :, *, ?, ", <, >, |) could cause issues on some filesystems.

♻️ Suggested improvement
+import re
+
+def _sanitize_filename_part(s: str) -> str:
+    """Sanitize a string for use in filenames."""
+    return re.sub(r'[\\/:*?"<>|\s]+', '_', str(s))
+
         # Create group Parquet filename
-        safe_backend = str(backend).replace("/", "_")
-        safe_frontend = str(frontend).replace("/", "_")
-        safe_benchmarker = str(benchmarker).replace("/", "_")
+        safe_backend = _sanitize_filename_part(backend)
+        safe_frontend = _sanitize_filename_part(frontend)
+        safe_benchmarker = _sanitize_filename_part(benchmarker)

Comment thread analysis/scripts/upload_rollup.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@analysis/scripts/upload_rollup.py`:
- Around line 2-18: The module docstring and CLI usage claim the script writes
Parquet but the implementation actually writes .json files and posts gzipped
JSON; update the module-level docstring and the CLI/help text to accurately
describe the current behavior (or, if you want Parquet, change the
implementation to serialize to Parquet instead). Specifically, edit the module
docstring and the usage/help string to state “gzipped JSON” (or add a --mode
flag to choose between parquet/json), and ensure the upload routine (the
function that writes .json and posts gzipped JSON) and any filenames/extensions
match the documented mode so docs and implementation are consistent.
- Around line 174-194: The type annotation for the defaultdict named groups is
incorrect: it declares a 3-tuple key (defaultdict[tuple[str, str, str], ...])
but the code builds a 4-tuple key as group = (data['benchmark_type'],
data['frontend_type'], data['backend_type'], mode); update the annotation at the
groups declaration to use a 4-tuple (tuple[str, str, str, str]) so the type
matches the actual key shape used by the loop that processes rollup_path and
constructs group.
- Around line 33-57: The CLI/handler uses the name study_id but upload_json
currently accepts and posts session_id, causing a payload name mismatch; fix by
aligning names: change upload_json's parameter from session_id to study_id (and
update its docstring), change the JSON payload key from "session_id" to
"study_id", and update every call site that passes the CLI value (the CLI
variable study_id and any callers of upload_json) to use the renamed parameter
so the sent payload matches the server contract.
- Around line 192-200: The code computes mode using the wrong field and directly
indexes required keys which can raise KeyError; change the mode computation to
use data.get("is_disaggregated") (invert logic accordingly) and add defensive
validation before accessing data['benchmark_type'], data['frontend_type'],
data['backend_type'], and data['job_name'] (used later around
read_sbatch_script/ groups[group].append). Follow the function's existing
error-handling pattern: check for missing keys, log an error identifying the
missing field(s) and the rollup_path or job_name, and skip/continue that rollup
instead of indexing into data; ensure read_sbatch_script(rollup_path) usage
remains unchanged.
🧹 Nitpick comments (1)
analysis/scripts/upload_rollup.py (1)

118-118: Add an explicit return type for main.

Minor type-hinting polish: declare main() -> None for consistency and to satisfy the “type hints everywhere” guideline. As per coding guidelines.

🧩 Suggested tweak
-def main():
+def main() -> None:

Comment on lines +2 to +18
"""Upload rollup.json files to SRT endpoint as Parquet files grouped by backend/frontend/benchmarker.

Process:
1. Find all rollup.json files in the directory
2. Create flattened JSON for each concurrency level
3. Combine all into a single Parquet file
4. Group by (backend_type, frontend_type, benchmark_type)
5. Upload separate Parquet file for each group

Usage:
python upload_rollup.py <directory> <user_login> [--study-id STUDY] [--endpoint URL]

Example:
python upload_rollup.py ./outputs user@example.com
python upload_rollup.py ./outputs user@example.com --study-id my-study
python upload_rollup.py ./outputs user@example.com --endpoint that accepts the upload
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docs mention Parquet but the script uploads gzipped JSON.

The module docstring and CLI description say “Parquet,” while the implementation writes .json files and posts gzipped JSON. Please align the wording (and include mode if intentional) to avoid confusion.

✏️ Suggested text update
-"""Upload rollup.json files to SRT endpoint as Parquet files grouped by backend/frontend/benchmarker.
+"""Upload rollup.json files to SRT endpoint as gzipped JSON files grouped by backend/frontend/benchmarker/mode.
 
 Process:
 1. Find all rollup.json files in the directory
-2. Create flattened JSON for each concurrency level
-3. Combine all into a single Parquet file
-4. Group by (backend_type, frontend_type, benchmark_type)
-5. Upload separate Parquet file for each group
+2. Group by (benchmark_type, frontend_type, backend_type, mode)
+3. Write a JSON file per group
+4. Upload a gzipped JSON file for each group
 ...
-        description="Upload rollup.json files to SRT endpoint as Parquet (grouped by backend/frontend/benchmarker)"
+        description="Upload rollup.json files to SRT endpoint as gzipped JSON (grouped by backend/frontend/benchmarker/mode)"

Also applies to: 118-121

🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 2 - 18, The module docstring
and CLI usage claim the script writes Parquet but the implementation actually
writes .json files and posts gzipped JSON; update the module-level docstring and
the CLI/help text to accurately describe the current behavior (or, if you want
Parquet, change the implementation to serialize to Parquet instead).
Specifically, edit the module docstring and the usage/help string to state
“gzipped JSON” (or add a --mode flag to choose between parquet/json), and ensure
the upload routine (the function that writes .json and posts gzipped JSON) and
any filenames/extensions match the documented mode so docs and implementation
are consistent.

Comment on lines +33 to +57
def upload_json(
json_path: Path,
user_login: str,
session_id: str,
endpoint: str,
backend: str,
benchmarker: str,
frontend: str,
mode: str,
) -> tuple[bool, str]:
"""Upload a gzipped JSON file to the endpoint.

Args:
json_path: Path to the JSON file
user_login: User login/email
session_id: Session ID for the upload
endpoint: API endpoint URL
backend: Backend type
benchmarker: Benchmark type
frontend: Frontend type
mode: Mode (disaggregated or aggregated)

Returns:
Tuple of (success, message)
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "study_id|session_id" -S --type=py | head -100

Repository: ishandhanani/srt-slurm

Length of output: 846


🏁 Script executed:

# Get more context around the upload_json function to see what's actually posted
cat -n analysis/scripts/upload_rollup.py | sed -n '33,100p'

Repository: ishandhanani/srt-slurm

Length of output: 2535


🏁 Script executed:

# Check for CLI parameter definition
fd -e py -x rg -l "study.id|study-id" {}

Repository: ishandhanani/srt-slurm

Length of output: 101


🏁 Script executed:

cat -n analysis/scripts/upload_rollup.py | sed -n '230,250p'

Repository: ishandhanani/srt-slurm

Length of output: 758


🏁 Script executed:

# Search for any API documentation or comments about expected field names
rg -n -B2 -A2 "study_id|session_id" analysis/scripts/upload_rollup.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 1261


🏁 Script executed:

# Check if there's any server-side code or documentation
fd -e py -e md | xargs rg -l "study_id.*session_id|session_id.*study_id" 2>/dev/null || echo "No files mention both"

Repository: ishandhanani/srt-slurm

Length of output: 87


Confirm naming mismatch between CLI and server payload: study_id vs. session_id.

The CLI captures study_id (line 234), but upload_json() receives it as session_id (line 36) and posts it as "session_id" in the payload (line 70). If the server API expects study_id, uploads will be rejected or misattributed. Verify the server contract and align parameter names consistently.

🐛 Possible alignment (if the API expects `study_id`)
-def upload_json(
-    json_path: Path,
-    user_login: str,
-    session_id: str,
-    endpoint: str,
-    backend: str,
-    benchmarker: str,
-    frontend: str,
-    mode: str,
-) -> tuple[bool, str]:
+def upload_json(
+    json_path: Path,
+    user_login: str,
+    study_id: str,
+    endpoint: str,
+    backend: str,
+    benchmarker: str,
+    frontend: str,
+    mode: str,
+) -> tuple[bool, str]:
@@
-        session_id: Session ID for the upload
+        study_id: Study ID for the upload
@@
-                "session_id": session_id,
+                "study_id": study_id,

Also applies to: 68–74, 240–244

🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 33 - 57, The CLI/handler uses
the name study_id but upload_json currently accepts and posts session_id,
causing a payload name mismatch; fix by aligning names: change upload_json's
parameter from session_id to study_id (and update its docstring), change the
JSON payload key from "session_id" to "study_id", and update every call site
that passes the CLI value (the CLI variable study_id and any callers of
upload_json) to use the renamed parameter so the sent payload matches the server
contract.

Comment on lines +174 to +194
groups: defaultdict[tuple[str, str, str], list[dict]] = defaultdict(list)

for rollup_path in sorted(rollup_files):
print(f"Processing: {rollup_path}")

try:
with open(rollup_path) as f:
data = json.load(f)
except Exception as e:
print(f" ✗ Failed to read: {e}")
failed_count += 1
continue

# Skip if no nodes_summary (job likely failed/cancelled)
if not data.get("nodes_summary"):
print(" ⚠ Skipping: no nodes_summary (job may have failed)")
continue

mode = "aggregated" if data.get("is_aggregated") else "disaggregated"
group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode)
# Read sbatch script for this job
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

groups key annotation is a 3‑tuple but the key is a 4‑tuple.

Line 174 declares defaultdict[tuple[str, str, str], ...], yet Line 193 constructs (benchmark_type, frontend_type, backend_type, mode). This will fail ty and mislead readers. Update the annotation to a 4‑tuple. As per coding guidelines, keep type annotations accurate for ty.

✅ Fix type annotation
-groups: defaultdict[tuple[str, str, str], list[dict]] = defaultdict(list)
+groups: defaultdict[tuple[str, str, str, str], list[dict]] = defaultdict(list)
🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 174 - 194, The type
annotation for the defaultdict named groups is incorrect: it declares a 3-tuple
key (defaultdict[tuple[str, str, str], ...]) but the code builds a 4-tuple key
as group = (data['benchmark_type'], data['frontend_type'], data['backend_type'],
mode); update the annotation at the groups declaration to use a 4-tuple
(tuple[str, str, str, str]) so the type matches the actual key shape used by the
loop that processes rollup_path and constructs group.

Comment on lines +192 to +200
mode = "aggregated" if data.get("is_aggregated") else "disaggregated"
group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode)
# Read sbatch script for this job
sbatch_script = read_sbatch_script(rollup_path)

# Add sbatch script to each row
if sbatch_script:
data["sbatch_script"] = sbatch_script
groups[group].append(data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the file and get its context
find . -name "upload_rollup.py" -type f

Repository: ishandhanani/srt-slurm

Length of output: 101


🏁 Script executed:

# Get the full context around the lines in question (lines 180-250)
sed -n '180,250p' analysis/scripts/upload_rollup.py | cat -n

Repository: ishandhanani/srt-slurm

Length of output: 2759


🏁 Script executed:

# Search for rollup schema definitions or sample data
fd -e json -e py | xargs rg -l "rollup|is_aggregated|is_disaggregated|benchmark_type|frontend_type|backend_type" | head -20

Repository: ishandhanani/srt-slurm

Length of output: 690


🏁 Script executed:

# Check the rollup models definition
head -100 src/srtctl/cli/mixins/rollup/models.py

Repository: ishandhanani/srt-slurm

Length of output: 3072


🏁 Script executed:

# Check test data for rollup
grep -A 20 "is_aggregated\|is_disaggregated\|benchmark_type" tests/test_rollup.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 1830


🏁 Script executed:

# Check the rollup harness
grep -B 5 -A 10 "is_aggregated\|is_disaggregated" src/srtctl/cli/mixins/rollup_stage.py analysis/srtlog/rollup_harness.py

Repository: ishandhanani/srt-slurm

Length of output: 8362


🏁 Script executed:

# Check the schema.py for field definitions
grep -B 2 -A 5 "benchmark_type\|frontend_type\|backend_type\|is_aggregated\|job_name" src/srtctl/core/schema.py

Repository: ishandhanani/srt-slurm

Length of output: 849


🏁 Script executed:

# Check the RollupSummary dataclass to see all expected fields
grep -A 100 "^class RollupSummary" src/srtctl/cli/mixins/rollup/models.py | head -150

Repository: ishandhanani/srt-slurm

Length of output: 2757


🏁 Script executed:

# Verify the field usage in upload_rollup.py more broadly
cat -n analysis/scripts/upload_rollup.py | grep -B 5 -A 5 "is_aggregated\|benchmark_type\|frontend_type\|backend_type\|job_name" | head -60

Repository: ishandhanani/srt-slurm

Length of output: 2180


Fix field name and add defensive validation for missing rollup fields.

Line 192 checks is_aggregated but the schema defines is_disaggregated, causing mode to always be calculated as "disaggregated". Lines 193 and 236 directly index required fields without checks—missing benchmark_type, frontend_type, backend_type, or job_name will raise KeyError and abort processing. Add validation to match the existing error-handling pattern in this function:

🛡️ Defensive checks
-        mode = "aggregated" if data.get("is_aggregated") else "disaggregated"
-        group = (data['benchmark_type'], data['frontend_type'], data['backend_type'], mode)
+        mode = "aggregated" if data.get("is_disaggregated") else "disaggregated"
+        try:
+            benchmark_type = data["benchmark_type"]
+            frontend_type = data["frontend_type"]
+            backend_type = data["backend_type"]
+        except KeyError as exc:
+            print(f"  ✗ Skipping: missing {exc} in rollup.json")
+            failed_count += 1
+            continue
+        group = (benchmark_type, frontend_type, backend_type, mode)
@@
-            study_id = rows[0]["job_name"]
+            study_id = rows[0].get("job_name")
+            if not study_id:
+                print("  ✗ Missing job_name; skipping upload")
+                upload_failed_count += 1
+                continue

Also applies to: 232-236

🤖 Prompt for AI Agents
In `@analysis/scripts/upload_rollup.py` around lines 192 - 200, The code computes
mode using the wrong field and directly indexes required keys which can raise
KeyError; change the mode computation to use data.get("is_disaggregated")
(invert logic accordingly) and add defensive validation before accessing
data['benchmark_type'], data['frontend_type'], data['backend_type'], and
data['job_name'] (used later around read_sbatch_script/ groups[group].append).
Follow the function's existing error-handling pattern: check for missing keys,
log an error identifying the missing field(s) and the rollup_path or job_name,
and skip/continue that rollup instead of indexing into data; ensure
read_sbatch_script(rollup_path) usage remains unchanged.

@KaunilD KaunilD requested a review from ishandhanani January 24, 2026 17:50
@property
def endpoints(self) -> list[Endpoint]:
"""Endpoint allocation topology."""
...
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise NotImplementedError/NotApplicable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants