add srtlog functionality to vectorize logs by KaunilD · Pull Request #101 · ishandhanani/srt-slurm

KaunilD · 2026-01-26T20:55:54Z

Summary by CodeRabbit

New Features
- Parser registry plus multiple new benchmark and node parsers; node logs now yield richer node metadata and launch-command extraction; dashboard accepts more timestamp formats and reads ISL/OSL from profiler metadata.
Bug Fixes / Improvements
- Benchmark/run loading now uses parser-driven result discovery with better fallbacks and artifact handling; benchmark scripts log commands and save artifacts.
Tests
- Large fixture and integration test suites for parsers, run loader, node metrics, and dashboard timestamp handling.
Documentation
- Added architecture/dataflow doc describing parser registry and end-to-end log analysis.
Refactor
- Separated profiler configuration metadata from runtime results and changed node-metrics payload shape.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-26T20:56:10Z

Warning

Rate limit exceeded

@KaunilD has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 9 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Introduces a parser registry and multiple benchmark/node parsers; adds metadata-focused data models (ProfilerMetadata, NodeMetadata, NodeInfo); migrates dashboard uses to profiler_metadata; integrates parser-driven benchmark and node parsing into RunLoader and NodeAnalyzer; adds caching, extensive tests, fixtures, and architecture documentation.

Changes

Cohort / File(s)	Summary
Dashboard profiler_metadata migration `analysis/dashboard/app.py`, `analysis/dashboard/config_tab.py`, `analysis/dashboard/node_metrics_tab.py`, `analysis/dashboard/rate_match_tab.py`	Switched UI/filtering and metric sources from `run.profiler.` to `run.profiler_metadata.`; added `_parse_timestamp` helper in `rate_match_tab`.
Node metrics payload shape `analysis/dashboard/components.py`	`NodeMetrics` dict output now emits `node_info` as a nested dict with keys `node`, `worker_type`, and `worker_id` (shape change).
Data model refactor `analysis/srtlog/models.py`	Added `ProfilerMetadata`, `NodeMetadata`, `NodeInfo`, `NodeConfig`, `TopologyInfo`, `BenchmarkLaunchCommand`, `NodeLaunchCommand`; updated `BenchmarkRun`, `NodeMetrics`, and `ProfilerResults` to use metadata-oriented fields.
Parser infrastructure & registry `analysis/srtlog/parsers/__init__.py`, `analysis/srtlog/parsers/benchmark/__init__.py`, `analysis/srtlog/parsers/nodes/__init__.py`	New parser protocols, registries, decorators, factory functions (`get_benchmark_parser`, `get_node_parser`), and public exports for benchmark/node parsers.
Benchmark parsers `analysis/srtlog/parsers/benchmark/sa_bench.py`, `analysis/srtlog/parsers/benchmark/mooncake_router.py`	Added `SABenchParser` and `MooncakeRouterParser` with JSON/AIPerf-first parsing, `benchmark.out` fallbacks, and launch-command extraction.
Node parsers `analysis/srtlog/parsers/nodes/sglang.py`, `analysis/srtlog/parsers/nodes/trtllm.py`	Added `SGLangNodeParser` and `TRTLLMNodeParser` producing `NodeInfo` (batches, memory snapshots, config, launch command).
Log parsing & NodeAnalyzer `analysis/srtlog/log_parser.py`	Reworked `parse_run_logs` to detect backend, use `get_node_parser()`, return `NodeInfo` objects (with dict conversion helper), added backend detection and config population from files.
RunLoader integration `analysis/srtlog/run_loader.py`	Migrated to `profiler_metadata`, added parser-driven benchmark result flow and normalizer, manual JSON fallback, and `load_node_metrics` / `load_node_metrics_for_run` APIs.
Config reader change `analysis/srtlog/config_reader.py`	`parse_command_line_from_err` now returns `TopologyInfo` (import changed from `ParsedCommandInfo` → `TopologyInfo`) and docstring/type hints updated.
Tests & fixtures `tests/fixtures_parsers.py`, `tests/test_parsers.py`, `tests/test_runloader_parsers.py`, `tests/test_dashboard.py`	New fixtures and comprehensive tests for parser registry, benchmark/node parsers, RunLoader integration, and timestamp parsing.
Docs `analysis/docs/ARCHITECTURE_DATAFLOW.md`	New architecture/dataflow doc describing parser registry, data models, cache strategy, and end‑to‑end parsing flows.
Benchmark scripts refactor `src/srtctl/benchmarks/scripts/...` (gpqa,mmlu,sa-bench,mooncake-router,longbenchv2,profiling,router...)	Replace inline shell invocations with `cmd=(...)` arrays, add `[CMD]` logging, execute via `${cmd[@]}`, and add some artifact handling.
Backend & CLI small changes `src/srtctl/backends/trtllm.py`, `src/srtctl/backends/sglang.py`, `src/srtctl/backends/base.py`	Added `TRTLLMServerConfig` dataclass, minor import/annotation/format tweaks.
Dependencies `pyproject.toml`	Added runtime deps (`pre-commit`, `pandas`, `pyarrow`, `plotly`) and new `analysis` dependency group (`streamlit`, `plotly`, `pandas`, `pyarrow`).
Minor formatting/quoting `src/srtctl/core/schema.py`, `src/srtctl/frontends/dynamo.py`	Quoting/whitespace adjustments only.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant RL as RunLoader
    participant PR as ParserRegistry
    participant BP as BenchmarkParser
    participant Cache as ParquetCache
    participant NA as NodeAnalyzer
    participant NP as NodeParser

    User->>RL: load_run(run_path)
    RL->>RL: read run metadata (profiler_metadata)
    RL->>PR: get_benchmark_parser(profiler_type)
    PR-->>RL: parser or None
    alt parser found
        RL->>BP: find_result_directory(run_path)
        BP-->>RL: result_dir
        alt cached results exist
            RL->>Cache: load cached benchmark results
            Cache-->>RL: metrics
        else
            RL->>BP: parse_result_directory(result_dir)
            BP-->>RL: metrics dict
            RL->>Cache: store parsed metrics
        end
    else
        RL->>RL: _parse_profiler_results_manual
    end

    RL->>NA: parse_run_logs(run_path)
    NA->>NA: detect backend type
    NA->>PR: get_node_parser(backend_type)
    PR-->>NA: node parser
    NA->>NP: parse_logs(log_dir)
    NP-->>NA: NodeInfo list
    NA->>Cache: cache node metrics
    NA-->>RL: node metrics

    RL-->>User: BenchmarkRun with profiler_metadata, ProfilerResults, node metrics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Fix benchmark order #95: Overlaps on SA‑Bench script invocation and artifact handling changes.
feat: TRTLLM support #79: Related to TRTLLM backend additions and parser/backends changes.
fix: dashboard looks at err files #51: Related to node-metrics/log-parsing and dashboard parsing adjustments.

Suggested reviewers

ishandhanani

Poem

🐰 I hopped through logs both terse and deep,
I parsed each line while others sleep.
Parsers in rows, metadata neat,
Nodes and benchmarks now nicely meet.
Hop—cache, registry—data tucked to keep.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'add srtlog functionality to vectorize logs' accurately describes the primary change: introducing comprehensive log parsing and vectorization infrastructure to the srtlog module.
Docstring Coverage	✅ Passed	Docstring coverage is 96.64% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/srtctl/backends/trtllm.py (2)
20-33: Incorrect docstring - references SGLang instead of TRTLLM.

The docstring mentions "SGLang server CLI configuration" but this is the TRTLLM backend. This appears to be a copy-paste error.
📝 Proposed fix
 `@dataclass`(frozen=True)
 class TRTLLMServerConfig:
-    """SGLang server CLI configuration per mode (prefill/decode/aggregated).
+    """TRTLLM server CLI configuration per mode (prefill/decode/aggregated).
 
     Each mode can have its own configuration dict that gets converted
     to CLI flags when starting the worker.
     """
135-177: Missing profiling and config dump implementation in TRTLLM backend.

The parameters frontend_type, profiling_enabled, nsys_prefix, and dump_config_path are part of the BackendProtocol interface and are actively used in the SGLang backend for profiling support and config dumping. TRTLLM declares these parameters but does not implement them. Either add support for these features (as SGLang does) or document why they are not applicable to TRTLLM.
analysis/dashboard/config_tab.py (1)
63-70: Guard profiler metadata access to avoid crashes on runs without metadata.

If a run lacks profiler_metadata (e.g., older logs or profiling disabled), these attribute reads can raise and break the UI. Consider a safe fallback or “N/A” defaults.
🛠️ Suggested hardening
-            with col3:
-                st.metric("ISL/OSL", f"{run.profiler_metadata.isl}/{run.profiler_metadata.osl}")
-            with col4:
-                gpu_type_suffix = f" ({run.metadata.gpu_type})" if run.metadata.gpu_type else ""
-                st.metric("Profiler", f"{run.profiler_metadata.profiler_type}{gpu_type_suffix}")
+            profiler_meta = getattr(run, "profiler_metadata", None)
+            isl = profiler_meta.isl if profiler_meta else "N/A"
+            osl = profiler_meta.osl if profiler_meta else "N/A"
+            profiler_type = profiler_meta.profiler_type if profiler_meta else "N/A"
+
+            with col3:
+                st.metric("ISL/OSL", f"{isl}/{osl}")
+            with col4:
+                gpu_type_suffix = f" ({run.metadata.gpu_type})" if run.metadata.gpu_type else ""
+                st.metric("Profiler", f"{profiler_type}{gpu_type_suffix}")

🤖 Fix all issues with AI agents

In `@analysis/srtlog/log_parser.py`:
- Around line 68-72: The if-not check after calling
get_node_parser(backend_type) is dead because get_node_parser raises ValueError
when no parser exists; update the code to catch that exception instead: wrap the
call to get_node_parser in a try/except ValueError block, log the warning via
logger.warning including the backend_type and exception message, and return []
from the except branch (remove the unreachable if not parser branch and adjust
any surrounding flow accordingly).

In `@analysis/srtlog/parsers/nodes/trtllm.py`:
- Around line 338-348: The MemoryMetrics instance appended to memory_snapshots
for the kv_cache allocation currently sets timestamp to an empty string; change
this to a non-empty fallback (e.g., "unknown") or prefer a nearby/local
timestamp variable if available (for example use an in-scope variable like
timestamp, ts, or last_log_timestamp) so the kv_cache MemoryMetrics never has an
empty timestamp; update the timestamp argument in the MemoryMetrics constructor
call in the KV allocation block accordingly.

In `@analysis/srtlog/run_loader.py`:
- Around line 244-246: The cache validation currently builds source_patterns as
[f"{result_dir_rel}/*.json"] which only matches top-level JSON files (variables:
result_dir_rel, source_patterns); change it to include recursive matches so
nested results are covered — e.g. use a recursive glob pattern like
f"{result_dir_rel}/**/*.json" or, if available, use the parser-provided patterns
for mooncake-router (e.g. include
"logs/artifacts/**/profile_export_aiperf.json") so cache invalidation detects
nested files under result_dir; update the source_patterns construction
accordingly.
- Around line 417-422: The code currently uses "or" to choose throughput and
concurrency values which treats legitimate zeros as falsy and drops them; update
the assignments for concurrency, out["concurrencies"], out["output_tps"],
out["total_tps"], and out["request_throughput"] to explicitly check for None
(e.g., if data.get("max_concurrency") is not None use it else fall back to
data.get("concurrency") else 0) so that 0.0 is preserved; apply the same
None-aware selection for output_throughput vs output_tps and
total_token_throughput vs total_tps (use explicit is not None checks or
membership checks like "if 'output_throughput' in data and
data['output_throughput'] is not None then ...") and then append the chosen
values to out as before.
- Around line 675-713: The method load_node_metrics is annotated to return
list[NodeMetrics] but currently returns the raw result of
NodeAnalyzer.parse_run_logs (which yields list[NodeInfo]); update
load_node_metrics to extract and return the metrics attribute from each NodeInfo
(e.g., call NodeAnalyzer.parse_run_logs(..., return_dicts=False), assign to a
local variable like node_infos, then return [node.metrics for node in
node_infos]) and ensure load_node_metrics_for_run continues to call
load_node_metrics with the detected backend_type unchanged.

In `@src/srtctl/benchmarks/scripts/gpqa/bench.sh`:
- Around line 29-31: The current code builds a shell string in the variable
command and uses eval which is vulnerable to shell injection; instead build an
argv array and invoke it directly: replace the string assembly using
command="..." with an array (e.g., cmd=(python3 -m sglang.test.run_eval
--base-url "$ENDPOINT" --model "$MODEL_NAME" --eval-name gpqa --num-examples
"$NUM_EXAMPLES" --max-tokens "$MAX_TOKENS" --repeat "$REPEAT" --num-threads
"$NUM_THREADS")), update the echo to print the joined command safely, and run it
with the exec form "${cmd[@]}" (or "${cmd[@]}" in place) to avoid eval and
ensure ENDPOINT, MODEL_NAME and other variables are passed as single safe
arguments.

In `@src/srtctl/benchmarks/scripts/longbenchv2/bench.sh`:
- Around line 31-43: The script currently builds a shell string in the variable
"command" and executes it with eval, which is unsafe and fragile; change to
build the command as an array (e.g., cmd=(python3 -m sglang.test.run_eval
--base-url "$ENDPOINT" --model "$MODEL_NAME" ...)), append optional args to that
array when NUM_EXAMPLES or CATEGORIES are non-empty, and then execute the array
directly (no eval) so arguments with spaces are preserved; replace the echo
"[CMD] $command" with a safe printable representation (use printf with proper
quoting or join the array) to log the invoked command.

In `@src/srtctl/benchmarks/scripts/mmlu/bench.sh`:
- Around line 29-31: The current use of a single string variable named command
and eval to run the benchmark is unsafe; instead build the invocation as an
argument array (e.g., cmd) with each token quoted (include the module
sglang.test.run_eval and flags like --base-url, --model, --num-examples,
--max-tokens, --repeat, --num-threads as separate array elements) and execute it
with the array expansion form to avoid shell interpretation of metacharacters
(also print a safe, escaped representation of the array for logging before
running). Ensure you replace usages of command and eval with the new array
variable (cmd) and use "${cmd[@]}" to run it.

In `@src/srtctl/benchmarks/scripts/profiling/profile.sh`:
- Around line 133-141: The script currently builds command strings (variables
named command) and runs them with eval which is unsafe; refactor both
invocations (the python3 sglang.bench_serving call and the python -m lm_eval
call) to construct command arrays (e.g., cmd=(python3 -m sglang.bench_serving
... ) and lm_cmd=(python -m lm_eval ... )) using properly quoted variables like
${model_name}, ${head_node}, ${head_port}, ${PROFILE_CONCURRENCY},
${PROFILE_ISL}, ${PROFILE_OSL}, then log them with printf or echo using a safely
joined representation and execute with "${cmd[@]}" and "${lm_cmd[@]}" instead of
eval; ensure any embedded complex arg (the --model_args JSON-like string) is
provided as a single array element so it is not split.

In `@src/srtctl/benchmarks/scripts/sa-bench/bench.sh`:
- Around line 54-57: Replace the empty --num-prompts and unsafe eval usage in
the warmup invocation by using the computed num_warmup_prompts variable and
executing the command without eval; specifically, update the warmup block so the
generated command uses --num-prompts "$num_warmup_prompts" (not $num_prompts)
and build the invocation as a shell array (e.g., cmd=(python3 -u
"${WORK_DIR}/benchmark_serving.py" --model "${MODEL_NAME}" ... --num-prompts
"$num_warmup_prompts" ...)) then run it with "${cmd[@]}", and apply the same fix
(use proper num_prompts variable and array execution) to the other occurrence at
the 71-73 block.

In `@tests/fixtures_parsers.py`:
- Line 42: The fixture contains a JSON-unsafe value: the "request_rate" key is
set to float("inf"), which will break json.dump(); change the fixture so
"request_rate" is either a string (e.g., "Infinity") or a finite numeric
sentinel (e.g., a very large float like 1e308) depending on downstream
expectations — locate the occurrence of request_rate = float("inf") in the
fixtures_parsers data structure and replace it accordingly and update any tests
that assert numeric type if you switch to a string.
- Around line 343-345: The assertion logic allows empty lists despite the
comment; update the check around `value` (from `results[field]`) so it asserts
`value` is not None and, if `value` is a list, that it is non-empty (i.e.,
assert not None AND (not a list OR list length > 0)); keep the failure message
referencing `field` and adjust wording if desired.

🧹 Nitpick comments (18)

src/srtctl/benchmarks/scripts/router/bench.sh (1)
43-45: Avoid eval when direct execution suffices.

Using eval here is unnecessary and introduces potential for subtle bugs or injection issues if any variable contains shell metacharacters. You can achieve the same logging effect by echoing the command separately and executing directly.
♻️ Suggested refactor
 # shellcheck disable=SC2086
-command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"
-echo "[CMD] $command"
-eval "$command"
+echo "[CMD] python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"
+python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl "$ISL" --osl "$OSL" --requests "$REQUESTS" --concurrency "$CONCURRENCY" --output-dir "$result_dir"
This approach:

Logs the command for visibility (same as before)

Executes directly without eval

Quotes scalar variables ($ISL, $OSL, etc.) while leaving $PREFIX_RATIOS unquoted for intended word splitting
analysis/srtlog/models.py (6)
145-177: Make ProfilerMetadata immutable.

This is configuration metadata and should be frozen to prevent accidental mutation.

As per coding guidelines, use @dataclass(frozen=True) for configuration objects.
♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
 class ProfilerMetadata:
288-300: Make BenchmarkLaunchCommand immutable.

This is configuration/command metadata and should be frozen.

As per coding guidelines, use @dataclass(frozen=True) for configuration objects.
♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
 class BenchmarkLaunchCommand:
440-449: Make NodeMetadata immutable.

This is identification/config metadata and should be frozen.

As per coding guidelines, use @dataclass(frozen=True) for configuration objects.
♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
 class NodeMetadata:
454-489: Tighten type for runtime config.

Use dict[str, Any] for config to keep type hints consistent.

As per coding guidelines, use explicit type hints for configuration structures.
♻️ Proposed change
-    config: dict = field(default_factory=dict)  # Runtime config: TP/PP/EP, batch sizes, etc.
+    config: dict[str, Any] = field(default_factory=dict)  # Runtime config: TP/PP/EP, batch sizes, etc.
492-505: Make NodeLaunchCommand immutable.

Launch command metadata should be frozen to avoid accidental mutation.

As per coding guidelines, use @dataclass(frozen=True) for configuration objects.
♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
 class NodeLaunchCommand:
623-633: Parameterize explicit_flags type.

Use set[str] to keep typing precise.

As per coding guidelines, use explicit type parameters for collections.
♻️ Proposed change
-    explicit_flags: set
+    explicit_flags: set[str]
analysis/srtlog/parsers/benchmark/__init__.py (1)
1-7: Add SPDX license header for consistency.

This file is missing the SPDX license header that's present in the sibling nodes/__init__.py file. Adding it ensures consistency across the parser package.
Suggested fix
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 
 """Benchmark output parsers."""
 
 from analysis.srtlog.parsers.benchmark.mooncake_router import MooncakeRouterParser
 from analysis.srtlog.parsers.benchmark.sa_bench import SABenchParser
 
 __all__ = ["SABenchParser", "MooncakeRouterParser"]
src/srtctl/benchmarks/scripts/mooncake-router/bench.sh (2)
59-61: Consider avoiding eval for command execution.

Using eval with interpolated variables can execute arbitrary code if inputs contain shell metacharacters. While this is likely run in controlled environments, a safer pattern would use arrays:
Safer alternative using array execution
-command="aiperf profile -m ${MODEL_NAME} --url ${ENDPOINT} --streaming --ui simple --concurrency 10 --request-count 20"
-echo "[CMD-WARMUP] $command"
-eval "$command"
+warmup_args=(
+    aiperf profile
+    -m "${MODEL_NAME}"
+    --url "${ENDPOINT}"
+    --streaming
+    --ui simple
+    --concurrency 10
+    --request-count 20
+)
+echo "[CMD-WARMUP] ${warmup_args[*]}"
+"${warmup_args[@]}"
77-79: Same eval concern applies to the main benchmark command.

The goodput argument with embedded spaces makes this trickier, but arrays handle it correctly:
Safer alternative
-command="aiperf profile -m ${MODEL_NAME} --input-file ${INPUT_FILE} --custom-dataset-type mooncake_trace --fixed-schedule --url ${ENDPOINT} --streaming --random-seed 42 --ui simple --artifact-dir ${RUN_ARTIFACT_DIR} --goodput \"time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}\""
-echo "[CMD] $command"
-eval "$command"
+bench_args=(
+    aiperf profile
+    -m "${MODEL_NAME}"
+    --input-file "${INPUT_FILE}"
+    --custom-dataset-type mooncake_trace
+    --fixed-schedule
+    --url "${ENDPOINT}"
+    --streaming
+    --random-seed 42
+    --ui simple
+    --artifact-dir "${RUN_ARTIFACT_DIR}"
+    --goodput "time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}"
+)
+echo "[CMD] ${bench_args[*]}"
+"${bench_args[@]}"
Note: If the [CMD] logging format must be preserved exactly for parser compatibility, you may need to adjust the echo accordingly.
tests/test_runloader_parsers.py (1)
24-58: Add fixture return type hints.

The fixtures are untyped; please add return annotations to align with the project’s typing standard.
♻️ Suggested update
+from typing import Any
@@
-    def temp_dir(self):
+    def temp_dir(self) -> Path:
@@
-    def sample_run_metadata(self):
+    def sample_run_metadata(self) -> dict[str, Any]:
As per coding guidelines, add type hints consistently across new Python code.
analysis/srtlog/parsers/nodes/trtllm.py (3)

27-28: Consider extracting ANSI_ESCAPE pattern to a shared utility.

The ANSI_ESCAPE regex pattern is duplicated across both SGLang and TRTLLM parsers. Consider moving it to a shared utility module (e.g., analysis/srtlog/parsers/utils.py).

138-165: Document the override behavior for config values.

The TensorRT-LLM engine args section (lines 138-164) can override values previously extracted from Config() dump (lines 108-130). If this is intentional because engine args are more authoritative, consider adding a brief comment clarifying this precedence.

352-367: Duplicated method - consider extracting to a shared utility.

The _extract_node_info_from_filename method is identical to the one in sglang.py. Consider moving this to a shared utility module to reduce code duplication.
analysis/srtlog/log_parser.py (4)
188-188: Move import os to the top of the file.

The os module is imported inside the method, but it's a standard library module that's commonly needed. It should be imported at the top of the file with other imports for consistency.
Suggested fix

Add at top of file with other imports:
import os
Then remove line 188.
268-313: Consider using DEBUG level for environment variable loading logs.

Lines 297, 302, and 307 use logger.info() for logging the number of env vars loaded. This might be too verbose for normal operation. Consider using logger.debug() instead.
Suggested fix
             if "prefill_environment" in backend:
                 env_vars["prefill"] = backend["prefill_environment"]
-                logger.info(f"Loaded {len(env_vars['prefill'])} prefill env vars from config.yaml")
+                logger.debug(f"Loaded {len(env_vars['prefill'])} prefill env vars from config.yaml")
 
             # Extract decode_environment
             if "decode_environment" in backend:
                 env_vars["decode"] = backend["decode_environment"]
-                logger.info(f"Loaded {len(env_vars['decode'])} decode env vars from config.yaml")
+                logger.debug(f"Loaded {len(env_vars['decode'])} decode env vars from config.yaml")

             # Extract agg_environment if present
             if "agg_environment" in backend:
                 env_vars["agg"] = backend["agg_environment"]
-                logger.info(f"Loaded {len(env_vars['agg'])} agg env vars from config.yaml")
+                logger.debug(f"Loaded {len(env_vars['agg'])} agg env vars from config.yaml")
331-331: Misleading key name: prefill_batches contains all batches, not just prefill.

The key "prefill_batches" is used to store all batches regardless of worker type. If this is for backward compatibility, consider adding a comment. Otherwise, consider renaming to "batches".
Suggested fix if backward compatibility isn't required
-            "prefill_batches": metrics.batches,  # Keep as list of BatchMetrics objects
+            "batches": metrics.batches,  # Keep as list of BatchMetrics objects
478-479: Remove redundant imports inside method.

time is already imported at the top of the file (line 9). The models import could also be at the top level. Consider removing these local imports.
Suggested fix
-        import time
-        from .models import BatchMetrics, MemoryMetrics, NodeInfo, NodeMetadata, NodeMetrics
+        from .models import BatchMetrics, MemoryMetrics, NodeMetadata, NodeMetrics
Note: NodeInfo is already imported at the top of the file (line 16). Only BatchMetrics, MemoryMetrics, NodeMetadata, and NodeMetrics need to be imported here (or better, at the top of the file).

coderabbitai · 2026-01-26T21:06:27Z

+            memory_snapshots.append(
+                MemoryMetrics(
+                    timestamp="",
+                    dp=0,
+                    tp=0,
+                    ep=0,
+                    metric_type="kv_cache",
+                    kv_cache_gb=kv_gb,
+                    kv_tokens=max_tokens,
+                )
+            )


⚠️ Potential issue | 🟡 Minor

Empty timestamp in KV allocation metrics may cause issues downstream.

The MemoryMetrics for KV cache allocation (line 340) uses an empty string for timestamp. This could cause issues if downstream code expects valid timestamps. Consider using "unknown" or the timestamp from nearby log lines if available.

Suggested fix

memory_snapshots.append( MemoryMetrics( - timestamp="", + timestamp="unknown", # KV allocation logs don't include timestamp dp=0, tp=0, ep=0,

🤖 Prompt for AI Agents

In `@analysis/srtlog/parsers/nodes/trtllm.py` around lines 338 - 348, The MemoryMetrics instance appended to memory_snapshots for the kv_cache allocation currently sets timestamp to an empty string; change this to a non-empty fallback (e.g., "unknown") or prefer a nearby/local timestamp variable if available (for example use an in-scope variable like timestamp, ts, or last_log_timestamp) so the kv_cache MemoryMetrics never has an empty timestamp; update the timestamp argument in the MemoryMetrics constructor call in the KV allocation block accordingly.

coderabbitai · 2026-01-26T21:06:28Z

+            "total_token_throughput": concurrency * 60.0,
+            "request_throughput": concurrency * 0.5,
+            "request_goodput": concurrency * 0.48,
+            "request_rate": float("inf"),


⚠️ Potential issue | 🟡 Minor

float("inf") is not valid JSON and will cause serialization issues.

The request_rate: float("inf") value will fail when serialized to JSON via json.dump() since JSON doesn't support Infinity. Consider using a string representation or a very large number.

Suggested fix

- "request_rate": float("inf"), + "request_rate": "inf", # String representation for JSON compatibility

Or if numeric is required:

- "request_rate": float("inf"), + "request_rate": 1e308, # Very large number as proxy for infinity

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"request_rate": float("inf"),

"request_rate": "inf", # String representation for JSON compatibility

🤖 Prompt for AI Agents

In `@tests/fixtures_parsers.py` at line 42, The fixture contains a JSON-unsafe value: the "request_rate" key is set to float("inf"), which will break json.dump(); change the fixture so "request_rate" is either a string (e.g., "Infinity") or a finite numeric sentinel (e.g., a very large float like 1e308) depending on downstream expectations — locate the occurrence of request_rate = float("inf") in the fixtures_parsers data structure and replace it accordingly and update any tests that assert numeric type if you switch to a string.

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@analysis/docs/ARCHITECTURE_DATAFLOW.md`:
- Line 10: Update the fenced code blocks listed (lines 10, 32, 67, 123, 156,
254, 325, 367, 417, 485, 513, 565) to include language identifiers to satisfy
markdownlint MD040; use "text" for diagrams/ASCII art and "python" (or the
appropriate language) for code snippets so each triple-backtick fence becomes
```text or ```python as applicable.

In `@analysis/srtlog/parsers/benchmark/sa_bench.py`:
- Around line 167-186: The fallback path currently takes the list-valued output
from parse() (fallback_result) and stuffs entire lists into a single dict,
losing per-concurrency entries; change the fallback to normalize into one scalar
result per concurrency by iterating over the length of
fallback_result["concurrencies"] (or the longest list among
output_tps/concurrencies) and appending a dict for each index with scalar fields
(max_concurrency from the specific concurrency value, output_tps[index],
request_throughput[index], mean_ttft_ms[index], mean_itl_ms[index],
mean_tpot_ms[index], p99_ttft_ms[index], p99_itl_ms[index], completed from
completed_requests[index] or a sensible default) so downstream consumers receive
the same per-concurrency format as when JSON results are present; use the
existing symbols parse, fallback_result, and results to locate and implement
this loop-based normalization.

In `@analysis/srtlog/run_loader.py`:
- Around line 713-733: In load_node_metrics_for_run, guard against
run.metadata.container being None by defaulting it to an empty string before
calling .lower(); specifically, change how the local variable container is
derived (from run.metadata.container) so it becomes (empty string if None) then
.lower(), keep the same backend_type detection logic (checking for "trtllm",
"dynamo", "tensorrt", "sglang") and then call
self.load_node_metrics(run.metadata.path, backend_type) as before.
- Around line 692-712: The load_node_metrics function currently ignores its
backend_type parameter; update it to pass that override through to
NodeAnalyzer.parse_run_logs so callers can force a backend when auto-detection
fails (e.g., call analyzer.parse_run_logs(run_path, return_dicts=False,
backend_type=backend_type) or the appropriate override parameter name used by
parse_run_logs), while allowing parse_run_logs to still fall back to
auto-detection when the override is None/empty; this change affects
load_node_metrics and ensures callers like load_node_metrics_for_run can force
parsing for a specific backend.

In `@rollup_sa_bench_553.json`:
- Around line 7-12: The fixture metadata contains host-specific absolute paths
(e.g., the "container" field
"/home/nlevin/containers/jwillthomson_dynamo-0.8.0_... .sqsh" and potentially
other fields like "model_dir"), so replace these with redacted or portable
values (for example a neutral token like "container": "<redacted-container>" or
a relative/portable path) and ensure any other absolute paths in the JSON are
similarly replaced; update the "container" value and any host-specific entries
so the fixture contains no usernames or host-specific directories and remains
portable across environments.
- Line 5: The JSON timestamps are inconsistent (run_date, batches[].timestamp,
memory_snapshots[].timestamp) and cause parsing failures; update the generator
to emit ISO 8601 timestamps (e.g., 2026-01-15T07:14:56Z or without Z if local)
for "run_date" and every batches[].timestamp and replace empty
memory_snapshots[].timestamp values with null, and then update the parsers that
call datetime.strptime(..., "%Y-%m-%d %H:%M:%S") (the parsing calls in
analysis/srtlog/visualizations.py and analysis/dashboard/rate_match_tab.py) to
accept ISO 8601 (use datetime.fromisoformat or a robust parser like
dateutil.parser.parse) and explicitly handle null timestamps instead of silently
skipping empty strings.
- Around line 17-32: The profiler_metadata.concurrencies field is set to an
empty string which causes check_completeness() to treat the run as manual and
skip validation; update profiler_metadata.concurrencies to reflect the actual
concurrency used (e.g., "32") or set it to null if it was not explicitly
configured so that check_completeness() correctly recognizes and validates the
benchmark results; ensure the value type matches expected schema (string or
null) and corresponds to the concurrency_values entry.

🧹 Nitpick comments (1)

analysis/srtlog/log_parser.py (1)

176-268: Refactor config loading to use marshmallow_dataclass schemas for runtime validation.

The current code loads JSON/YAML configs with json.load() and yaml.safe_load() without validation, relying only on TypedDict for type hints (which provide no runtime protection). NodeConfig and GPUInfo should be converted to marshmallow_dataclass dataclasses to match the project's established patterns (see SrtConfig.from_yaml() in schema.py and validation patterns in config.py). This ensures config structure is validated before merging and prevents malformed shapes from silently affecting node_config.

coderabbitai · 2026-01-26T22:41:01Z

+
+## 1. Entry Point: RunLoader
+
+```


⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks.
markdownlint (MD040) requires language tags; consider text for diagrams and python for code examples (e.g., Line 10, Line 32, Line 417).

Also applies to: 32-32, 67-67, 123-123, 156-156, 254-254, 325-325, 367-367, 417-417, 485-485, 513-513, 565-565

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

10-10: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@analysis/docs/ARCHITECTURE_DATAFLOW.md` at line 10, Update the fenced code blocks listed (lines 10, 32, 67, 123, 156, 254, 325, 367, 417, 485, 513, 565) to include language identifiers to satisfy markdownlint MD040; use "text" for diagrams/ASCII art and "python" (or the appropriate language) for code snippets so each triple-backtick fence becomes ```text or ```python as applicable.

coderabbitai · 2026-01-26T22:41:01Z

+        # Fallback: If no JSON results found, try parsing benchmark.out
+        if not results:
+            benchmark_out = result_dir / "benchmark.out"
+            if benchmark_out.exists():
+                logger.info(f"No JSON results found in {result_dir}, falling back to benchmark.out parsing")
+                # Parse benchmark.out and create a single result entry
+                fallback_result = self.parse(benchmark_out)
+                if fallback_result.get("output_tps"):
+                    # Wrap in list format expected by caller
+                    results.append({
+                        "max_concurrency": fallback_result.get("concurrencies", [0])[0] if fallback_result.get("concurrencies") else 0,
+                        "output_tps": fallback_result.get("output_tps"),
+                        "request_throughput": fallback_result.get("request_throughput"),
+                        "mean_ttft_ms": fallback_result.get("mean_ttft_ms"),
+                        "mean_itl_ms": fallback_result.get("mean_itl_ms"),
+                        "mean_tpot_ms": fallback_result.get("mean_tpot_ms"),
+                        "p99_ttft_ms": fallback_result.get("p99_ttft_ms"),
+                        "p99_itl_ms": fallback_result.get("p99_itl_ms"),
+                        "completed": fallback_result.get("completed_requests"),
+                    })


⚠️ Potential issue | 🟠 Major

Fallback results are list-valued and collapse multi-concurrency data.
parse() returns lists per concurrency, but the fallback path wraps those lists into a single result dict (Line 173–186). This yields list-valued fields and drops additional concurrency entries, which can break downstream consumers expecting one scalar result per concurrency.

🔧 Suggested fix: normalize fallback into per-concurrency entries

- if fallback_result.get("output_tps"): - # Wrap in list format expected by caller - results.append({ - "max_concurrency": fallback_result.get("concurrencies", [0])[0] if fallback_result.get("concurrencies") else 0, - "output_tps": fallback_result.get("output_tps"), - "request_throughput": fallback_result.get("request_throughput"), - "mean_ttft_ms": fallback_result.get("mean_ttft_ms"), - "mean_itl_ms": fallback_result.get("mean_itl_ms"), - "mean_tpot_ms": fallback_result.get("mean_tpot_ms"), - "p99_ttft_ms": fallback_result.get("p99_ttft_ms"), - "p99_itl_ms": fallback_result.get("p99_itl_ms"), - "completed": fallback_result.get("completed_requests"), - }) + if fallback_result.get("output_tps"): + concurrencies = fallback_result.get("concurrencies") or [0] + output_tps = fallback_result.get("output_tps") or [] + request_throughput = fallback_result.get("request_throughput") or [] + mean_ttft = fallback_result.get("mean_ttft_ms") or [] + mean_itl = fallback_result.get("mean_itl_ms") or [] + mean_tpot = fallback_result.get("mean_tpot_ms") or [] + p99_ttft = fallback_result.get("p99_ttft_ms") or [] + p99_itl = fallback_result.get("p99_itl_ms") or [] + completed = fallback_result.get("completed_requests") or [] + + for idx, concurrency in enumerate(concurrencies): + results.append({ + "max_concurrency": concurrency, + "output_throughput": output_tps[idx] if idx < len(output_tps) else None, + "request_throughput": request_throughput[idx] if idx < len(request_throughput) else None, + "mean_ttft_ms": mean_ttft[idx] if idx < len(mean_ttft) else None, + "mean_itl_ms": mean_itl[idx] if idx < len(mean_itl) else None, + "mean_tpot_ms": mean_tpot[idx] if idx < len(mean_tpot) else None, + "p99_ttft_ms": p99_ttft[idx] if idx < len(p99_ttft) else None, + "p99_itl_ms": p99_itl[idx] if idx < len(p99_itl) else None, + "completed": completed[idx] if idx < len(completed) else None, + })

🤖 Prompt for AI Agents

In `@analysis/srtlog/parsers/benchmark/sa_bench.py` around lines 167 - 186, The fallback path currently takes the list-valued output from parse() (fallback_result) and stuffs entire lists into a single dict, losing per-concurrency entries; change the fallback to normalize into one scalar result per concurrency by iterating over the length of fallback_result["concurrencies"] (or the longest list among output_tps/concurrencies) and appending a dict for each index with scalar fields (max_concurrency from the specific concurrency value, output_tps[index], request_throughput[index], mean_ttft_ms[index], mean_itl_ms[index], mean_tpot_ms[index], p99_ttft_ms[index], p99_itl_ms[index], completed from completed_requests[index] or a sensible default) so downstream consumers receive the same per-concurrency format as when JSON results are present; use the existing symbols parse, fallback_result, and results to locate and implement this loop-based normalization.

coderabbitai · 2026-01-26T22:41:01Z

+    def load_node_metrics(self, run_path: str, backend_type: str = "sglang") -> list[NodeMetrics]:
+        """Load node metrics from worker log files using NodeAnalyzer.
+
+        Args:
+            run_path: Path to the run directory
+            backend_type: Backend type (sglang or trtllm) - deprecated, auto-detected
+
+        Returns:
+            List of NodeMetrics objects, one per worker
+        """
+        # Handle both relative and absolute paths
+        if not os.path.isabs(run_path):
+            run_path = os.path.join(self.logs_dir, run_path)
+
+        # Use NodeAnalyzer which handles caching, backend detection, and config loading
+        analyzer = NodeAnalyzer()
+        node_infos = analyzer.parse_run_logs(run_path, return_dicts=False)
+
+        # Extract only the metrics from each NodeInfo
+        return [node.metrics for node in node_infos]
+


⚠️ Potential issue | 🟠 Major

Honor backend_type override to avoid detection failures.

load_node_metrics ignores its backend_type argument. If auto-detection fails, callers (including load_node_metrics_for_run) have no way to force the parser, leading to empty results. Consider passing the override through to NodeAnalyzer.parse_run_logs (and letting that method fall back to detection).

🐛 Proposed fix

- def parse_run_logs(self, run_path: str, return_dicts: bool = False) -> list: + def parse_run_logs( + self, + run_path: str, + return_dicts: bool = False, + backend_type: str | None = None, + ) -> list: @@ - backend_type = self._detect_backend_type(run_path) + backend_type = backend_type or self._detect_backend_type(run_path)

- node_infos = analyzer.parse_run_logs(run_path, return_dicts=False) + node_infos = analyzer.parse_run_logs(run_path, return_dicts=False, backend_type=backend_type)

🤖 Prompt for AI Agents

In `@analysis/srtlog/run_loader.py` around lines 692 - 712, The load_node_metrics function currently ignores its backend_type parameter; update it to pass that override through to NodeAnalyzer.parse_run_logs so callers can force a backend when auto-detection fails (e.g., call analyzer.parse_run_logs(run_path, return_dicts=False, backend_type=backend_type) or the appropriate override parameter name used by parse_run_logs), while allowing parse_run_logs to still fall back to auto-detection when the override is None/empty; this change affects load_node_metrics and ensures callers like load_node_metrics_for_run can force parsing for a specific backend.

coderabbitai · 2026-01-26T22:41:01Z

+    def load_node_metrics_for_run(self, run: BenchmarkRun) -> list[NodeMetrics]:
+        """Load node metrics for a BenchmarkRun.
+
+        Automatically detects backend type from the run's container image.
+
+        Args:
+            run: BenchmarkRun object
+
+        Returns:
+            List of NodeMetrics objects
+        """
+        # Detect backend type from container
+        backend_type = "sglang"  # Default
+        container = run.metadata.container.lower()
+
+        if "trtllm" in container or "dynamo" in container or "tensorrt" in container:
+            backend_type = "trtllm"
+        elif "sglang" in container:
+            backend_type = "sglang"
+
+        return self.load_node_metrics(run.metadata.path, backend_type)


⚠️ Potential issue | 🟡 Minor

Guard against missing container values.

run.metadata.container can be None for older runs, so .lower() would raise. Default to an empty string.

🐛 Proposed fix

- container = run.metadata.container.lower() + container = (run.metadata.container or "").lower()

🤖 Prompt for AI Agents

In `@analysis/srtlog/run_loader.py` around lines 713 - 733, In load_node_metrics_for_run, guard against run.metadata.container being None by defaulting it to an empty string before calling .lower(); specifically, change how the local variable container is derived (from run.metadata.container) so it becomes (empty string if None) then .lower(), keep the same backend_type detection logic (checking for "trtllm", "dynamo", "tensorrt", "sglang") and then call self.load_node_metrics(run.metadata.path, backend_type) as before.

coderabbitai · 2026-01-26T22:41:02Z

+  "profiler_metadata": {
+    "profiler_type": "sa-bench",
+    "isl": "8192",
+    "osl": "1024",
+    "concurrencies": ""
+  },
+  "profiler_results": {
+    "output_tps": [
+      1748.0954656979873
+    ],
+    "request_throughput": [
+      1.9035306394778506
+    ],
+    "concurrency_values": [
+      32
+    ],


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Locate SA-Bench parser/schema expectations for "concurrencies". rg -n "SABench|sa-bench|SABenchParser|profiler_metadata|concurrenc" -g '*.py' | head -50

Repository: ishandhanani/srt-slurm

Length of output: 4409

🏁 Script executed:

#!/bin/bash # Also search for JSON schema files or related metadata definitions fd -e schema -o -e json -o -e py | xargs rg -l "profiler_metadata|sa-bench" 2>/dev/null | head -20

Repository: ishandhanani/srt-slurm

Length of output: 1383

🏁 Script executed:

#!/bin/bash # Look for the specific file being reviewed fd "rollup_sa_bench_553.json" 2>/dev/null

Repository: ishandhanani/srt-slurm

Length of output: 90

🏁 Script executed:

#!/bin/bash # Find the SA-Bench parser implementation fd -e py | xargs rg -l "class.*SABench.*Parser|def.*profiler_metadata" | head -10

Repository: ishandhanani/srt-slurm

Length of output: 169

🏁 Script executed:

#!/bin/bash # Look for the fixtures_parsers module fd "fixtures_parsers.py"

Repository: ishandhanani/srt-slurm

Length of output: 91

🏁 Script executed:

#!/bin/bash # Check the actual rollup file contents to see the full structure cat rollup_sa_bench_553.json | head -40

Repository: ishandhanani/srt-slurm

Length of output: 1007

🏁 Script executed:

#!/bin/bash # Check the SA-Bench parser implementation cat analysis/srtlog/parsers/benchmark/sa_bench.py

Repository: ishandhanani/srt-slurm

Length of output: 14728

🏁 Script executed:

#!/bin/bash # Check the fixtures to understand expected format and SampleSABenchData rg -A 20 "class SampleSABenchData|def result_json" tests/fixtures_parsers.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 1895

🏁 Script executed:

#!/bin/bash # Search for code that creates or validates the rollup structure fd -e py | xargs rg -l "profiler_metadata|profiler_results" | grep -v test | grep -v fixtures

Repository: ishandhanani/srt-slurm

Length of output: 256

🏁 Script executed:

#!/bin/bash # Search for where concurrencies field is set/populated rg -n "concurrencies.*=" -g '*.py' | grep -v test | grep -v "concurrency_values"

Repository: ishandhanani/srt-slurm

Length of output: 1023

🏁 Script executed:

#!/bin/bash # Look for rollup generation code fd -e py | xargs rg -l "rollup" | grep -v test

Repository: ishandhanani/srt-slurm

Length of output: 111

🏁 Script executed:

#!/bin/bash # Check the ProfilerMetadata class definition in models.py rg -B 5 -A 15 "class ProfilerMetadata" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 573

🏁 Script executed:

#!/bin/bash # Check how concurrencies is populated in rollup building code rg -B 5 -A 10 "concurrencies.*profiler" analysis/srtlog/run_loader.py

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Check the schema definition for concurrencies more carefully rg -B 5 -A 5 "concurrencies:" src/srtctl/core/schema.py

Repository: ishandhanani/srt-slurm

Length of output: 438

🏁 Script executed:

#!/bin/bash # Check how profiler_metadata is created/populated when building the rollup rg -B 10 -A 10 "ProfilerMetadata\|profiler_metadata" analysis/srtlog/run_loader.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Check if there's any validation or schema checking for profiler_metadata rg -n "validate|schema|check.*concurrenc" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Get a broader view of ProfilerMetadata usage in run_loader cat analysis/srtlog/run_loader.py | grep -n "profiler_metadata\|ProfilerMetadata" -B 3 -A 3

Repository: ishandhanani/srt-slurm

Length of output: 1629

🏁 Script executed:

#!/bin/bash # Check the entire ProfilerMetadata from_json method rg -A 20 "class ProfilerMetadata" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 578

🏁 Script executed:

#!/bin/bash # Get the complete from_json method for ProfilerMetadata rg -A 30 "def from_json.*ProfilerMetadata" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 1188

🏁 Script executed:

#!/bin/bash # Search for where the JSON is loaded and ProfilerMetadata is created rg -B 5 -A 10 "profiler_metadata.*from_json\|ProfilerMetadata.from_json" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Check how concurrencies field is used in consumers/dashboards rg -n "\.concurrencies\|concurrenc" analysis/dashboard/*.py | head -30

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Check if there are any data validation or normalization functions rg -B 3 -A 10 "def.*normalize|def.*validate.*profiler\|def.*process.*metadata" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Check how ProfilerResults.concurrency_values is populated rg -B 5 -A 15 "concurrency_values" analysis/srtlog/models.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 2283

🏁 Script executed:

#!/bin/bash # Look for any place where profiler_metadata.concurrencies is actually used or validated rg "profiler_metadata\.concurrenc|\.concurrencies" --type py | grep -v test | grep -v "^Binary"

Repository: ishandhanani/srt-slurm

Length of output: 900

🏁 Script executed:

#!/bin/bash # Check the code that uses profiler_metadata.concurrencies rg -B 5 -A 10 "if not self.profiler_metadata.concurrencies" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 687

🏁 Script executed:

#!/bin/bash # Check BenchmarkRun to see the full context rg -B 10 -A 20 "class BenchmarkRun" analysis/srtlog/models.py | head -60

Repository: ishandhanani/srt-slurm

Length of output: 992

🏁 Script executed:

#!/bin/bash # Check the complete validate_completeness method rg -B 2 -A 25 "def validate_completeness" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Check if there are any downstream consumers that depend on concurrencies being populated rg "profiler_metadata\.concurrencies" analysis/srtlog/*.py analysis/dashboard/*.py 2>/dev/null

Repository: ishandhanani/srt-slurm

Length of output: 238

🏁 Script executed:

#!/bin/bash # Verify the complete validate_completeness logic to confirm the issue rg -B 5 -A 35 "def validate_completeness" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Search for validate_completeness more broadly rg "validate_completeness" analysis/srtlog/models.py -A 30

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

#!/bin/bash # Also check for where is_complete and missing_concurrencies are set rg -n "is_complete|missing_concurrencies" analysis/srtlog/models.py | head -20

Repository: ishandhanani/srt-slurm

Length of output: 429

🏁 Script executed:

#!/bin/bash # Get the full method context around line 366-389 sed -n '360,395p' analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 1243

🏁 Script executed:

#!/bin/bash # Check how the JSON file is created/populated - look at any rollup generation code fd -e py | xargs rg -l "rollup.*json|json.*rollup" | grep -v test | head -5

Repository: ishandhanani/srt-slurm

Length of output: 48

Fix profiler_metadata.concurrencies to align with actual results.

The empty string "" in concurrencies while concurrency_values contains [32] breaks the data validation logic. When concurrencies is empty, the check_completeness() method treats the run as manual and skips validation entirely, even though actual benchmark results exist. Either populate concurrencies with the actual value (e.g., "32") to match the results, or set it to null if the configuration was not explicitly specified.

🤖 Prompt for AI Agents

In `@rollup_sa_bench_553.json` around lines 17 - 32, The profiler_metadata.concurrencies field is set to an empty string which causes check_completeness() to treat the run as manual and skip validation; update profiler_metadata.concurrencies to reflect the actual concurrency used (e.g., "32") or set it to null if it was not explicitly configured so that check_completeness() correctly recognizes and validates the benchmark results; ensure the value type matches expected schema (string or null) and corresponds to the concurrency_values entry.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@analysis/srtlog/parsers/nodes/sglang.py`:
- Around line 233-240: The _parse_timestamp function's regex requires fractional
seconds so timestamps like "2025-12-30T15:52:38Z" are missed; update the regex
in _parse_timestamp to make the fractional-seconds part optional (use a
non-capturing optional group like (?:\.\d+)? ) while preserving the optional
trailing Z, so it matches both "YYYY-MM-DDTHH:MM:SSZ" and
"YYYY-MM-DDTHH:MM:SS.sssZ"; keep returning match.group(1) as before.
- Around line 116-197: The file read using open(log_path) can raise
UnicodeDecodeError; replace the open loop with a tolerant read (e.g.,
Path(log_path).read_text(errors="replace") or similar) and iterate over the
returned string's lines so parsing still runs even with invalid bytes; update
the code that currently appends to full_content and feeds clean_line into
_parse_prefill_batch_line, _parse_decode_batch_line, _parse_memory_line and the
subsequent tp/dp/ep regex extraction to use the decoded text lines, then call
parse_launch_command("".join(full_content), node_info["worker_type"]) as before.

🧹 Nitpick comments (3)

analysis/srtlog/parsers/nodes/trtllm.py (2)
77-81: Prefer Path.iterdir() over os.listdir() for consistency.

Since log_dir is already a Path object (line 69), using pathlib methods throughout would be more idiomatic.
♻️ Suggested refactor
         # Find all worker log files
-        for file in os.listdir(log_dir):
-            if not (file.endswith(".err") or file.endswith(".out")):
+        for filepath in log_dir.iterdir():
+            if not filepath.is_file() or filepath.suffix not in (".err", ".out"):
                 continue
-            if not any(wt in file for wt in ("prefill", "decode", "agg")):
+            if not any(wt in filepath.name for wt in ("prefill", "decode", "agg")):
                 continue

-            filepath = log_dir / file
             node = self.parse_single_log(filepath)
398-401: Minor inefficiency: ANSI codes stripped twice when called internally.

When parse_launch_command is called from parse_single_log (line 114), the content has already been stripped. However, since this method is public and could be called directly with raw log content, the defensive stripping is reasonable.

If performance becomes a concern for large logs, consider passing a boolean flag or checking for ANSI presence first:
♻️ Optional optimization
-    def parse_launch_command(self, log_content: str, worker_type: str = "unknown") -> NodeLaunchCommand | None:
+    def parse_launch_command(
+        self, log_content: str, worker_type: str = "unknown", *, already_cleaned: bool = False
+    ) -> NodeLaunchCommand | None:
         ...
         from analysis.srtlog.parsers import NodeLaunchCommand

         # Strip ANSI codes for cleaner parsing
-        clean_content = ANSI_ESCAPE.sub("", log_content)
+        clean_content = log_content if already_cleaned else ANSI_ESCAPE.sub("", log_content)
analysis/srtlog/parsers/nodes/sglang.py (1)
243-343: Type the parsed metric dicts with TypedDict.

The parsers return untyped dict | None for external log data. Defining TypedDicts for prefill/decode/memory metrics improves ty checking and makes required vs optional fields explicit. As per coding guidelines, TypedDicts should be used for external data shapes.

Example shape:
from typing import TypedDict, Literal

class PrefillMetricsDict(TypedDict, total=False):
    timestamp: str
    type: Literal["prefill"]
    new_seq: int
    new_token: int
    cached_token: int
    token_usage: float
    running_req: int
    queue_req: int
    prealloc_req: int
    inflight_req: int
    input_throughput: float
As per coding guidelines, use TypedDict for parsed external data structures.

coderabbitai · 2026-01-27T02:37:40Z

+        try:
+            with open(log_path) as f:
+                for line in f:
+                    full_content.append(line)
+                    # Strip ANSI escape codes
+                    clean_line = ANSI_ESCAPE.sub("", line)
+
+                    # Parse prefill batch metrics
+                    batch_metrics = self._parse_prefill_batch_line(clean_line)
+                    if batch_metrics:
+                        batches.append(
+                            BatchMetrics(
+                                timestamp=batch_metrics["timestamp"],
+                                dp=0,  # Default since not in log
+                                tp=0,
+                                ep=0,
+                                batch_type=batch_metrics["type"],
+                                new_seq=batch_metrics.get("new_seq"),
+                                new_token=batch_metrics.get("new_token"),
+                                cached_token=batch_metrics.get("cached_token"),
+                                token_usage=batch_metrics.get("token_usage"),
+                                running_req=batch_metrics.get("running_req"),
+                                queue_req=batch_metrics.get("queue_req"),
+                                prealloc_req=batch_metrics.get("prealloc_req"),
+                                inflight_req=batch_metrics.get("inflight_req"),
+                                input_throughput=batch_metrics.get("input_throughput"),
+                            )
+                        )
+
+                    # Parse decode batch metrics
+                    decode_metrics = self._parse_decode_batch_line(clean_line)
+                    if decode_metrics:
+                        batches.append(
+                            BatchMetrics(
+                                timestamp=decode_metrics["timestamp"],
+                                dp=0,
+                                tp=0,
+                                ep=0,
+                                batch_type=decode_metrics["type"],
+                                running_req=decode_metrics.get("running_req"),
+                                queue_req=decode_metrics.get("queue_req"),
+                                prealloc_req=decode_metrics.get("prealloc_req"),
+                                transfer_req=decode_metrics.get("transfer_req"),
+                                token_usage=decode_metrics.get("token_usage"),
+                                preallocated_usage=decode_metrics.get("preallocated_usage"),
+                                num_tokens=decode_metrics.get("num_tokens"),
+                                gen_throughput=decode_metrics.get("gen_throughput"),
+                            )
+                        )
+
+                    # Parse memory metrics
+                    mem_metrics = self._parse_memory_line(clean_line)
+                    if mem_metrics:
+                        memory_snapshots.append(
+                            MemoryMetrics(
+                                timestamp=mem_metrics["timestamp"],
+                                dp=0,
+                                tp=0,
+                                ep=0,
+                                metric_type=mem_metrics["type"],
+                                avail_mem_gb=mem_metrics.get("avail_mem_gb"),
+                                mem_usage_gb=mem_metrics.get("mem_usage_gb"),
+                                kv_cache_gb=mem_metrics.get("kv_cache_gb"),
+                                kv_tokens=mem_metrics.get("kv_tokens"),
+                            )
+                        )
+
+                    # Extract TP/DP/EP configuration from server_args
+                    if "tp_size=" in clean_line:
+                        tp_match = re.search(r"tp_size=(\d+)", clean_line)
+                        dp_match = re.search(r"dp_size=(\d+)", clean_line)
+                        ep_match = re.search(r"ep_size=(\d+)", clean_line)
+
+                        if tp_match:
+                            config["tp_size"] = int(tp_match.group(1))
+                        if dp_match:
+                            config["dp_size"] = int(dp_match.group(1))
+                        if ep_match:
+                            config["ep_size"] = int(ep_match.group(1))
+
+            # Parse launch command from full content
+            launch_command = self.parse_launch_command("".join(full_content), node_info["worker_type"])


⚠️ Potential issue | 🟠 Major

Harden log reading to avoid decode failures.

open(log_path) uses the platform default encoding and can raise UnicodeDecodeError on logs with invalid bytes, causing the entire file to be skipped. Use read_text(errors="replace") (as in the TRTLLM parser) and iterate over the decoded content to keep parsing resilient.

🛠️ Proposed fix

- full_content = [] + full_content = "" @@ - with open(log_path) as f: - for line in f: - full_content.append(line) + full_content = log_path.read_text(errors="replace") + for line in full_content.splitlines(): # Strip ANSI escape codes clean_line = ANSI_ESCAPE.sub("", line) @@ - launch_command = self.parse_launch_command("".join(full_content), node_info["worker_type"]) + launch_command = self.parse_launch_command(full_content, node_info["worker_type"])

🤖 Prompt for AI Agents

In `@analysis/srtlog/parsers/nodes/sglang.py` around lines 116 - 197, The file read using open(log_path) can raise UnicodeDecodeError; replace the open loop with a tolerant read (e.g., Path(log_path).read_text(errors="replace") or similar) and iterate over the returned string's lines so parsing still runs even with invalid bytes; update the code that currently appends to full_content and feeds clean_line into _parse_prefill_batch_line, _parse_decode_batch_line, _parse_memory_line and the subsequent tp/dp/ep regex extraction to use the decoded text lines, then call parse_launch_command("".join(full_content), node_info["worker_type"]) as before.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/test_dashboard.py`:
- Around line 147-166: The test test_parse_timestamp_mixed_formats asserts exact
float equality for delta.total_seconds(), which is brittle; update the assertion
to use pytest.approx for floating comparison (e.g., assert delta.total_seconds()
== pytest.approx(1.1)) and ensure pytest is imported in the test file if not
already; the change should be made next to the use of _parse_timestamp and
delta.total_seconds() in that test.

🧹 Nitpick comments (1)

tests/test_dashboard.py (1)
18-31: Add return type annotations to test methods.

To align with the project’s “type hints everywhere” guideline, annotate test methods with -> None (and apply consistently across this file).
♻️ Proposed update (apply pattern to all tests in this class)
-    def test_parse_timestamp_yyyy_mm_dd(self):
+    def test_parse_timestamp_yyyy_mm_dd(self) -> None:
As per coding guidelines, please add return type annotations.

coderabbitai · 2026-01-27T04:20:01Z

+    def test_parse_timestamp_mixed_formats(self):
+        """Test parsing a sequence of different timestamp formats."""
+        from analysis.dashboard.rate_match_tab import _parse_timestamp
+
+        # Simulate what dashboard might see from different backends
+        timestamps = [
+            "2025-12-30 15:52:38",  # Standard (could be from old cache)
+            "2025-12-30T15:52:39.100000Z",  # SGLang
+            "01/23/2026-08:04:40",  # TRTLLM
+        ]
+
+        dts = [_parse_timestamp(ts) for ts in timestamps]
+
+        # All should parse successfully
+        assert len(dts) == 3
+        assert all(isinstance(dt, datetime) for dt in dts)
+
+        # Should be able to compute deltas (even if not chronological)
+        delta = dts[1] - dts[0]
+        assert delta.total_seconds() == 1.1


⚠️ Potential issue | 🟡 Minor

Use pytest.approx for non-integer float comparisons.

delta.total_seconds() is a float derived from microseconds; direct equality can be brittle.

🐛 Proposed fix

- assert delta.total_seconds() == 1.1 + assert delta.total_seconds() == pytest.approx(1.1)

🤖 Prompt for AI Agents

In `@tests/test_dashboard.py` around lines 147 - 166, The test test_parse_timestamp_mixed_formats asserts exact float equality for delta.total_seconds(), which is brittle; update the assertion to use pytest.approx for floating comparison (e.g., assert delta.total_seconds() == pytest.approx(1.1)) and ensure pytest is imported in the test file if not already; the change should be made next to the use of _parse_timestamp and delta.total_seconds() in that test.

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@analysis/dashboard/rate_match_tab.py`:
- Around line 129-133: The int conversions for run.profiler_metadata.isl and
.osl can raise ValueError for non-numeric strings; update the block that assigns
isl and osl (currently creating variables isl and osl used for the rate match
graph) to wrap each int(...) conversion in a try/except (ValueError, TypeError)
and fall back to None on failure, leaving st.metric display unchanged; ensure
you reference run.profiler_metadata.isl and run.profiler_metadata.osl and assign
valid integers or None so downstream graph code handles missing values safely.

In `@analysis/srtlog/log_parser.py`:
- Around line 88-95: The cache currently saves only metrics (via
_serialize_node_metrics) and later reconstructs NodeInfo from metrics, dropping
node_config and launch_command; change the caching to persist node_config (and
any parsed launch_command/env) alongside metrics: update the code that builds
the cache (where node_infos is used and _serialize_node_metrics is called) to
serialize full NodeInfo entries (or return a tuple/dict with metrics and
node_config/launch_command), extend _serialize_node_metrics (or add a new
_serialize_nodeinfo_list) to include node_config and launch_command, and update
cache load/deserialization logic that rebuilds NodeInfo on cache hits to restore
node_config and launch_command so cached NodeInfo objects match fresh parses
(apply same change to other caching sites referenced in the comment).

In `@analysis/srtlog/models.py`:
- Around line 145-176: The ProfilerMetadata dataclass (and the other
configuration/identity dataclasses referenced around lines 288-300 and 441-506)
should be made immutable by marking them as frozen dataclasses; update the
`@dataclass` decorator for ProfilerMetadata (and the other config classes) to
include frozen=True so their fields cannot be mutated after construction,
ensuring from_json classmethods and any consumers continue to return instances
with the same API but immutable state.

In `@tests/test_runloader_parsers.py`:
- Around line 14-17: The test module imports RunLoader at collection time which
pulls in pandas and causes collection to fail when pandas is missing; fix by
making the dependency check explicit before importing RunLoader (e.g., call
pytest.importorskip("pandas") at the top of the test module) or alternatively
move the import of RunLoader inside the test functions/setup so import happens
only when pandas is available; update tests referencing RunLoader,
ParserTestHarness, and SampleSABenchData accordingly.

🧹 Nitpick comments (4)

tests/test_parsers.py (1)

428-436: Remove redundant pytest import inside test method.

pytest is already imported at the module level (line 19). The same applies to the import at line 790.
♻️ Suggested fix
     def test_parse_timestamp_invalid(self, parser):
         """Test parsing invalid timestamp raises ValueError."""
-        import pytest
-
         with pytest.raises(ValueError):
             parser.parse_timestamp("invalid-timestamp")

analysis/srtlog/parsers/benchmark/mooncake_router.py (1)

216-224: Simplify redundant loop structure.

The loop for root_dir in [logs_dir]: only iterates once over a single-element list. The loop can be removed.

♻️ Suggested fix

         if logs_dir.exists():
             # Check if there are any AIPerf result files using iterdir recursively
             try:
-                for root_dir in [logs_dir]:
-                    for item in root_dir.rglob("profile_export_aiperf.json"):
-                        logger.info(f"Found mooncake-router results in: {logs_dir}")
-                        return logs_dir
+                for item in logs_dir.rglob("profile_export_aiperf.json"):
+                    logger.info(f"Found mooncake-router results in: {logs_dir}")
+                    return logs_dir
             except (OSError, PermissionError) as e:
                 logger.warning(f"Error accessing {logs_dir}: {e}")

analysis/srtlog/parsers/__init__.py (1)

161-188: Guard against duplicate registrations to avoid silent overrides.

Right now, registering the same type twice will quietly replace the previous parser, which can hide bugs.

♻️ Proposed fix

 def register_benchmark_parser(benchmark_type: str):
@@
-    def decorator(cls):
-        _benchmark_parsers[benchmark_type] = cls
+    def decorator(cls):
+        if benchmark_type in _benchmark_parsers:
+            raise ValueError(f"Benchmark parser already registered for '{benchmark_type}'")
+        _benchmark_parsers[benchmark_type] = cls
         return cls
@@
 def register_node_parser(backend_type: str):
@@
-    def decorator(cls):
-        _node_parsers[backend_type] = cls
+    def decorator(cls):
+        if backend_type in _node_parsers:
+            raise ValueError(f"Node parser already registered for '{backend_type}'")
+        _node_parsers[backend_type] = cls
         return cls

analysis/srtlog/log_parser.py (1)

120-170: Consider narrowing JSON scan to likely metadata files.

On large runs, parsing every *.json in the directory can add noticeable overhead; filtering to known metadata filenames (or small-size files) can keep detection fast.

coderabbitai · 2026-01-27T04:36:24Z

+                st.metric("ISL/OSL", f"{run.profiler_metadata.isl}/{run.profiler_metadata.osl}")

            # Create rate match graph
-            isl = int(run.profiler.isl) if run.profiler.isl else None
-            osl = int(run.profiler.osl) if run.profiler.osl else None
+            isl = int(run.profiler_metadata.isl) if run.profiler_metadata.isl else None
+            osl = int(run.profiler_metadata.osl) if run.profiler_metadata.osl else None


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the ProfilerMetadata class definition rg -t py "class ProfilerMetadata" --max-count 5

Repository: ishandhanani/srt-slurm

Length of output: 115

🏁 Script executed:

# Look at the full context of rate_match_tab.py around lines 129-133 fd -t f "rate_match_tab.py" -x wc -l {}

Repository: ishandhanani/srt-slurm

Length of output: 108

🏁 Script executed:

# Check the file size first wc -l analysis/dashboard/rate_match_tab.py 2>/dev/null || fd -t f "rate_match_tab.py"

Repository: ishandhanani/srt-slurm

Length of output: 106

🏁 Script executed:

# Read ProfilerMetadata class definition cat -n analysis/srtlog/models.py | grep -A 50 "class ProfilerMetadata"

Repository: ishandhanani/srt-slurm

Length of output: 2092

🏁 Script executed:

# Read the rate_match_tab.py context sed -n '120,145p' analysis/dashboard/rate_match_tab.py

Repository: ishandhanani/srt-slurm

Length of output: 1421

🏁 Script executed:

# Search for test data or JSON examples showing isl/osl values fd -t f "\\.json" | head -10

Repository: ishandhanani/srt-slurm

Length of output: 92

🏁 Script executed:

# Look at _create_rate_match_graph to see how isl/osl are used sed -n '140,180p' analysis/dashboard/rate_match_tab.py

Repository: ishandhanani/srt-slurm

Length of output: 1682

🏁 Script executed:

# Check if there are any tests or data samples find . -type f $ -name "*.json" -o -name "*test*" $ | grep -i "profiler\|metadata\|isl\|osl" | head -20

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Check the config file to see if there are examples cat configs/deepep_config.json

Repository: ishandhanani/srt-slurm

Length of output: 559

🏁 Script executed:

# Search for where isl/osl values are set or used rg "isl|osl" -t py -B 2 -A 2 | head -100

Repository: ishandhanani/srt-slurm

Length of output: 5297

🏁 Script executed:

# Look for any test files that might show expected values find . -type f -name "*test*.py" | xargs rg -l "isl\|osl" 2>/dev/null

Repository: ishandhanani/srt-slurm

Length of output: 50

🏁 Script executed:

# Check the actual JSON source to see if isl/osl can be non-numeric rg "\"isl\"|\"osl\"" -t py -B 5 -A 5 | grep -A 10 "get"

Repository: ishandhanani/srt-slurm

Length of output: 1565

🏁 Script executed:

# Look for any real-world JSON samples or where the JSON data comes from rg "benchmark.*json\|\.json.*benchmark" -t py -B 3 -A 3

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Check if there's any existing error handling for int() conversions in the file rg "int\(" analysis/dashboard/rate_match_tab.py -B 2 -A 2

Repository: ishandhanani/srt-slurm

Length of output: 440

Guard ISL/OSL int conversion against non-numeric strings.

ProfilerMetadata.isl and ProfilerMetadata.osl are stored as strings without validation. Non-numeric values (such as "N/A") will raise ValueError and crash the dashboard. Wrap conversions in try-except blocks to safely handle invalid inputs.

🛠️ Proposed fix

- isl = int(run.profiler_metadata.isl) if run.profiler_metadata.isl else None - osl = int(run.profiler_metadata.osl) if run.profiler_metadata.osl else None + try: + isl = int(run.profiler_metadata.isl) if run.profiler_metadata.isl else None + except (TypeError, ValueError): + isl = None + try: + osl = int(run.profiler_metadata.osl) if run.profiler_metadata.osl else None + except (TypeError, ValueError): + osl = None

🤖 Prompt for AI Agents

In `@analysis/dashboard/rate_match_tab.py` around lines 129 - 133, The int conversions for run.profiler_metadata.isl and .osl can raise ValueError for non-numeric strings; update the block that assigns isl and osl (currently creating variables isl and osl used for the rate match graph) to wrap each int(...) conversion in a try/except (ValueError, TypeError) and fall back to None on failure, leaving st.metric display unchanged; ensure you reference run.profiler_metadata.isl and run.profiler_metadata.osl and assign valid integers or None so downstream graph code handles missing values safely.

coderabbitai · 2026-01-27T04:36:24Z

        # Save to cache if we have data
-        if nodes:
-            cache_df = self._serialize_node_metrics(nodes)
+        if node_infos:
+            # Extract metrics for caching
+            metrics_list = [ni.metrics for ni in node_infos]
+            cache_df = self._serialize_node_metrics(metrics_list)
            cache_mgr.save_to_cache("node_metrics", cache_df, source_patterns)
+            logger.info(f"Parsed and cached {len(node_infos)} nodes from {logs_dir}")



⚠️ Potential issue | 🟠 Major

Cache path drops node_config / launch_command data.

On a cache hit, NodeInfo is reconstructed from metrics only, so launch_command/environment derived from log parsing can be lost unless config files exist. This makes cached results inconsistent with fresh parses.

🐛 Proposed fix: persist node_config alongside metrics

@@ - metrics_list = [ni.metrics for ni in node_infos] - cache_df = self._serialize_node_metrics(metrics_list) + cache_df = self._serialize_node_metrics(node_infos) @@ - def _serialize_node_metrics(self, nodes: list) -> pd.DataFrame: + def _serialize_node_metrics(self, nodes: list[NodeInfo]) -> pd.DataFrame: @@ - for node in nodes: - metadata = node.metadata - config = node.config + for node_info in nodes: + metrics = node_info.metrics + metadata = metrics.metadata + config = metrics.config + node_config_json = json.dumps(node_info.node_config) if node_info.node_config else None @@ - row = { + row = { # Node identification "node": metadata.node_name, "worker_type": metadata.worker_type, "worker_id": metadata.worker_id, + "node_config_json": node_config_json, @@ - row = { + row = { # Node identification "node": metadata.node_name, "worker_type": metadata.worker_type, "worker_id": metadata.worker_id, + "node_config_json": node_config_json, @@ - # Extract config (same for all rows in this node) + # Extract config (same for all rows in this node) config = {} if not group_df.empty: first_row = group_df.iloc[0] + node_config: dict = {} + node_config_json = first_row.get("node_config_json") + if pd.notna(node_config_json): + try: + node_config = json.loads(node_config_json) + except Exception as e: + logger.debug(f"Could not parse node_config_json: {e}") @@ - node_info = NodeInfo(metrics=metrics, node_config={}) + node_info = NodeInfo(metrics=metrics, node_config=node_config or {})

Also applies to: 394-469, 471-582

🤖 Prompt for AI Agents

In `@analysis/srtlog/log_parser.py` around lines 88 - 95, The cache currently saves only metrics (via _serialize_node_metrics) and later reconstructs NodeInfo from metrics, dropping node_config and launch_command; change the caching to persist node_config (and any parsed launch_command/env) alongside metrics: update the code that builds the cache (where node_infos is used and _serialize_node_metrics is called) to serialize full NodeInfo entries (or return a tuple/dict with metrics and node_config/launch_command), extend _serialize_node_metrics (or add a new _serialize_nodeinfo_list) to include node_config and launch_command, and update cache load/deserialization logic that rebuilds NodeInfo on cache hits to restore node_config and launch_command so cached NodeInfo objects match fresh parses (apply same change to other caching sites referenced in the comment).

coderabbitai · 2026-01-27T04:36:24Z

+@dataclass
+class ProfilerMetadata:
+    """Metadata about the benchmark/profiler configuration.
+
+    This describes what the benchmark was configured to do,
+    not the actual results.
+    """
+
+    profiler_type: str
+    isl: str
+    osl: str
+    concurrencies: str = ""
+    req_rate: str = ""
+
+    @classmethod
+    def from_json(cls, json_data: dict) -> "ProfilerMetadata":
+        """Create from {jobid}.json benchmark section.
+
+        Args:
+            json_data: Parsed JSON from {jobid}.json file
+
+        Returns:
+            ProfilerMetadata instance
+        """
+        profiler_meta = json_data.get("benchmark", {})
+
+        return cls(
+            profiler_type=profiler_meta.get("type", "unknown"),
+            isl=str(profiler_meta.get("isl", "")),
+            osl=str(profiler_meta.get("osl", "")),
+            concurrencies=profiler_meta.get("concurrencies", ""),
+            req_rate=profiler_meta.get("req-rate", ""),


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n analysis/srtlog/models.py | head -200

Repository: ishandhanani/srt-slurm

Length of output: 8762

🏁 Script executed:

cat -n analysis/srtlog/models.py | sed -n '280,310p'

Repository: ishandhanani/srt-slurm

Length of output: 1175

🏁 Script executed:

cat -n analysis/srtlog/models.py | sed -n '430,510p'

Repository: ishandhanani/srt-slurm

Length of output: 2952

🏁 Script executed:

# Check if there are any __post_init__ methods or mutation patterns in these classes rg -A 10 "class ProfilerMetadata|class BenchmarkLaunchCommand|class NodeMetadata|class NodeLaunchCommand" analysis/srtlog/models.py | head -100

Repository: ishandhanani/srt-slurm

Length of output: 1249

🏁 Script executed:

# Verify no mutation methods exist by searching for methods that modify fields rg -B 2 -A 5 "self\.(profiler_type|isl|osl|concurrencies|req_rate|benchmark_type|raw_command|extra_args|node_name|worker_type|worker_id|backend_type) =" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

# Check if RunMetadata (also a config class) is frozen and verify consistency rg -B 2 -A 15 "^@dataclass|^class RunMetadata" analysis/srtlog/models.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 1136

Make configuration dataclasses immutable.

These classes are configuration/identity carriers; freezing prevents accidental mutation and aligns with project conventions.

♻️ Proposed change

-@dataclass +@dataclass(frozen=True) class ProfilerMetadata: @@ -@dataclass +@dataclass(frozen=True) class BenchmarkLaunchCommand: @@ -@dataclass +@dataclass(frozen=True) class NodeMetadata: @@ -@dataclass +@dataclass(frozen=True) class NodeLaunchCommand:

As per coding guidelines, configuration objects should be frozen dataclasses.

Also applies to: 288-300, 441-506

🤖 Prompt for AI Agents

In `@analysis/srtlog/models.py` around lines 145 - 176, The ProfilerMetadata dataclass (and the other configuration/identity dataclasses referenced around lines 288-300 and 441-506) should be made immutable by marking them as frozen dataclasses; update the `@dataclass` decorator for ProfilerMetadata (and the other config classes) to include frozen=True so their fields cannot be mutated after construction, ensuring from_json classmethods and any consumers continue to return instances with the same API but immutable state.

coderabbitai · 2026-01-27T04:36:24Z

+import pytest
+
+from analysis.srtlog.run_loader import RunLoader
+from tests.fixtures_parsers import ParserTestHarness, SampleSABenchData


⚠️ Potential issue | 🟡 Minor

Same pipeline failure applies here.

This file also fails test collection due to the missing pandas dependency triggered by importing RunLoader. See the comment on tests/test_parsers.py for details.

🧰 Tools

🪛 GitHub Actions: CI

[error] 16-16: ModuleNotFoundError: No module named 'pandas'. Importing test module failed due to missing pandas dependency.

🤖 Prompt for AI Agents

In `@tests/test_runloader_parsers.py` around lines 14 - 17, The test module imports RunLoader at collection time which pulls in pandas and causes collection to fail when pandas is missing; fix by making the dependency check explicit before importing RunLoader (e.g., call pytest.importorskip("pandas") at the top of the test module) or alternatively move the import of RunLoader inside the test functions/setup so import happens only when pandas is available; update tests referencing RunLoader, ParserTestHarness, and SampleSABenchData accordingly.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@pyproject.toml`:
- Around line 19-21: Remove "pre-commit" from the runtime dependencies list in
pyproject.toml and add it to the dev-only dependency group instead (e.g., under
[project.optional-dependencies.dev] or [tool.poetry.dev-dependencies] depending
on your project tool), keeping the same version constraint "pre-commit>=4.5.1";
update the dependency arrays so only runtime packages like "pandas" and
"pyarrow" remain in the main dependencies and "pre-commit" is declared under the
dev/dev-dependencies symbol.
- Around line 20-21: pyarrow's version is inconsistent between runtime and
dev/analysis groups and streamlit/plotly are likely misclassified as runtime
deps; update pyproject.toml so pyarrow has a single aligned constraint across
all groups (or document why runtime needs "pyarrow>=23.0.0") and move streamlit
and plotly out of the main runtime dependencies into the analysis extras if they
are only used in analysis workflows (verify imports in
src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py and other src/ files
before moving); ensure pandas stays as a runtime dependency because
benchmark_serving.py imports it.

coderabbitai · 2026-01-27T05:25:12Z

+    "pre-commit>=4.5.1",
+    "pandas>=2.1.0",
+    "pyarrow>=23.0.0",


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🌐 Web query:

pre-commit official documentation installation dev dependency runtime

💡 Result:

Install the pre-commit runner (tooling, not app runtime):

With pip: pip install pre-commit [1]

In a Python project, add pre-commit to your dev requirements (e.g., requirements-dev.txt): the docs explicitly suggest requirements.txt (or requirements-dev.txt) [1]

With Homebrew: brew install pre-commit [2]

As a zero-dependency zipapp: download the .pyz from GitHub releases and run it with python pre-commit-#.pyz ... [1]

Install into the repo (developer workflow): run pre-commit install so hooks run on git commit [1]

Dev-dependency vs runtime-dependency:

pre-commit is a development/CI tool (it installs and runs git hooks), so it typically belongs in dev dependencies, not your application/library’s runtime dependencies [1].

Hook tool runtimes (eslint, black, etc.) are generally not something you add to your project’s runtime deps—pre-commit creates isolated environments for hooks and can even fetch/build required language runtimes when needed (e.g., Node) [1].

Move pre-commit to a dev-only dependency group.

pre-commit is a development and CI tool (not a runtime dependency). The official documentation recommends installing it in dev requirements. Keep it in runtime dependencies bloats end-user installations unnecessarily.

♻️ Proposed change

dependencies = [ "pyyaml>=6.0", "jinja2>=3.1.0", "marshmallow>=3.20.0", "marshmallow-dataclass>=8.6.0", "requests>=2.31.0", "rich>=13.0.0", "questionary>=2.0.0", - "pre-commit>=4.5.1", "pandas>=2.1.0", "pyarrow>=23.0.0", ] [dependency-groups] dev = [ "pytest>=8.0.0", "pytest-cov>=4.0", "ruff>=0.8.0", "ty", # Astral's fast type checker (replaces mypy) + "pre-commit>=4.5.1", ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"pre-commit>=4.5.1",

"pandas>=2.1.0",

"pyarrow>=23.0.0",

"pandas>=2.1.0",

"pyarrow>=23.0.0",

🤖 Prompt for AI Agents

In `@pyproject.toml` around lines 19 - 21, Remove "pre-commit" from the runtime dependencies list in pyproject.toml and add it to the dev-only dependency group instead (e.g., under [project.optional-dependencies.dev] or [tool.poetry.dev-dependencies] depending on your project tool), keeping the same version constraint "pre-commit>=4.5.1"; update the dependency arrays so only runtime packages like "pandas" and "pyarrow" remain in the main dependencies and "pre-commit" is declared under the dev/dev-dependencies symbol.

coderabbitai · 2026-01-27T05:25:12Z

+    "pandas>=2.1.0",
+    "pyarrow>=23.0.0",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "pyproject.toml" -type f

Repository: ishandhanani/srt-slurm

Length of output: 82

🏁 Script executed:

cat -n ./pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 3825

🏁 Script executed:

rg "import pandas|from pandas|import pyarrow|from pyarrow" --type python -l | head -20

Repository: ishandhanani/srt-slurm

Length of output: 152

🏁 Script executed:

find ./src -name "*.py" -type f | xargs grep -l "pandas\|pyarrow" 2>/dev/null | head -20

Repository: ishandhanani/srt-slurm

Length of output: 127

🏁 Script executed:

find ./src/srtctl -name "*.py" -type f | head -20

Repository: ishandhanani/srt-slurm

Length of output: 816

🏁 Script executed:

grep -r "import pandas\|from pandas\|import pyarrow\|from pyarrow" ./src/srtctl --include="*.py" | grep -v "\.pyc"

Repository: ishandhanani/srt-slurm

Length of output: 147

🏁 Script executed:

grep -r "pyarrow" ./src/srtctl --include="*.py"

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

cat ./src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 1400

🏁 Script executed:

grep -A 5 -B 5 "pandas\|pd\." ./src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 730

🏁 Script executed:

find ./src/srtctl/benchmarks/scripts -type f -name "*.py" | wc -l

Repository: ishandhanani/srt-slurm

Length of output: 67

🏁 Script executed:

grep -l "pandas\|pyarrow" ./src/srtctl/benchmarks/scripts/sa-bench/*.py

Repository: ishandhanani/srt-slurm

Length of output: 127

🏁 Script executed:

find ./src/srtctl -name "*.py" -type f -exec grep -l "streamlit\|plotly" {} \;

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

grep -r "streamlit\|plotly" ./src --include="*.py"

Repository: ishandhanani/srt-slurm

Length of output: 48

🏁 Script executed:

find . -type d -name "analysis" -o -type d -name "dashboard" | head -10

Repository: ishandhanani/srt-slurm

Length of output: 97

🏁 Script executed:

find ./analysis -name "*.py" -type f | head -20

Repository: ishandhanani/srt-slurm

Length of output: 791

🏁 Script executed:

grep -r "import pandas\|from pandas\|import pyarrow\|from pyarrow" ./analysis --include="*.py" | head -20

Repository: ishandhanani/srt-slurm

Length of output: 506

Fix pyarrow version inconsistency and reconsider streamlit/plotly in runtime deps.

pandas>=2.1.0 is legitimately a runtime dependency (used in src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py), but pyarrow has conflicting version constraints across dependency groups: >=23.0.0 in runtime vs >=14.0.0 in both dev and analysis groups. Align the pyarrow version across all groups or clarify why runtime requires a newer version.

Additionally, streamlit>=1.30.0 and plotly>=6.5.2 appear only in the analysis workflows (no imports found in src/), yet are listed in runtime dependencies. Move these to the analysis group if they're not used by runtime code.

🤖 Prompt for AI Agents

In `@pyproject.toml` around lines 20 - 21, pyarrow's version is inconsistent between runtime and dev/analysis groups and streamlit/plotly are likely misclassified as runtime deps; update pyproject.toml so pyarrow has a single aligned constraint across all groups (or document why runtime needs "pyarrow>=23.0.0") and move streamlit and plotly out of the main runtime dependencies into the analysis extras if they are only used in analysis workflows (verify imports in src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py and other src/ files before moving); ensure pandas stays as a runtime dependency because benchmark_serving.py imports it.

parser for node and benchmakre. trim aislop

43dd31a

KaunilD requested a review from ishandhanani January 26, 2026 20:56

KaunilD changed the title ~~parser for node and benchmakrs. trim aislop~~ Rollup Experiments - Part 1: parser for node and benchmakrs. trim aislop Jan 26, 2026

create architecture diagram

159aab8

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

KaunilD added 5 commits January 26, 2026 13:23

benchmark log out as fallback

f630cba

benchmark log out as fallback

574658c

rabbit deedback

466c3c9

deleted debug files

05649d8

router bench

71d13cb

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

parser logic for timestamp

f06eee7

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

added beter fixtures

fe3b9aa

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

precoommiit

689dcd7

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

ishandhanani changed the title ~~Rollup Experiments - Part 1: parser for node and benchmakrs. trim aislop~~ add srtlog functionality to vectorize logs Jan 27, 2026

KaunilD added 4 commits January 26, 2026 21:17

precoommiit

5e73496

precoommiit

958f913

precoommiit

36d1017

precoommiit

ee4d607

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

ruff check

99e2225

ishandhanani closed this Jan 28, 2026

	"request_rate": float("inf"),
	"request_rate": "inf", # String representation for JSON compatibility

Conversation

KaunilD commented Jan 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

KaunilD commented Jan 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 26, 2026 •

edited

Loading