Skip to content

add srtlog functionality to vectorize logs#101

Closed
KaunilD wants to merge 15 commits intomainfrom
kdhruv/rollup_part_1
Closed

add srtlog functionality to vectorize logs#101
KaunilD wants to merge 15 commits intomainfrom
kdhruv/rollup_part_1

Conversation

@KaunilD
Copy link
Copy Markdown
Contributor

@KaunilD KaunilD commented Jan 26, 2026

Summary by CodeRabbit

  • New Features

    • Parser registry plus multiple new benchmark and node parsers; node logs now yield richer node metadata and launch-command extraction; dashboard accepts more timestamp formats and reads ISL/OSL from profiler metadata.
  • Bug Fixes / Improvements

    • Benchmark/run loading now uses parser-driven result discovery with better fallbacks and artifact handling; benchmark scripts log commands and save artifacts.
  • Tests

    • Large fixture and integration test suites for parsers, run loader, node metrics, and dashboard timestamp handling.
  • Documentation

    • Added architecture/dataflow doc describing parser registry and end-to-end log analysis.
  • Refactor

    • Separated profiler configuration metadata from runtime results and changed node-metrics payload shape.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 26, 2026

Warning

Rate limit exceeded

@KaunilD has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 9 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Introduces a parser registry and multiple benchmark/node parsers; adds metadata-focused data models (ProfilerMetadata, NodeMetadata, NodeInfo); migrates dashboard uses to profiler_metadata; integrates parser-driven benchmark and node parsing into RunLoader and NodeAnalyzer; adds caching, extensive tests, fixtures, and architecture documentation.

Changes

Cohort / File(s) Summary
Dashboard profiler_metadata migration
analysis/dashboard/app.py, analysis/dashboard/config_tab.py, analysis/dashboard/node_metrics_tab.py, analysis/dashboard/rate_match_tab.py
Switched UI/filtering and metric sources from run.profiler.* to run.profiler_metadata.*; added _parse_timestamp helper in rate_match_tab.
Node metrics payload shape
analysis/dashboard/components.py
NodeMetrics dict output now emits node_info as a nested dict with keys node, worker_type, and worker_id (shape change).
Data model refactor
analysis/srtlog/models.py
Added ProfilerMetadata, NodeMetadata, NodeInfo, NodeConfig, TopologyInfo, BenchmarkLaunchCommand, NodeLaunchCommand; updated BenchmarkRun, NodeMetrics, and ProfilerResults to use metadata-oriented fields.
Parser infrastructure & registry
analysis/srtlog/parsers/__init__.py, analysis/srtlog/parsers/benchmark/__init__.py, analysis/srtlog/parsers/nodes/__init__.py
New parser protocols, registries, decorators, factory functions (get_benchmark_parser, get_node_parser), and public exports for benchmark/node parsers.
Benchmark parsers
analysis/srtlog/parsers/benchmark/sa_bench.py, analysis/srtlog/parsers/benchmark/mooncake_router.py
Added SABenchParser and MooncakeRouterParser with JSON/AIPerf-first parsing, benchmark.out fallbacks, and launch-command extraction.
Node parsers
analysis/srtlog/parsers/nodes/sglang.py, analysis/srtlog/parsers/nodes/trtllm.py
Added SGLangNodeParser and TRTLLMNodeParser producing NodeInfo (batches, memory snapshots, config, launch command).
Log parsing & NodeAnalyzer
analysis/srtlog/log_parser.py
Reworked parse_run_logs to detect backend, use get_node_parser(), return NodeInfo objects (with dict conversion helper), added backend detection and config population from files.
RunLoader integration
analysis/srtlog/run_loader.py
Migrated to profiler_metadata, added parser-driven benchmark result flow and normalizer, manual JSON fallback, and load_node_metrics / load_node_metrics_for_run APIs.
Config reader change
analysis/srtlog/config_reader.py
parse_command_line_from_err now returns TopologyInfo (import changed from ParsedCommandInfoTopologyInfo) and docstring/type hints updated.
Tests & fixtures
tests/fixtures_parsers.py, tests/test_parsers.py, tests/test_runloader_parsers.py, tests/test_dashboard.py
New fixtures and comprehensive tests for parser registry, benchmark/node parsers, RunLoader integration, and timestamp parsing.
Docs
analysis/docs/ARCHITECTURE_DATAFLOW.md
New architecture/dataflow doc describing parser registry, data models, cache strategy, and end‑to‑end parsing flows.
Benchmark scripts refactor
src/srtctl/benchmarks/scripts/... (gpqa,mmlu,sa-bench,mooncake-router,longbenchv2,profiling,router...)
Replace inline shell invocations with cmd=(...) arrays, add [CMD] logging, execute via ${cmd[@]}, and add some artifact handling.
Backend & CLI small changes
src/srtctl/backends/trtllm.py, src/srtctl/backends/sglang.py, src/srtctl/backends/base.py
Added TRTLLMServerConfig dataclass, minor import/annotation/format tweaks.
Dependencies
pyproject.toml
Added runtime deps (pre-commit, pandas, pyarrow, plotly) and new analysis dependency group (streamlit, plotly, pandas, pyarrow).
Minor formatting/quoting
src/srtctl/core/schema.py, src/srtctl/frontends/dynamo.py
Quoting/whitespace adjustments only.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant RL as RunLoader
    participant PR as ParserRegistry
    participant BP as BenchmarkParser
    participant Cache as ParquetCache
    participant NA as NodeAnalyzer
    participant NP as NodeParser

    User->>RL: load_run(run_path)
    RL->>RL: read run metadata (profiler_metadata)
    RL->>PR: get_benchmark_parser(profiler_type)
    PR-->>RL: parser or None
    alt parser found
        RL->>BP: find_result_directory(run_path)
        BP-->>RL: result_dir
        alt cached results exist
            RL->>Cache: load cached benchmark results
            Cache-->>RL: metrics
        else
            RL->>BP: parse_result_directory(result_dir)
            BP-->>RL: metrics dict
            RL->>Cache: store parsed metrics
        end
    else
        RL->>RL: _parse_profiler_results_manual
    end

    RL->>NA: parse_run_logs(run_path)
    NA->>NA: detect backend type
    NA->>PR: get_node_parser(backend_type)
    PR-->>NA: node parser
    NA->>NP: parse_logs(log_dir)
    NP-->>NA: NodeInfo list
    NA->>Cache: cache node metrics
    NA-->>RL: node metrics

    RL-->>User: BenchmarkRun with profiler_metadata, ProfilerResults, node metrics
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • ishandhanani

Poem

🐰 I hopped through logs both terse and deep,
I parsed each line while others sleep.
Parsers in rows, metadata neat,
Nodes and benchmarks now nicely meet.
Hop—cache, registry—data tucked to keep.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'add srtlog functionality to vectorize logs' accurately describes the primary change: introducing comprehensive log parsing and vectorization infrastructure to the srtlog module.
Docstring Coverage ✅ Passed Docstring coverage is 96.64% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@KaunilD KaunilD requested a review from ishandhanani January 26, 2026 20:56
@KaunilD KaunilD changed the title parser for node and benchmakrs. trim aislop Rollup Experiments - Part 1: parser for node and benchmakrs. trim aislop Jan 26, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/srtctl/backends/trtllm.py (2)

20-33: Incorrect docstring - references SGLang instead of TRTLLM.

The docstring mentions "SGLang server CLI configuration" but this is the TRTLLM backend. This appears to be a copy-paste error.

📝 Proposed fix
 `@dataclass`(frozen=True)
 class TRTLLMServerConfig:
-    """SGLang server CLI configuration per mode (prefill/decode/aggregated).
+    """TRTLLM server CLI configuration per mode (prefill/decode/aggregated).
 
     Each mode can have its own configuration dict that gets converted
     to CLI flags when starting the worker.
     """

135-177: Missing profiling and config dump implementation in TRTLLM backend.

The parameters frontend_type, profiling_enabled, nsys_prefix, and dump_config_path are part of the BackendProtocol interface and are actively used in the SGLang backend for profiling support and config dumping. TRTLLM declares these parameters but does not implement them. Either add support for these features (as SGLang does) or document why they are not applicable to TRTLLM.

analysis/dashboard/config_tab.py (1)

63-70: Guard profiler metadata access to avoid crashes on runs without metadata.

If a run lacks profiler_metadata (e.g., older logs or profiling disabled), these attribute reads can raise and break the UI. Consider a safe fallback or “N/A” defaults.

🛠️ Suggested hardening
-            with col3:
-                st.metric("ISL/OSL", f"{run.profiler_metadata.isl}/{run.profiler_metadata.osl}")
-            with col4:
-                gpu_type_suffix = f" ({run.metadata.gpu_type})" if run.metadata.gpu_type else ""
-                st.metric("Profiler", f"{run.profiler_metadata.profiler_type}{gpu_type_suffix}")
+            profiler_meta = getattr(run, "profiler_metadata", None)
+            isl = profiler_meta.isl if profiler_meta else "N/A"
+            osl = profiler_meta.osl if profiler_meta else "N/A"
+            profiler_type = profiler_meta.profiler_type if profiler_meta else "N/A"
+
+            with col3:
+                st.metric("ISL/OSL", f"{isl}/{osl}")
+            with col4:
+                gpu_type_suffix = f" ({run.metadata.gpu_type})" if run.metadata.gpu_type else ""
+                st.metric("Profiler", f"{profiler_type}{gpu_type_suffix}")
🤖 Fix all issues with AI agents
In `@analysis/srtlog/log_parser.py`:
- Around line 68-72: The if-not check after calling
get_node_parser(backend_type) is dead because get_node_parser raises ValueError
when no parser exists; update the code to catch that exception instead: wrap the
call to get_node_parser in a try/except ValueError block, log the warning via
logger.warning including the backend_type and exception message, and return []
from the except branch (remove the unreachable if not parser branch and adjust
any surrounding flow accordingly).

In `@analysis/srtlog/parsers/nodes/trtllm.py`:
- Around line 338-348: The MemoryMetrics instance appended to memory_snapshots
for the kv_cache allocation currently sets timestamp to an empty string; change
this to a non-empty fallback (e.g., "unknown") or prefer a nearby/local
timestamp variable if available (for example use an in-scope variable like
timestamp, ts, or last_log_timestamp) so the kv_cache MemoryMetrics never has an
empty timestamp; update the timestamp argument in the MemoryMetrics constructor
call in the KV allocation block accordingly.

In `@analysis/srtlog/run_loader.py`:
- Around line 244-246: The cache validation currently builds source_patterns as
[f"{result_dir_rel}/*.json"] which only matches top-level JSON files (variables:
result_dir_rel, source_patterns); change it to include recursive matches so
nested results are covered — e.g. use a recursive glob pattern like
f"{result_dir_rel}/**/*.json" or, if available, use the parser-provided patterns
for mooncake-router (e.g. include
"logs/artifacts/**/profile_export_aiperf.json") so cache invalidation detects
nested files under result_dir; update the source_patterns construction
accordingly.
- Around line 417-422: The code currently uses "or" to choose throughput and
concurrency values which treats legitimate zeros as falsy and drops them; update
the assignments for concurrency, out["concurrencies"], out["output_tps"],
out["total_tps"], and out["request_throughput"] to explicitly check for None
(e.g., if data.get("max_concurrency") is not None use it else fall back to
data.get("concurrency") else 0) so that 0.0 is preserved; apply the same
None-aware selection for output_throughput vs output_tps and
total_token_throughput vs total_tps (use explicit is not None checks or
membership checks like "if 'output_throughput' in data and
data['output_throughput'] is not None then ...") and then append the chosen
values to out as before.
- Around line 675-713: The method load_node_metrics is annotated to return
list[NodeMetrics] but currently returns the raw result of
NodeAnalyzer.parse_run_logs (which yields list[NodeInfo]); update
load_node_metrics to extract and return the metrics attribute from each NodeInfo
(e.g., call NodeAnalyzer.parse_run_logs(..., return_dicts=False), assign to a
local variable like node_infos, then return [node.metrics for node in
node_infos]) and ensure load_node_metrics_for_run continues to call
load_node_metrics with the detected backend_type unchanged.

In `@src/srtctl/benchmarks/scripts/gpqa/bench.sh`:
- Around line 29-31: The current code builds a shell string in the variable
command and uses eval which is vulnerable to shell injection; instead build an
argv array and invoke it directly: replace the string assembly using
command="..." with an array (e.g., cmd=(python3 -m sglang.test.run_eval
--base-url "$ENDPOINT" --model "$MODEL_NAME" --eval-name gpqa --num-examples
"$NUM_EXAMPLES" --max-tokens "$MAX_TOKENS" --repeat "$REPEAT" --num-threads
"$NUM_THREADS")), update the echo to print the joined command safely, and run it
with the exec form "${cmd[@]}" (or "${cmd[@]}" in place) to avoid eval and
ensure ENDPOINT, MODEL_NAME and other variables are passed as single safe
arguments.

In `@src/srtctl/benchmarks/scripts/longbenchv2/bench.sh`:
- Around line 31-43: The script currently builds a shell string in the variable
"command" and executes it with eval, which is unsafe and fragile; change to
build the command as an array (e.g., cmd=(python3 -m sglang.test.run_eval
--base-url "$ENDPOINT" --model "$MODEL_NAME" ...)), append optional args to that
array when NUM_EXAMPLES or CATEGORIES are non-empty, and then execute the array
directly (no eval) so arguments with spaces are preserved; replace the echo
"[CMD] $command" with a safe printable representation (use printf with proper
quoting or join the array) to log the invoked command.

In `@src/srtctl/benchmarks/scripts/mmlu/bench.sh`:
- Around line 29-31: The current use of a single string variable named command
and eval to run the benchmark is unsafe; instead build the invocation as an
argument array (e.g., cmd) with each token quoted (include the module
sglang.test.run_eval and flags like --base-url, --model, --num-examples,
--max-tokens, --repeat, --num-threads as separate array elements) and execute it
with the array expansion form to avoid shell interpretation of metacharacters
(also print a safe, escaped representation of the array for logging before
running). Ensure you replace usages of command and eval with the new array
variable (cmd) and use "${cmd[@]}" to run it.

In `@src/srtctl/benchmarks/scripts/profiling/profile.sh`:
- Around line 133-141: The script currently builds command strings (variables
named command) and runs them with eval which is unsafe; refactor both
invocations (the python3 sglang.bench_serving call and the python -m lm_eval
call) to construct command arrays (e.g., cmd=(python3 -m sglang.bench_serving
... ) and lm_cmd=(python -m lm_eval ... )) using properly quoted variables like
${model_name}, ${head_node}, ${head_port}, ${PROFILE_CONCURRENCY},
${PROFILE_ISL}, ${PROFILE_OSL}, then log them with printf or echo using a safely
joined representation and execute with "${cmd[@]}" and "${lm_cmd[@]}" instead of
eval; ensure any embedded complex arg (the --model_args JSON-like string) is
provided as a single array element so it is not split.

In `@src/srtctl/benchmarks/scripts/sa-bench/bench.sh`:
- Around line 54-57: Replace the empty --num-prompts and unsafe eval usage in
the warmup invocation by using the computed num_warmup_prompts variable and
executing the command without eval; specifically, update the warmup block so the
generated command uses --num-prompts "$num_warmup_prompts" (not $num_prompts)
and build the invocation as a shell array (e.g., cmd=(python3 -u
"${WORK_DIR}/benchmark_serving.py" --model "${MODEL_NAME}" ... --num-prompts
"$num_warmup_prompts" ...)) then run it with "${cmd[@]}", and apply the same fix
(use proper num_prompts variable and array execution) to the other occurrence at
the 71-73 block.

In `@tests/fixtures_parsers.py`:
- Line 42: The fixture contains a JSON-unsafe value: the "request_rate" key is
set to float("inf"), which will break json.dump(); change the fixture so
"request_rate" is either a string (e.g., "Infinity") or a finite numeric
sentinel (e.g., a very large float like 1e308) depending on downstream
expectations — locate the occurrence of request_rate = float("inf") in the
fixtures_parsers data structure and replace it accordingly and update any tests
that assert numeric type if you switch to a string.
- Around line 343-345: The assertion logic allows empty lists despite the
comment; update the check around `value` (from `results[field]`) so it asserts
`value` is not None and, if `value` is a list, that it is non-empty (i.e.,
assert not None AND (not a list OR list length > 0)); keep the failure message
referencing `field` and adjust wording if desired.
🧹 Nitpick comments (18)
src/srtctl/benchmarks/scripts/router/bench.sh (1)

43-45: Avoid eval when direct execution suffices.

Using eval here is unnecessary and introduces potential for subtle bugs or injection issues if any variable contains shell metacharacters. You can achieve the same logging effect by echoing the command separately and executing directly.

♻️ Suggested refactor
 # shellcheck disable=SC2086
-command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"
-echo "[CMD] $command"
-eval "$command"
+echo "[CMD] python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir"
+python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl "$ISL" --osl "$OSL" --requests "$REQUESTS" --concurrency "$CONCURRENCY" --output-dir "$result_dir"

This approach:

  1. Logs the command for visibility (same as before)
  2. Executes directly without eval
  3. Quotes scalar variables ($ISL, $OSL, etc.) while leaving $PREFIX_RATIOS unquoted for intended word splitting
analysis/srtlog/models.py (6)

145-177: Make ProfilerMetadata immutable.

This is configuration metadata and should be frozen to prevent accidental mutation.

As per coding guidelines, use @dataclass(frozen=True) for configuration objects.

♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
 class ProfilerMetadata:

288-300: Make BenchmarkLaunchCommand immutable.

This is configuration/command metadata and should be frozen.

As per coding guidelines, use @dataclass(frozen=True) for configuration objects.

♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
 class BenchmarkLaunchCommand:

440-449: Make NodeMetadata immutable.

This is identification/config metadata and should be frozen.

As per coding guidelines, use @dataclass(frozen=True) for configuration objects.

♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
 class NodeMetadata:

454-489: Tighten type for runtime config.

Use dict[str, Any] for config to keep type hints consistent.

As per coding guidelines, use explicit type hints for configuration structures.

♻️ Proposed change
-    config: dict = field(default_factory=dict)  # Runtime config: TP/PP/EP, batch sizes, etc.
+    config: dict[str, Any] = field(default_factory=dict)  # Runtime config: TP/PP/EP, batch sizes, etc.

492-505: Make NodeLaunchCommand immutable.

Launch command metadata should be frozen to avoid accidental mutation.

As per coding guidelines, use @dataclass(frozen=True) for configuration objects.

♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
 class NodeLaunchCommand:

623-633: Parameterize explicit_flags type.

Use set[str] to keep typing precise.

As per coding guidelines, use explicit type parameters for collections.

♻️ Proposed change
-    explicit_flags: set
+    explicit_flags: set[str]
analysis/srtlog/parsers/benchmark/__init__.py (1)

1-7: Add SPDX license header for consistency.

This file is missing the SPDX license header that's present in the sibling nodes/__init__.py file. Adding it ensures consistency across the parser package.

Suggested fix
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 
 """Benchmark output parsers."""
 
 from analysis.srtlog.parsers.benchmark.mooncake_router import MooncakeRouterParser
 from analysis.srtlog.parsers.benchmark.sa_bench import SABenchParser
 
 __all__ = ["SABenchParser", "MooncakeRouterParser"]
src/srtctl/benchmarks/scripts/mooncake-router/bench.sh (2)

59-61: Consider avoiding eval for command execution.

Using eval with interpolated variables can execute arbitrary code if inputs contain shell metacharacters. While this is likely run in controlled environments, a safer pattern would use arrays:

Safer alternative using array execution
-command="aiperf profile -m ${MODEL_NAME} --url ${ENDPOINT} --streaming --ui simple --concurrency 10 --request-count 20"
-echo "[CMD-WARMUP] $command"
-eval "$command"
+warmup_args=(
+    aiperf profile
+    -m "${MODEL_NAME}"
+    --url "${ENDPOINT}"
+    --streaming
+    --ui simple
+    --concurrency 10
+    --request-count 20
+)
+echo "[CMD-WARMUP] ${warmup_args[*]}"
+"${warmup_args[@]}"

77-79: Same eval concern applies to the main benchmark command.

The goodput argument with embedded spaces makes this trickier, but arrays handle it correctly:

Safer alternative
-command="aiperf profile -m ${MODEL_NAME} --input-file ${INPUT_FILE} --custom-dataset-type mooncake_trace --fixed-schedule --url ${ENDPOINT} --streaming --random-seed 42 --ui simple --artifact-dir ${RUN_ARTIFACT_DIR} --goodput \"time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}\""
-echo "[CMD] $command"
-eval "$command"
+bench_args=(
+    aiperf profile
+    -m "${MODEL_NAME}"
+    --input-file "${INPUT_FILE}"
+    --custom-dataset-type mooncake_trace
+    --fixed-schedule
+    --url "${ENDPOINT}"
+    --streaming
+    --random-seed 42
+    --ui simple
+    --artifact-dir "${RUN_ARTIFACT_DIR}"
+    --goodput "time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}"
+)
+echo "[CMD] ${bench_args[*]}"
+"${bench_args[@]}"

Note: If the [CMD] logging format must be preserved exactly for parser compatibility, you may need to adjust the echo accordingly.

tests/test_runloader_parsers.py (1)

24-58: Add fixture return type hints.

The fixtures are untyped; please add return annotations to align with the project’s typing standard.

♻️ Suggested update
+from typing import Any
@@
-    def temp_dir(self):
+    def temp_dir(self) -> Path:
@@
-    def sample_run_metadata(self):
+    def sample_run_metadata(self) -> dict[str, Any]:
As per coding guidelines, add type hints consistently across new Python code.
analysis/srtlog/parsers/nodes/trtllm.py (3)

27-28: Consider extracting ANSI_ESCAPE pattern to a shared utility.

The ANSI_ESCAPE regex pattern is duplicated across both SGLang and TRTLLM parsers. Consider moving it to a shared utility module (e.g., analysis/srtlog/parsers/utils.py).


138-165: Document the override behavior for config values.

The TensorRT-LLM engine args section (lines 138-164) can override values previously extracted from Config() dump (lines 108-130). If this is intentional because engine args are more authoritative, consider adding a brief comment clarifying this precedence.


352-367: Duplicated method - consider extracting to a shared utility.

The _extract_node_info_from_filename method is identical to the one in sglang.py. Consider moving this to a shared utility module to reduce code duplication.

analysis/srtlog/log_parser.py (4)

188-188: Move import os to the top of the file.

The os module is imported inside the method, but it's a standard library module that's commonly needed. It should be imported at the top of the file with other imports for consistency.

Suggested fix

Add at top of file with other imports:

import os

Then remove line 188.


268-313: Consider using DEBUG level for environment variable loading logs.

Lines 297, 302, and 307 use logger.info() for logging the number of env vars loaded. This might be too verbose for normal operation. Consider using logger.debug() instead.

Suggested fix
             if "prefill_environment" in backend:
                 env_vars["prefill"] = backend["prefill_environment"]
-                logger.info(f"Loaded {len(env_vars['prefill'])} prefill env vars from config.yaml")
+                logger.debug(f"Loaded {len(env_vars['prefill'])} prefill env vars from config.yaml")
 
             # Extract decode_environment
             if "decode_environment" in backend:
                 env_vars["decode"] = backend["decode_environment"]
-                logger.info(f"Loaded {len(env_vars['decode'])} decode env vars from config.yaml")
+                logger.debug(f"Loaded {len(env_vars['decode'])} decode env vars from config.yaml")

             # Extract agg_environment if present
             if "agg_environment" in backend:
                 env_vars["agg"] = backend["agg_environment"]
-                logger.info(f"Loaded {len(env_vars['agg'])} agg env vars from config.yaml")
+                logger.debug(f"Loaded {len(env_vars['agg'])} agg env vars from config.yaml")

331-331: Misleading key name: prefill_batches contains all batches, not just prefill.

The key "prefill_batches" is used to store all batches regardless of worker type. If this is for backward compatibility, consider adding a comment. Otherwise, consider renaming to "batches".

Suggested fix if backward compatibility isn't required
-            "prefill_batches": metrics.batches,  # Keep as list of BatchMetrics objects
+            "batches": metrics.batches,  # Keep as list of BatchMetrics objects

478-479: Remove redundant imports inside method.

time is already imported at the top of the file (line 9). The models import could also be at the top level. Consider removing these local imports.

Suggested fix
-        import time
-        from .models import BatchMetrics, MemoryMetrics, NodeInfo, NodeMetadata, NodeMetrics
+        from .models import BatchMetrics, MemoryMetrics, NodeMetadata, NodeMetrics

Note: NodeInfo is already imported at the top of the file (line 16). Only BatchMetrics, MemoryMetrics, NodeMetadata, and NodeMetrics need to be imported here (or better, at the top of the file).

Comment thread analysis/srtlog/log_parser.py
Comment on lines +338 to +348
memory_snapshots.append(
MemoryMetrics(
timestamp="",
dp=0,
tp=0,
ep=0,
metric_type="kv_cache",
kv_cache_gb=kv_gb,
kv_tokens=max_tokens,
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Empty timestamp in KV allocation metrics may cause issues downstream.

The MemoryMetrics for KV cache allocation (line 340) uses an empty string for timestamp. This could cause issues if downstream code expects valid timestamps. Consider using "unknown" or the timestamp from nearby log lines if available.

Suggested fix
             memory_snapshots.append(
                 MemoryMetrics(
-                    timestamp="",
+                    timestamp="unknown",  # KV allocation logs don't include timestamp
                     dp=0,
                     tp=0,
                     ep=0,
🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/nodes/trtllm.py` around lines 338 - 348, The
MemoryMetrics instance appended to memory_snapshots for the kv_cache allocation
currently sets timestamp to an empty string; change this to a non-empty fallback
(e.g., "unknown") or prefer a nearby/local timestamp variable if available (for
example use an in-scope variable like timestamp, ts, or last_log_timestamp) so
the kv_cache MemoryMetrics never has an empty timestamp; update the timestamp
argument in the MemoryMetrics constructor call in the KV allocation block
accordingly.

Comment thread analysis/srtlog/run_loader.py Outdated
Comment thread analysis/srtlog/run_loader.py Outdated
Comment thread analysis/srtlog/run_loader.py
Comment thread src/srtctl/benchmarks/scripts/mmlu/bench.sh Outdated
Comment thread src/srtctl/benchmarks/scripts/profiling/profile.sh Outdated
Comment thread src/srtctl/benchmarks/scripts/sa-bench/bench.sh Outdated
Comment thread tests/fixtures_parsers.py
"total_token_throughput": concurrency * 60.0,
"request_throughput": concurrency * 0.5,
"request_goodput": concurrency * 0.48,
"request_rate": float("inf"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

float("inf") is not valid JSON and will cause serialization issues.

The request_rate: float("inf") value will fail when serialized to JSON via json.dump() since JSON doesn't support Infinity. Consider using a string representation or a very large number.

Suggested fix
-            "request_rate": float("inf"),
+            "request_rate": "inf",  # String representation for JSON compatibility

Or if numeric is required:

-            "request_rate": float("inf"),
+            "request_rate": 1e308,  # Very large number as proxy for infinity
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"request_rate": float("inf"),
"request_rate": "inf", # String representation for JSON compatibility
🤖 Prompt for AI Agents
In `@tests/fixtures_parsers.py` at line 42, The fixture contains a JSON-unsafe
value: the "request_rate" key is set to float("inf"), which will break
json.dump(); change the fixture so "request_rate" is either a string (e.g.,
"Infinity") or a finite numeric sentinel (e.g., a very large float like 1e308)
depending on downstream expectations — locate the occurrence of request_rate =
float("inf") in the fixtures_parsers data structure and replace it accordingly
and update any tests that assert numeric type if you switch to a string.

Comment thread tests/fixtures_parsers.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@analysis/docs/ARCHITECTURE_DATAFLOW.md`:
- Line 10: Update the fenced code blocks listed (lines 10, 32, 67, 123, 156,
254, 325, 367, 417, 485, 513, 565) to include language identifiers to satisfy
markdownlint MD040; use "text" for diagrams/ASCII art and "python" (or the
appropriate language) for code snippets so each triple-backtick fence becomes
```text or ```python as applicable.

In `@analysis/srtlog/parsers/benchmark/sa_bench.py`:
- Around line 167-186: The fallback path currently takes the list-valued output
from parse() (fallback_result) and stuffs entire lists into a single dict,
losing per-concurrency entries; change the fallback to normalize into one scalar
result per concurrency by iterating over the length of
fallback_result["concurrencies"] (or the longest list among
output_tps/concurrencies) and appending a dict for each index with scalar fields
(max_concurrency from the specific concurrency value, output_tps[index],
request_throughput[index], mean_ttft_ms[index], mean_itl_ms[index],
mean_tpot_ms[index], p99_ttft_ms[index], p99_itl_ms[index], completed from
completed_requests[index] or a sensible default) so downstream consumers receive
the same per-concurrency format as when JSON results are present; use the
existing symbols parse, fallback_result, and results to locate and implement
this loop-based normalization.

In `@analysis/srtlog/run_loader.py`:
- Around line 713-733: In load_node_metrics_for_run, guard against
run.metadata.container being None by defaulting it to an empty string before
calling .lower(); specifically, change how the local variable container is
derived (from run.metadata.container) so it becomes (empty string if None) then
.lower(), keep the same backend_type detection logic (checking for "trtllm",
"dynamo", "tensorrt", "sglang") and then call
self.load_node_metrics(run.metadata.path, backend_type) as before.
- Around line 692-712: The load_node_metrics function currently ignores its
backend_type parameter; update it to pass that override through to
NodeAnalyzer.parse_run_logs so callers can force a backend when auto-detection
fails (e.g., call analyzer.parse_run_logs(run_path, return_dicts=False,
backend_type=backend_type) or the appropriate override parameter name used by
parse_run_logs), while allowing parse_run_logs to still fall back to
auto-detection when the override is None/empty; this change affects
load_node_metrics and ensures callers like load_node_metrics_for_run can force
parsing for a specific backend.

In `@rollup_sa_bench_553.json`:
- Around line 7-12: The fixture metadata contains host-specific absolute paths
(e.g., the "container" field
"/home/nlevin/containers/jwillthomson_dynamo-0.8.0_... .sqsh" and potentially
other fields like "model_dir"), so replace these with redacted or portable
values (for example a neutral token like "container": "<redacted-container>" or
a relative/portable path) and ensure any other absolute paths in the JSON are
similarly replaced; update the "container" value and any host-specific entries
so the fixture contains no usernames or host-specific directories and remains
portable across environments.
- Line 5: The JSON timestamps are inconsistent (run_date, batches[].timestamp,
memory_snapshots[].timestamp) and cause parsing failures; update the generator
to emit ISO 8601 timestamps (e.g., 2026-01-15T07:14:56Z or without Z if local)
for "run_date" and every batches[].timestamp and replace empty
memory_snapshots[].timestamp values with null, and then update the parsers that
call datetime.strptime(..., "%Y-%m-%d %H:%M:%S") (the parsing calls in
analysis/srtlog/visualizations.py and analysis/dashboard/rate_match_tab.py) to
accept ISO 8601 (use datetime.fromisoformat or a robust parser like
dateutil.parser.parse) and explicitly handle null timestamps instead of silently
skipping empty strings.
- Around line 17-32: The profiler_metadata.concurrencies field is set to an
empty string which causes check_completeness() to treat the run as manual and
skip validation; update profiler_metadata.concurrencies to reflect the actual
concurrency used (e.g., "32") or set it to null if it was not explicitly
configured so that check_completeness() correctly recognizes and validates the
benchmark results; ensure the value type matches expected schema (string or
null) and corresponds to the concurrency_values entry.
🧹 Nitpick comments (1)
analysis/srtlog/log_parser.py (1)

176-268: Refactor config loading to use marshmallow_dataclass schemas for runtime validation.

The current code loads JSON/YAML configs with json.load() and yaml.safe_load() without validation, relying only on TypedDict for type hints (which provide no runtime protection). NodeConfig and GPUInfo should be converted to marshmallow_dataclass dataclasses to match the project's established patterns (see SrtConfig.from_yaml() in schema.py and validation patterns in config.py). This ensures config structure is validated before merging and prevents malformed shapes from silently affecting node_config.


## 1. Entry Point: RunLoader

```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks.
markdownlint (MD040) requires language tags; consider text for diagrams and python for code examples (e.g., Line 10, Line 32, Line 417).

Also applies to: 32-32, 67-67, 123-123, 156-156, 254-254, 325-325, 367-367, 417-417, 485-485, 513-513, 565-565

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

10-10: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@analysis/docs/ARCHITECTURE_DATAFLOW.md` at line 10, Update the fenced code
blocks listed (lines 10, 32, 67, 123, 156, 254, 325, 367, 417, 485, 513, 565) to
include language identifiers to satisfy markdownlint MD040; use "text" for
diagrams/ASCII art and "python" (or the appropriate language) for code snippets
so each triple-backtick fence becomes ```text or ```python as applicable.

Comment on lines +167 to +186
# Fallback: If no JSON results found, try parsing benchmark.out
if not results:
benchmark_out = result_dir / "benchmark.out"
if benchmark_out.exists():
logger.info(f"No JSON results found in {result_dir}, falling back to benchmark.out parsing")
# Parse benchmark.out and create a single result entry
fallback_result = self.parse(benchmark_out)
if fallback_result.get("output_tps"):
# Wrap in list format expected by caller
results.append({
"max_concurrency": fallback_result.get("concurrencies", [0])[0] if fallback_result.get("concurrencies") else 0,
"output_tps": fallback_result.get("output_tps"),
"request_throughput": fallback_result.get("request_throughput"),
"mean_ttft_ms": fallback_result.get("mean_ttft_ms"),
"mean_itl_ms": fallback_result.get("mean_itl_ms"),
"mean_tpot_ms": fallback_result.get("mean_tpot_ms"),
"p99_ttft_ms": fallback_result.get("p99_ttft_ms"),
"p99_itl_ms": fallback_result.get("p99_itl_ms"),
"completed": fallback_result.get("completed_requests"),
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fallback results are list-valued and collapse multi-concurrency data.
parse() returns lists per concurrency, but the fallback path wraps those lists into a single result dict (Line 173–186). This yields list-valued fields and drops additional concurrency entries, which can break downstream consumers expecting one scalar result per concurrency.

🔧 Suggested fix: normalize fallback into per-concurrency entries
-                if fallback_result.get("output_tps"):
-                    # Wrap in list format expected by caller
-                    results.append({
-                        "max_concurrency": fallback_result.get("concurrencies", [0])[0] if fallback_result.get("concurrencies") else 0,
-                        "output_tps": fallback_result.get("output_tps"),
-                        "request_throughput": fallback_result.get("request_throughput"),
-                        "mean_ttft_ms": fallback_result.get("mean_ttft_ms"),
-                        "mean_itl_ms": fallback_result.get("mean_itl_ms"),
-                        "mean_tpot_ms": fallback_result.get("mean_tpot_ms"),
-                        "p99_ttft_ms": fallback_result.get("p99_ttft_ms"),
-                        "p99_itl_ms": fallback_result.get("p99_itl_ms"),
-                        "completed": fallback_result.get("completed_requests"),
-                    })
+                if fallback_result.get("output_tps"):
+                    concurrencies = fallback_result.get("concurrencies") or [0]
+                    output_tps = fallback_result.get("output_tps") or []
+                    request_throughput = fallback_result.get("request_throughput") or []
+                    mean_ttft = fallback_result.get("mean_ttft_ms") or []
+                    mean_itl = fallback_result.get("mean_itl_ms") or []
+                    mean_tpot = fallback_result.get("mean_tpot_ms") or []
+                    p99_ttft = fallback_result.get("p99_ttft_ms") or []
+                    p99_itl = fallback_result.get("p99_itl_ms") or []
+                    completed = fallback_result.get("completed_requests") or []
+
+                    for idx, concurrency in enumerate(concurrencies):
+                        results.append({
+                            "max_concurrency": concurrency,
+                            "output_throughput": output_tps[idx] if idx < len(output_tps) else None,
+                            "request_throughput": request_throughput[idx] if idx < len(request_throughput) else None,
+                            "mean_ttft_ms": mean_ttft[idx] if idx < len(mean_ttft) else None,
+                            "mean_itl_ms": mean_itl[idx] if idx < len(mean_itl) else None,
+                            "mean_tpot_ms": mean_tpot[idx] if idx < len(mean_tpot) else None,
+                            "p99_ttft_ms": p99_ttft[idx] if idx < len(p99_ttft) else None,
+                            "p99_itl_ms": p99_itl[idx] if idx < len(p99_itl) else None,
+                            "completed": completed[idx] if idx < len(completed) else None,
+                        })
🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/benchmark/sa_bench.py` around lines 167 - 186, The
fallback path currently takes the list-valued output from parse()
(fallback_result) and stuffs entire lists into a single dict, losing
per-concurrency entries; change the fallback to normalize into one scalar result
per concurrency by iterating over the length of fallback_result["concurrencies"]
(or the longest list among output_tps/concurrencies) and appending a dict for
each index with scalar fields (max_concurrency from the specific concurrency
value, output_tps[index], request_throughput[index], mean_ttft_ms[index],
mean_itl_ms[index], mean_tpot_ms[index], p99_ttft_ms[index], p99_itl_ms[index],
completed from completed_requests[index] or a sensible default) so downstream
consumers receive the same per-concurrency format as when JSON results are
present; use the existing symbols parse, fallback_result, and results to locate
and implement this loop-based normalization.

Comment on lines +692 to +712
def load_node_metrics(self, run_path: str, backend_type: str = "sglang") -> list[NodeMetrics]:
"""Load node metrics from worker log files using NodeAnalyzer.

Args:
run_path: Path to the run directory
backend_type: Backend type (sglang or trtllm) - deprecated, auto-detected

Returns:
List of NodeMetrics objects, one per worker
"""
# Handle both relative and absolute paths
if not os.path.isabs(run_path):
run_path = os.path.join(self.logs_dir, run_path)

# Use NodeAnalyzer which handles caching, backend detection, and config loading
analyzer = NodeAnalyzer()
node_infos = analyzer.parse_run_logs(run_path, return_dicts=False)

# Extract only the metrics from each NodeInfo
return [node.metrics for node in node_infos]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Honor backend_type override to avoid detection failures.

load_node_metrics ignores its backend_type argument. If auto-detection fails, callers (including load_node_metrics_for_run) have no way to force the parser, leading to empty results. Consider passing the override through to NodeAnalyzer.parse_run_logs (and letting that method fall back to detection).

🐛 Proposed fix
-    def parse_run_logs(self, run_path: str, return_dicts: bool = False) -> list:
+    def parse_run_logs(
+        self,
+        run_path: str,
+        return_dicts: bool = False,
+        backend_type: str | None = None,
+    ) -> list:
@@
-        backend_type = self._detect_backend_type(run_path)
+        backend_type = backend_type or self._detect_backend_type(run_path)
-        node_infos = analyzer.parse_run_logs(run_path, return_dicts=False)
+        node_infos = analyzer.parse_run_logs(run_path, return_dicts=False, backend_type=backend_type)
🤖 Prompt for AI Agents
In `@analysis/srtlog/run_loader.py` around lines 692 - 712, The load_node_metrics
function currently ignores its backend_type parameter; update it to pass that
override through to NodeAnalyzer.parse_run_logs so callers can force a backend
when auto-detection fails (e.g., call analyzer.parse_run_logs(run_path,
return_dicts=False, backend_type=backend_type) or the appropriate override
parameter name used by parse_run_logs), while allowing parse_run_logs to still
fall back to auto-detection when the override is None/empty; this change affects
load_node_metrics and ensures callers like load_node_metrics_for_run can force
parsing for a specific backend.

Comment on lines +713 to +733
def load_node_metrics_for_run(self, run: BenchmarkRun) -> list[NodeMetrics]:
"""Load node metrics for a BenchmarkRun.

Automatically detects backend type from the run's container image.

Args:
run: BenchmarkRun object

Returns:
List of NodeMetrics objects
"""
# Detect backend type from container
backend_type = "sglang" # Default
container = run.metadata.container.lower()

if "trtllm" in container or "dynamo" in container or "tensorrt" in container:
backend_type = "trtllm"
elif "sglang" in container:
backend_type = "sglang"

return self.load_node_metrics(run.metadata.path, backend_type)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against missing container values.

run.metadata.container can be None for older runs, so .lower() would raise. Default to an empty string.

🐛 Proposed fix
-        container = run.metadata.container.lower()
+        container = (run.metadata.container or "").lower()
🤖 Prompt for AI Agents
In `@analysis/srtlog/run_loader.py` around lines 713 - 733, In
load_node_metrics_for_run, guard against run.metadata.container being None by
defaulting it to an empty string before calling .lower(); specifically, change
how the local variable container is derived (from run.metadata.container) so it
becomes (empty string if None) then .lower(), keep the same backend_type
detection logic (checking for "trtllm", "dynamo", "tensorrt", "sglang") and then
call self.load_node_metrics(run.metadata.path, backend_type) as before.

Comment thread rollup_sa_bench_553.json Outdated
Comment thread rollup_sa_bench_553.json Outdated
Comment thread rollup_sa_bench_553.json Outdated
Comment on lines +17 to +32
"profiler_metadata": {
"profiler_type": "sa-bench",
"isl": "8192",
"osl": "1024",
"concurrencies": ""
},
"profiler_results": {
"output_tps": [
1748.0954656979873
],
"request_throughput": [
1.9035306394778506
],
"concurrency_values": [
32
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Locate SA-Bench parser/schema expectations for "concurrencies".
rg -n "SABench|sa-bench|SABenchParser|profiler_metadata|concurrenc" -g '*.py' | head -50

Repository: ishandhanani/srt-slurm

Length of output: 4409


🏁 Script executed:

#!/bin/bash
# Also search for JSON schema files or related metadata definitions
fd -e schema -o -e json -o -e py | xargs rg -l "profiler_metadata|sa-bench" 2>/dev/null | head -20

Repository: ishandhanani/srt-slurm

Length of output: 1383


🏁 Script executed:

#!/bin/bash
# Look for the specific file being reviewed
fd "rollup_sa_bench_553.json" 2>/dev/null

Repository: ishandhanani/srt-slurm

Length of output: 90


🏁 Script executed:

#!/bin/bash
# Find the SA-Bench parser implementation
fd -e py | xargs rg -l "class.*SABench.*Parser|def.*profiler_metadata" | head -10

Repository: ishandhanani/srt-slurm

Length of output: 169


🏁 Script executed:

#!/bin/bash
# Look for the fixtures_parsers module
fd "fixtures_parsers.py"

Repository: ishandhanani/srt-slurm

Length of output: 91


🏁 Script executed:

#!/bin/bash
# Check the actual rollup file contents to see the full structure
cat rollup_sa_bench_553.json | head -40

Repository: ishandhanani/srt-slurm

Length of output: 1007


🏁 Script executed:

#!/bin/bash
# Check the SA-Bench parser implementation
cat analysis/srtlog/parsers/benchmark/sa_bench.py

Repository: ishandhanani/srt-slurm

Length of output: 14728


🏁 Script executed:

#!/bin/bash
# Check the fixtures to understand expected format and SampleSABenchData
rg -A 20 "class SampleSABenchData|def result_json" tests/fixtures_parsers.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 1895


🏁 Script executed:

#!/bin/bash
# Search for code that creates or validates the rollup structure
fd -e py | xargs rg -l "profiler_metadata|profiler_results" | grep -v test | grep -v fixtures

Repository: ishandhanani/srt-slurm

Length of output: 256


🏁 Script executed:

#!/bin/bash
# Search for where concurrencies field is set/populated
rg -n "concurrencies.*=" -g '*.py' | grep -v test | grep -v "concurrency_values"

Repository: ishandhanani/srt-slurm

Length of output: 1023


🏁 Script executed:

#!/bin/bash
# Look for rollup generation code
fd -e py | xargs rg -l "rollup" | grep -v test

Repository: ishandhanani/srt-slurm

Length of output: 111


🏁 Script executed:

#!/bin/bash
# Check the ProfilerMetadata class definition in models.py
rg -B 5 -A 15 "class ProfilerMetadata" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 573


🏁 Script executed:

#!/bin/bash
# Check how concurrencies is populated in rollup building code
rg -B 5 -A 10 "concurrencies.*profiler" analysis/srtlog/run_loader.py

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check the schema definition for concurrencies more carefully
rg -B 5 -A 5 "concurrencies:" src/srtctl/core/schema.py

Repository: ishandhanani/srt-slurm

Length of output: 438


🏁 Script executed:

#!/bin/bash
# Check how profiler_metadata is created/populated when building the rollup
rg -B 10 -A 10 "ProfilerMetadata\|profiler_metadata" analysis/srtlog/run_loader.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check if there's any validation or schema checking for profiler_metadata
rg -n "validate|schema|check.*concurrenc" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Get a broader view of ProfilerMetadata usage in run_loader
cat analysis/srtlog/run_loader.py | grep -n "profiler_metadata\|ProfilerMetadata" -B 3 -A 3

Repository: ishandhanani/srt-slurm

Length of output: 1629


🏁 Script executed:

#!/bin/bash
# Check the entire ProfilerMetadata from_json method
rg -A 20 "class ProfilerMetadata" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 578


🏁 Script executed:

#!/bin/bash
# Get the complete from_json method for ProfilerMetadata
rg -A 30 "def from_json.*ProfilerMetadata" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 1188


🏁 Script executed:

#!/bin/bash
# Search for where the JSON is loaded and ProfilerMetadata is created
rg -B 5 -A 10 "profiler_metadata.*from_json\|ProfilerMetadata.from_json" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check how concurrencies field is used in consumers/dashboards
rg -n "\.concurrencies\|concurrenc" analysis/dashboard/*.py | head -30

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check if there are any data validation or normalization functions
rg -B 3 -A 10 "def.*normalize|def.*validate.*profiler\|def.*process.*metadata" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check how ProfilerResults.concurrency_values is populated
rg -B 5 -A 15 "concurrency_values" analysis/srtlog/models.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 2283


🏁 Script executed:

#!/bin/bash
# Look for any place where profiler_metadata.concurrencies is actually used or validated
rg "profiler_metadata\.concurrenc|\.concurrencies" --type py | grep -v test | grep -v "^Binary"

Repository: ishandhanani/srt-slurm

Length of output: 900


🏁 Script executed:

#!/bin/bash
# Check the code that uses profiler_metadata.concurrencies
rg -B 5 -A 10 "if not self.profiler_metadata.concurrencies" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 687


🏁 Script executed:

#!/bin/bash
# Check BenchmarkRun to see the full context
rg -B 10 -A 20 "class BenchmarkRun" analysis/srtlog/models.py | head -60

Repository: ishandhanani/srt-slurm

Length of output: 992


🏁 Script executed:

#!/bin/bash
# Check the complete validate_completeness method
rg -B 2 -A 25 "def validate_completeness" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check if there are any downstream consumers that depend on concurrencies being populated
rg "profiler_metadata\.concurrencies" analysis/srtlog/*.py analysis/dashboard/*.py 2>/dev/null

Repository: ishandhanani/srt-slurm

Length of output: 238


🏁 Script executed:

#!/bin/bash
# Verify the complete validate_completeness logic to confirm the issue
rg -B 5 -A 35 "def validate_completeness" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Search for validate_completeness more broadly
rg "validate_completeness" analysis/srtlog/models.py -A 30

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Also check for where is_complete and missing_concurrencies are set
rg -n "is_complete|missing_concurrencies" analysis/srtlog/models.py | head -20

Repository: ishandhanani/srt-slurm

Length of output: 429


🏁 Script executed:

#!/bin/bash
# Get the full method context around line 366-389
sed -n '360,395p' analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 1243


🏁 Script executed:

#!/bin/bash
# Check how the JSON file is created/populated - look at any rollup generation code
fd -e py | xargs rg -l "rollup.*json|json.*rollup" | grep -v test | head -5

Repository: ishandhanani/srt-slurm

Length of output: 48


Fix profiler_metadata.concurrencies to align with actual results.

The empty string "" in concurrencies while concurrency_values contains [32] breaks the data validation logic. When concurrencies is empty, the check_completeness() method treats the run as manual and skips validation entirely, even though actual benchmark results exist. Either populate concurrencies with the actual value (e.g., "32") to match the results, or set it to null if the configuration was not explicitly specified.

🤖 Prompt for AI Agents
In `@rollup_sa_bench_553.json` around lines 17 - 32, The
profiler_metadata.concurrencies field is set to an empty string which causes
check_completeness() to treat the run as manual and skip validation; update
profiler_metadata.concurrencies to reflect the actual concurrency used (e.g.,
"32") or set it to null if it was not explicitly configured so that
check_completeness() correctly recognizes and validates the benchmark results;
ensure the value type matches expected schema (string or null) and corresponds
to the concurrency_values entry.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@analysis/srtlog/parsers/nodes/sglang.py`:
- Around line 233-240: The _parse_timestamp function's regex requires fractional
seconds so timestamps like "2025-12-30T15:52:38Z" are missed; update the regex
in _parse_timestamp to make the fractional-seconds part optional (use a
non-capturing optional group like (?:\.\d+)? ) while preserving the optional
trailing Z, so it matches both "YYYY-MM-DDTHH:MM:SSZ" and
"YYYY-MM-DDTHH:MM:SS.sssZ"; keep returning match.group(1) as before.
- Around line 116-197: The file read using open(log_path) can raise
UnicodeDecodeError; replace the open loop with a tolerant read (e.g.,
Path(log_path).read_text(errors="replace") or similar) and iterate over the
returned string's lines so parsing still runs even with invalid bytes; update
the code that currently appends to full_content and feeds clean_line into
_parse_prefill_batch_line, _parse_decode_batch_line, _parse_memory_line and the
subsequent tp/dp/ep regex extraction to use the decoded text lines, then call
parse_launch_command("".join(full_content), node_info["worker_type"]) as before.
🧹 Nitpick comments (3)
analysis/srtlog/parsers/nodes/trtllm.py (2)

77-81: Prefer Path.iterdir() over os.listdir() for consistency.

Since log_dir is already a Path object (line 69), using pathlib methods throughout would be more idiomatic.

♻️ Suggested refactor
         # Find all worker log files
-        for file in os.listdir(log_dir):
-            if not (file.endswith(".err") or file.endswith(".out")):
+        for filepath in log_dir.iterdir():
+            if not filepath.is_file() or filepath.suffix not in (".err", ".out"):
                 continue
-            if not any(wt in file for wt in ("prefill", "decode", "agg")):
+            if not any(wt in filepath.name for wt in ("prefill", "decode", "agg")):
                 continue

-            filepath = log_dir / file
             node = self.parse_single_log(filepath)

398-401: Minor inefficiency: ANSI codes stripped twice when called internally.

When parse_launch_command is called from parse_single_log (line 114), the content has already been stripped. However, since this method is public and could be called directly with raw log content, the defensive stripping is reasonable.

If performance becomes a concern for large logs, consider passing a boolean flag or checking for ANSI presence first:

♻️ Optional optimization
-    def parse_launch_command(self, log_content: str, worker_type: str = "unknown") -> NodeLaunchCommand | None:
+    def parse_launch_command(
+        self, log_content: str, worker_type: str = "unknown", *, already_cleaned: bool = False
+    ) -> NodeLaunchCommand | None:
         ...
         from analysis.srtlog.parsers import NodeLaunchCommand

         # Strip ANSI codes for cleaner parsing
-        clean_content = ANSI_ESCAPE.sub("", log_content)
+        clean_content = log_content if already_cleaned else ANSI_ESCAPE.sub("", log_content)
analysis/srtlog/parsers/nodes/sglang.py (1)

243-343: Type the parsed metric dicts with TypedDict.

The parsers return untyped dict | None for external log data. Defining TypedDicts for prefill/decode/memory metrics improves ty checking and makes required vs optional fields explicit. As per coding guidelines, TypedDicts should be used for external data shapes.

Example shape:

from typing import TypedDict, Literal

class PrefillMetricsDict(TypedDict, total=False):
    timestamp: str
    type: Literal["prefill"]
    new_seq: int
    new_token: int
    cached_token: int
    token_usage: float
    running_req: int
    queue_req: int
    prealloc_req: int
    inflight_req: int
    input_throughput: float

As per coding guidelines, use TypedDict for parsed external data structures.

Comment on lines +116 to +197
try:
with open(log_path) as f:
for line in f:
full_content.append(line)
# Strip ANSI escape codes
clean_line = ANSI_ESCAPE.sub("", line)

# Parse prefill batch metrics
batch_metrics = self._parse_prefill_batch_line(clean_line)
if batch_metrics:
batches.append(
BatchMetrics(
timestamp=batch_metrics["timestamp"],
dp=0, # Default since not in log
tp=0,
ep=0,
batch_type=batch_metrics["type"],
new_seq=batch_metrics.get("new_seq"),
new_token=batch_metrics.get("new_token"),
cached_token=batch_metrics.get("cached_token"),
token_usage=batch_metrics.get("token_usage"),
running_req=batch_metrics.get("running_req"),
queue_req=batch_metrics.get("queue_req"),
prealloc_req=batch_metrics.get("prealloc_req"),
inflight_req=batch_metrics.get("inflight_req"),
input_throughput=batch_metrics.get("input_throughput"),
)
)

# Parse decode batch metrics
decode_metrics = self._parse_decode_batch_line(clean_line)
if decode_metrics:
batches.append(
BatchMetrics(
timestamp=decode_metrics["timestamp"],
dp=0,
tp=0,
ep=0,
batch_type=decode_metrics["type"],
running_req=decode_metrics.get("running_req"),
queue_req=decode_metrics.get("queue_req"),
prealloc_req=decode_metrics.get("prealloc_req"),
transfer_req=decode_metrics.get("transfer_req"),
token_usage=decode_metrics.get("token_usage"),
preallocated_usage=decode_metrics.get("preallocated_usage"),
num_tokens=decode_metrics.get("num_tokens"),
gen_throughput=decode_metrics.get("gen_throughput"),
)
)

# Parse memory metrics
mem_metrics = self._parse_memory_line(clean_line)
if mem_metrics:
memory_snapshots.append(
MemoryMetrics(
timestamp=mem_metrics["timestamp"],
dp=0,
tp=0,
ep=0,
metric_type=mem_metrics["type"],
avail_mem_gb=mem_metrics.get("avail_mem_gb"),
mem_usage_gb=mem_metrics.get("mem_usage_gb"),
kv_cache_gb=mem_metrics.get("kv_cache_gb"),
kv_tokens=mem_metrics.get("kv_tokens"),
)
)

# Extract TP/DP/EP configuration from server_args
if "tp_size=" in clean_line:
tp_match = re.search(r"tp_size=(\d+)", clean_line)
dp_match = re.search(r"dp_size=(\d+)", clean_line)
ep_match = re.search(r"ep_size=(\d+)", clean_line)

if tp_match:
config["tp_size"] = int(tp_match.group(1))
if dp_match:
config["dp_size"] = int(dp_match.group(1))
if ep_match:
config["ep_size"] = int(ep_match.group(1))

# Parse launch command from full content
launch_command = self.parse_launch_command("".join(full_content), node_info["worker_type"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden log reading to avoid decode failures.

open(log_path) uses the platform default encoding and can raise UnicodeDecodeError on logs with invalid bytes, causing the entire file to be skipped. Use read_text(errors="replace") (as in the TRTLLM parser) and iterate over the decoded content to keep parsing resilient.

🛠️ Proposed fix
-        full_content = []
+        full_content = ""
@@
-            with open(log_path) as f:
-                for line in f:
-                    full_content.append(line)
+            full_content = log_path.read_text(errors="replace")
+            for line in full_content.splitlines():
                     # Strip ANSI escape codes
                     clean_line = ANSI_ESCAPE.sub("", line)
@@
-            launch_command = self.parse_launch_command("".join(full_content), node_info["worker_type"])
+            launch_command = self.parse_launch_command(full_content, node_info["worker_type"])
🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/nodes/sglang.py` around lines 116 - 197, The file
read using open(log_path) can raise UnicodeDecodeError; replace the open loop
with a tolerant read (e.g., Path(log_path).read_text(errors="replace") or
similar) and iterate over the returned string's lines so parsing still runs even
with invalid bytes; update the code that currently appends to full_content and
feeds clean_line into _parse_prefill_batch_line, _parse_decode_batch_line,
_parse_memory_line and the subsequent tp/dp/ep regex extraction to use the
decoded text lines, then call parse_launch_command("".join(full_content),
node_info["worker_type"]) as before.

Comment thread analysis/srtlog/parsers/nodes/sglang.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/test_dashboard.py`:
- Around line 147-166: The test test_parse_timestamp_mixed_formats asserts exact
float equality for delta.total_seconds(), which is brittle; update the assertion
to use pytest.approx for floating comparison (e.g., assert delta.total_seconds()
== pytest.approx(1.1)) and ensure pytest is imported in the test file if not
already; the change should be made next to the use of _parse_timestamp and
delta.total_seconds() in that test.
🧹 Nitpick comments (1)
tests/test_dashboard.py (1)

18-31: Add return type annotations to test methods.

To align with the project’s “type hints everywhere” guideline, annotate test methods with -> None (and apply consistently across this file).

♻️ Proposed update (apply pattern to all tests in this class)
-    def test_parse_timestamp_yyyy_mm_dd(self):
+    def test_parse_timestamp_yyyy_mm_dd(self) -> None:
As per coding guidelines, please add return type annotations.

Comment thread tests/test_dashboard.py
Comment on lines +147 to +166
def test_parse_timestamp_mixed_formats(self):
"""Test parsing a sequence of different timestamp formats."""
from analysis.dashboard.rate_match_tab import _parse_timestamp

# Simulate what dashboard might see from different backends
timestamps = [
"2025-12-30 15:52:38", # Standard (could be from old cache)
"2025-12-30T15:52:39.100000Z", # SGLang
"01/23/2026-08:04:40", # TRTLLM
]

dts = [_parse_timestamp(ts) for ts in timestamps]

# All should parse successfully
assert len(dts) == 3
assert all(isinstance(dt, datetime) for dt in dts)

# Should be able to compute deltas (even if not chronological)
delta = dts[1] - dts[0]
assert delta.total_seconds() == 1.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use pytest.approx for non-integer float comparisons.

delta.total_seconds() is a float derived from microseconds; direct equality can be brittle.

🐛 Proposed fix
-        assert delta.total_seconds() == 1.1
+        assert delta.total_seconds() == pytest.approx(1.1)
🤖 Prompt for AI Agents
In `@tests/test_dashboard.py` around lines 147 - 166, The test
test_parse_timestamp_mixed_formats asserts exact float equality for
delta.total_seconds(), which is brittle; update the assertion to use
pytest.approx for floating comparison (e.g., assert delta.total_seconds() ==
pytest.approx(1.1)) and ensure pytest is imported in the test file if not
already; the change should be made next to the use of _parse_timestamp and
delta.total_seconds() in that test.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@analysis/dashboard/rate_match_tab.py`:
- Around line 129-133: The int conversions for run.profiler_metadata.isl and
.osl can raise ValueError for non-numeric strings; update the block that assigns
isl and osl (currently creating variables isl and osl used for the rate match
graph) to wrap each int(...) conversion in a try/except (ValueError, TypeError)
and fall back to None on failure, leaving st.metric display unchanged; ensure
you reference run.profiler_metadata.isl and run.profiler_metadata.osl and assign
valid integers or None so downstream graph code handles missing values safely.

In `@analysis/srtlog/log_parser.py`:
- Around line 88-95: The cache currently saves only metrics (via
_serialize_node_metrics) and later reconstructs NodeInfo from metrics, dropping
node_config and launch_command; change the caching to persist node_config (and
any parsed launch_command/env) alongside metrics: update the code that builds
the cache (where node_infos is used and _serialize_node_metrics is called) to
serialize full NodeInfo entries (or return a tuple/dict with metrics and
node_config/launch_command), extend _serialize_node_metrics (or add a new
_serialize_nodeinfo_list) to include node_config and launch_command, and update
cache load/deserialization logic that rebuilds NodeInfo on cache hits to restore
node_config and launch_command so cached NodeInfo objects match fresh parses
(apply same change to other caching sites referenced in the comment).

In `@analysis/srtlog/models.py`:
- Around line 145-176: The ProfilerMetadata dataclass (and the other
configuration/identity dataclasses referenced around lines 288-300 and 441-506)
should be made immutable by marking them as frozen dataclasses; update the
`@dataclass` decorator for ProfilerMetadata (and the other config classes) to
include frozen=True so their fields cannot be mutated after construction,
ensuring from_json classmethods and any consumers continue to return instances
with the same API but immutable state.

In `@tests/test_runloader_parsers.py`:
- Around line 14-17: The test module imports RunLoader at collection time which
pulls in pandas and causes collection to fail when pandas is missing; fix by
making the dependency check explicit before importing RunLoader (e.g., call
pytest.importorskip("pandas") at the top of the test module) or alternatively
move the import of RunLoader inside the test functions/setup so import happens
only when pandas is available; update tests referencing RunLoader,
ParserTestHarness, and SampleSABenchData accordingly.
🧹 Nitpick comments (4)
tests/test_parsers.py (1)

428-436: Remove redundant pytest import inside test method.

pytest is already imported at the module level (line 19). The same applies to the import at line 790.

♻️ Suggested fix
     def test_parse_timestamp_invalid(self, parser):
         """Test parsing invalid timestamp raises ValueError."""
-        import pytest
-
         with pytest.raises(ValueError):
             parser.parse_timestamp("invalid-timestamp")
analysis/srtlog/parsers/benchmark/mooncake_router.py (1)

216-224: Simplify redundant loop structure.

The loop for root_dir in [logs_dir]: only iterates once over a single-element list. The loop can be removed.

♻️ Suggested fix
         if logs_dir.exists():
             # Check if there are any AIPerf result files using iterdir recursively
             try:
-                for root_dir in [logs_dir]:
-                    for item in root_dir.rglob("profile_export_aiperf.json"):
-                        logger.info(f"Found mooncake-router results in: {logs_dir}")
-                        return logs_dir
+                for item in logs_dir.rglob("profile_export_aiperf.json"):
+                    logger.info(f"Found mooncake-router results in: {logs_dir}")
+                    return logs_dir
             except (OSError, PermissionError) as e:
                 logger.warning(f"Error accessing {logs_dir}: {e}")
analysis/srtlog/parsers/__init__.py (1)

161-188: Guard against duplicate registrations to avoid silent overrides.

Right now, registering the same type twice will quietly replace the previous parser, which can hide bugs.

♻️ Proposed fix
 def register_benchmark_parser(benchmark_type: str):
@@
-    def decorator(cls):
-        _benchmark_parsers[benchmark_type] = cls
+    def decorator(cls):
+        if benchmark_type in _benchmark_parsers:
+            raise ValueError(f"Benchmark parser already registered for '{benchmark_type}'")
+        _benchmark_parsers[benchmark_type] = cls
         return cls
@@
 def register_node_parser(backend_type: str):
@@
-    def decorator(cls):
-        _node_parsers[backend_type] = cls
+    def decorator(cls):
+        if backend_type in _node_parsers:
+            raise ValueError(f"Node parser already registered for '{backend_type}'")
+        _node_parsers[backend_type] = cls
         return cls
analysis/srtlog/log_parser.py (1)

120-170: Consider narrowing JSON scan to likely metadata files.

On large runs, parsing every *.json in the directory can add noticeable overhead; filtering to known metadata filenames (or small-size files) can keep detection fast.

Comment on lines +129 to +133
st.metric("ISL/OSL", f"{run.profiler_metadata.isl}/{run.profiler_metadata.osl}")

# Create rate match graph
isl = int(run.profiler.isl) if run.profiler.isl else None
osl = int(run.profiler.osl) if run.profiler.osl else None
isl = int(run.profiler_metadata.isl) if run.profiler_metadata.isl else None
osl = int(run.profiler_metadata.osl) if run.profiler_metadata.osl else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the ProfilerMetadata class definition
rg -t py "class ProfilerMetadata" --max-count 5

Repository: ishandhanani/srt-slurm

Length of output: 115


🏁 Script executed:

# Look at the full context of rate_match_tab.py around lines 129-133
fd -t f "rate_match_tab.py" -x wc -l {}

Repository: ishandhanani/srt-slurm

Length of output: 108


🏁 Script executed:

# Check the file size first
wc -l analysis/dashboard/rate_match_tab.py 2>/dev/null || fd -t f "rate_match_tab.py"

Repository: ishandhanani/srt-slurm

Length of output: 106


🏁 Script executed:

# Read ProfilerMetadata class definition
cat -n analysis/srtlog/models.py | grep -A 50 "class ProfilerMetadata"

Repository: ishandhanani/srt-slurm

Length of output: 2092


🏁 Script executed:

# Read the rate_match_tab.py context
sed -n '120,145p' analysis/dashboard/rate_match_tab.py

Repository: ishandhanani/srt-slurm

Length of output: 1421


🏁 Script executed:

# Search for test data or JSON examples showing isl/osl values
fd -t f "\\.json" | head -10

Repository: ishandhanani/srt-slurm

Length of output: 92


🏁 Script executed:

# Look at _create_rate_match_graph to see how isl/osl are used
sed -n '140,180p' analysis/dashboard/rate_match_tab.py

Repository: ishandhanani/srt-slurm

Length of output: 1682


🏁 Script executed:

# Check if there are any tests or data samples
find . -type f \( -name "*.json" -o -name "*test*" \) | grep -i "profiler\|metadata\|isl\|osl" | head -20

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Check the config file to see if there are examples
cat configs/deepep_config.json

Repository: ishandhanani/srt-slurm

Length of output: 559


🏁 Script executed:

# Search for where isl/osl values are set or used
rg "isl|osl" -t py -B 2 -A 2 | head -100

Repository: ishandhanani/srt-slurm

Length of output: 5297


🏁 Script executed:

# Look for any test files that might show expected values
find . -type f -name "*test*.py" | xargs rg -l "isl\|osl" 2>/dev/null

Repository: ishandhanani/srt-slurm

Length of output: 50


🏁 Script executed:

# Check the actual JSON source to see if isl/osl can be non-numeric
rg "\"isl\"|\"osl\"" -t py -B 5 -A 5 | grep -A 10 "get"

Repository: ishandhanani/srt-slurm

Length of output: 1565


🏁 Script executed:

# Look for any real-world JSON samples or where the JSON data comes from
rg "benchmark.*json\|\.json.*benchmark" -t py -B 3 -A 3

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Check if there's any existing error handling for int() conversions in the file
rg "int\(" analysis/dashboard/rate_match_tab.py -B 2 -A 2

Repository: ishandhanani/srt-slurm

Length of output: 440


Guard ISL/OSL int conversion against non-numeric strings.

ProfilerMetadata.isl and ProfilerMetadata.osl are stored as strings without validation. Non-numeric values (such as "N/A") will raise ValueError and crash the dashboard. Wrap conversions in try-except blocks to safely handle invalid inputs.

🛠️ Proposed fix
-            isl = int(run.profiler_metadata.isl) if run.profiler_metadata.isl else None
-            osl = int(run.profiler_metadata.osl) if run.profiler_metadata.osl else None
+            try:
+                isl = int(run.profiler_metadata.isl) if run.profiler_metadata.isl else None
+            except (TypeError, ValueError):
+                isl = None
+            try:
+                osl = int(run.profiler_metadata.osl) if run.profiler_metadata.osl else None
+            except (TypeError, ValueError):
+                osl = None
🤖 Prompt for AI Agents
In `@analysis/dashboard/rate_match_tab.py` around lines 129 - 133, The int
conversions for run.profiler_metadata.isl and .osl can raise ValueError for
non-numeric strings; update the block that assigns isl and osl (currently
creating variables isl and osl used for the rate match graph) to wrap each
int(...) conversion in a try/except (ValueError, TypeError) and fall back to
None on failure, leaving st.metric display unchanged; ensure you reference
run.profiler_metadata.isl and run.profiler_metadata.osl and assign valid
integers or None so downstream graph code handles missing values safely.

Comment on lines 88 to 95
# Save to cache if we have data
if nodes:
cache_df = self._serialize_node_metrics(nodes)
if node_infos:
# Extract metrics for caching
metrics_list = [ni.metrics for ni in node_infos]
cache_df = self._serialize_node_metrics(metrics_list)
cache_mgr.save_to_cache("node_metrics", cache_df, source_patterns)
logger.info(f"Parsed and cached {len(node_infos)} nodes from {logs_dir}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cache path drops node_config / launch_command data.

On a cache hit, NodeInfo is reconstructed from metrics only, so launch_command/environment derived from log parsing can be lost unless config files exist. This makes cached results inconsistent with fresh parses.

🐛 Proposed fix: persist node_config alongside metrics
@@
-            metrics_list = [ni.metrics for ni in node_infos]
-            cache_df = self._serialize_node_metrics(metrics_list)
+            cache_df = self._serialize_node_metrics(node_infos)
@@
-    def _serialize_node_metrics(self, nodes: list) -> pd.DataFrame:
+    def _serialize_node_metrics(self, nodes: list[NodeInfo]) -> pd.DataFrame:
@@
-        for node in nodes:
-            metadata = node.metadata
-            config = node.config
+        for node_info in nodes:
+            metrics = node_info.metrics
+            metadata = metrics.metadata
+            config = metrics.config
+            node_config_json = json.dumps(node_info.node_config) if node_info.node_config else None
@@
-                row = {
+                row = {
                     # Node identification
                     "node": metadata.node_name,
                     "worker_type": metadata.worker_type,
                     "worker_id": metadata.worker_id,
+                    "node_config_json": node_config_json,
@@
-                row = {
+                row = {
                     # Node identification
                     "node": metadata.node_name,
                     "worker_type": metadata.worker_type,
                     "worker_id": metadata.worker_id,
+                    "node_config_json": node_config_json,
@@
-            # Extract config (same for all rows in this node)
+            # Extract config (same for all rows in this node)
             config = {}
             if not group_df.empty:
                 first_row = group_df.iloc[0]
+                node_config: dict = {}
+                node_config_json = first_row.get("node_config_json")
+                if pd.notna(node_config_json):
+                    try:
+                        node_config = json.loads(node_config_json)
+                    except Exception as e:
+                        logger.debug(f"Could not parse node_config_json: {e}")
@@
-            node_info = NodeInfo(metrics=metrics, node_config={})
+            node_info = NodeInfo(metrics=metrics, node_config=node_config or {})

Also applies to: 394-469, 471-582

🤖 Prompt for AI Agents
In `@analysis/srtlog/log_parser.py` around lines 88 - 95, The cache currently
saves only metrics (via _serialize_node_metrics) and later reconstructs NodeInfo
from metrics, dropping node_config and launch_command; change the caching to
persist node_config (and any parsed launch_command/env) alongside metrics:
update the code that builds the cache (where node_infos is used and
_serialize_node_metrics is called) to serialize full NodeInfo entries (or return
a tuple/dict with metrics and node_config/launch_command), extend
_serialize_node_metrics (or add a new _serialize_nodeinfo_list) to include
node_config and launch_command, and update cache load/deserialization logic that
rebuilds NodeInfo on cache hits to restore node_config and launch_command so
cached NodeInfo objects match fresh parses (apply same change to other caching
sites referenced in the comment).

Comment thread analysis/srtlog/models.py
Comment on lines +145 to +176
@dataclass
class ProfilerMetadata:
"""Metadata about the benchmark/profiler configuration.

This describes what the benchmark was configured to do,
not the actual results.
"""

profiler_type: str
isl: str
osl: str
concurrencies: str = ""
req_rate: str = ""

@classmethod
def from_json(cls, json_data: dict) -> "ProfilerMetadata":
"""Create from {jobid}.json benchmark section.

Args:
json_data: Parsed JSON from {jobid}.json file

Returns:
ProfilerMetadata instance
"""
profiler_meta = json_data.get("benchmark", {})

return cls(
profiler_type=profiler_meta.get("type", "unknown"),
isl=str(profiler_meta.get("isl", "")),
osl=str(profiler_meta.get("osl", "")),
concurrencies=profiler_meta.get("concurrencies", ""),
req_rate=profiler_meta.get("req-rate", ""),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n analysis/srtlog/models.py | head -200

Repository: ishandhanani/srt-slurm

Length of output: 8762


🏁 Script executed:

cat -n analysis/srtlog/models.py | sed -n '280,310p'

Repository: ishandhanani/srt-slurm

Length of output: 1175


🏁 Script executed:

cat -n analysis/srtlog/models.py | sed -n '430,510p'

Repository: ishandhanani/srt-slurm

Length of output: 2952


🏁 Script executed:

# Check if there are any __post_init__ methods or mutation patterns in these classes
rg -A 10 "class ProfilerMetadata|class BenchmarkLaunchCommand|class NodeMetadata|class NodeLaunchCommand" analysis/srtlog/models.py | head -100

Repository: ishandhanani/srt-slurm

Length of output: 1249


🏁 Script executed:

# Verify no mutation methods exist by searching for methods that modify fields
rg -B 2 -A 5 "self\.(profiler_type|isl|osl|concurrencies|req_rate|benchmark_type|raw_command|extra_args|node_name|worker_type|worker_id|backend_type) =" analysis/srtlog/models.py

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

# Check if RunMetadata (also a config class) is frozen and verify consistency
rg -B 2 -A 15 "^@dataclass|^class RunMetadata" analysis/srtlog/models.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 1136


Make configuration dataclasses immutable.

These classes are configuration/identity carriers; freezing prevents accidental mutation and aligns with project conventions.

♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
 class ProfilerMetadata:
@@
-@dataclass
+@dataclass(frozen=True)
 class BenchmarkLaunchCommand:
@@
-@dataclass
+@dataclass(frozen=True)
 class NodeMetadata:
@@
-@dataclass
+@dataclass(frozen=True)
 class NodeLaunchCommand:

As per coding guidelines, configuration objects should be frozen dataclasses.

Also applies to: 288-300, 441-506

🤖 Prompt for AI Agents
In `@analysis/srtlog/models.py` around lines 145 - 176, The ProfilerMetadata
dataclass (and the other configuration/identity dataclasses referenced around
lines 288-300 and 441-506) should be made immutable by marking them as frozen
dataclasses; update the `@dataclass` decorator for ProfilerMetadata (and the other
config classes) to include frozen=True so their fields cannot be mutated after
construction, ensuring from_json classmethods and any consumers continue to
return instances with the same API but immutable state.

Comment on lines +14 to +17
import pytest

from analysis.srtlog.run_loader import RunLoader
from tests.fixtures_parsers import ParserTestHarness, SampleSABenchData
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same pipeline failure applies here.

This file also fails test collection due to the missing pandas dependency triggered by importing RunLoader. See the comment on tests/test_parsers.py for details.

🧰 Tools
🪛 GitHub Actions: CI

[error] 16-16: ModuleNotFoundError: No module named 'pandas'. Importing test module failed due to missing pandas dependency.

🤖 Prompt for AI Agents
In `@tests/test_runloader_parsers.py` around lines 14 - 17, The test module
imports RunLoader at collection time which pulls in pandas and causes collection
to fail when pandas is missing; fix by making the dependency check explicit
before importing RunLoader (e.g., call pytest.importorskip("pandas") at the top
of the test module) or alternatively move the import of RunLoader inside the
test functions/setup so import happens only when pandas is available; update
tests referencing RunLoader, ParserTestHarness, and SampleSABenchData
accordingly.

@ishandhanani ishandhanani changed the title Rollup Experiments - Part 1: parser for node and benchmakrs. trim aislop add srtlog functionality to vectorize logs Jan 27, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Around line 19-21: Remove "pre-commit" from the runtime dependencies list in
pyproject.toml and add it to the dev-only dependency group instead (e.g., under
[project.optional-dependencies.dev] or [tool.poetry.dev-dependencies] depending
on your project tool), keeping the same version constraint "pre-commit>=4.5.1";
update the dependency arrays so only runtime packages like "pandas" and
"pyarrow" remain in the main dependencies and "pre-commit" is declared under the
dev/dev-dependencies symbol.
- Around line 20-21: pyarrow's version is inconsistent between runtime and
dev/analysis groups and streamlit/plotly are likely misclassified as runtime
deps; update pyproject.toml so pyarrow has a single aligned constraint across
all groups (or document why runtime needs "pyarrow>=23.0.0") and move streamlit
and plotly out of the main runtime dependencies into the analysis extras if they
are only used in analysis workflows (verify imports in
src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py and other src/ files
before moving); ensure pandas stays as a runtime dependency because
benchmark_serving.py imports it.

Comment thread pyproject.toml
Comment on lines +19 to +21
"pre-commit>=4.5.1",
"pandas>=2.1.0",
"pyarrow>=23.0.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🌐 Web query:

pre-commit official documentation installation dev dependency runtime

💡 Result:

  • Install the pre-commit runner (tooling, not app runtime):

    • With pip: pip install pre-commit [1]
    • In a Python project, add pre-commit to your dev requirements (e.g., requirements-dev.txt): the docs explicitly suggest requirements.txt (or requirements-dev.txt) [1]
    • With Homebrew: brew install pre-commit [2]
    • As a zero-dependency zipapp: download the .pyz from GitHub releases and run it with python pre-commit-#.pyz ... [1]
  • Install into the repo (developer workflow): run pre-commit install so hooks run on git commit [1]

  • Dev-dependency vs runtime-dependency:

    • pre-commit is a development/CI tool (it installs and runs git hooks), so it typically belongs in dev dependencies, not your application/library’s runtime dependencies [1].
    • Hook tool runtimes (eslint, black, etc.) are generally not something you add to your project’s runtime deps—pre-commit creates isolated environments for hooks and can even fetch/build required language runtimes when needed (e.g., Node) [1].

Move pre-commit to a dev-only dependency group.

pre-commit is a development and CI tool (not a runtime dependency). The official documentation recommends installing it in dev requirements. Keep it in runtime dependencies bloats end-user installations unnecessarily.

♻️ Proposed change
 dependencies = [
     "pyyaml>=6.0",
     "jinja2>=3.1.0",
     "marshmallow>=3.20.0",
     "marshmallow-dataclass>=8.6.0",
     "requests>=2.31.0",
     "rich>=13.0.0",
     "questionary>=2.0.0",
-    "pre-commit>=4.5.1",
     "pandas>=2.1.0",
     "pyarrow>=23.0.0",
 ]

 [dependency-groups]
 dev = [
     "pytest>=8.0.0",
     "pytest-cov>=4.0",
     "ruff>=0.8.0",
     "ty",  # Astral's fast type checker (replaces mypy)
+    "pre-commit>=4.5.1",
 ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"pre-commit>=4.5.1",
"pandas>=2.1.0",
"pyarrow>=23.0.0",
"pandas>=2.1.0",
"pyarrow>=23.0.0",
🤖 Prompt for AI Agents
In `@pyproject.toml` around lines 19 - 21, Remove "pre-commit" from the runtime
dependencies list in pyproject.toml and add it to the dev-only dependency group
instead (e.g., under [project.optional-dependencies.dev] or
[tool.poetry.dev-dependencies] depending on your project tool), keeping the same
version constraint "pre-commit>=4.5.1"; update the dependency arrays so only
runtime packages like "pandas" and "pyarrow" remain in the main dependencies and
"pre-commit" is declared under the dev/dev-dependencies symbol.

Comment thread pyproject.toml
Comment on lines +20 to +21
"pandas>=2.1.0",
"pyarrow>=23.0.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "pyproject.toml" -type f

Repository: ishandhanani/srt-slurm

Length of output: 82


🏁 Script executed:

cat -n ./pyproject.toml

Repository: ishandhanani/srt-slurm

Length of output: 3825


🏁 Script executed:

rg "import pandas|from pandas|import pyarrow|from pyarrow" --type python -l | head -20

Repository: ishandhanani/srt-slurm

Length of output: 152


🏁 Script executed:

find ./src -name "*.py" -type f | xargs grep -l "pandas\|pyarrow" 2>/dev/null | head -20

Repository: ishandhanani/srt-slurm

Length of output: 127


🏁 Script executed:

find ./src/srtctl -name "*.py" -type f | head -20

Repository: ishandhanani/srt-slurm

Length of output: 816


🏁 Script executed:

grep -r "import pandas\|from pandas\|import pyarrow\|from pyarrow" ./src/srtctl --include="*.py" | grep -v "\.pyc"

Repository: ishandhanani/srt-slurm

Length of output: 147


🏁 Script executed:

grep -r "pyarrow" ./src/srtctl --include="*.py"

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

cat ./src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py | head -50

Repository: ishandhanani/srt-slurm

Length of output: 1400


🏁 Script executed:

grep -A 5 -B 5 "pandas\|pd\." ./src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py | head -80

Repository: ishandhanani/srt-slurm

Length of output: 730


🏁 Script executed:

find ./src/srtctl/benchmarks/scripts -type f -name "*.py" | wc -l

Repository: ishandhanani/srt-slurm

Length of output: 67


🏁 Script executed:

grep -l "pandas\|pyarrow" ./src/srtctl/benchmarks/scripts/sa-bench/*.py

Repository: ishandhanani/srt-slurm

Length of output: 127


🏁 Script executed:

find ./src/srtctl -name "*.py" -type f -exec grep -l "streamlit\|plotly" {} \;

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

grep -r "streamlit\|plotly" ./src --include="*.py"

Repository: ishandhanani/srt-slurm

Length of output: 48


🏁 Script executed:

find . -type d -name "analysis" -o -type d -name "dashboard" | head -10

Repository: ishandhanani/srt-slurm

Length of output: 97


🏁 Script executed:

find ./analysis -name "*.py" -type f | head -20

Repository: ishandhanani/srt-slurm

Length of output: 791


🏁 Script executed:

grep -r "import pandas\|from pandas\|import pyarrow\|from pyarrow" ./analysis --include="*.py" | head -20

Repository: ishandhanani/srt-slurm

Length of output: 506


Fix pyarrow version inconsistency and reconsider streamlit/plotly in runtime deps.

pandas>=2.1.0 is legitimately a runtime dependency (used in src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py), but pyarrow has conflicting version constraints across dependency groups: >=23.0.0 in runtime vs >=14.0.0 in both dev and analysis groups. Align the pyarrow version across all groups or clarify why runtime requires a newer version.

Additionally, streamlit>=1.30.0 and plotly>=6.5.2 appear only in the analysis workflows (no imports found in src/), yet are listed in runtime dependencies. Move these to the analysis group if they're not used by runtime code.

🤖 Prompt for AI Agents
In `@pyproject.toml` around lines 20 - 21, pyarrow's version is inconsistent
between runtime and dev/analysis groups and streamlit/plotly are likely
misclassified as runtime deps; update pyproject.toml so pyarrow has a single
aligned constraint across all groups (or document why runtime needs
"pyarrow>=23.0.0") and move streamlit and plotly out of the main runtime
dependencies into the analysis extras if they are only used in analysis
workflows (verify imports in
src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py and other src/ files
before moving); ensure pandas stays as a runtime dependency because
benchmark_serving.py imports it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants