Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughIntroduces a parser registry and multiple benchmark/node parsers; adds metadata-focused data models (ProfilerMetadata, NodeMetadata, NodeInfo); migrates dashboard uses to Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant RL as RunLoader
participant PR as ParserRegistry
participant BP as BenchmarkParser
participant Cache as ParquetCache
participant NA as NodeAnalyzer
participant NP as NodeParser
User->>RL: load_run(run_path)
RL->>RL: read run metadata (profiler_metadata)
RL->>PR: get_benchmark_parser(profiler_type)
PR-->>RL: parser or None
alt parser found
RL->>BP: find_result_directory(run_path)
BP-->>RL: result_dir
alt cached results exist
RL->>Cache: load cached benchmark results
Cache-->>RL: metrics
else
RL->>BP: parse_result_directory(result_dir)
BP-->>RL: metrics dict
RL->>Cache: store parsed metrics
end
else
RL->>RL: _parse_profiler_results_manual
end
RL->>NA: parse_run_logs(run_path)
NA->>NA: detect backend type
NA->>PR: get_node_parser(backend_type)
PR-->>NA: node parser
NA->>NP: parse_logs(log_dir)
NP-->>NA: NodeInfo list
NA->>Cache: cache node metrics
NA-->>RL: node metrics
RL-->>User: BenchmarkRun with profiler_metadata, ProfilerResults, node metrics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/srtctl/backends/trtllm.py (2)
20-33: Incorrect docstring - references SGLang instead of TRTLLM.The docstring mentions "SGLang server CLI configuration" but this is the TRTLLM backend. This appears to be a copy-paste error.
📝 Proposed fix
`@dataclass`(frozen=True) class TRTLLMServerConfig: - """SGLang server CLI configuration per mode (prefill/decode/aggregated). + """TRTLLM server CLI configuration per mode (prefill/decode/aggregated). Each mode can have its own configuration dict that gets converted to CLI flags when starting the worker. """
135-177: Missing profiling and config dump implementation in TRTLLM backend.The parameters
frontend_type,profiling_enabled,nsys_prefix, anddump_config_pathare part of theBackendProtocolinterface and are actively used in the SGLang backend for profiling support and config dumping. TRTLLM declares these parameters but does not implement them. Either add support for these features (as SGLang does) or document why they are not applicable to TRTLLM.analysis/dashboard/config_tab.py (1)
63-70: Guard profiler metadata access to avoid crashes on runs without metadata.If a run lacks
profiler_metadata(e.g., older logs or profiling disabled), these attribute reads can raise and break the UI. Consider a safe fallback or “N/A” defaults.🛠️ Suggested hardening
- with col3: - st.metric("ISL/OSL", f"{run.profiler_metadata.isl}/{run.profiler_metadata.osl}") - with col4: - gpu_type_suffix = f" ({run.metadata.gpu_type})" if run.metadata.gpu_type else "" - st.metric("Profiler", f"{run.profiler_metadata.profiler_type}{gpu_type_suffix}") + profiler_meta = getattr(run, "profiler_metadata", None) + isl = profiler_meta.isl if profiler_meta else "N/A" + osl = profiler_meta.osl if profiler_meta else "N/A" + profiler_type = profiler_meta.profiler_type if profiler_meta else "N/A" + + with col3: + st.metric("ISL/OSL", f"{isl}/{osl}") + with col4: + gpu_type_suffix = f" ({run.metadata.gpu_type})" if run.metadata.gpu_type else "" + st.metric("Profiler", f"{profiler_type}{gpu_type_suffix}")
🤖 Fix all issues with AI agents
In `@analysis/srtlog/log_parser.py`:
- Around line 68-72: The if-not check after calling
get_node_parser(backend_type) is dead because get_node_parser raises ValueError
when no parser exists; update the code to catch that exception instead: wrap the
call to get_node_parser in a try/except ValueError block, log the warning via
logger.warning including the backend_type and exception message, and return []
from the except branch (remove the unreachable if not parser branch and adjust
any surrounding flow accordingly).
In `@analysis/srtlog/parsers/nodes/trtllm.py`:
- Around line 338-348: The MemoryMetrics instance appended to memory_snapshots
for the kv_cache allocation currently sets timestamp to an empty string; change
this to a non-empty fallback (e.g., "unknown") or prefer a nearby/local
timestamp variable if available (for example use an in-scope variable like
timestamp, ts, or last_log_timestamp) so the kv_cache MemoryMetrics never has an
empty timestamp; update the timestamp argument in the MemoryMetrics constructor
call in the KV allocation block accordingly.
In `@analysis/srtlog/run_loader.py`:
- Around line 244-246: The cache validation currently builds source_patterns as
[f"{result_dir_rel}/*.json"] which only matches top-level JSON files (variables:
result_dir_rel, source_patterns); change it to include recursive matches so
nested results are covered — e.g. use a recursive glob pattern like
f"{result_dir_rel}/**/*.json" or, if available, use the parser-provided patterns
for mooncake-router (e.g. include
"logs/artifacts/**/profile_export_aiperf.json") so cache invalidation detects
nested files under result_dir; update the source_patterns construction
accordingly.
- Around line 417-422: The code currently uses "or" to choose throughput and
concurrency values which treats legitimate zeros as falsy and drops them; update
the assignments for concurrency, out["concurrencies"], out["output_tps"],
out["total_tps"], and out["request_throughput"] to explicitly check for None
(e.g., if data.get("max_concurrency") is not None use it else fall back to
data.get("concurrency") else 0) so that 0.0 is preserved; apply the same
None-aware selection for output_throughput vs output_tps and
total_token_throughput vs total_tps (use explicit is not None checks or
membership checks like "if 'output_throughput' in data and
data['output_throughput'] is not None then ...") and then append the chosen
values to out as before.
- Around line 675-713: The method load_node_metrics is annotated to return
list[NodeMetrics] but currently returns the raw result of
NodeAnalyzer.parse_run_logs (which yields list[NodeInfo]); update
load_node_metrics to extract and return the metrics attribute from each NodeInfo
(e.g., call NodeAnalyzer.parse_run_logs(..., return_dicts=False), assign to a
local variable like node_infos, then return [node.metrics for node in
node_infos]) and ensure load_node_metrics_for_run continues to call
load_node_metrics with the detected backend_type unchanged.
In `@src/srtctl/benchmarks/scripts/gpqa/bench.sh`:
- Around line 29-31: The current code builds a shell string in the variable
command and uses eval which is vulnerable to shell injection; instead build an
argv array and invoke it directly: replace the string assembly using
command="..." with an array (e.g., cmd=(python3 -m sglang.test.run_eval
--base-url "$ENDPOINT" --model "$MODEL_NAME" --eval-name gpqa --num-examples
"$NUM_EXAMPLES" --max-tokens "$MAX_TOKENS" --repeat "$REPEAT" --num-threads
"$NUM_THREADS")), update the echo to print the joined command safely, and run it
with the exec form "${cmd[@]}" (or "${cmd[@]}" in place) to avoid eval and
ensure ENDPOINT, MODEL_NAME and other variables are passed as single safe
arguments.
In `@src/srtctl/benchmarks/scripts/longbenchv2/bench.sh`:
- Around line 31-43: The script currently builds a shell string in the variable
"command" and executes it with eval, which is unsafe and fragile; change to
build the command as an array (e.g., cmd=(python3 -m sglang.test.run_eval
--base-url "$ENDPOINT" --model "$MODEL_NAME" ...)), append optional args to that
array when NUM_EXAMPLES or CATEGORIES are non-empty, and then execute the array
directly (no eval) so arguments with spaces are preserved; replace the echo
"[CMD] $command" with a safe printable representation (use printf with proper
quoting or join the array) to log the invoked command.
In `@src/srtctl/benchmarks/scripts/mmlu/bench.sh`:
- Around line 29-31: The current use of a single string variable named command
and eval to run the benchmark is unsafe; instead build the invocation as an
argument array (e.g., cmd) with each token quoted (include the module
sglang.test.run_eval and flags like --base-url, --model, --num-examples,
--max-tokens, --repeat, --num-threads as separate array elements) and execute it
with the array expansion form to avoid shell interpretation of metacharacters
(also print a safe, escaped representation of the array for logging before
running). Ensure you replace usages of command and eval with the new array
variable (cmd) and use "${cmd[@]}" to run it.
In `@src/srtctl/benchmarks/scripts/profiling/profile.sh`:
- Around line 133-141: The script currently builds command strings (variables
named command) and runs them with eval which is unsafe; refactor both
invocations (the python3 sglang.bench_serving call and the python -m lm_eval
call) to construct command arrays (e.g., cmd=(python3 -m sglang.bench_serving
... ) and lm_cmd=(python -m lm_eval ... )) using properly quoted variables like
${model_name}, ${head_node}, ${head_port}, ${PROFILE_CONCURRENCY},
${PROFILE_ISL}, ${PROFILE_OSL}, then log them with printf or echo using a safely
joined representation and execute with "${cmd[@]}" and "${lm_cmd[@]}" instead of
eval; ensure any embedded complex arg (the --model_args JSON-like string) is
provided as a single array element so it is not split.
In `@src/srtctl/benchmarks/scripts/sa-bench/bench.sh`:
- Around line 54-57: Replace the empty --num-prompts and unsafe eval usage in
the warmup invocation by using the computed num_warmup_prompts variable and
executing the command without eval; specifically, update the warmup block so the
generated command uses --num-prompts "$num_warmup_prompts" (not $num_prompts)
and build the invocation as a shell array (e.g., cmd=(python3 -u
"${WORK_DIR}/benchmark_serving.py" --model "${MODEL_NAME}" ... --num-prompts
"$num_warmup_prompts" ...)) then run it with "${cmd[@]}", and apply the same fix
(use proper num_prompts variable and array execution) to the other occurrence at
the 71-73 block.
In `@tests/fixtures_parsers.py`:
- Line 42: The fixture contains a JSON-unsafe value: the "request_rate" key is
set to float("inf"), which will break json.dump(); change the fixture so
"request_rate" is either a string (e.g., "Infinity") or a finite numeric
sentinel (e.g., a very large float like 1e308) depending on downstream
expectations — locate the occurrence of request_rate = float("inf") in the
fixtures_parsers data structure and replace it accordingly and update any tests
that assert numeric type if you switch to a string.
- Around line 343-345: The assertion logic allows empty lists despite the
comment; update the check around `value` (from `results[field]`) so it asserts
`value` is not None and, if `value` is a list, that it is non-empty (i.e.,
assert not None AND (not a list OR list length > 0)); keep the failure message
referencing `field` and adjust wording if desired.
🧹 Nitpick comments (18)
src/srtctl/benchmarks/scripts/router/bench.sh (1)
43-45: Avoidevalwhen direct execution suffices.Using
evalhere is unnecessary and introduces potential for subtle bugs or injection issues if any variable contains shell metacharacters. You can achieve the same logging effect by echoing the command separately and executing directly.♻️ Suggested refactor
# shellcheck disable=SC2086 -command="python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir" -echo "[CMD] $command" -eval "$command" +echo "[CMD] python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl $ISL --osl $OSL --requests $REQUESTS --concurrency $CONCURRENCY --output-dir $result_dir" +python prefix_ratio_benchmark.py --prefix-ratios $PREFIX_RATIOS --isl "$ISL" --osl "$OSL" --requests "$REQUESTS" --concurrency "$CONCURRENCY" --output-dir "$result_dir"This approach:
- Logs the command for visibility (same as before)
- Executes directly without
eval- Quotes scalar variables (
$ISL,$OSL, etc.) while leaving$PREFIX_RATIOSunquoted for intended word splittinganalysis/srtlog/models.py (6)
145-177: Make ProfilerMetadata immutable.This is configuration metadata and should be frozen to prevent accidental mutation.
As per coding guidelines, use
@dataclass(frozen=True)for configuration objects.♻️ Proposed change
-@dataclass +@dataclass(frozen=True) class ProfilerMetadata:
288-300: Make BenchmarkLaunchCommand immutable.This is configuration/command metadata and should be frozen.
As per coding guidelines, use
@dataclass(frozen=True)for configuration objects.♻️ Proposed change
-@dataclass +@dataclass(frozen=True) class BenchmarkLaunchCommand:
440-449: Make NodeMetadata immutable.This is identification/config metadata and should be frozen.
As per coding guidelines, use
@dataclass(frozen=True)for configuration objects.♻️ Proposed change
-@dataclass +@dataclass(frozen=True) class NodeMetadata:
454-489: Tighten type for runtime config.Use
dict[str, Any]forconfigto keep type hints consistent.As per coding guidelines, use explicit type hints for configuration structures.
♻️ Proposed change
- config: dict = field(default_factory=dict) # Runtime config: TP/PP/EP, batch sizes, etc. + config: dict[str, Any] = field(default_factory=dict) # Runtime config: TP/PP/EP, batch sizes, etc.
492-505: Make NodeLaunchCommand immutable.Launch command metadata should be frozen to avoid accidental mutation.
As per coding guidelines, use
@dataclass(frozen=True)for configuration objects.♻️ Proposed change
-@dataclass +@dataclass(frozen=True) class NodeLaunchCommand:
623-633: Parameterizeexplicit_flagstype.Use
set[str]to keep typing precise.As per coding guidelines, use explicit type parameters for collections.
♻️ Proposed change
- explicit_flags: set + explicit_flags: set[str]analysis/srtlog/parsers/benchmark/__init__.py (1)
1-7: Add SPDX license header for consistency.This file is missing the SPDX license header that's present in the sibling
nodes/__init__.pyfile. Adding it ensures consistency across the parser package.Suggested fix
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 """Benchmark output parsers.""" from analysis.srtlog.parsers.benchmark.mooncake_router import MooncakeRouterParser from analysis.srtlog.parsers.benchmark.sa_bench import SABenchParser __all__ = ["SABenchParser", "MooncakeRouterParser"]src/srtctl/benchmarks/scripts/mooncake-router/bench.sh (2)
59-61: Consider avoidingevalfor command execution.Using
evalwith interpolated variables can execute arbitrary code if inputs contain shell metacharacters. While this is likely run in controlled environments, a safer pattern would use arrays:Safer alternative using array execution
-command="aiperf profile -m ${MODEL_NAME} --url ${ENDPOINT} --streaming --ui simple --concurrency 10 --request-count 20" -echo "[CMD-WARMUP] $command" -eval "$command" +warmup_args=( + aiperf profile + -m "${MODEL_NAME}" + --url "${ENDPOINT}" + --streaming + --ui simple + --concurrency 10 + --request-count 20 +) +echo "[CMD-WARMUP] ${warmup_args[*]}" +"${warmup_args[@]}"
77-79: Sameevalconcern applies to the main benchmark command.The goodput argument with embedded spaces makes this trickier, but arrays handle it correctly:
Safer alternative
-command="aiperf profile -m ${MODEL_NAME} --input-file ${INPUT_FILE} --custom-dataset-type mooncake_trace --fixed-schedule --url ${ENDPOINT} --streaming --random-seed 42 --ui simple --artifact-dir ${RUN_ARTIFACT_DIR} --goodput \"time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}\"" -echo "[CMD] $command" -eval "$command" +bench_args=( + aiperf profile + -m "${MODEL_NAME}" + --input-file "${INPUT_FILE}" + --custom-dataset-type mooncake_trace + --fixed-schedule + --url "${ENDPOINT}" + --streaming + --random-seed 42 + --ui simple + --artifact-dir "${RUN_ARTIFACT_DIR}" + --goodput "time_to_first_token:${TTFT_THRESHOLD} inter_token_latency:${ITL_THRESHOLD}" +) +echo "[CMD] ${bench_args[*]}" +"${bench_args[@]}"Note: If the
[CMD]logging format must be preserved exactly for parser compatibility, you may need to adjust the echo accordingly.tests/test_runloader_parsers.py (1)
24-58: Add fixture return type hints.The fixtures are untyped; please add return annotations to align with the project’s typing standard.
As per coding guidelines, add type hints consistently across new Python code.♻️ Suggested update
+from typing import Any @@ - def temp_dir(self): + def temp_dir(self) -> Path: @@ - def sample_run_metadata(self): + def sample_run_metadata(self) -> dict[str, Any]:analysis/srtlog/parsers/nodes/trtllm.py (3)
27-28: Consider extractingANSI_ESCAPEpattern to a shared utility.The
ANSI_ESCAPEregex pattern is duplicated across both SGLang and TRTLLM parsers. Consider moving it to a shared utility module (e.g.,analysis/srtlog/parsers/utils.py).
138-165: Document the override behavior for config values.The TensorRT-LLM engine args section (lines 138-164) can override values previously extracted from
Config()dump (lines 108-130). If this is intentional because engine args are more authoritative, consider adding a brief comment clarifying this precedence.
352-367: Duplicated method - consider extracting to a shared utility.The
_extract_node_info_from_filenamemethod is identical to the one insglang.py. Consider moving this to a shared utility module to reduce code duplication.analysis/srtlog/log_parser.py (4)
188-188: Moveimport osto the top of the file.The
osmodule is imported inside the method, but it's a standard library module that's commonly needed. It should be imported at the top of the file with other imports for consistency.Suggested fix
Add at top of file with other imports:
import osThen remove line 188.
268-313: Consider using DEBUG level for environment variable loading logs.Lines 297, 302, and 307 use
logger.info()for logging the number of env vars loaded. This might be too verbose for normal operation. Consider usinglogger.debug()instead.Suggested fix
if "prefill_environment" in backend: env_vars["prefill"] = backend["prefill_environment"] - logger.info(f"Loaded {len(env_vars['prefill'])} prefill env vars from config.yaml") + logger.debug(f"Loaded {len(env_vars['prefill'])} prefill env vars from config.yaml") # Extract decode_environment if "decode_environment" in backend: env_vars["decode"] = backend["decode_environment"] - logger.info(f"Loaded {len(env_vars['decode'])} decode env vars from config.yaml") + logger.debug(f"Loaded {len(env_vars['decode'])} decode env vars from config.yaml") # Extract agg_environment if present if "agg_environment" in backend: env_vars["agg"] = backend["agg_environment"] - logger.info(f"Loaded {len(env_vars['agg'])} agg env vars from config.yaml") + logger.debug(f"Loaded {len(env_vars['agg'])} agg env vars from config.yaml")
331-331: Misleading key name:prefill_batchescontains all batches, not just prefill.The key
"prefill_batches"is used to store all batches regardless of worker type. If this is for backward compatibility, consider adding a comment. Otherwise, consider renaming to"batches".Suggested fix if backward compatibility isn't required
- "prefill_batches": metrics.batches, # Keep as list of BatchMetrics objects + "batches": metrics.batches, # Keep as list of BatchMetrics objects
478-479: Remove redundant imports inside method.
timeis already imported at the top of the file (line 9). The models import could also be at the top level. Consider removing these local imports.Suggested fix
- import time - from .models import BatchMetrics, MemoryMetrics, NodeInfo, NodeMetadata, NodeMetrics + from .models import BatchMetrics, MemoryMetrics, NodeMetadata, NodeMetricsNote:
NodeInfois already imported at the top of the file (line 16). OnlyBatchMetrics,MemoryMetrics,NodeMetadata, andNodeMetricsneed to be imported here (or better, at the top of the file).
| memory_snapshots.append( | ||
| MemoryMetrics( | ||
| timestamp="", | ||
| dp=0, | ||
| tp=0, | ||
| ep=0, | ||
| metric_type="kv_cache", | ||
| kv_cache_gb=kv_gb, | ||
| kv_tokens=max_tokens, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Empty timestamp in KV allocation metrics may cause issues downstream.
The MemoryMetrics for KV cache allocation (line 340) uses an empty string for timestamp. This could cause issues if downstream code expects valid timestamps. Consider using "unknown" or the timestamp from nearby log lines if available.
Suggested fix
memory_snapshots.append(
MemoryMetrics(
- timestamp="",
+ timestamp="unknown", # KV allocation logs don't include timestamp
dp=0,
tp=0,
ep=0,🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/nodes/trtllm.py` around lines 338 - 348, The
MemoryMetrics instance appended to memory_snapshots for the kv_cache allocation
currently sets timestamp to an empty string; change this to a non-empty fallback
(e.g., "unknown") or prefer a nearby/local timestamp variable if available (for
example use an in-scope variable like timestamp, ts, or last_log_timestamp) so
the kv_cache MemoryMetrics never has an empty timestamp; update the timestamp
argument in the MemoryMetrics constructor call in the KV allocation block
accordingly.
| "total_token_throughput": concurrency * 60.0, | ||
| "request_throughput": concurrency * 0.5, | ||
| "request_goodput": concurrency * 0.48, | ||
| "request_rate": float("inf"), |
There was a problem hiding this comment.
float("inf") is not valid JSON and will cause serialization issues.
The request_rate: float("inf") value will fail when serialized to JSON via json.dump() since JSON doesn't support Infinity. Consider using a string representation or a very large number.
Suggested fix
- "request_rate": float("inf"),
+ "request_rate": "inf", # String representation for JSON compatibilityOr if numeric is required:
- "request_rate": float("inf"),
+ "request_rate": 1e308, # Very large number as proxy for infinity📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "request_rate": float("inf"), | |
| "request_rate": "inf", # String representation for JSON compatibility |
🤖 Prompt for AI Agents
In `@tests/fixtures_parsers.py` at line 42, The fixture contains a JSON-unsafe
value: the "request_rate" key is set to float("inf"), which will break
json.dump(); change the fixture so "request_rate" is either a string (e.g.,
"Infinity") or a finite numeric sentinel (e.g., a very large float like 1e308)
depending on downstream expectations — locate the occurrence of request_rate =
float("inf") in the fixtures_parsers data structure and replace it accordingly
and update any tests that assert numeric type if you switch to a string.
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@analysis/docs/ARCHITECTURE_DATAFLOW.md`:
- Line 10: Update the fenced code blocks listed (lines 10, 32, 67, 123, 156,
254, 325, 367, 417, 485, 513, 565) to include language identifiers to satisfy
markdownlint MD040; use "text" for diagrams/ASCII art and "python" (or the
appropriate language) for code snippets so each triple-backtick fence becomes
```text or ```python as applicable.
In `@analysis/srtlog/parsers/benchmark/sa_bench.py`:
- Around line 167-186: The fallback path currently takes the list-valued output
from parse() (fallback_result) and stuffs entire lists into a single dict,
losing per-concurrency entries; change the fallback to normalize into one scalar
result per concurrency by iterating over the length of
fallback_result["concurrencies"] (or the longest list among
output_tps/concurrencies) and appending a dict for each index with scalar fields
(max_concurrency from the specific concurrency value, output_tps[index],
request_throughput[index], mean_ttft_ms[index], mean_itl_ms[index],
mean_tpot_ms[index], p99_ttft_ms[index], p99_itl_ms[index], completed from
completed_requests[index] or a sensible default) so downstream consumers receive
the same per-concurrency format as when JSON results are present; use the
existing symbols parse, fallback_result, and results to locate and implement
this loop-based normalization.
In `@analysis/srtlog/run_loader.py`:
- Around line 713-733: In load_node_metrics_for_run, guard against
run.metadata.container being None by defaulting it to an empty string before
calling .lower(); specifically, change how the local variable container is
derived (from run.metadata.container) so it becomes (empty string if None) then
.lower(), keep the same backend_type detection logic (checking for "trtllm",
"dynamo", "tensorrt", "sglang") and then call
self.load_node_metrics(run.metadata.path, backend_type) as before.
- Around line 692-712: The load_node_metrics function currently ignores its
backend_type parameter; update it to pass that override through to
NodeAnalyzer.parse_run_logs so callers can force a backend when auto-detection
fails (e.g., call analyzer.parse_run_logs(run_path, return_dicts=False,
backend_type=backend_type) or the appropriate override parameter name used by
parse_run_logs), while allowing parse_run_logs to still fall back to
auto-detection when the override is None/empty; this change affects
load_node_metrics and ensures callers like load_node_metrics_for_run can force
parsing for a specific backend.
In `@rollup_sa_bench_553.json`:
- Around line 7-12: The fixture metadata contains host-specific absolute paths
(e.g., the "container" field
"/home/nlevin/containers/jwillthomson_dynamo-0.8.0_... .sqsh" and potentially
other fields like "model_dir"), so replace these with redacted or portable
values (for example a neutral token like "container": "<redacted-container>" or
a relative/portable path) and ensure any other absolute paths in the JSON are
similarly replaced; update the "container" value and any host-specific entries
so the fixture contains no usernames or host-specific directories and remains
portable across environments.
- Line 5: The JSON timestamps are inconsistent (run_date, batches[].timestamp,
memory_snapshots[].timestamp) and cause parsing failures; update the generator
to emit ISO 8601 timestamps (e.g., 2026-01-15T07:14:56Z or without Z if local)
for "run_date" and every batches[].timestamp and replace empty
memory_snapshots[].timestamp values with null, and then update the parsers that
call datetime.strptime(..., "%Y-%m-%d %H:%M:%S") (the parsing calls in
analysis/srtlog/visualizations.py and analysis/dashboard/rate_match_tab.py) to
accept ISO 8601 (use datetime.fromisoformat or a robust parser like
dateutil.parser.parse) and explicitly handle null timestamps instead of silently
skipping empty strings.
- Around line 17-32: The profiler_metadata.concurrencies field is set to an
empty string which causes check_completeness() to treat the run as manual and
skip validation; update profiler_metadata.concurrencies to reflect the actual
concurrency used (e.g., "32") or set it to null if it was not explicitly
configured so that check_completeness() correctly recognizes and validates the
benchmark results; ensure the value type matches expected schema (string or
null) and corresponds to the concurrency_values entry.
🧹 Nitpick comments (1)
analysis/srtlog/log_parser.py (1)
176-268: Refactor config loading to use marshmallow_dataclass schemas for runtime validation.The current code loads JSON/YAML configs with
json.load()andyaml.safe_load()without validation, relying only on TypedDict for type hints (which provide no runtime protection). NodeConfig and GPUInfo should be converted tomarshmallow_dataclassdataclasses to match the project's established patterns (seeSrtConfig.from_yaml()inschema.pyand validation patterns inconfig.py). This ensures config structure is validated before merging and prevents malformed shapes from silently affecting node_config.
|
|
||
| ## 1. Entry Point: RunLoader | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks.
markdownlint (MD040) requires language tags; consider text for diagrams and python for code examples (e.g., Line 10, Line 32, Line 417).
Also applies to: 32-32, 67-67, 123-123, 156-156, 254-254, 325-325, 367-367, 417-417, 485-485, 513-513, 565-565
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
10-10: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@analysis/docs/ARCHITECTURE_DATAFLOW.md` at line 10, Update the fenced code
blocks listed (lines 10, 32, 67, 123, 156, 254, 325, 367, 417, 485, 513, 565) to
include language identifiers to satisfy markdownlint MD040; use "text" for
diagrams/ASCII art and "python" (or the appropriate language) for code snippets
so each triple-backtick fence becomes ```text or ```python as applicable.
| # Fallback: If no JSON results found, try parsing benchmark.out | ||
| if not results: | ||
| benchmark_out = result_dir / "benchmark.out" | ||
| if benchmark_out.exists(): | ||
| logger.info(f"No JSON results found in {result_dir}, falling back to benchmark.out parsing") | ||
| # Parse benchmark.out and create a single result entry | ||
| fallback_result = self.parse(benchmark_out) | ||
| if fallback_result.get("output_tps"): | ||
| # Wrap in list format expected by caller | ||
| results.append({ | ||
| "max_concurrency": fallback_result.get("concurrencies", [0])[0] if fallback_result.get("concurrencies") else 0, | ||
| "output_tps": fallback_result.get("output_tps"), | ||
| "request_throughput": fallback_result.get("request_throughput"), | ||
| "mean_ttft_ms": fallback_result.get("mean_ttft_ms"), | ||
| "mean_itl_ms": fallback_result.get("mean_itl_ms"), | ||
| "mean_tpot_ms": fallback_result.get("mean_tpot_ms"), | ||
| "p99_ttft_ms": fallback_result.get("p99_ttft_ms"), | ||
| "p99_itl_ms": fallback_result.get("p99_itl_ms"), | ||
| "completed": fallback_result.get("completed_requests"), | ||
| }) |
There was a problem hiding this comment.
Fallback results are list-valued and collapse multi-concurrency data.
parse() returns lists per concurrency, but the fallback path wraps those lists into a single result dict (Line 173–186). This yields list-valued fields and drops additional concurrency entries, which can break downstream consumers expecting one scalar result per concurrency.
🔧 Suggested fix: normalize fallback into per-concurrency entries
- if fallback_result.get("output_tps"):
- # Wrap in list format expected by caller
- results.append({
- "max_concurrency": fallback_result.get("concurrencies", [0])[0] if fallback_result.get("concurrencies") else 0,
- "output_tps": fallback_result.get("output_tps"),
- "request_throughput": fallback_result.get("request_throughput"),
- "mean_ttft_ms": fallback_result.get("mean_ttft_ms"),
- "mean_itl_ms": fallback_result.get("mean_itl_ms"),
- "mean_tpot_ms": fallback_result.get("mean_tpot_ms"),
- "p99_ttft_ms": fallback_result.get("p99_ttft_ms"),
- "p99_itl_ms": fallback_result.get("p99_itl_ms"),
- "completed": fallback_result.get("completed_requests"),
- })
+ if fallback_result.get("output_tps"):
+ concurrencies = fallback_result.get("concurrencies") or [0]
+ output_tps = fallback_result.get("output_tps") or []
+ request_throughput = fallback_result.get("request_throughput") or []
+ mean_ttft = fallback_result.get("mean_ttft_ms") or []
+ mean_itl = fallback_result.get("mean_itl_ms") or []
+ mean_tpot = fallback_result.get("mean_tpot_ms") or []
+ p99_ttft = fallback_result.get("p99_ttft_ms") or []
+ p99_itl = fallback_result.get("p99_itl_ms") or []
+ completed = fallback_result.get("completed_requests") or []
+
+ for idx, concurrency in enumerate(concurrencies):
+ results.append({
+ "max_concurrency": concurrency,
+ "output_throughput": output_tps[idx] if idx < len(output_tps) else None,
+ "request_throughput": request_throughput[idx] if idx < len(request_throughput) else None,
+ "mean_ttft_ms": mean_ttft[idx] if idx < len(mean_ttft) else None,
+ "mean_itl_ms": mean_itl[idx] if idx < len(mean_itl) else None,
+ "mean_tpot_ms": mean_tpot[idx] if idx < len(mean_tpot) else None,
+ "p99_ttft_ms": p99_ttft[idx] if idx < len(p99_ttft) else None,
+ "p99_itl_ms": p99_itl[idx] if idx < len(p99_itl) else None,
+ "completed": completed[idx] if idx < len(completed) else None,
+ })🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/benchmark/sa_bench.py` around lines 167 - 186, The
fallback path currently takes the list-valued output from parse()
(fallback_result) and stuffs entire lists into a single dict, losing
per-concurrency entries; change the fallback to normalize into one scalar result
per concurrency by iterating over the length of fallback_result["concurrencies"]
(or the longest list among output_tps/concurrencies) and appending a dict for
each index with scalar fields (max_concurrency from the specific concurrency
value, output_tps[index], request_throughput[index], mean_ttft_ms[index],
mean_itl_ms[index], mean_tpot_ms[index], p99_ttft_ms[index], p99_itl_ms[index],
completed from completed_requests[index] or a sensible default) so downstream
consumers receive the same per-concurrency format as when JSON results are
present; use the existing symbols parse, fallback_result, and results to locate
and implement this loop-based normalization.
| def load_node_metrics(self, run_path: str, backend_type: str = "sglang") -> list[NodeMetrics]: | ||
| """Load node metrics from worker log files using NodeAnalyzer. | ||
|
|
||
| Args: | ||
| run_path: Path to the run directory | ||
| backend_type: Backend type (sglang or trtllm) - deprecated, auto-detected | ||
|
|
||
| Returns: | ||
| List of NodeMetrics objects, one per worker | ||
| """ | ||
| # Handle both relative and absolute paths | ||
| if not os.path.isabs(run_path): | ||
| run_path = os.path.join(self.logs_dir, run_path) | ||
|
|
||
| # Use NodeAnalyzer which handles caching, backend detection, and config loading | ||
| analyzer = NodeAnalyzer() | ||
| node_infos = analyzer.parse_run_logs(run_path, return_dicts=False) | ||
|
|
||
| # Extract only the metrics from each NodeInfo | ||
| return [node.metrics for node in node_infos] | ||
|
|
There was a problem hiding this comment.
Honor backend_type override to avoid detection failures.
load_node_metrics ignores its backend_type argument. If auto-detection fails, callers (including load_node_metrics_for_run) have no way to force the parser, leading to empty results. Consider passing the override through to NodeAnalyzer.parse_run_logs (and letting that method fall back to detection).
🐛 Proposed fix
- def parse_run_logs(self, run_path: str, return_dicts: bool = False) -> list:
+ def parse_run_logs(
+ self,
+ run_path: str,
+ return_dicts: bool = False,
+ backend_type: str | None = None,
+ ) -> list:
@@
- backend_type = self._detect_backend_type(run_path)
+ backend_type = backend_type or self._detect_backend_type(run_path)- node_infos = analyzer.parse_run_logs(run_path, return_dicts=False)
+ node_infos = analyzer.parse_run_logs(run_path, return_dicts=False, backend_type=backend_type)🤖 Prompt for AI Agents
In `@analysis/srtlog/run_loader.py` around lines 692 - 712, The load_node_metrics
function currently ignores its backend_type parameter; update it to pass that
override through to NodeAnalyzer.parse_run_logs so callers can force a backend
when auto-detection fails (e.g., call analyzer.parse_run_logs(run_path,
return_dicts=False, backend_type=backend_type) or the appropriate override
parameter name used by parse_run_logs), while allowing parse_run_logs to still
fall back to auto-detection when the override is None/empty; this change affects
load_node_metrics and ensures callers like load_node_metrics_for_run can force
parsing for a specific backend.
| def load_node_metrics_for_run(self, run: BenchmarkRun) -> list[NodeMetrics]: | ||
| """Load node metrics for a BenchmarkRun. | ||
|
|
||
| Automatically detects backend type from the run's container image. | ||
|
|
||
| Args: | ||
| run: BenchmarkRun object | ||
|
|
||
| Returns: | ||
| List of NodeMetrics objects | ||
| """ | ||
| # Detect backend type from container | ||
| backend_type = "sglang" # Default | ||
| container = run.metadata.container.lower() | ||
|
|
||
| if "trtllm" in container or "dynamo" in container or "tensorrt" in container: | ||
| backend_type = "trtllm" | ||
| elif "sglang" in container: | ||
| backend_type = "sglang" | ||
|
|
||
| return self.load_node_metrics(run.metadata.path, backend_type) |
There was a problem hiding this comment.
Guard against missing container values.
run.metadata.container can be None for older runs, so .lower() would raise. Default to an empty string.
🐛 Proposed fix
- container = run.metadata.container.lower()
+ container = (run.metadata.container or "").lower()🤖 Prompt for AI Agents
In `@analysis/srtlog/run_loader.py` around lines 713 - 733, In
load_node_metrics_for_run, guard against run.metadata.container being None by
defaulting it to an empty string before calling .lower(); specifically, change
how the local variable container is derived (from run.metadata.container) so it
becomes (empty string if None) then .lower(), keep the same backend_type
detection logic (checking for "trtllm", "dynamo", "tensorrt", "sglang") and then
call self.load_node_metrics(run.metadata.path, backend_type) as before.
| "profiler_metadata": { | ||
| "profiler_type": "sa-bench", | ||
| "isl": "8192", | ||
| "osl": "1024", | ||
| "concurrencies": "" | ||
| }, | ||
| "profiler_results": { | ||
| "output_tps": [ | ||
| 1748.0954656979873 | ||
| ], | ||
| "request_throughput": [ | ||
| 1.9035306394778506 | ||
| ], | ||
| "concurrency_values": [ | ||
| 32 | ||
| ], |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate SA-Bench parser/schema expectations for "concurrencies".
rg -n "SABench|sa-bench|SABenchParser|profiler_metadata|concurrenc" -g '*.py' | head -50Repository: ishandhanani/srt-slurm
Length of output: 4409
🏁 Script executed:
#!/bin/bash
# Also search for JSON schema files or related metadata definitions
fd -e schema -o -e json -o -e py | xargs rg -l "profiler_metadata|sa-bench" 2>/dev/null | head -20Repository: ishandhanani/srt-slurm
Length of output: 1383
🏁 Script executed:
#!/bin/bash
# Look for the specific file being reviewed
fd "rollup_sa_bench_553.json" 2>/dev/nullRepository: ishandhanani/srt-slurm
Length of output: 90
🏁 Script executed:
#!/bin/bash
# Find the SA-Bench parser implementation
fd -e py | xargs rg -l "class.*SABench.*Parser|def.*profiler_metadata" | head -10Repository: ishandhanani/srt-slurm
Length of output: 169
🏁 Script executed:
#!/bin/bash
# Look for the fixtures_parsers module
fd "fixtures_parsers.py"Repository: ishandhanani/srt-slurm
Length of output: 91
🏁 Script executed:
#!/bin/bash
# Check the actual rollup file contents to see the full structure
cat rollup_sa_bench_553.json | head -40Repository: ishandhanani/srt-slurm
Length of output: 1007
🏁 Script executed:
#!/bin/bash
# Check the SA-Bench parser implementation
cat analysis/srtlog/parsers/benchmark/sa_bench.pyRepository: ishandhanani/srt-slurm
Length of output: 14728
🏁 Script executed:
#!/bin/bash
# Check the fixtures to understand expected format and SampleSABenchData
rg -A 20 "class SampleSABenchData|def result_json" tests/fixtures_parsers.py | head -80Repository: ishandhanani/srt-slurm
Length of output: 1895
🏁 Script executed:
#!/bin/bash
# Search for code that creates or validates the rollup structure
fd -e py | xargs rg -l "profiler_metadata|profiler_results" | grep -v test | grep -v fixturesRepository: ishandhanani/srt-slurm
Length of output: 256
🏁 Script executed:
#!/bin/bash
# Search for where concurrencies field is set/populated
rg -n "concurrencies.*=" -g '*.py' | grep -v test | grep -v "concurrency_values"Repository: ishandhanani/srt-slurm
Length of output: 1023
🏁 Script executed:
#!/bin/bash
# Look for rollup generation code
fd -e py | xargs rg -l "rollup" | grep -v testRepository: ishandhanani/srt-slurm
Length of output: 111
🏁 Script executed:
#!/bin/bash
# Check the ProfilerMetadata class definition in models.py
rg -B 5 -A 15 "class ProfilerMetadata" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 573
🏁 Script executed:
#!/bin/bash
# Check how concurrencies is populated in rollup building code
rg -B 5 -A 10 "concurrencies.*profiler" analysis/srtlog/run_loader.pyRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check the schema definition for concurrencies more carefully
rg -B 5 -A 5 "concurrencies:" src/srtctl/core/schema.pyRepository: ishandhanani/srt-slurm
Length of output: 438
🏁 Script executed:
#!/bin/bash
# Check how profiler_metadata is created/populated when building the rollup
rg -B 10 -A 10 "ProfilerMetadata\|profiler_metadata" analysis/srtlog/run_loader.py | head -80Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if there's any validation or schema checking for profiler_metadata
rg -n "validate|schema|check.*concurrenc" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Get a broader view of ProfilerMetadata usage in run_loader
cat analysis/srtlog/run_loader.py | grep -n "profiler_metadata\|ProfilerMetadata" -B 3 -A 3Repository: ishandhanani/srt-slurm
Length of output: 1629
🏁 Script executed:
#!/bin/bash
# Check the entire ProfilerMetadata from_json method
rg -A 20 "class ProfilerMetadata" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 578
🏁 Script executed:
#!/bin/bash
# Get the complete from_json method for ProfilerMetadata
rg -A 30 "def from_json.*ProfilerMetadata" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 1188
🏁 Script executed:
#!/bin/bash
# Search for where the JSON is loaded and ProfilerMetadata is created
rg -B 5 -A 10 "profiler_metadata.*from_json\|ProfilerMetadata.from_json" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check how concurrencies field is used in consumers/dashboards
rg -n "\.concurrencies\|concurrenc" analysis/dashboard/*.py | head -30Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if there are any data validation or normalization functions
rg -B 3 -A 10 "def.*normalize|def.*validate.*profiler\|def.*process.*metadata" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check how ProfilerResults.concurrency_values is populated
rg -B 5 -A 15 "concurrency_values" analysis/srtlog/models.py | head -50Repository: ishandhanani/srt-slurm
Length of output: 2283
🏁 Script executed:
#!/bin/bash
# Look for any place where profiler_metadata.concurrencies is actually used or validated
rg "profiler_metadata\.concurrenc|\.concurrencies" --type py | grep -v test | grep -v "^Binary"Repository: ishandhanani/srt-slurm
Length of output: 900
🏁 Script executed:
#!/bin/bash
# Check the code that uses profiler_metadata.concurrencies
rg -B 5 -A 10 "if not self.profiler_metadata.concurrencies" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 687
🏁 Script executed:
#!/bin/bash
# Check BenchmarkRun to see the full context
rg -B 10 -A 20 "class BenchmarkRun" analysis/srtlog/models.py | head -60Repository: ishandhanani/srt-slurm
Length of output: 992
🏁 Script executed:
#!/bin/bash
# Check the complete validate_completeness method
rg -B 2 -A 25 "def validate_completeness" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if there are any downstream consumers that depend on concurrencies being populated
rg "profiler_metadata\.concurrencies" analysis/srtlog/*.py analysis/dashboard/*.py 2>/dev/nullRepository: ishandhanani/srt-slurm
Length of output: 238
🏁 Script executed:
#!/bin/bash
# Verify the complete validate_completeness logic to confirm the issue
rg -B 5 -A 35 "def validate_completeness" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Search for validate_completeness more broadly
rg "validate_completeness" analysis/srtlog/models.py -A 30Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Also check for where is_complete and missing_concurrencies are set
rg -n "is_complete|missing_concurrencies" analysis/srtlog/models.py | head -20Repository: ishandhanani/srt-slurm
Length of output: 429
🏁 Script executed:
#!/bin/bash
# Get the full method context around line 366-389
sed -n '360,395p' analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 1243
🏁 Script executed:
#!/bin/bash
# Check how the JSON file is created/populated - look at any rollup generation code
fd -e py | xargs rg -l "rollup.*json|json.*rollup" | grep -v test | head -5Repository: ishandhanani/srt-slurm
Length of output: 48
Fix profiler_metadata.concurrencies to align with actual results.
The empty string "" in concurrencies while concurrency_values contains [32] breaks the data validation logic. When concurrencies is empty, the check_completeness() method treats the run as manual and skips validation entirely, even though actual benchmark results exist. Either populate concurrencies with the actual value (e.g., "32") to match the results, or set it to null if the configuration was not explicitly specified.
🤖 Prompt for AI Agents
In `@rollup_sa_bench_553.json` around lines 17 - 32, The
profiler_metadata.concurrencies field is set to an empty string which causes
check_completeness() to treat the run as manual and skip validation; update
profiler_metadata.concurrencies to reflect the actual concurrency used (e.g.,
"32") or set it to null if it was not explicitly configured so that
check_completeness() correctly recognizes and validates the benchmark results;
ensure the value type matches expected schema (string or null) and corresponds
to the concurrency_values entry.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@analysis/srtlog/parsers/nodes/sglang.py`:
- Around line 233-240: The _parse_timestamp function's regex requires fractional
seconds so timestamps like "2025-12-30T15:52:38Z" are missed; update the regex
in _parse_timestamp to make the fractional-seconds part optional (use a
non-capturing optional group like (?:\.\d+)? ) while preserving the optional
trailing Z, so it matches both "YYYY-MM-DDTHH:MM:SSZ" and
"YYYY-MM-DDTHH:MM:SS.sssZ"; keep returning match.group(1) as before.
- Around line 116-197: The file read using open(log_path) can raise
UnicodeDecodeError; replace the open loop with a tolerant read (e.g.,
Path(log_path).read_text(errors="replace") or similar) and iterate over the
returned string's lines so parsing still runs even with invalid bytes; update
the code that currently appends to full_content and feeds clean_line into
_parse_prefill_batch_line, _parse_decode_batch_line, _parse_memory_line and the
subsequent tp/dp/ep regex extraction to use the decoded text lines, then call
parse_launch_command("".join(full_content), node_info["worker_type"]) as before.
🧹 Nitpick comments (3)
analysis/srtlog/parsers/nodes/trtllm.py (2)
77-81: PreferPath.iterdir()overos.listdir()for consistency.Since
log_diris already aPathobject (line 69), using pathlib methods throughout would be more idiomatic.♻️ Suggested refactor
# Find all worker log files - for file in os.listdir(log_dir): - if not (file.endswith(".err") or file.endswith(".out")): + for filepath in log_dir.iterdir(): + if not filepath.is_file() or filepath.suffix not in (".err", ".out"): continue - if not any(wt in file for wt in ("prefill", "decode", "agg")): + if not any(wt in filepath.name for wt in ("prefill", "decode", "agg")): continue - filepath = log_dir / file node = self.parse_single_log(filepath)
398-401: Minor inefficiency: ANSI codes stripped twice when called internally.When
parse_launch_commandis called fromparse_single_log(line 114), the content has already been stripped. However, since this method is public and could be called directly with raw log content, the defensive stripping is reasonable.If performance becomes a concern for large logs, consider passing a boolean flag or checking for ANSI presence first:
♻️ Optional optimization
- def parse_launch_command(self, log_content: str, worker_type: str = "unknown") -> NodeLaunchCommand | None: + def parse_launch_command( + self, log_content: str, worker_type: str = "unknown", *, already_cleaned: bool = False + ) -> NodeLaunchCommand | None: ... from analysis.srtlog.parsers import NodeLaunchCommand # Strip ANSI codes for cleaner parsing - clean_content = ANSI_ESCAPE.sub("", log_content) + clean_content = log_content if already_cleaned else ANSI_ESCAPE.sub("", log_content)analysis/srtlog/parsers/nodes/sglang.py (1)
243-343: Type the parsed metric dicts withTypedDict.The parsers return untyped
dict | Nonefor external log data. DefiningTypedDicts for prefill/decode/memory metrics improvestychecking and makes required vs optional fields explicit. As per coding guidelines, TypedDicts should be used for external data shapes.Example shape:
from typing import TypedDict, Literal class PrefillMetricsDict(TypedDict, total=False): timestamp: str type: Literal["prefill"] new_seq: int new_token: int cached_token: int token_usage: float running_req: int queue_req: int prealloc_req: int inflight_req: int input_throughput: floatAs per coding guidelines, use
TypedDictfor parsed external data structures.
| try: | ||
| with open(log_path) as f: | ||
| for line in f: | ||
| full_content.append(line) | ||
| # Strip ANSI escape codes | ||
| clean_line = ANSI_ESCAPE.sub("", line) | ||
|
|
||
| # Parse prefill batch metrics | ||
| batch_metrics = self._parse_prefill_batch_line(clean_line) | ||
| if batch_metrics: | ||
| batches.append( | ||
| BatchMetrics( | ||
| timestamp=batch_metrics["timestamp"], | ||
| dp=0, # Default since not in log | ||
| tp=0, | ||
| ep=0, | ||
| batch_type=batch_metrics["type"], | ||
| new_seq=batch_metrics.get("new_seq"), | ||
| new_token=batch_metrics.get("new_token"), | ||
| cached_token=batch_metrics.get("cached_token"), | ||
| token_usage=batch_metrics.get("token_usage"), | ||
| running_req=batch_metrics.get("running_req"), | ||
| queue_req=batch_metrics.get("queue_req"), | ||
| prealloc_req=batch_metrics.get("prealloc_req"), | ||
| inflight_req=batch_metrics.get("inflight_req"), | ||
| input_throughput=batch_metrics.get("input_throughput"), | ||
| ) | ||
| ) | ||
|
|
||
| # Parse decode batch metrics | ||
| decode_metrics = self._parse_decode_batch_line(clean_line) | ||
| if decode_metrics: | ||
| batches.append( | ||
| BatchMetrics( | ||
| timestamp=decode_metrics["timestamp"], | ||
| dp=0, | ||
| tp=0, | ||
| ep=0, | ||
| batch_type=decode_metrics["type"], | ||
| running_req=decode_metrics.get("running_req"), | ||
| queue_req=decode_metrics.get("queue_req"), | ||
| prealloc_req=decode_metrics.get("prealloc_req"), | ||
| transfer_req=decode_metrics.get("transfer_req"), | ||
| token_usage=decode_metrics.get("token_usage"), | ||
| preallocated_usage=decode_metrics.get("preallocated_usage"), | ||
| num_tokens=decode_metrics.get("num_tokens"), | ||
| gen_throughput=decode_metrics.get("gen_throughput"), | ||
| ) | ||
| ) | ||
|
|
||
| # Parse memory metrics | ||
| mem_metrics = self._parse_memory_line(clean_line) | ||
| if mem_metrics: | ||
| memory_snapshots.append( | ||
| MemoryMetrics( | ||
| timestamp=mem_metrics["timestamp"], | ||
| dp=0, | ||
| tp=0, | ||
| ep=0, | ||
| metric_type=mem_metrics["type"], | ||
| avail_mem_gb=mem_metrics.get("avail_mem_gb"), | ||
| mem_usage_gb=mem_metrics.get("mem_usage_gb"), | ||
| kv_cache_gb=mem_metrics.get("kv_cache_gb"), | ||
| kv_tokens=mem_metrics.get("kv_tokens"), | ||
| ) | ||
| ) | ||
|
|
||
| # Extract TP/DP/EP configuration from server_args | ||
| if "tp_size=" in clean_line: | ||
| tp_match = re.search(r"tp_size=(\d+)", clean_line) | ||
| dp_match = re.search(r"dp_size=(\d+)", clean_line) | ||
| ep_match = re.search(r"ep_size=(\d+)", clean_line) | ||
|
|
||
| if tp_match: | ||
| config["tp_size"] = int(tp_match.group(1)) | ||
| if dp_match: | ||
| config["dp_size"] = int(dp_match.group(1)) | ||
| if ep_match: | ||
| config["ep_size"] = int(ep_match.group(1)) | ||
|
|
||
| # Parse launch command from full content | ||
| launch_command = self.parse_launch_command("".join(full_content), node_info["worker_type"]) |
There was a problem hiding this comment.
Harden log reading to avoid decode failures.
open(log_path) uses the platform default encoding and can raise UnicodeDecodeError on logs with invalid bytes, causing the entire file to be skipped. Use read_text(errors="replace") (as in the TRTLLM parser) and iterate over the decoded content to keep parsing resilient.
🛠️ Proposed fix
- full_content = []
+ full_content = ""
@@
- with open(log_path) as f:
- for line in f:
- full_content.append(line)
+ full_content = log_path.read_text(errors="replace")
+ for line in full_content.splitlines():
# Strip ANSI escape codes
clean_line = ANSI_ESCAPE.sub("", line)
@@
- launch_command = self.parse_launch_command("".join(full_content), node_info["worker_type"])
+ launch_command = self.parse_launch_command(full_content, node_info["worker_type"])🤖 Prompt for AI Agents
In `@analysis/srtlog/parsers/nodes/sglang.py` around lines 116 - 197, The file
read using open(log_path) can raise UnicodeDecodeError; replace the open loop
with a tolerant read (e.g., Path(log_path).read_text(errors="replace") or
similar) and iterate over the returned string's lines so parsing still runs even
with invalid bytes; update the code that currently appends to full_content and
feeds clean_line into _parse_prefill_batch_line, _parse_decode_batch_line,
_parse_memory_line and the subsequent tp/dp/ep regex extraction to use the
decoded text lines, then call parse_launch_command("".join(full_content),
node_info["worker_type"]) as before.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/test_dashboard.py`:
- Around line 147-166: The test test_parse_timestamp_mixed_formats asserts exact
float equality for delta.total_seconds(), which is brittle; update the assertion
to use pytest.approx for floating comparison (e.g., assert delta.total_seconds()
== pytest.approx(1.1)) and ensure pytest is imported in the test file if not
already; the change should be made next to the use of _parse_timestamp and
delta.total_seconds() in that test.
🧹 Nitpick comments (1)
tests/test_dashboard.py (1)
18-31: Add return type annotations to test methods.To align with the project’s “type hints everywhere” guideline, annotate test methods with
-> None(and apply consistently across this file).As per coding guidelines, please add return type annotations.♻️ Proposed update (apply pattern to all tests in this class)
- def test_parse_timestamp_yyyy_mm_dd(self): + def test_parse_timestamp_yyyy_mm_dd(self) -> None:
| def test_parse_timestamp_mixed_formats(self): | ||
| """Test parsing a sequence of different timestamp formats.""" | ||
| from analysis.dashboard.rate_match_tab import _parse_timestamp | ||
|
|
||
| # Simulate what dashboard might see from different backends | ||
| timestamps = [ | ||
| "2025-12-30 15:52:38", # Standard (could be from old cache) | ||
| "2025-12-30T15:52:39.100000Z", # SGLang | ||
| "01/23/2026-08:04:40", # TRTLLM | ||
| ] | ||
|
|
||
| dts = [_parse_timestamp(ts) for ts in timestamps] | ||
|
|
||
| # All should parse successfully | ||
| assert len(dts) == 3 | ||
| assert all(isinstance(dt, datetime) for dt in dts) | ||
|
|
||
| # Should be able to compute deltas (even if not chronological) | ||
| delta = dts[1] - dts[0] | ||
| assert delta.total_seconds() == 1.1 |
There was a problem hiding this comment.
Use pytest.approx for non-integer float comparisons.
delta.total_seconds() is a float derived from microseconds; direct equality can be brittle.
🐛 Proposed fix
- assert delta.total_seconds() == 1.1
+ assert delta.total_seconds() == pytest.approx(1.1)🤖 Prompt for AI Agents
In `@tests/test_dashboard.py` around lines 147 - 166, The test
test_parse_timestamp_mixed_formats asserts exact float equality for
delta.total_seconds(), which is brittle; update the assertion to use
pytest.approx for floating comparison (e.g., assert delta.total_seconds() ==
pytest.approx(1.1)) and ensure pytest is imported in the test file if not
already; the change should be made next to the use of _parse_timestamp and
delta.total_seconds() in that test.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@analysis/dashboard/rate_match_tab.py`:
- Around line 129-133: The int conversions for run.profiler_metadata.isl and
.osl can raise ValueError for non-numeric strings; update the block that assigns
isl and osl (currently creating variables isl and osl used for the rate match
graph) to wrap each int(...) conversion in a try/except (ValueError, TypeError)
and fall back to None on failure, leaving st.metric display unchanged; ensure
you reference run.profiler_metadata.isl and run.profiler_metadata.osl and assign
valid integers or None so downstream graph code handles missing values safely.
In `@analysis/srtlog/log_parser.py`:
- Around line 88-95: The cache currently saves only metrics (via
_serialize_node_metrics) and later reconstructs NodeInfo from metrics, dropping
node_config and launch_command; change the caching to persist node_config (and
any parsed launch_command/env) alongside metrics: update the code that builds
the cache (where node_infos is used and _serialize_node_metrics is called) to
serialize full NodeInfo entries (or return a tuple/dict with metrics and
node_config/launch_command), extend _serialize_node_metrics (or add a new
_serialize_nodeinfo_list) to include node_config and launch_command, and update
cache load/deserialization logic that rebuilds NodeInfo on cache hits to restore
node_config and launch_command so cached NodeInfo objects match fresh parses
(apply same change to other caching sites referenced in the comment).
In `@analysis/srtlog/models.py`:
- Around line 145-176: The ProfilerMetadata dataclass (and the other
configuration/identity dataclasses referenced around lines 288-300 and 441-506)
should be made immutable by marking them as frozen dataclasses; update the
`@dataclass` decorator for ProfilerMetadata (and the other config classes) to
include frozen=True so their fields cannot be mutated after construction,
ensuring from_json classmethods and any consumers continue to return instances
with the same API but immutable state.
In `@tests/test_runloader_parsers.py`:
- Around line 14-17: The test module imports RunLoader at collection time which
pulls in pandas and causes collection to fail when pandas is missing; fix by
making the dependency check explicit before importing RunLoader (e.g., call
pytest.importorskip("pandas") at the top of the test module) or alternatively
move the import of RunLoader inside the test functions/setup so import happens
only when pandas is available; update tests referencing RunLoader,
ParserTestHarness, and SampleSABenchData accordingly.
🧹 Nitpick comments (4)
tests/test_parsers.py (1)
428-436: Remove redundantpytestimport inside test method.
pytestis already imported at the module level (line 19). The same applies to the import at line 790.♻️ Suggested fix
def test_parse_timestamp_invalid(self, parser): """Test parsing invalid timestamp raises ValueError.""" - import pytest - with pytest.raises(ValueError): parser.parse_timestamp("invalid-timestamp")analysis/srtlog/parsers/benchmark/mooncake_router.py (1)
216-224: Simplify redundant loop structure.The loop
for root_dir in [logs_dir]:only iterates once over a single-element list. The loop can be removed.♻️ Suggested fix
if logs_dir.exists(): # Check if there are any AIPerf result files using iterdir recursively try: - for root_dir in [logs_dir]: - for item in root_dir.rglob("profile_export_aiperf.json"): - logger.info(f"Found mooncake-router results in: {logs_dir}") - return logs_dir + for item in logs_dir.rglob("profile_export_aiperf.json"): + logger.info(f"Found mooncake-router results in: {logs_dir}") + return logs_dir except (OSError, PermissionError) as e: logger.warning(f"Error accessing {logs_dir}: {e}")analysis/srtlog/parsers/__init__.py (1)
161-188: Guard against duplicate registrations to avoid silent overrides.Right now, registering the same type twice will quietly replace the previous parser, which can hide bugs.
♻️ Proposed fix
def register_benchmark_parser(benchmark_type: str): @@ - def decorator(cls): - _benchmark_parsers[benchmark_type] = cls + def decorator(cls): + if benchmark_type in _benchmark_parsers: + raise ValueError(f"Benchmark parser already registered for '{benchmark_type}'") + _benchmark_parsers[benchmark_type] = cls return cls @@ def register_node_parser(backend_type: str): @@ - def decorator(cls): - _node_parsers[backend_type] = cls + def decorator(cls): + if backend_type in _node_parsers: + raise ValueError(f"Node parser already registered for '{backend_type}'") + _node_parsers[backend_type] = cls return clsanalysis/srtlog/log_parser.py (1)
120-170: Consider narrowing JSON scan to likely metadata files.On large runs, parsing every
*.jsonin the directory can add noticeable overhead; filtering to known metadata filenames (or small-size files) can keep detection fast.
| st.metric("ISL/OSL", f"{run.profiler_metadata.isl}/{run.profiler_metadata.osl}") | ||
|
|
||
| # Create rate match graph | ||
| isl = int(run.profiler.isl) if run.profiler.isl else None | ||
| osl = int(run.profiler.osl) if run.profiler.osl else None | ||
| isl = int(run.profiler_metadata.isl) if run.profiler_metadata.isl else None | ||
| osl = int(run.profiler_metadata.osl) if run.profiler_metadata.osl else None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the ProfilerMetadata class definition
rg -t py "class ProfilerMetadata" --max-count 5Repository: ishandhanani/srt-slurm
Length of output: 115
🏁 Script executed:
# Look at the full context of rate_match_tab.py around lines 129-133
fd -t f "rate_match_tab.py" -x wc -l {}Repository: ishandhanani/srt-slurm
Length of output: 108
🏁 Script executed:
# Check the file size first
wc -l analysis/dashboard/rate_match_tab.py 2>/dev/null || fd -t f "rate_match_tab.py"Repository: ishandhanani/srt-slurm
Length of output: 106
🏁 Script executed:
# Read ProfilerMetadata class definition
cat -n analysis/srtlog/models.py | grep -A 50 "class ProfilerMetadata"Repository: ishandhanani/srt-slurm
Length of output: 2092
🏁 Script executed:
# Read the rate_match_tab.py context
sed -n '120,145p' analysis/dashboard/rate_match_tab.pyRepository: ishandhanani/srt-slurm
Length of output: 1421
🏁 Script executed:
# Search for test data or JSON examples showing isl/osl values
fd -t f "\\.json" | head -10Repository: ishandhanani/srt-slurm
Length of output: 92
🏁 Script executed:
# Look at _create_rate_match_graph to see how isl/osl are used
sed -n '140,180p' analysis/dashboard/rate_match_tab.pyRepository: ishandhanani/srt-slurm
Length of output: 1682
🏁 Script executed:
# Check if there are any tests or data samples
find . -type f \( -name "*.json" -o -name "*test*" \) | grep -i "profiler\|metadata\|isl\|osl" | head -20Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Check the config file to see if there are examples
cat configs/deepep_config.jsonRepository: ishandhanani/srt-slurm
Length of output: 559
🏁 Script executed:
# Search for where isl/osl values are set or used
rg "isl|osl" -t py -B 2 -A 2 | head -100Repository: ishandhanani/srt-slurm
Length of output: 5297
🏁 Script executed:
# Look for any test files that might show expected values
find . -type f -name "*test*.py" | xargs rg -l "isl\|osl" 2>/dev/nullRepository: ishandhanani/srt-slurm
Length of output: 50
🏁 Script executed:
# Check the actual JSON source to see if isl/osl can be non-numeric
rg "\"isl\"|\"osl\"" -t py -B 5 -A 5 | grep -A 10 "get"Repository: ishandhanani/srt-slurm
Length of output: 1565
🏁 Script executed:
# Look for any real-world JSON samples or where the JSON data comes from
rg "benchmark.*json\|\.json.*benchmark" -t py -B 3 -A 3Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Check if there's any existing error handling for int() conversions in the file
rg "int\(" analysis/dashboard/rate_match_tab.py -B 2 -A 2Repository: ishandhanani/srt-slurm
Length of output: 440
Guard ISL/OSL int conversion against non-numeric strings.
ProfilerMetadata.isl and ProfilerMetadata.osl are stored as strings without validation. Non-numeric values (such as "N/A") will raise ValueError and crash the dashboard. Wrap conversions in try-except blocks to safely handle invalid inputs.
🛠️ Proposed fix
- isl = int(run.profiler_metadata.isl) if run.profiler_metadata.isl else None
- osl = int(run.profiler_metadata.osl) if run.profiler_metadata.osl else None
+ try:
+ isl = int(run.profiler_metadata.isl) if run.profiler_metadata.isl else None
+ except (TypeError, ValueError):
+ isl = None
+ try:
+ osl = int(run.profiler_metadata.osl) if run.profiler_metadata.osl else None
+ except (TypeError, ValueError):
+ osl = None🤖 Prompt for AI Agents
In `@analysis/dashboard/rate_match_tab.py` around lines 129 - 133, The int
conversions for run.profiler_metadata.isl and .osl can raise ValueError for
non-numeric strings; update the block that assigns isl and osl (currently
creating variables isl and osl used for the rate match graph) to wrap each
int(...) conversion in a try/except (ValueError, TypeError) and fall back to
None on failure, leaving st.metric display unchanged; ensure you reference
run.profiler_metadata.isl and run.profiler_metadata.osl and assign valid
integers or None so downstream graph code handles missing values safely.
| # Save to cache if we have data | ||
| if nodes: | ||
| cache_df = self._serialize_node_metrics(nodes) | ||
| if node_infos: | ||
| # Extract metrics for caching | ||
| metrics_list = [ni.metrics for ni in node_infos] | ||
| cache_df = self._serialize_node_metrics(metrics_list) | ||
| cache_mgr.save_to_cache("node_metrics", cache_df, source_patterns) | ||
| logger.info(f"Parsed and cached {len(node_infos)} nodes from {logs_dir}") | ||
|
|
There was a problem hiding this comment.
Cache path drops node_config / launch_command data.
On a cache hit, NodeInfo is reconstructed from metrics only, so launch_command/environment derived from log parsing can be lost unless config files exist. This makes cached results inconsistent with fresh parses.
🐛 Proposed fix: persist node_config alongside metrics
@@
- metrics_list = [ni.metrics for ni in node_infos]
- cache_df = self._serialize_node_metrics(metrics_list)
+ cache_df = self._serialize_node_metrics(node_infos)
@@
- def _serialize_node_metrics(self, nodes: list) -> pd.DataFrame:
+ def _serialize_node_metrics(self, nodes: list[NodeInfo]) -> pd.DataFrame:
@@
- for node in nodes:
- metadata = node.metadata
- config = node.config
+ for node_info in nodes:
+ metrics = node_info.metrics
+ metadata = metrics.metadata
+ config = metrics.config
+ node_config_json = json.dumps(node_info.node_config) if node_info.node_config else None
@@
- row = {
+ row = {
# Node identification
"node": metadata.node_name,
"worker_type": metadata.worker_type,
"worker_id": metadata.worker_id,
+ "node_config_json": node_config_json,
@@
- row = {
+ row = {
# Node identification
"node": metadata.node_name,
"worker_type": metadata.worker_type,
"worker_id": metadata.worker_id,
+ "node_config_json": node_config_json,
@@
- # Extract config (same for all rows in this node)
+ # Extract config (same for all rows in this node)
config = {}
if not group_df.empty:
first_row = group_df.iloc[0]
+ node_config: dict = {}
+ node_config_json = first_row.get("node_config_json")
+ if pd.notna(node_config_json):
+ try:
+ node_config = json.loads(node_config_json)
+ except Exception as e:
+ logger.debug(f"Could not parse node_config_json: {e}")
@@
- node_info = NodeInfo(metrics=metrics, node_config={})
+ node_info = NodeInfo(metrics=metrics, node_config=node_config or {})Also applies to: 394-469, 471-582
🤖 Prompt for AI Agents
In `@analysis/srtlog/log_parser.py` around lines 88 - 95, The cache currently
saves only metrics (via _serialize_node_metrics) and later reconstructs NodeInfo
from metrics, dropping node_config and launch_command; change the caching to
persist node_config (and any parsed launch_command/env) alongside metrics:
update the code that builds the cache (where node_infos is used and
_serialize_node_metrics is called) to serialize full NodeInfo entries (or return
a tuple/dict with metrics and node_config/launch_command), extend
_serialize_node_metrics (or add a new _serialize_nodeinfo_list) to include
node_config and launch_command, and update cache load/deserialization logic that
rebuilds NodeInfo on cache hits to restore node_config and launch_command so
cached NodeInfo objects match fresh parses (apply same change to other caching
sites referenced in the comment).
| @dataclass | ||
| class ProfilerMetadata: | ||
| """Metadata about the benchmark/profiler configuration. | ||
|
|
||
| This describes what the benchmark was configured to do, | ||
| not the actual results. | ||
| """ | ||
|
|
||
| profiler_type: str | ||
| isl: str | ||
| osl: str | ||
| concurrencies: str = "" | ||
| req_rate: str = "" | ||
|
|
||
| @classmethod | ||
| def from_json(cls, json_data: dict) -> "ProfilerMetadata": | ||
| """Create from {jobid}.json benchmark section. | ||
|
|
||
| Args: | ||
| json_data: Parsed JSON from {jobid}.json file | ||
|
|
||
| Returns: | ||
| ProfilerMetadata instance | ||
| """ | ||
| profiler_meta = json_data.get("benchmark", {}) | ||
|
|
||
| return cls( | ||
| profiler_type=profiler_meta.get("type", "unknown"), | ||
| isl=str(profiler_meta.get("isl", "")), | ||
| osl=str(profiler_meta.get("osl", "")), | ||
| concurrencies=profiler_meta.get("concurrencies", ""), | ||
| req_rate=profiler_meta.get("req-rate", ""), |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
cat -n analysis/srtlog/models.py | head -200Repository: ishandhanani/srt-slurm
Length of output: 8762
🏁 Script executed:
cat -n analysis/srtlog/models.py | sed -n '280,310p'Repository: ishandhanani/srt-slurm
Length of output: 1175
🏁 Script executed:
cat -n analysis/srtlog/models.py | sed -n '430,510p'Repository: ishandhanani/srt-slurm
Length of output: 2952
🏁 Script executed:
# Check if there are any __post_init__ methods or mutation patterns in these classes
rg -A 10 "class ProfilerMetadata|class BenchmarkLaunchCommand|class NodeMetadata|class NodeLaunchCommand" analysis/srtlog/models.py | head -100Repository: ishandhanani/srt-slurm
Length of output: 1249
🏁 Script executed:
# Verify no mutation methods exist by searching for methods that modify fields
rg -B 2 -A 5 "self\.(profiler_type|isl|osl|concurrencies|req_rate|benchmark_type|raw_command|extra_args|node_name|worker_type|worker_id|backend_type) =" analysis/srtlog/models.pyRepository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
# Check if RunMetadata (also a config class) is frozen and verify consistency
rg -B 2 -A 15 "^@dataclass|^class RunMetadata" analysis/srtlog/models.py | head -50Repository: ishandhanani/srt-slurm
Length of output: 1136
Make configuration dataclasses immutable.
These classes are configuration/identity carriers; freezing prevents accidental mutation and aligns with project conventions.
♻️ Proposed change
-@dataclass
+@dataclass(frozen=True)
class ProfilerMetadata:
@@
-@dataclass
+@dataclass(frozen=True)
class BenchmarkLaunchCommand:
@@
-@dataclass
+@dataclass(frozen=True)
class NodeMetadata:
@@
-@dataclass
+@dataclass(frozen=True)
class NodeLaunchCommand:As per coding guidelines, configuration objects should be frozen dataclasses.
Also applies to: 288-300, 441-506
🤖 Prompt for AI Agents
In `@analysis/srtlog/models.py` around lines 145 - 176, The ProfilerMetadata
dataclass (and the other configuration/identity dataclasses referenced around
lines 288-300 and 441-506) should be made immutable by marking them as frozen
dataclasses; update the `@dataclass` decorator for ProfilerMetadata (and the other
config classes) to include frozen=True so their fields cannot be mutated after
construction, ensuring from_json classmethods and any consumers continue to
return instances with the same API but immutable state.
| import pytest | ||
|
|
||
| from analysis.srtlog.run_loader import RunLoader | ||
| from tests.fixtures_parsers import ParserTestHarness, SampleSABenchData |
There was a problem hiding this comment.
Same pipeline failure applies here.
This file also fails test collection due to the missing pandas dependency triggered by importing RunLoader. See the comment on tests/test_parsers.py for details.
🧰 Tools
🪛 GitHub Actions: CI
[error] 16-16: ModuleNotFoundError: No module named 'pandas'. Importing test module failed due to missing pandas dependency.
🤖 Prompt for AI Agents
In `@tests/test_runloader_parsers.py` around lines 14 - 17, The test module
imports RunLoader at collection time which pulls in pandas and causes collection
to fail when pandas is missing; fix by making the dependency check explicit
before importing RunLoader (e.g., call pytest.importorskip("pandas") at the top
of the test module) or alternatively move the import of RunLoader inside the
test functions/setup so import happens only when pandas is available; update
tests referencing RunLoader, ParserTestHarness, and SampleSABenchData
accordingly.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Around line 19-21: Remove "pre-commit" from the runtime dependencies list in
pyproject.toml and add it to the dev-only dependency group instead (e.g., under
[project.optional-dependencies.dev] or [tool.poetry.dev-dependencies] depending
on your project tool), keeping the same version constraint "pre-commit>=4.5.1";
update the dependency arrays so only runtime packages like "pandas" and
"pyarrow" remain in the main dependencies and "pre-commit" is declared under the
dev/dev-dependencies symbol.
- Around line 20-21: pyarrow's version is inconsistent between runtime and
dev/analysis groups and streamlit/plotly are likely misclassified as runtime
deps; update pyproject.toml so pyarrow has a single aligned constraint across
all groups (or document why runtime needs "pyarrow>=23.0.0") and move streamlit
and plotly out of the main runtime dependencies into the analysis extras if they
are only used in analysis workflows (verify imports in
src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py and other src/ files
before moving); ensure pandas stays as a runtime dependency because
benchmark_serving.py imports it.
| "pre-commit>=4.5.1", | ||
| "pandas>=2.1.0", | ||
| "pyarrow>=23.0.0", |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🌐 Web query:
pre-commit official documentation installation dev dependency runtime
💡 Result:
-
Install the
pre-commitrunner (tooling, not app runtime):- With pip:
pip install pre-commit[1] - In a Python project, add
pre-committo your dev requirements (e.g.,requirements-dev.txt): the docs explicitly suggestrequirements.txt (or requirements-dev.txt)[1] - With Homebrew:
brew install pre-commit[2] - As a zero-dependency zipapp: download the
.pyzfrom GitHub releases and run it withpython pre-commit-#.pyz ...[1]
- With pip:
-
Install into the repo (developer workflow): run
pre-commit installso hooks run ongit commit[1] -
Dev-dependency vs runtime-dependency:
pre-commitis a development/CI tool (it installs and runs git hooks), so it typically belongs in dev dependencies, not your application/library’s runtime dependencies [1].- Hook tool runtimes (eslint, black, etc.) are generally not something you add to your project’s runtime deps—
pre-commitcreates isolated environments for hooks and can even fetch/build required language runtimes when needed (e.g., Node) [1].
Move pre-commit to a dev-only dependency group.
pre-commit is a development and CI tool (not a runtime dependency). The official documentation recommends installing it in dev requirements. Keep it in runtime dependencies bloats end-user installations unnecessarily.
♻️ Proposed change
dependencies = [
"pyyaml>=6.0",
"jinja2>=3.1.0",
"marshmallow>=3.20.0",
"marshmallow-dataclass>=8.6.0",
"requests>=2.31.0",
"rich>=13.0.0",
"questionary>=2.0.0",
- "pre-commit>=4.5.1",
"pandas>=2.1.0",
"pyarrow>=23.0.0",
]
[dependency-groups]
dev = [
"pytest>=8.0.0",
"pytest-cov>=4.0",
"ruff>=0.8.0",
"ty", # Astral's fast type checker (replaces mypy)
+ "pre-commit>=4.5.1",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "pre-commit>=4.5.1", | |
| "pandas>=2.1.0", | |
| "pyarrow>=23.0.0", | |
| "pandas>=2.1.0", | |
| "pyarrow>=23.0.0", |
🤖 Prompt for AI Agents
In `@pyproject.toml` around lines 19 - 21, Remove "pre-commit" from the runtime
dependencies list in pyproject.toml and add it to the dev-only dependency group
instead (e.g., under [project.optional-dependencies.dev] or
[tool.poetry.dev-dependencies] depending on your project tool), keeping the same
version constraint "pre-commit>=4.5.1"; update the dependency arrays so only
runtime packages like "pandas" and "pyarrow" remain in the main dependencies and
"pre-commit" is declared under the dev/dev-dependencies symbol.
| "pandas>=2.1.0", | ||
| "pyarrow>=23.0.0", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "pyproject.toml" -type fRepository: ishandhanani/srt-slurm
Length of output: 82
🏁 Script executed:
cat -n ./pyproject.tomlRepository: ishandhanani/srt-slurm
Length of output: 3825
🏁 Script executed:
rg "import pandas|from pandas|import pyarrow|from pyarrow" --type python -l | head -20Repository: ishandhanani/srt-slurm
Length of output: 152
🏁 Script executed:
find ./src -name "*.py" -type f | xargs grep -l "pandas\|pyarrow" 2>/dev/null | head -20Repository: ishandhanani/srt-slurm
Length of output: 127
🏁 Script executed:
find ./src/srtctl -name "*.py" -type f | head -20Repository: ishandhanani/srt-slurm
Length of output: 816
🏁 Script executed:
grep -r "import pandas\|from pandas\|import pyarrow\|from pyarrow" ./src/srtctl --include="*.py" | grep -v "\.pyc"Repository: ishandhanani/srt-slurm
Length of output: 147
🏁 Script executed:
grep -r "pyarrow" ./src/srtctl --include="*.py"Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
cat ./src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py | head -50Repository: ishandhanani/srt-slurm
Length of output: 1400
🏁 Script executed:
grep -A 5 -B 5 "pandas\|pd\." ./src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py | head -80Repository: ishandhanani/srt-slurm
Length of output: 730
🏁 Script executed:
find ./src/srtctl/benchmarks/scripts -type f -name "*.py" | wc -lRepository: ishandhanani/srt-slurm
Length of output: 67
🏁 Script executed:
grep -l "pandas\|pyarrow" ./src/srtctl/benchmarks/scripts/sa-bench/*.pyRepository: ishandhanani/srt-slurm
Length of output: 127
🏁 Script executed:
find ./src/srtctl -name "*.py" -type f -exec grep -l "streamlit\|plotly" {} \;Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
grep -r "streamlit\|plotly" ./src --include="*.py"Repository: ishandhanani/srt-slurm
Length of output: 48
🏁 Script executed:
find . -type d -name "analysis" -o -type d -name "dashboard" | head -10Repository: ishandhanani/srt-slurm
Length of output: 97
🏁 Script executed:
find ./analysis -name "*.py" -type f | head -20Repository: ishandhanani/srt-slurm
Length of output: 791
🏁 Script executed:
grep -r "import pandas\|from pandas\|import pyarrow\|from pyarrow" ./analysis --include="*.py" | head -20Repository: ishandhanani/srt-slurm
Length of output: 506
Fix pyarrow version inconsistency and reconsider streamlit/plotly in runtime deps.
pandas>=2.1.0 is legitimately a runtime dependency (used in src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py), but pyarrow has conflicting version constraints across dependency groups: >=23.0.0 in runtime vs >=14.0.0 in both dev and analysis groups. Align the pyarrow version across all groups or clarify why runtime requires a newer version.
Additionally, streamlit>=1.30.0 and plotly>=6.5.2 appear only in the analysis workflows (no imports found in src/), yet are listed in runtime dependencies. Move these to the analysis group if they're not used by runtime code.
🤖 Prompt for AI Agents
In `@pyproject.toml` around lines 20 - 21, pyarrow's version is inconsistent
between runtime and dev/analysis groups and streamlit/plotly are likely
misclassified as runtime deps; update pyproject.toml so pyarrow has a single
aligned constraint across all groups (or document why runtime needs
"pyarrow>=23.0.0") and move streamlit and plotly out of the main runtime
dependencies into the analysis extras if they are only used in analysis
workflows (verify imports in
src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py and other src/ files
before moving); ensure pandas stays as a runtime dependency because
benchmark_serving.py imports it.
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Tests
Documentation
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.