Skip to content
This repository was archived by the owner on Apr 20, 2026. It is now read-only.

support agg_logs and dashboard#193

Draft
YAMY1234 wants to merge 1 commit intoishandhanani:mainfrom
YAMY1234:agg_dashboard
Draft

support agg_logs and dashboard#193
YAMY1234 wants to merge 1 commit intoishandhanani:mainfrom
YAMY1234:agg_dashboard

Conversation

@YAMY1234
Copy link
Copy Markdown
Collaborator

@YAMY1234 YAMY1234 commented Feb 26, 2026

Summary by CodeRabbit

  • New Features

    • Added support for aggregated log format parsing alongside existing formats
    • Added support for v2.0 run metadata format with enhanced backward compatibility
    • Improved cache loading to search multiple directory locations
  • Bug Fixes

    • Enhanced run identification for aggregated runs with distinct labeling
    • Improved dashboard legend labeling using folder-derived suffixes and job names
  • Documentation

    • Updated file pattern guidance in node metrics warnings to reflect supported file types

@YAMY1234 YAMY1234 marked this pull request as draft February 26, 2026 22:15
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

The changes extend the analysis pipeline to support aggregated logs: parser recognizes agg formats and log files; metadata extraction handles v2.0 structure with agg_nodes; run loader adapts cache patterns and run ID generation for aggregated runs; UI displays appropriate labels and guidance text.

Changes

Cohort / File(s) Summary
Aggregated Log Format Support
analysis/srtlog/log_parser.py
Expanded parsing to recognize agg files alongside prefill/decode; added regex and timestamp handling for Agg/ANSI log format; updated docstrings to document four-format support.
Metadata and Run Format Handling
analysis/srtlog/models.py
Added support for v2.0 run format with agg_nodes extraction and fallback logic; unified handling for flat new-structure format to ensure consistent agg_nodes fallback across both paths.
Run Data Loading and Caching
analysis/srtlog/run_loader.py
Broadened cache source patterns to include logs subdirectories; modified run ID construction to use aggregated metadata with A suffix when is_aggregated is true.
Dashboard Display Updates
analysis/dashboard/app.py, analysis/dashboard/node_metrics_tab.py
Sidebar legend now derives folder_suffix from YAML filename with dash-based fallback to job_name; node metrics warning text updated to reference agg files alongside out files.

Sequence Diagram

sequenceDiagram
    participant User
    participant Dashboard as Dashboard UI
    participant RunLoader as Run Loader
    participant LogParser as Log Parser
    participant Metadata as Metadata Handler
    
    User->>Dashboard: Request analysis view
    Dashboard->>RunLoader: Load run data
    RunLoader->>LogParser: Parse log files (prefill/decode/agg)
    LogParser->>LogParser: Extract metrics & timestamps<br/>(incl. Agg/ANSI format)
    LogParser-->>RunLoader: Parsed log data
    RunLoader->>Metadata: Extract run metadata<br/>(v2.0 or flat format)
    Metadata->>Metadata: Derive agg_nodes & mode<br/>(with fallback logic)
    Metadata-->>RunLoader: RunMetadata instance
    RunLoader->>RunLoader: Generate run ID<br/>(agg path if aggregated)
    RunLoader-->>Dashboard: Populated dataframe
    Dashboard->>Dashboard: Construct sidebar legend<br/>(folder_suffix from filename)
    Dashboard-->>User: Rendered dashboard
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

Suggested Reviewers

  • ishandhanani

Poem

🐰 Hop along the agg-gregate path,
Parsing formats, metadata bath,
From dash-named YAML to run IDs bright,
The dashboard now dances with aggregated might!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'support agg_logs and dashboard' is partially related to the changeset. It accurately reflects that the PR adds support for aggregated logs (agg) across multiple files, but it omits important implementation details about label construction changes and run metadata format handling that are significant parts of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gracehonv gracehonv marked this pull request as ready for review February 26, 2026 23:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
analysis/srtlog/models.py (1)

112-141: Consider extracting shared logic to reduce duplication.

The flat structure handling (lines 112-141) duplicates nearly all the extraction logic from the v2.0 format handling (lines 51-84). Both paths perform identical field extraction and agg_nodes fallback computation.

Consider extracting the common logic into a private helper method to improve maintainability.

♻️ Suggested refactor to reduce duplication
+    `@classmethod`
+    def _from_resources_model(cls, json_data: dict, resources: dict, model: dict, run_path: str) -> "RunMetadata":
+        """Helper to create RunMetadata from resources/model structure."""
+        agg_workers = resources.get("agg_workers") or 0
+        mode = "aggregated" if agg_workers > 0 else "disaggregated"
+        
+        agg_nodes = (resources.get("agg_nodes") or 0) if agg_workers > 0 else 0
+        if agg_workers > 0 and agg_nodes == 0:
+            agg_nodes = agg_workers
+        
+        return cls(
+            job_id=json_data.get("job_id", ""),
+            path=run_path,
+            run_date=json_data.get("generated_at", ""),
+            container=model.get("container", ""),
+            prefill_nodes=resources.get("prefill_nodes") or 0,
+            decode_nodes=resources.get("decode_nodes") or 0,
+            prefill_workers=resources.get("prefill_workers") or 0,
+            decode_workers=resources.get("decode_workers") or 0,
+            mode=mode,
+            job_name=json_data.get("job_name", ""),
+            partition="",
+            model_dir=model.get("path", ""),
+            gpus_per_node=resources.get("gpus_per_node") or 0,
+            gpu_type=resources.get("gpu_type", ""),
+            enable_multiple_frontends=False,
+            num_additional_frontends=0,
+            agg_nodes=agg_nodes,
+            agg_workers=agg_workers,
+        )
+
     `@classmethod`
     def from_json(cls, json_data: dict, run_path: str) -> "RunMetadata":
         # ... docstring ...
         
         # Check if this is the new v2.0 format
         if json_data.get("version") == "2.0":
             resources = json_data.get("resources", {})
             model = json_data.get("model", {})
-            agg_workers = resources.get("agg_workers") or 0
-            # ... all the extraction logic ...
+            return cls._from_resources_model(json_data, resources, model, run_path)
         
         # Old format with run_metadata key
         if "run_metadata" in json_data:
             # ... existing old format handling ...
         
         # New format (flat structure, no version field)
         model_data = json_data.get("model", {})
         resources_data = json_data.get("resources", {})
-        agg_workers = resources_data.get("agg_workers") or 0
-        # ... all the extraction logic ...
+        return cls._from_resources_model(json_data, resources_data, model_data, run_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analysis/srtlog/models.py` around lines 112 - 141, The flat-structure branch
duplicates field extraction and agg_nodes logic already used for the v2.0 path;
extract that shared logic into a private helper (e.g., _extract_run_fields or
_build_run_kwargs) which takes model_data, resources_data, json_data and
run_path, computes agg_workers/agg_nodes/mode and returns the common kwargs or a
dict, then call that helper from both the v2.0 and flat branches and pass its
result into cls(...) so both branches reuse the same extraction and agg_nodes
fallback logic.
analysis/dashboard/app.py (1)

307-314: Clarify comment about folder name vs YAML file.

The comment mentions "YAML file name" but the code extracts the suffix from the run directory name (run.metadata.path), not a YAML file. Consider updating the comment to accurately describe what's being extracted.

📝 Suggested comment clarification
-        # Use folder name suffix (YAML file name) instead of YAML's internal name field
+        # Use folder name suffix (portion after first dash) for additional context
         dirname = os.path.basename(run.metadata.path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analysis/dashboard/app.py` around lines 307 - 314, The comment above the
suffix extraction is inaccurate: it says "YAML file name" but the code derives
dirname from run.metadata.path and extracts a folder suffix from the run
directory name (dirname -> folder_suffix) to append to label; update the comment
to clearly state that the code is using the run directory name
(run.metadata.path) and its suffix after the first "-" rather than any internal
YAML filename or YAML "name" field, and keep the existing logic in the block
that sets folder_suffix and falls back to run.metadata.job_name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@analysis/dashboard/app.py`:
- Around line 307-314: The comment above the suffix extraction is inaccurate: it
says "YAML file name" but the code derives dirname from run.metadata.path and
extracts a folder suffix from the run directory name (dirname -> folder_suffix)
to append to label; update the comment to clearly state that the code is using
the run directory name (run.metadata.path) and its suffix after the first "-"
rather than any internal YAML filename or YAML "name" field, and keep the
existing logic in the block that sets folder_suffix and falls back to
run.metadata.job_name.

In `@analysis/srtlog/models.py`:
- Around line 112-141: The flat-structure branch duplicates field extraction and
agg_nodes logic already used for the v2.0 path; extract that shared logic into a
private helper (e.g., _extract_run_fields or _build_run_kwargs) which takes
model_data, resources_data, json_data and run_path, computes
agg_workers/agg_nodes/mode and returns the common kwargs or a dict, then call
that helper from both the v2.0 and flat branches and pass its result into
cls(...) so both branches reuse the same extraction and agg_nodes fallback
logic.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between acc6dd3 and 60ba478.

📒 Files selected for processing (5)
  • analysis/dashboard/app.py
  • analysis/dashboard/node_metrics_tab.py
  • analysis/srtlog/log_parser.py
  • analysis/srtlog/models.py
  • analysis/srtlog/run_loader.py

@gracehonv gracehonv marked this pull request as draft February 26, 2026 23:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant