Skip to content

test(l1): add per-phase timing breakdown to multisync Slack notifications#6136

Merged
pablodeymo merged 5 commits into
mainfrom
feature/slack-phase-breakdown
Feb 6, 2026
Merged

test(l1): add per-phase timing breakdown to multisync Slack notifications#6136
pablodeymo merged 5 commits into
mainfrom
feature/slack-phase-breakdown

Conversation

@pablodeymo
Copy link
Copy Markdown
Contributor

Motivation

The multisync monitoring script (docker_monitor.py) sends Slack notifications at the end of each sync run, but they only report the total sync time per network. When investigating performance regressions or comparing runs, we had to manually SSH into the server and parse raw container logs to figure out which phase was slow. This is time-consuming and error-prone.

The sync logs already contain per-phase completion markers like:

✓ BLOCK HEADERS complete: 25,693,009 headers in 0:29:00
✓ STORAGE HEALING complete: 87,414 storage accounts healed in 1:42:00

This PR surfaces that data directly in the Slack notification, so performance bottlenecks are visible at a glance.

Description

Adds three things to tooling/sync/docker_monitor.py:

  1. PHASE_COMPLETION_PATTERNS dict — Regex patterns for all 8 snap sync phases:

    • Block Headers, Account Ranges, Account Insertion, Storage Ranges, Storage Insertion, State Healing, Storage Healing, Bytecodes
  2. parse_phase_timings(run_id, container) function — Reads saved container log files from multisync_logs/run_{run_id}/{container}.log and extracts (phase_name, item_count, duration) for each completed phase. Returns an empty list if logs are missing or if a phase didn't complete (e.g., on a failed run), so the behavior is graceful.

  3. Phase breakdown in Slack and run logs — After the per-instance status line, a code block is appended showing the full phase timing table. The same breakdown is also written to run_history.log and the per-run summary.txt.

Expected Slack output (successful run)

The Slack message will now include a section like this for each network instance:

📊 Phase Breakdown — mainnet
Block Headers       0:29:00  (25,693,009)
Account Ranges      0:45:12  (12,345,678)
Account Insertion   0:12:34  (12,345,678)
Storage Ranges      0:38:45  (1,234,567)
Storage Insertion   0:08:23  (1,234,567)
State Healing       0:15:00  (87,414)
Storage Healing     1:42:00  (87,414)
Bytecodes           0:05:30  (45,678)

Phase names are left-aligned with padding for readability. The count in parentheses corresponds to the number of items processed (headers, accounts, storage slots, etc.).

Expected Slack output (failed run with partial phases)

If a run fails mid-sync (e.g., timeout during storage healing), only the phases that completed are shown:

📊 Phase Breakdown — mainnet
Block Headers       0:29:00  (25,693,009)
Account Ranges      0:45:12  (12,345,678)
Account Insertion   0:12:34  (12,345,678)
Storage Ranges      0:38:45  (1,234,567)
Storage Insertion   0:08:23  (1,234,567)

Phases that never completed (State Healing, Storage Healing, Bytecodes in this case) are simply omitted — no placeholder or "N/A" rows.

Expected text log output (summary.txt / run_history.log)

  ✅ mainnet: success (sync: 4h 32m 15s)
    Phase Breakdown:
      Block Headers       0:29:00  (25,693,009)
      Account Ranges      0:45:12  (12,345,678)
      Account Insertion   0:12:34  (12,345,678)
      Storage Ranges      0:38:45  (1,234,567)
      Storage Insertion   0:08:23  (1,234,567)
      State Healing       0:15:00  (87,414)
      Storage Healing     1:42:00  (87,414)
      Bytecodes           0:05:30  (45,678)

How it works

The flow is:

  1. save_all_logs() saves container logs to disk (already existed, no changes)
  2. log_run_result() now calls parse_phase_timings() and appends breakdown to text log
  3. slack_notify() now calls parse_phase_timings() and appends code blocks to Slack payload

Since save_all_logs() is called before both log_run_result() and slack_notify() (lines 721→725 in main loop), the saved log files are always available for parsing.

Edge cases

Scenario Behavior
Run fails before any phase completes No breakdown section shown
Log file missing or unreadable Empty list returned, no breakdown
Only some phases completed Only completed phases listed
Multiple networks (hoodi, sepolia, mainnet) Separate breakdown per instance

Checklist

  • Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR includes breaking changes to the Store requiring a re-sync.

N/A — This PR only modifies the Python monitoring script, no Rust code or storage changes.

in the multisync monitoring script (docker_monitor.py).

The sync completion logs already contain per-phase completion markers
(e.g. "✓ BLOCK HEADERS complete: 25,693,009 headers in 0:29:00")
but this data was not surfaced in the Slack messages or run summaries.

This adds a parse_phase_timings() function that reads saved container
logs and extracts timing, count, and duration for all 8 snap sync
phases: Block Headers, Account Ranges, Account Insertion, Storage
Ranges, Storage Insertion, State Healing, Storage Healing, and
Bytecodes. The breakdown is appended to both the Slack notification
(as a code block per network instance) and the text-based run log
(run_history.log and per-run summary.txt). When a phase did not
complete (e.g. on a failed run), it is simply omitted from the
breakdown.
@pablodeymo pablodeymo requested a review from a team as a code owner February 5, 2026 19:26
Copilot AI review requested due to automatic review settings February 5, 2026 19:26
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

Adds per-phase timing breakdown to multisync monitoring Slack notifications and run logs by parsing saved container logs for phase completion markers.

Key changes:

  • Added PHASE_COMPLETION_PATTERNS dict with regex patterns matching all 8 snap sync phases from Rust logs (network.rs:389)
  • Implemented parse_phase_timings() function that reads saved log files and extracts phase completion data (name, count, duration)
  • Enhanced slack_notify() and log_run_result() to display phase breakdowns after per-instance status
  • Gracefully handles missing logs or incomplete phases by returning empty list

Implementation quality:

  • Clean separation of concerns with dedicated parsing function
  • Proper error handling with try/except and file existence checks
  • Regex patterns correctly match the actual log format from network.rs (verified against source)
  • Consistent formatting across Slack and text logs
  • No breaking changes to existing functionality

The change provides immediate visibility into performance bottlenecks without requiring manual log inspection.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is well-designed with proper error handling, no breaking changes, and correctly matches the actual log format from the Rust source code. The changes are isolated to the monitoring script with graceful degradation on failures.
  • No files require special attention

Important Files Changed

Filename Overview
tooling/sync/docker_monitor.py Adds phase timing breakdown parsing and display for multisync monitoring - clean implementation with proper error handling

Sequence Diagram

sequenceDiagram
    participant Main as main()
    participant SaveLogs as save_all_logs()
    participant LogResult as log_run_result()
    participant SlackNotify as slack_notify()
    participant ParsePhase as parse_phase_timings()
    participant LogFile as Log Files

    Note over Main: Run completes (success or failed)
    Main->>SaveLogs: save_all_logs(instances, run_id, compose_file)
    SaveLogs->>LogFile: Write container logs to multisync_logs/run_{run_id}/{container}.log
    LogFile-->>SaveLogs: Logs saved
    SaveLogs-->>Main: Complete

    Main->>LogResult: log_run_result(run_id, run_count, instances, ...)
    loop For each instance
        LogResult->>ParsePhase: parse_phase_timings(run_id, container)
        ParsePhase->>LogFile: Read multisync_logs/run_{run_id}/{container}.log
        LogFile-->>ParsePhase: Log content
        ParsePhase->>ParsePhase: Apply regex patterns for 8 phases
        ParsePhase-->>LogResult: [(phase_name, count, duration), ...]
        LogResult->>LogResult: Format and append to text log
    end
    LogResult->>LogFile: Append to run_history.log and summary.txt
    LogResult-->>Main: Complete

    Main->>SlackNotify: slack_notify(run_id, run_count, instances, ...)
    loop For each instance
        SlackNotify->>ParsePhase: parse_phase_timings(run_id, container)
        ParsePhase->>LogFile: Read multisync_logs/run_{run_id}/{container}.log
        LogFile-->>ParsePhase: Log content
        ParsePhase->>ParsePhase: Apply regex patterns for 8 phases
        ParsePhase-->>SlackNotify: [(phase_name, count, duration), ...]
        SlackNotify->>SlackNotify: Format phase breakdown code block
    end
    SlackNotify->>SlackNotify: POST to Slack webhook with blocks
    SlackNotify-->>Main: Complete
Loading

@pablodeymo pablodeymo changed the title Add per-phase timing breakdown to multisync Slack notifications feat(l1): add per-phase timing breakdown to multisync Slack notifications Feb 5, 2026
@pablodeymo pablodeymo changed the title feat(l1): add per-phase timing breakdown to multisync Slack notifications test(l1): add per-phase timing breakdown to multisync Slack notifications Feb 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds per-phase timing breakdown to multisync Slack notifications and log files, making it easier to identify performance bottlenecks in sync operations without manually parsing container logs.

Changes:

  • Added PHASE_COMPLETION_PATTERNS dictionary with regex patterns for all 8 snap sync phases
  • Added parse_phase_timings() function to extract phase timing data from saved container logs
  • Enhanced Slack notifications and text logs to display phase breakdowns for each network instance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tooling/sync/docker_monitor.py Outdated
Comment on lines +359 to +371
# Add phase breakdown for each instance
for i in instances:
phases = parse_phase_timings(run_id, i.container)
if phases:
phase_lines = [f"📊 *Phase Breakdown — {i.name}*", "```"]
max_name_len = max(len(name) for name, _, _ in phases)
for name, count, duration in phases:
phase_lines.append(f"{name:<{max_name_len}} {duration} ({count})")
phase_lines.append("```")
blocks.append({
"type": "section",
"text": {"type": "mrkdwn", "text": "\n".join(phase_lines)}
})
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phase breakdown sections are added in a separate loop (lines 360-371) from the instance status sections (lines 342-357). This means in the Slack notification, all instance statuses will be shown first, followed by all phase breakdowns. This differs from the text log format (lines 456-479) where each phase breakdown immediately follows its instance status. For better readability and consistency, consider moving the phase breakdown logic inside the first loop (after line 357) so each instance's breakdown appears immediately after its status, matching the text log format and the example in the PR description.

Copilot uses AI. Check for mistakes.
@github-actions github-actions Bot added the L1 Ethereum client label Feb 5, 2026
@github-actions github-actions Bot removed the L1 Ethereum client label Feb 5, 2026
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@pablodeymo pablodeymo mentioned this pull request Feb 5, 2026
1 task
@github-project-automation github-project-automation Bot moved this to In Review in ethrex_l1 Feb 6, 2026
@pablodeymo pablodeymo enabled auto-merge February 6, 2026 21:56
@pablodeymo pablodeymo added this pull request to the merge queue Feb 6, 2026
Merged via the queue into main with commit 9a5b5e6 Feb 6, 2026
51 checks passed
@pablodeymo pablodeymo deleted the feature/slack-phase-breakdown branch February 6, 2026 22:52
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants