test(l1): add per-phase timing breakdown to multisync Slack notifications by pablodeymo · Pull Request #6136 · lambdaclass/ethrex

pablodeymo · 2026-02-05T19:26:28Z

Motivation

The multisync monitoring script (docker_monitor.py) sends Slack notifications at the end of each sync run, but they only report the total sync time per network. When investigating performance regressions or comparing runs, we had to manually SSH into the server and parse raw container logs to figure out which phase was slow. This is time-consuming and error-prone.

The sync logs already contain per-phase completion markers like:

✓ BLOCK HEADERS complete: 25,693,009 headers in 0:29:00
✓ STORAGE HEALING complete: 87,414 storage accounts healed in 1:42:00

This PR surfaces that data directly in the Slack notification, so performance bottlenecks are visible at a glance.

Description

Adds three things to tooling/sync/docker_monitor.py:

PHASE_COMPLETION_PATTERNS dict — Regex patterns for all 8 snap sync phases:
- Block Headers, Account Ranges, Account Insertion, Storage Ranges, Storage Insertion, State Healing, Storage Healing, Bytecodes
parse_phase_timings(run_id, container) function — Reads saved container log files from multisync_logs/run_{run_id}/{container}.log and extracts (phase_name, item_count, duration) for each completed phase. Returns an empty list if logs are missing or if a phase didn't complete (e.g., on a failed run), so the behavior is graceful.
Phase breakdown in Slack and run logs — After the per-instance status line, a code block is appended showing the full phase timing table. The same breakdown is also written to run_history.log and the per-run summary.txt.

Expected Slack output (successful run)

The Slack message will now include a section like this for each network instance:

📊 Phase Breakdown — mainnet
Block Headers       0:29:00  (25,693,009)
Account Ranges      0:45:12  (12,345,678)
Account Insertion   0:12:34  (12,345,678)
Storage Ranges      0:38:45  (1,234,567)
Storage Insertion   0:08:23  (1,234,567)
State Healing       0:15:00  (87,414)
Storage Healing     1:42:00  (87,414)
Bytecodes           0:05:30  (45,678)

Phase names are left-aligned with padding for readability. The count in parentheses corresponds to the number of items processed (headers, accounts, storage slots, etc.).

Expected Slack output (failed run with partial phases)

If a run fails mid-sync (e.g., timeout during storage healing), only the phases that completed are shown:

📊 Phase Breakdown — mainnet
Block Headers       0:29:00  (25,693,009)
Account Ranges      0:45:12  (12,345,678)
Account Insertion   0:12:34  (12,345,678)
Storage Ranges      0:38:45  (1,234,567)
Storage Insertion   0:08:23  (1,234,567)

Phases that never completed (State Healing, Storage Healing, Bytecodes in this case) are simply omitted — no placeholder or "N/A" rows.

Expected text log output (`summary.txt` / `run_history.log`)

  ✅ mainnet: success (sync: 4h 32m 15s)
    Phase Breakdown:
      Block Headers       0:29:00  (25,693,009)
      Account Ranges      0:45:12  (12,345,678)
      Account Insertion   0:12:34  (12,345,678)
      Storage Ranges      0:38:45  (1,234,567)
      Storage Insertion   0:08:23  (1,234,567)
      State Healing       0:15:00  (87,414)
      Storage Healing     1:42:00  (87,414)
      Bytecodes           0:05:30  (45,678)

How it works

The flow is:

save_all_logs() saves container logs to disk (already existed, no changes)
log_run_result() now calls parse_phase_timings() and appends breakdown to text log
slack_notify() now calls parse_phase_timings() and appends code blocks to Slack payload

Since save_all_logs() is called before both log_run_result() and slack_notify() (lines 721→725 in main loop), the saved log files are always available for parsing.

Edge cases

Scenario	Behavior
Run fails before any phase completes	No breakdown section shown
Log file missing or unreadable	Empty list returned, no breakdown
Only some phases completed	Only completed phases listed
Multiple networks (hoodi, sepolia, mainnet)	Separate breakdown per instance

Checklist

Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR includes breaking changes to the Store requiring a re-sync.

N/A — This PR only modifies the Python monitoring script, no Rust code or storage changes.

in the multisync monitoring script (docker_monitor.py). The sync completion logs already contain per-phase completion markers (e.g. "✓ BLOCK HEADERS complete: 25,693,009 headers in 0:29:00") but this data was not surfaced in the Slack messages or run summaries. This adds a parse_phase_timings() function that reads saved container logs and extracts timing, count, and duration for all 8 snap sync phases: Block Headers, Account Ranges, Account Insertion, Storage Ranges, Storage Insertion, State Healing, Storage Healing, and Bytecodes. The breakdown is appended to both the Slack notification (as a code block per network instance) and the text-based run log (run_history.log and per-run summary.txt). When a phase did not complete (e.g. on a failed run), it is simply omitted from the breakdown.

greptile-apps · 2026-02-05T19:29:14Z

Greptile Overview

Greptile Summary

Adds per-phase timing breakdown to multisync monitoring Slack notifications and run logs by parsing saved container logs for phase completion markers.

Key changes:

Added PHASE_COMPLETION_PATTERNS dict with regex patterns matching all 8 snap sync phases from Rust logs (network.rs:389)
Implemented parse_phase_timings() function that reads saved log files and extracts phase completion data (name, count, duration)
Enhanced slack_notify() and log_run_result() to display phase breakdowns after per-instance status
Gracefully handles missing logs or incomplete phases by returning empty list

Implementation quality:

Clean separation of concerns with dedicated parsing function
Proper error handling with try/except and file existence checks
Regex patterns correctly match the actual log format from network.rs (verified against source)
Consistent formatting across Slack and text logs
No breaking changes to existing functionality

The change provides immediate visibility into performance bottlenecks without requiring manual log inspection.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The implementation is well-designed with proper error handling, no breaking changes, and correctly matches the actual log format from the Rust source code. The changes are isolated to the monitoring script with graceful degradation on failures.
No files require special attention

Important Files Changed

Filename	Overview
tooling/sync/docker_monitor.py	Adds phase timing breakdown parsing and display for multisync monitoring - clean implementation with proper error handling

Sequence Diagram

sequenceDiagram
    participant Main as main()
    participant SaveLogs as save_all_logs()
    participant LogResult as log_run_result()
    participant SlackNotify as slack_notify()
    participant ParsePhase as parse_phase_timings()
    participant LogFile as Log Files

    Note over Main: Run completes (success or failed)
    Main->>SaveLogs: save_all_logs(instances, run_id, compose_file)
    SaveLogs->>LogFile: Write container logs to multisync_logs/run_{run_id}/{container}.log
    LogFile-->>SaveLogs: Logs saved
    SaveLogs-->>Main: Complete

    Main->>LogResult: log_run_result(run_id, run_count, instances, ...)
    loop For each instance
        LogResult->>ParsePhase: parse_phase_timings(run_id, container)
        ParsePhase->>LogFile: Read multisync_logs/run_{run_id}/{container}.log
        LogFile-->>ParsePhase: Log content
        ParsePhase->>ParsePhase: Apply regex patterns for 8 phases
        ParsePhase-->>LogResult: [(phase_name, count, duration), ...]
        LogResult->>LogResult: Format and append to text log
    end
    LogResult->>LogFile: Append to run_history.log and summary.txt
    LogResult-->>Main: Complete

    Main->>SlackNotify: slack_notify(run_id, run_count, instances, ...)
    loop For each instance
        SlackNotify->>ParsePhase: parse_phase_timings(run_id, container)
        ParsePhase->>LogFile: Read multisync_logs/run_{run_id}/{container}.log
        LogFile-->>ParsePhase: Log content
        ParsePhase->>ParsePhase: Apply regex patterns for 8 phases
        ParsePhase-->>SlackNotify: [(phase_name, count, duration), ...]
        SlackNotify->>SlackNotify: Format phase breakdown code block
    end
    SlackNotify->>SlackNotify: POST to Slack webhook with blocks
    SlackNotify-->>Main: Complete

Copilot

Pull request overview

This PR adds per-phase timing breakdown to multisync Slack notifications and log files, making it easier to identify performance bottlenecks in sync operations without manually parsing container logs.

Changes:

Added PHASE_COMPLETION_PATTERNS dictionary with regex patterns for all 8 snap sync phases
Added parse_phase_timings() function to extract phase timing data from saved container logs
Enhanced Slack notifications and text logs to display phase breakdowns for each network instance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-05T19:31:35Z

+    # Add phase breakdown for each instance
+    for i in instances:
+        phases = parse_phase_timings(run_id, i.container)
+        if phases:
+            phase_lines = [f"📊 *Phase Breakdown — {i.name}*", "```"]
+            max_name_len = max(len(name) for name, _, _ in phases)
+            for name, count, duration in phases:
+                phase_lines.append(f"{name:<{max_name_len}}  {duration}  ({count})")
+            phase_lines.append("```")
+            blocks.append({
+                "type": "section",
+                "text": {"type": "mrkdwn", "text": "\n".join(phase_lines)}
+            })


The phase breakdown sections are added in a separate loop (lines 360-371) from the instance status sections (lines 342-357). This means in the Slack notification, all instance statuses will be shown first, followed by all phase breakdowns. This differs from the text log format (lines 456-479) where each phase breakdown immediately follows its instance status. For better readability and consistency, consider moving the phase breakdown logic inside the first loop (after line 357) so each instance's breakdown appears immediately after its status, matching the text log format and the example in the PR description.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

pablodeymo requested a review from a team as a code owner February 5, 2026 19:26

Copilot AI review requested due to automatic review settings February 5, 2026 19:26

Copilot started reviewing on behalf of pablodeymo February 5, 2026 19:26 View session

pablodeymo changed the title ~~Add per-phase timing breakdown to multisync Slack notifications~~ feat(l1): add per-phase timing breakdown to multisync Slack notifications Feb 5, 2026

pablodeymo changed the title ~~feat(l1): add per-phase timing breakdown to multisync Slack notifications~~ test(l1): add per-phase timing breakdown to multisync Slack notifications Feb 5, 2026

Copilot AI reviewed Feb 5, 2026

View reviewed changes

github-actions Bot added the L1 Ethereum client label Feb 5, 2026

github-project-automation Bot added this to ethrex_l1 Feb 5, 2026

pablodeymo added the snapsync label Feb 5, 2026

github-actions Bot assigned pablodeymo Feb 5, 2026

github-actions Bot removed the L1 Ethereum client label Feb 5, 2026

Update tooling/sync/docker_monitor.py

7042b83

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

pablodeymo mentioned this pull request Feb 5, 2026

docs(l1): snapsync roadmap #6112

Open

1 task

Merge branch 'main' into feature/slack-phase-breakdown

6f12101

ilitteri approved these changes Feb 6, 2026

View reviewed changes

iovoid approved these changes Feb 6, 2026

View reviewed changes

Merge branch 'main' into feature/slack-phase-breakdown

216ee7f

ElFantasma approved these changes Feb 6, 2026

View reviewed changes

github-project-automation Bot moved this to In Review in ethrex_l1 Feb 6, 2026

Merge branch 'main' into feature/slack-phase-breakdown

b44c4ea

pablodeymo enabled auto-merge February 6, 2026 21:56

pablodeymo added this pull request to the merge queue Feb 6, 2026

Merged via the queue into main with commit 9a5b5e6 Feb 6, 2026
51 checks passed

pablodeymo deleted the feature/slack-phase-breakdown branch February 6, 2026 22:52

github-project-automation Bot moved this from In Review to Done in ethrex_l1 Feb 6, 2026

pablodeymo added a commit that referenced this pull request Feb 6, 2026

Add sections 1.9, 1.10, 1.11 to roadmap and mark 1.11 as done (#6136)

1970015

ElFantasma pushed a commit that referenced this pull request Mar 13, 2026

Add sections 1.9, 1.10, 1.11 to roadmap and mark 1.11 as done (#6136)

f488de7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(l1): add per-phase timing breakdown to multisync Slack notifications#6136

test(l1): add per-phase timing breakdown to multisync Slack notifications#6136
pablodeymo merged 5 commits into
mainfrom
feature/slack-phase-breakdown

pablodeymo commented Feb 5, 2026

Uh oh!

greptile-apps Bot commented Feb 5, 2026

Important Files Changed

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pablodeymo commented Feb 5, 2026

Expected Slack output (successful run)

Expected Slack output (failed run with partial phases)

Expected text log output (summary.txt / run_history.log)

How it works

Edge cases

Uh oh!

greptile-apps Bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Expected text log output (`summary.txt` / `run_history.log`)