Skip to content

Conversation

srilman
Copy link
Contributor

@srilman srilman commented Aug 7, 2025

Changes Made

Rather than sending updates per active node to runtime subscribers, we should just send them all at once and let the subscriber decide what to do (whether to iterate or not). This is useful for the Flotilla subscriber because it allows us to send one packet rather than internal batching.

Also includes changes from the logging PR since they should be fixed ASAP and can help reduce that PR to narrow the segfault.

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

@github-actions github-actions bot added the fix label Aug 7, 2025
@@ -32,7 +32,7 @@ impl ProgressBarColor {
Self::Blue => "blue",
Self::Magenta => "magenta",
Self::Cyan => "cyan",
Self::Red => "red",
Self::Yellow => "yellow",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yellow looks better in the terminal than red :)

@@ -134,6 +135,8 @@ impl RuntimeStatsManager {
let event_loop = async move {
let mut interval = interval(throttle_interval);
let mut active_nodes = HashSet::with_capacity(node_stats.len());
// Reuse container for ticks
let mut snapshot_container = Vec::with_capacity(node_stats.len());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe premature, but easy change.

Copy link

codecov bot commented Aug 7, 2025

Codecov Report

❌ Patch coverage is 54.41176% with 31 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@a25c2f8). Learn more about missing BASE report.
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...ion/src/runtime_stats/subscribers/opentelemetry.rs 0.00% 15 Missing ⚠️
...ecution/src/runtime_stats/subscribers/dashboard.rs 0.00% 5 Missing ⚠️
...tion/src/runtime_stats/subscribers/progress_bar.rs 58.33% 5 Missing ⚠️
...l-execution/src/runtime_stats/subscribers/debug.rs 0.00% 4 Missing ⚠️
src/daft-local-execution/src/runtime_stats/mod.rs 93.33% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #4932   +/-   ##
=======================================
  Coverage        ?   77.55%           
=======================================
  Files           ?      906           
  Lines           ?   130215           
  Branches        ?        0           
=======================================
  Hits            ?   100993           
  Misses          ?    29222           
  Partials        ?        0           
Files with missing lines Coverage Δ
src/daft-local-execution/src/run.rs 53.03% <100.00%> (ø)
src/daft-local-execution/src/sources/source.rs 80.83% <100.00%> (ø)
src/daft-local-execution/src/runtime_stats/mod.rs 93.06% <93.33%> (ø)
...l-execution/src/runtime_stats/subscribers/debug.rs 0.00% <0.00%> (ø)
...ecution/src/runtime_stats/subscribers/dashboard.rs 5.26% <0.00%> (ø)
...tion/src/runtime_stats/subscribers/progress_bar.rs 60.00% <58.33%> (ø)
...ion/src/runtime_stats/subscribers/opentelemetry.rs 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@srilman srilman marked this pull request as ready for review August 7, 2025 20:34
@srilman srilman requested a review from colin-ho August 7, 2025 20:34
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR refactors the runtime statistics system in Daft's local execution engine to batch subscriber updates instead of sending individual updates per active node. The core change modifies the RuntimeStatsSubscriber trait interface from handle_event(&self, event: &StatSnapshotSend, node_info: &NodeInfo) to handle_event(&self, events: &[(&NodeInfo, StatSnapshotSend)]), requiring all subscriber implementations to be updated.

The key architectural improvement occurs in RuntimeStatsManager where the periodic interval tick now collects all active node statistics into a reusable snapshot_container vector and sends them as a single batch to each subscriber. This eliminates the previous approach of making N separate subscriber calls (where N is the number of active nodes) in favor of a single batched call.

All subscriber implementations have been updated to handle the new batched interface:

  • DashboardSubscriber: Iterates through batched events and sends each through its internal channel
  • ProgressBarSubscriber: Updates multiple progress bars in a single method call and changes streaming sink node color from Red to Yellow
  • OpenTelemetrySubscriber: Processes multiple events to record metrics with proper attributes
  • DebugSubscriber: Prints all events in the batch for debugging purposes

The PR also includes important cleanup improvements: the finish() method now properly accepts a Runtime parameter to block on task completion during shutdown, ensuring all pending statistics are flushed. Additionally, a bug fix was added to ensure stats_manager.finalize_node() is called when source node send operations fail, preventing memory leaks from incomplete node finalization.

A minor dependency update bumps indicatif from version 0.17 to 0.18, likely to support enhanced progress bar functionality required by the batching changes.

Confidence score: 4/5

  • This PR is safe to merge with low risk as it primarily refactors internal interfaces without changing core functionality
  • Score reflects well-structured changes with consistent updates across all subscriber implementations, though the breaking API change requires careful coordination
  • Pay close attention to the RuntimeStatsManager finish method changes and subscriber interface updates to ensure proper shutdown behavior

9 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

@srilman srilman merged commit 8f20ed5 into main Aug 7, 2025
50 checks passed
@srilman srilman deleted the slade/batched-subscriber-updates branch August 7, 2025 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants