Skip to content

misc: Propagate full IO stats (count/min/max) to runtime metrics#15408

Closed
rui-mo wants to merge 1 commit intofacebookincubator:mainfrom
rui-mo:wip_runtime
Closed

misc: Propagate full IO stats (count/min/max) to runtime metrics#15408
rui-mo wants to merge 1 commit intofacebookincubator:mainfrom
rui-mo:wip_runtime

Conversation

@rui-mo
Copy link
Copy Markdown
Collaborator

@rui-mo rui-mo commented Nov 5, 2025

This PR enhances the propagation of IO statistics into RuntimeMetric by
including not just the sum, but also count, min, and max values from the IO
layer. This affects the IO-related metrics such as:prefetchBytes,
ioWaitWallNanos, storageReadBytes, localReadBytes, ramReadBytes.
Previously, these Runtime metrics had a default count of 1, and min/max were
equal to sum because no detailed IO stats were passed down. After this change,
count/min/max now reflect real observed values from the IO layer.

The metrics maxSingleIoWaitWallNanos and numStorageRead are removed
because their functionality is represented by the ioWaitWallNanos.max and
storageReadBytes.count respectively.

Users relying on these metrics for historical analysis, monitoring, or
dashboards should be aware of this impact.

@rui-mo rui-mo requested a review from majetideepak as a code owner November 5, 2025 07:50
@netlify
Copy link
Copy Markdown

netlify bot commented Nov 5, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 1ec718b
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/690c8d1d36ed310008692843

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2025
Copy link
Copy Markdown
Collaborator

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could update the PR description?

Copy link
Copy Markdown
Contributor

@Yuhta Yuhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself makes sense, the only caveat is it changes the semantics of these stats drastically (so that all the stats other than sum are not comparable before & after this change). This is something worthing noting in the release note.

@Yuhta Yuhta added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Nov 5, 2025
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Nov 5, 2025

@peterenescu has imported this pull request. If you are a Meta employee, you can view this in D86311108.

@Yuhta
Copy link
Copy Markdown
Contributor

Yuhta commented Nov 5, 2025

@rui-mo The test failures are relevant, can you double check and fix?

@peterenescu
Copy link
Copy Markdown
Contributor

To be specific, there are four velox exec tests failing:
PrintPlanWithStatsTest (innerJoinWithTableScan, partialAggregateWithTableScan, tableWriterWithTableScan)
TableScanTest (statsBasedFilterReorderDisabled)

@peterenescu peterenescu removed the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Nov 5, 2025
@FelixYBW
Copy link
Copy Markdown

FelixYBW commented Nov 6, 2025

With this change, we may remove maxSingleIoWaitWallNanos, numStorageRead. They are shown in the ioWaitWallNanos.max and storageReadBytes.count

@rui-mo rui-mo changed the title misc: Pass count/min/max from IO stats to runtime metric misc: Propagate full IO stats (count/min/max) to runtime metrics Nov 6, 2025
@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Nov 6, 2025

Hi everyone, thanks for the suggestions! I've updated the PR description and resolved the unit test failure. Could you please review it again?

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Nov 6, 2025

@peterenescu merged this pull request in a0d77b3.

@majetideepak
Copy link
Copy Markdown
Collaborator

This PR is causing the values to overflow since IoCounter uses uint64_t and RuntimeMetric uses int64_t.
@rui-mo will you be able to help with a fix?

Specifically due to min_ default in the IoCounter.

std::atomic<uint64_t> min_{std::numeric_limits<uint64_t>::max()};
"TableScan.0.ioWaitWallNanos": {
                     "name": "TableScan.0.ioWaitWallNanos",
                     "unit": "NANO",
                     "sum": 0,
                     "count": 0,
                     "max": 0,
                     "min": -1000
                 }

See prestodb/presto#26588 (comment)

@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Nov 18, 2025

@majetideepak Thanks for pointing this out. I opened a PR that changes RuntimeMetrics counters from signed to unsigned (int64_t -> uint64_t) #15536. Let me know your thoughts, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants