Skip to content

fix: Change count metric from signed to unsigned (int64_t -> uint64_t)#15536

Closed
rui-mo wants to merge 1 commit intofacebookincubator:mainfrom
rui-mo:wip_uint64
Closed

fix: Change count metric from signed to unsigned (int64_t -> uint64_t)#15536
rui-mo wants to merge 1 commit intofacebookincubator:mainfrom
rui-mo:wip_uint64

Conversation

@rui-mo
Copy link
Copy Markdown
Collaborator

@rui-mo rui-mo commented Nov 18, 2025

Use an unsigned type for the count metric while keeping int64_t for value
metrics. When the unit is kNone, negative values can be valid such as for delta-
type metrics, so int64_t remains appropriate. In contrast, count should
always be non-negative. This PR also addresses potential overflow when
converting unsigned metrics to RuntimeMetrics.

@netlify
Copy link
Copy Markdown

netlify bot commented Nov 18, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 1bfaa5e
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/69b08f5881004e0008c51819

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 18, 2025
@rui-mo rui-mo changed the title fix: Use uint64_t in the RuntimeMetric fix: Change RuntimeMetrics counters from signed to unsigned (int64_t -> uint64_t) Nov 18, 2025
@majetideepak
Copy link
Copy Markdown
Collaborator

@rui-mo I see a test failing [ FAILED ] RuntimeMetricsTest.basic (0 ms).

@tanjialiang, @xiaoxmeng Do you have any thoughts on this PR?

@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Nov 18, 2025

Thanks @majetideepak. The test should be fixed now.

Copy link
Copy Markdown
Contributor

@tanjialiang tanjialiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rui-mo for the improvements. Could you make sure the types aligns also for the calling methods and the counters' assignings? Unexpected overflows might happen if not well aligned for this type of downcast.

Comment on lines +28 to +29
int64_t expectedMin = std::numeric_limits<uint64_t>::max(),
int64_t expectedMax = std::numeric_limits<uint64_t>::min()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigning a uint64 max to int64

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I double-checked those to ensure unexpected overflow would not occur.

@tanjialiang
Copy link
Copy Markdown
Contributor

It just came to my mind: Should we keep value signed? When Unit is kNone, we might expect legit negative value metrics such as delta type of metrics.

counts should be non-negative this should be strictly true.

@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Dec 3, 2025

@tanjialiang If it’s necessary to keep the value as int64_t, I’m OK with doing so. In that case, we’ll need special handling for range checks when passing uint64_t metrics to RuntimeMetrics. For example, IoCounter uses uint64_t, which could overflow an int64_t.

@tanjialiang
Copy link
Copy Markdown
Contributor

tanjialiang commented Dec 4, 2025

@tanjialiang If it’s necessary to keep the value as int64_t, I’m OK with doing so. In that case, we’ll need special handling for range checks when passing uint64_t metrics to RuntimeMetrics. For example, IoCounter uses uint64_t, which could overflow an int64_t.

For most cases, we should be good. But yeah for the ones we already know that is doing a uint64 to int64 assignment we can add impose range checks. (and it should be okay if we miss some. most cases they won't overflow)

@rui-mo rui-mo changed the title fix: Change RuntimeMetrics counters from signed to unsigned (int64_t -> uint64_t) fix: Change RuntimeMetrics count from signed to unsigned (int64_t -> uint64_t) Dec 10, 2025
@rui-mo rui-mo changed the title fix: Change RuntimeMetrics count from signed to unsigned (int64_t -> uint64_t) fix: Change count metric from signed to unsigned (int64_t -> uint64_t) Dec 10, 2025
@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Dec 10, 2025

@tanjialiang I’ve updated the PR based on your suggestion. uint64_t is now used only for count, and additional range checks have been added to prevent potential overflow. Please take another look when you have a moment. Thanks!

Copy link
Copy Markdown
Collaborator

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving for velox-cudf changes.

@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Feb 3, 2026

Hi @tanjialiang, could you please take another look at your convenience? Thanks.

@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Mar 2, 2026

Hi @Yuhta @tanjialiang, this PR fixes the overflow issue after propagating IO stats to runtime metrics, see #15408 (comment). Could you please take further review? Thanks.

@@ -1159,7 +1159,7 @@ CudfVectorPtr CudfHashAggregation::releaseAndResetPartialOutput() {
std::string(exec::HashAggregation::kFlushTimes), RuntimeCounter(1));
lockedStats->addRuntimeStat(
std::string(exec::HashAggregation::kPartialAggregationPct),
RuntimeCounter(aggregationPct));
RuntimeCounter(static_cast<int64_t>(aggregationPct)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this one be saturate cast as well?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed aggregationPct would typically be <= 100, so I used static_cast. I've updated it to use a saturate cast to prevent any potential risk. Thanks.

@rui-mo rui-mo force-pushed the wip_uint64 branch 4 times, most recently from 2e4cbb2 to 0879e2e Compare March 5, 2026 15:10
@Yuhta Yuhta added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Mar 5, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 5, 2026

@peterenescu has imported this pull request. If you are a Meta employee, you can view this in D95451420.

peterenescu pushed a commit to peterenescu/presto that referenced this pull request Mar 6, 2026
Summary:
Use an unsigned type for the count metric while keeping int64_t for value 
metrics. When the unit is kNone, negative values can be valid such as for delta-
type metrics, so int64_t remains appropriate. In contrast, `count` should 
always be non-negative. This PR also addresses potential overflow when 
converting unsigned metrics to `RuntimeMetrics`.

X-link: facebookincubator/velox#15536

Reviewed By: Yuhta

Differential Revision: D95451420

Pulled By: peterenescu
@peterenescu
Copy link
Copy Markdown
Contributor

Hi @rui-mo you will also need to make a small change to presto prestodb/presto#27276. We will probably need to:

  1. Add compatibility layer
  2. Merge the Velox code
  3. Update Presto side
  4. Remove the compatibility layer in Velox
    cc: @Yuhta

@peterenescu peterenescu removed the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Mar 6, 2026
@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Mar 6, 2026

@peterenescu Thanks for your help. I understand - that’s because RuntimeMetric in Presto also uses int64_t for count.

namespace facebook::presto::protocol {
struct RuntimeMetric {
  String name = {};
  RuntimeUnit unit = {};
  int64_t sum = {};
  int64_t count = {};
  int64_t max = {};
  int64_t min = {};
};

@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Mar 6, 2026

@peterenescu Could you please let me know what I need to do to "add compatibility layer" in Velox?

@Yuhta
Copy link
Copy Markdown
Contributor

Yuhta commented Mar 6, 2026

@rui-mo Maybe we change the presto side first to be uint64_t? Will this error go away?

@rui-mo
Copy link
Copy Markdown
Collaborator Author

rui-mo commented Mar 10, 2026

Hi @peterenescu @Yuhta, based on the review comment from prestodb/presto#27295 (comment), it seems that we could merge this PR into Velox first and apply the corresponding changes in Presto afterward. I would appreciate your thoughts on this approach. Thank you!

@Yuhta
Copy link
Copy Markdown
Contributor

Yuhta commented Mar 10, 2026

@rui-mo The merge into Meta internal codebase is blocked unless the Presto side is fixed first. If the changed code on Presto side works with both before & after with Velox, we should change Presto first; otherwise we may need to use VELOX_ENABLE_BACKWARD_COMPATIBILITY

@rui-mo rui-mo closed this by deleting the head repository Mar 30, 2026
aditi-pandit pushed a commit to prestodb/presto that referenced this pull request Mar 31, 2026
## Description
<!---Describe your changes in detail-->

Presto's RuntimeMetric uses int64_t for count, but Velox's RuntimeMetric
uses uint64_t. To avoid overflow, we cap the count at int64_t's max
value.

## Motivation and Context
<!---Why is this change required? What problem does it solve?-->
<!---If it fixes an open issue, please link to the issue here.-->

A refactor to change the count metric from int64_t to uint64_t is going
on in
Velox. To avoid overflow when converting to Presto metric, this PR
updates
Presto count metric to consistent type.
facebookincubator/velox#15536

```
== NO RELEASE NOTE ==
```
bibith4 pushed a commit to bibith4/presto that referenced this pull request Apr 1, 2026
…7295)

## Description
<!---Describe your changes in detail-->

Presto's RuntimeMetric uses int64_t for count, but Velox's RuntimeMetric
uses uint64_t. To avoid overflow, we cap the count at int64_t's max
value.

## Motivation and Context
<!---Why is this change required? What problem does it solve?-->
<!---If it fixes an open issue, please link to the issue here.-->

A refactor to change the count metric from int64_t to uint64_t is going
on in
Velox. To avoid overflow when converting to Presto metric, this PR
updates
Presto count metric to consistent type.
facebookincubator/velox#15536

```
== NO RELEASE NOTE ==
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants