fix: Change count metric from signed to unsigned (int64_t -> uint64_t)#15536
fix: Change count metric from signed to unsigned (int64_t -> uint64_t)#15536rui-mo wants to merge 1 commit intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
@rui-mo I see a test failing @tanjialiang, @xiaoxmeng Do you have any thoughts on this PR? |
|
Thanks @majetideepak. The test should be fixed now. |
tanjialiang
left a comment
There was a problem hiding this comment.
Thanks @rui-mo for the improvements. Could you make sure the types aligns also for the calling methods and the counters' assignings? Unexpected overflows might happen if not well aligned for this type of downcast.
| int64_t expectedMin = std::numeric_limits<uint64_t>::max(), | ||
| int64_t expectedMax = std::numeric_limits<uint64_t>::min()) { |
There was a problem hiding this comment.
assigning a uint64 max to int64
There was a problem hiding this comment.
Thanks for catching this. I double-checked those to ensure unexpected overflow would not occur.
573291a to
f5ddb61
Compare
|
It just came to my mind: Should we keep value signed? When Unit is kNone, we might expect legit negative value metrics such as delta type of metrics. counts should be non-negative this should be strictly true. |
|
@tanjialiang If it’s necessary to keep the value as |
For most cases, we should be good. But yeah for the ones we already know that is doing a uint64 to int64 assignment we can add impose range checks. (and it should be okay if we miss some. most cases they won't overflow) |
|
@tanjialiang I’ve updated the PR based on your suggestion. |
karthikeyann
left a comment
There was a problem hiding this comment.
approving for velox-cudf changes.
|
Hi @tanjialiang, could you please take another look at your convenience? Thanks. |
|
Hi @Yuhta @tanjialiang, this PR fixes the overflow issue after propagating IO stats to runtime metrics, see #15408 (comment). Could you please take further review? Thanks. |
| @@ -1159,7 +1159,7 @@ CudfVectorPtr CudfHashAggregation::releaseAndResetPartialOutput() { | |||
| std::string(exec::HashAggregation::kFlushTimes), RuntimeCounter(1)); | |||
| lockedStats->addRuntimeStat( | |||
| std::string(exec::HashAggregation::kPartialAggregationPct), | |||
| RuntimeCounter(aggregationPct)); | |||
| RuntimeCounter(static_cast<int64_t>(aggregationPct))); | |||
There was a problem hiding this comment.
Should this one be saturate cast as well?
There was a problem hiding this comment.
I assumed aggregationPct would typically be <= 100, so I used static_cast. I've updated it to use a saturate cast to prevent any potential risk. Thanks.
2e4cbb2 to
0879e2e
Compare
|
@peterenescu has imported this pull request. If you are a Meta employee, you can view this in D95451420. |
Summary: Use an unsigned type for the count metric while keeping int64_t for value metrics. When the unit is kNone, negative values can be valid such as for delta- type metrics, so int64_t remains appropriate. In contrast, `count` should always be non-negative. This PR also addresses potential overflow when converting unsigned metrics to `RuntimeMetrics`. X-link: facebookincubator/velox#15536 Reviewed By: Yuhta Differential Revision: D95451420 Pulled By: peterenescu
|
Hi @rui-mo you will also need to make a small change to presto prestodb/presto#27276. We will probably need to:
|
|
@peterenescu Thanks for your help. I understand - that’s because RuntimeMetric in Presto also uses int64_t for count. |
|
@peterenescu Could you please let me know what I need to do to "add compatibility layer" in Velox? |
|
@rui-mo Maybe we change the presto side first to be |
|
Hi @peterenescu @Yuhta, based on the review comment from prestodb/presto#27295 (comment), it seems that we could merge this PR into Velox first and apply the corresponding changes in Presto afterward. I would appreciate your thoughts on this approach. Thank you! |
|
@rui-mo The merge into Meta internal codebase is blocked unless the Presto side is fixed first. If the changed code on Presto side works with both before & after with Velox, we should change Presto first; otherwise we may need to use |
## Description <!---Describe your changes in detail--> Presto's RuntimeMetric uses int64_t for count, but Velox's RuntimeMetric uses uint64_t. To avoid overflow, we cap the count at int64_t's max value. ## Motivation and Context <!---Why is this change required? What problem does it solve?--> <!---If it fixes an open issue, please link to the issue here.--> A refactor to change the count metric from int64_t to uint64_t is going on in Velox. To avoid overflow when converting to Presto metric, this PR updates Presto count metric to consistent type. facebookincubator/velox#15536 ``` == NO RELEASE NOTE == ```
…7295) ## Description <!---Describe your changes in detail--> Presto's RuntimeMetric uses int64_t for count, but Velox's RuntimeMetric uses uint64_t. To avoid overflow, we cap the count at int64_t's max value. ## Motivation and Context <!---Why is this change required? What problem does it solve?--> <!---If it fixes an open issue, please link to the issue here.--> A refactor to change the count metric from int64_t to uint64_t is going on in Velox. To avoid overflow when converting to Presto metric, this PR updates Presto count metric to consistent type. facebookincubator/velox#15536 ``` == NO RELEASE NOTE == ```
Use an unsigned type for the count metric while keeping int64_t for value
metrics. When the unit is kNone, negative values can be valid such as for delta-
type metrics, so int64_t remains appropriate. In contrast,
countshouldalways be non-negative. This PR also addresses potential overflow when
converting unsigned metrics to
RuntimeMetrics.