-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-51756][CORE][FOLLOWUP] Avoid the risk of overflow of long #52776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@beliefer Aggregated checksum might overflow and become a negative value but is it really a problem? If negative checksum value causes a problem, should we have a test for a problematic case? |
|
I don't think that negative checksum is a problem, we would just lose a bit from the checksum range with that |
Does the lost bit cause some unexpected issues ? |
It makes the quality of the checksum worse. |
I'm worry that some bits may be lost, which could actually affect the reliability of the comparative checksum. |
Checksum computation is always about losing bits 😄, the less we lose the better quality checksum we can get. |
| def getAggregatedChecksumValue(rowBasedChecksums: Array[RowBasedChecksum]): Long = { | ||
| Option(rowBasedChecksums) | ||
| .map(_.foldLeft(0L)((acc, c) => acc * 31L + c.getValue)) | ||
| .map(_.foldLeft(0L)((acc, c) => (acc * 31L + c.getValue) & Long.MaxValue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does & Long.MaxValue help here? The current code is kind of very common and many hash code are also calculated like this.
|
I'm not sure. Let me close this PR and open it if exists the actual problem in future. |
What changes were proposed in this pull request?
This PR proposes to avoid the risk of overflow of long for
getAggregatedChecksumValue.This PR follows up #50230
Why are the changes needed?
I guess the size of
rowBasedChecksumsis very great and row-based checksum could be big one. Then there exists the risk of overflow of long.Does this PR introduce any user-facing change?
'No'.
New feature.
How was this patch tested?
GA tests.
Was this patch authored or co-authored using generative AI tooling?
'No'.