-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-44018][SQL] Improve the hashCode and toString for some DS V2 Expression #41543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ping @cloud-fan cc @asiunov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this! I'm not familiar well with how these classes work and are used, but I have a few suggestions based on how equals/hashCode are usually implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return Objects.hash(name) * 31 + Arrays.hashCode(children); | |
| return Objects.hash(name, isDistinct) * 31 + Arrays.hashCode(children); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return Objects.hash(name, canonicalName) * 31 + Arrays.hashCode(children); | |
| return Objects.hash(name, canonicalName, isDistinct) * 31 + Arrays.hashCode(children); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(added isDistinct == that.isDistinct)
| return Objects.equals(name, that.name) && Objects.equals(canonicalName, that.canonicalName) && | |
| Arrays.equals(children, that.children); | |
| return isDistinct == that.isDistinct && Objects.equals(name, that.name) && | |
| Objects.equals(canonicalName, that.canonicalName) && Arrays.equals(children, that.children); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool primitive comparison works faster than Objects.equals, also it is usually placed at the beginning, because it is faster to check than string/objects.
| return Objects.equals(name, that.name) && Objects.equals(isDistinct, that.isDistinct) && | |
| return isDistinct == that.isDistinct && Objects.equals(name, that.name) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. bool primitive comparison faster than Objects.equals. But put bool primitive comparison at first is not good for readability.
|
A question for education: what's the benefit of a good hash in such cases? |
Personally, I think we shouldn't use references of any java object for hash code. |
|
ping @cloud-fan cc @huaxingao |
|
@pan3793, in this case, I built spark using bazel with a linter enabled, which errored on this hashcode implementation saying that There is a conceptual issue here: I do not remember all the details, but I remember there was a good explanation on designing |
@asiunov thanks for the explanation, I know the principle, but I don't find where they are used as hash keys. That's why I put the in such cases in my question. |
|
Oh, I see. Sorry, I do not know the details of how these classes work. I reported this issue mostly because I've got linter errors, and just as a good-to-fix for any future use. |
|
Hi @beliefer One question. If hashCode function is added to UserDefinedAggregateFunc as public that means this is user facing change right? |
Yes. But if it's incorrect, we should fix it. |
It says No above in the "Does this PR introduce any user-facing change? " section . Since we are adding two new methods as user facing so wanted to confirm |
|
It overrides two methods, so it's not an API change but rather a bug fix. |
|
thanks, merging to master/3.4! |
…xpression ### What changes were proposed in this pull request? The `hashCode() `of `UserDefinedScalarFunc` and `GeneralScalarExpression` is not good enough. Take for example, `GeneralScalarExpression` uses `Objects.hash(name, children)`, it adopt the hash code of `name` and `children`'s reference and then combine them together as the `GeneralScalarExpression`'s hash code. In fact, we should adopt the hash code for each element in `children`. Because `UserDefinedAggregateFunc` and `GeneralAggregateFunc` missing `hashCode()`, this PR also want add them. This PR also improve the toString for `UserDefinedAggregateFunc` and `GeneralAggregateFunc` by using bool primitive comparison instead `Objects.equals`. Because the performance of bool primitive comparison better than `Objects.equals`. ### Why are the changes needed? Improve the hash code for some DS V2 Expression. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? N/A Closes #41543 from beliefer/SPARK-44018. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 8c84d2c) Signed-off-by: Wenchen Fan <[email protected]>
…xpression ### What changes were proposed in this pull request? The `hashCode() `of `UserDefinedScalarFunc` and `GeneralScalarExpression` is not good enough. Take for example, `GeneralScalarExpression` uses `Objects.hash(name, children)`, it adopt the hash code of `name` and `children`'s reference and then combine them together as the `GeneralScalarExpression`'s hash code. In fact, we should adopt the hash code for each element in `children`. Because `UserDefinedAggregateFunc` and `GeneralAggregateFunc` missing `hashCode()`, this PR also want add them. This PR also improve the toString for `UserDefinedAggregateFunc` and `GeneralAggregateFunc` by using bool primitive comparison instead `Objects.equals`. Because the performance of bool primitive comparison better than `Objects.equals`. ### Why are the changes needed? Improve the hash code for some DS V2 Expression. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? N/A Closes apache#41543 from beliefer/SPARK-44018. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
|
@cloud-fan @asiunov @pan3793 @VindhyaG Thank you! |
…xpression ### What changes were proposed in this pull request? The `hashCode() `of `UserDefinedScalarFunc` and `GeneralScalarExpression` is not good enough. Take for example, `GeneralScalarExpression` uses `Objects.hash(name, children)`, it adopt the hash code of `name` and `children`'s reference and then combine them together as the `GeneralScalarExpression`'s hash code. In fact, we should adopt the hash code for each element in `children`. Because `UserDefinedAggregateFunc` and `GeneralAggregateFunc` missing `hashCode()`, this PR also want add them. This PR also improve the toString for `UserDefinedAggregateFunc` and `GeneralAggregateFunc` by using bool primitive comparison instead `Objects.equals`. Because the performance of bool primitive comparison better than `Objects.equals`. ### Why are the changes needed? Improve the hash code for some DS V2 Expression. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? N/A Closes apache#41543 from beliefer/SPARK-44018. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 8c84d2c) Signed-off-by: Wenchen Fan <[email protected]>
…xpression ### What changes were proposed in this pull request? The `hashCode() `of `UserDefinedScalarFunc` and `GeneralScalarExpression` is not good enough. Take for example, `GeneralScalarExpression` uses `Objects.hash(name, children)`, it adopt the hash code of `name` and `children`'s reference and then combine them together as the `GeneralScalarExpression`'s hash code. In fact, we should adopt the hash code for each element in `children`. Because `UserDefinedAggregateFunc` and `GeneralAggregateFunc` missing `hashCode()`, this PR also want add them. This PR also improve the toString for `UserDefinedAggregateFunc` and `GeneralAggregateFunc` by using bool primitive comparison instead `Objects.equals`. Because the performance of bool primitive comparison better than `Objects.equals`. ### Why are the changes needed? Improve the hash code for some DS V2 Expression. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? N/A Closes apache#41543 from beliefer/SPARK-44018. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 8c84d2c) Signed-off-by: Wenchen Fan <[email protected]>
…xpression ### What changes were proposed in this pull request? The `hashCode() `of `UserDefinedScalarFunc` and `GeneralScalarExpression` is not good enough. Take for example, `GeneralScalarExpression` uses `Objects.hash(name, children)`, it adopt the hash code of `name` and `children`'s reference and then combine them together as the `GeneralScalarExpression`'s hash code. In fact, we should adopt the hash code for each element in `children`. Because `UserDefinedAggregateFunc` and `GeneralAggregateFunc` missing `hashCode()`, this PR also want add them. This PR also improve the toString for `UserDefinedAggregateFunc` and `GeneralAggregateFunc` by using bool primitive comparison instead `Objects.equals`. Because the performance of bool primitive comparison better than `Objects.equals`. ### Why are the changes needed? Improve the hash code for some DS V2 Expression. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? N/A Closes apache#41543 from beliefer/SPARK-44018. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 8c84d2c) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
The
hashCode()ofUserDefinedScalarFuncandGeneralScalarExpressionis not good enough. Take for example,GeneralScalarExpressionusesObjects.hash(name, children), it adopt the hash code ofnameandchildren's reference and then combine them together as theGeneralScalarExpression's hash code.In fact, we should adopt the hash code for each element in
children.Because
UserDefinedAggregateFuncandGeneralAggregateFuncmissinghashCode(), this PR also want add them.This PR also improve the toString for
UserDefinedAggregateFuncandGeneralAggregateFuncby using bool primitive comparison insteadObjects.equals. Because the performance of bool primitive comparison better thanObjects.equals.Why are the changes needed?
Improve the hash code for some DS V2 Expression.
Does this PR introduce any user-facing change?
'Yes'.
How was this patch tested?
N/A