-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25416][SQL] ArrayPosition function may return incorrect result when right expression is implicitly down casted #22407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… right expression is implicitly down casted.
|
Test build #96008 has finished for PR 22407 at commit
|
|
retest this please |
|
Test build #96014 has finished for PR 22407 at commit
|
|
cc @ueshin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
I'm now wondering how about other expressions.
Seems like ArrayContains, ArrayRemove, and ElementAt for map types could have the same problem?
Can you confirm and fix them? Or we can address them in separate prs. Thanks!
| case TypeCheckResult.TypeCheckSuccess => | ||
| (left.dataType, right.dataType) match { | ||
| case (ArrayType(e1, _), e2) if e1.sameType(e2) => | ||
| TypeUtils.checkForOrderingExpr(right.dataType, s"function $prettyName") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can use e1 or e2 instead of right.dataType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin Sure
| checkAnswer( | ||
| df.selectExpr("array_position(array(1.23D), 1)"), | ||
| Seq(Row(0L), Row(0L)) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about array_position(array(1.0D), 1)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin Sure.. i will add this case.
|
@dilipbiswal Thanks! I'll take a look at it. |
|
Test build #96024 has finished for PR 22407 at commit
|
|
retest this please. |
|
Test build #96028 has finished for PR 22407 at commit
|
|
Test build #96027 has finished for PR 22407 at commit
|
|
@ueshin Wenchen thought it may be risky to backport the fix to tighestCommonType. Given this, can this be looked at now ? |
|
Jenkins, retest this please. |
|
LGTM, pending Jenkins. |
|
Test build #96421 has finished for PR 22407 at commit
|
| ) | ||
|
|
||
| checkAnswer( | ||
| df.selectExpr("array_position(array(1), 1.23D)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same problem here. The test doesn't read any column from df, so we should use OneRowRelation.
|
LGTM except one comment |
|
Test build #96475 has finished for PR 22407 at commit
|
… when right expression is implicitly down casted ## What changes were proposed in this pull request? In ArrayPosition, we currently cast the right hand side expression to match the element type of the left hand side Array. This may result in down casting and may return wrong result or questionable result. Example : ```SQL spark-sql> select array_position(array(1), 1.34); 1 ``` ```SQL spark-sql> select array_position(array(1), 'foo'); null ``` We should safely coerce both left and right hand side expressions. ## How was this patch tested? Added tests in DataFrameFunctionsSuite Closes #22407 from dilipbiswal/SPARK-25416. Authored-by: Dilip Biswal <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit bb49661) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master/2.4! |
|
@cloud-fan @ueshin Thank you very much !! |
What changes were proposed in this pull request?
In ArrayPosition, we currently cast the right hand side expression to match the element type of the left hand side Array. This may result in down casting and may return wrong result or questionable result.
Example :
We should safely coerce both left and right hand side expressions.
How was this patch tested?
Added tests in DataFrameFunctionsSuite