Skip to content

Conversation

@dilipbiswal
Copy link
Contributor

What changes were proposed in this pull request?

In ArrayPosition, we currently cast the right hand side expression to match the element type of the left hand side Array. This may result in down casting and may return wrong result or questionable result.

Example :

spark-sql> select array_position(array(1), 1.34);
1
spark-sql> select array_position(array(1), 'foo');
null

We should safely coerce both left and right hand side expressions.

How was this patch tested?

Added tests in DataFrameFunctionsSuite

… right expression is implicitly down casted.
@dilipbiswal dilipbiswal changed the title [SPARK-25415] ArrayPosition function may return incorrect result when right expression is implicitly down casted [SPARK-25415][SQL] ArrayPosition function may return incorrect result when right expression is implicitly down casted Sep 12, 2018
@dilipbiswal dilipbiswal changed the title [SPARK-25415][SQL] ArrayPosition function may return incorrect result when right expression is implicitly down casted [SPARK-25416][SQL] ArrayPosition function may return incorrect result when right expression is implicitly down casted Sep 12, 2018
@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Test build #96008 has finished for PR 22407 at commit 629ec84.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 13, 2018

Test build #96014 has finished for PR 22407 at commit 629ec84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @ueshin

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
I'm now wondering how about other expressions.
Seems like ArrayContains, ArrayRemove, and ElementAt for map types could have the same problem?
Can you confirm and fix them? Or we can address them in separate prs. Thanks!

case TypeCheckResult.TypeCheckSuccess =>
(left.dataType, right.dataType) match {
case (ArrayType(e1, _), e2) if e1.sameType(e2) =>
TypeUtils.checkForOrderingExpr(right.dataType, s"function $prettyName")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can use e1 or e2 instead of right.dataType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin Sure

checkAnswer(
df.selectExpr("array_position(array(1.23D), 1)"),
Seq(Row(0L), Row(0L))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about array_position(array(1.0D), 1)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin Sure.. i will add this case.

@dilipbiswal
Copy link
Contributor Author

Seems like ArrayContains, ArrayRemove, and ElementAt for map types could have the same problem?
@ueshin I have a PR opened for ArrayContains link. I will go through ArrayRemove and ElementAt fix if necessary.

@ueshin
Copy link
Member

ueshin commented Sep 13, 2018

@dilipbiswal Thanks! I'll take a look at it.

@SparkQA
Copy link

SparkQA commented Sep 13, 2018

Test build #96024 has finished for PR 22407 at commit bb18108.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 13, 2018

Test build #96028 has finished for PR 22407 at commit bb18108.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 13, 2018

Test build #96027 has finished for PR 22407 at commit bb18108.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

@ueshin Wenchen thought it may be risky to backport the fix to tighestCommonType. Given this, can this be looked at now ?

@ueshin
Copy link
Member

ueshin commented Sep 21, 2018

So, before #22448 (branch-2.4 if we don't backport #22448), we don't coerce between decimals, and after that (master, not merged yet though), we will do, right?
I'm okay with the behavior.

I'd retrigger the build because it's been a long time since the last build.

@ueshin
Copy link
Member

ueshin commented Sep 21, 2018

Jenkins, retest this please.

@ueshin
Copy link
Member

ueshin commented Sep 21, 2018

LGTM, pending Jenkins.

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96421 has finished for PR 22407 at commit bb18108.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

)

checkAnswer(
df.selectExpr("array_position(array(1), 1.23D)"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same problem here. The test doesn't read any column from df, so we should use OneRowRelation.

@cloud-fan
Copy link
Contributor

LGTM except one comment

@SparkQA
Copy link

SparkQA commented Sep 22, 2018

Test build #96475 has finished for PR 22407 at commit 55d4b95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in bb49661 Sep 24, 2018
asfgit pushed a commit that referenced this pull request Sep 24, 2018
… when right expression is implicitly down casted

## What changes were proposed in this pull request?
In ArrayPosition, we currently cast the right hand side expression to match the element type of the left hand side Array. This may result in down casting and may return wrong result or questionable result.

Example :
```SQL
spark-sql> select array_position(array(1), 1.34);
1
```
```SQL
spark-sql> select array_position(array(1), 'foo');
null
```

We should safely coerce both left and right hand side expressions.
## How was this patch tested?
Added tests in DataFrameFunctionsSuite

Closes #22407 from dilipbiswal/SPARK-25416.

Authored-by: Dilip Biswal <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit bb49661)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.4!

@dilipbiswal
Copy link
Contributor Author

@cloud-fan @ueshin Thank you very much !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants