Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Feb 22, 2019

What changes were proposed in this pull request?

In master, ElementAt nullable is always true;

But, If input is an array and foldable, we could make its nullability more precise.
This fix is based on SPARK-26637(#23566).

How was this patch tested?

Added tests in CollectionExpressionsSuite.

@SparkQA
Copy link

SparkQA commented Feb 22, 2019

Test build #102617 has finished for PR 23867 at commit a3709b9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ElementAt(left: Expression, right: Expression)
  • trait GetArrayItemUtil extends BinaryExpression
  • case class GetArrayItem(child: Expression, ordinal: Expression)
  • trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 22, 2019

Test build #102630 has finished for PR 23867 at commit a3709b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ElementAt(left: Expression, right: Expression)
  • trait GetArrayItemUtil extends BinaryExpression
  • case class GetArrayItem(child: Expression, ordinal: Expression)
  • trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes

@maropu
Copy link
Member Author

maropu commented Feb 22, 2019

@dongjoon-hyun could you check?


override def toString: String = s"$child[$ordinal]"
override def sql: String = s"${child.sql}[${ordinal.sql}]"
trait GetArrayItemUtil extends BinaryExpression {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this after case class GetArrayItem to reduce the diff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, sure!

* Common base class for [[GetMapValue]] and [[ElementAt]].
* Common trait for [[GetMapValue]] and [[ElementAt]].
*/
trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GetMapValueUtil need to extend extends BinaryExpression with ImplicitCastInputTypes? If child and key are required, computeNullabilityFromMap seems to able to accept the child and key as parameters. How do you think that way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems BinaryExpression.nullSafeCodeGen is used in the other place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Then, what about GetArrayItemUtil? Actually, the reason why I asked is that this PR makes some non-trivial lineages like the followings.

  • def BinaryExpression.left -> private val GetArrayItemUtil.child -> override def GetArrayItem.left.
  • def BinaryExpression.left -> private val GetMapValueUtil.child -> override def GetMapValue.left

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok. I'll simplify it.

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @maropu . I left a few comments. I understand this follows the original code structure, but it seems to become more complicated than the required.

@SparkQA
Copy link

SparkQA commented Feb 25, 2019

Test build #102729 has finished for PR 23867 at commit c2d522a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait GetArrayItemUtil extends BinaryExpression

@SparkQA
Copy link

SparkQA commented Feb 26, 2019

Test build #102779 has finished for PR 23867 at commit 7590955.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait GetArrayItemUtil

/** `Null` is returned for invalid ordinals. */
protected def computeNullabilityFromMap: Boolean = if (key.foldable && !key.nullable) {
val keyObj = key.eval()
child match {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating, @maropu . For this one, can we simplify like the following? We can remove these alias variables.

-  private val child = left
-  private val key = right
-
   /** `Null` is returned for invalid ordinals. */
-  protected def computeNullabilityFromMap: Boolean = if (key.foldable && !key.nullable) {
-    val keyObj = key.eval()
-    child match {
+  protected def computeNullabilityFromMap: Boolean = if (right.foldable && !right.nullable) {
+    val keyObj = right.eval()
+    left match {

Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to keep the clear meaning, you may declare them inside of this function. But, since we are not using parameters, I think we already have some assumptions that this is inside BinaryExpression. So, for me, I just want to avoid making aliases like my previous comment.

protected def computeNullabilityFromMap: Boolean = {
    val child = left
    val key = right
    ....
}

case class GetArrayItem(child: Expression, ordinal: Expression)
extends BinaryExpression with ExpectsInputTypes with ExtractValue with NullIntolerant {
extends BinaryExpression with GetArrayItemUtil with ExpectsInputTypes with ExtractValue
with NullIntolerant {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation?

checkEvaluation(ElementAt(mb0, Literal(Array[Byte](3, 4))), null)
}

test("SPARK-26965 correctly handles ElementAt nullability for arrays") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new feature, it seems that we don't need JIRA ID, SPARK-26965, in the test case name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped that in the latest commit though, any rule for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've heard the rule from @cloud-fan in my old PRs.

I'm not sure if that is a written rule or not.

Hi, @cloud-fan . Do we have some URLs for the above SPARK JIRA ID usage rule in the test case name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see. I'll be more careful next time, thanks!

since = "2.4.0")
case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil {
case class ElementAt(left: Expression, right: Expression)
extends GetMapValueUtil with GetArrayItemUtil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation?

@SparkQA
Copy link

SparkQA commented Mar 1, 2019

Test build #102885 has finished for PR 23867 at commit 2cb0bf9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Mar 1, 2019

retest this please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, ditto. SPARK-26965.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh...

@dongjoon-hyun
Copy link
Member

+1, LGTM (except one nit comment).

Ping, @cloud-fan .

left match {
case m: CreateMap if m.resolved =>
m.keys.zip(m.values).filter { case (k, _) => k.foldable && !k.nullable }.find {
case (k, _) if k.eval() == keyObj => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rely on ==? What if one expression returns unsafe row and the other returns safe row?

I know this is existing code, but after a hindsight maybe it's not worth to linear scan the map entries and just to get the precise nullability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I'll update soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped the nullablity computation for the map side and added the TODO comment there. How about this?

@SparkQA
Copy link

SparkQA commented Mar 1, 2019

Test build #102897 has finished for PR 23867 at commit 2cb0bf9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2019

Test build #102901 has finished for PR 23867 at commit 499a57b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

override def nullable: Boolean = true
override def nullable: Boolean = left.dataType match {
case _: ArrayType => computeNullabilityFromArray(left, right)
case _: MapType => computeNullabilityFromMap
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Map is out of the scope, shall we update the title and use the following code path instead?

  • Remove extends GetMapValueUtil.
  • Remove computeNullabilityFromMap and use the following.
case _: MapType => true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@maropu maropu changed the title [SPARK-26965][SQL] Makes ElementAt nullability more precise [SPARK-26965][SQL] Makes ElementAt nullability more precise for array cases Mar 4, 2019
override def left: Expression = child
override def right: Expression = key

/** `Null` is returned for invalid ordinals. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to update this comment, to say why the nullability is always true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added.

@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #102958 has finished for PR 23867 at commit 7faf51a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #102959 has finished for PR 23867 at commit b95a389.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #102969 has finished for PR 23867 at commit 95f2862.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #102973 has finished for PR 23867 at commit 95f2862.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 68fbbbe Mar 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants