-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26965][SQL] Makes ElementAt nullability more precise for array cases #23867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #102617 has finished for PR 23867 at commit
|
|
retest this please |
|
Test build #102630 has finished for PR 23867 at commit
|
|
@dongjoon-hyun could you check? |
|
|
||
| override def toString: String = s"$child[$ordinal]" | ||
| override def sql: String = s"${child.sql}[${ordinal.sql}]" | ||
| trait GetArrayItemUtil extends BinaryExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this after case class GetArrayItem to reduce the diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, sure!
| * Common base class for [[GetMapValue]] and [[ElementAt]]. | ||
| * Common trait for [[GetMapValue]] and [[ElementAt]]. | ||
| */ | ||
| trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GetMapValueUtil need to extend extends BinaryExpression with ImplicitCastInputTypes? If child and key are required, computeNullabilityFromMap seems to able to accept the child and key as parameters. How do you think that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems BinaryExpression.nullSafeCodeGen is used in the other place?
Line 326 in 2d2fb34
| nullSafeCodeGen(ctx, ev, (eval1, eval2) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Then, what about GetArrayItemUtil? Actually, the reason why I asked is that this PR makes some non-trivial lineages like the followings.
def BinaryExpression.left->private val GetArrayItemUtil.child->override def GetArrayItem.left.def BinaryExpression.left->private val GetMapValueUtil.child->override def GetMapValue.left
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok. I'll simplify it.
|
Thank you for pinging me, @maropu . I left a few comments. I understand this follows the original code structure, but it seems to become more complicated than the required. |
|
Test build #102729 has finished for PR 23867 at commit
|
|
Test build #102779 has finished for PR 23867 at commit
|
| /** `Null` is returned for invalid ordinals. */ | ||
| protected def computeNullabilityFromMap: Boolean = if (key.foldable && !key.nullable) { | ||
| val keyObj = key.eval() | ||
| child match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating, @maropu . For this one, can we simplify like the following? We can remove these alias variables.
- private val child = left
- private val key = right
-
/** `Null` is returned for invalid ordinals. */
- protected def computeNullabilityFromMap: Boolean = if (key.foldable && !key.nullable) {
- val keyObj = key.eval()
- child match {
+ protected def computeNullabilityFromMap: Boolean = if (right.foldable && !right.nullable) {
+ val keyObj = right.eval()
+ left match {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to keep the clear meaning, you may declare them inside of this function. But, since we are not using parameters, I think we already have some assumptions that this is inside BinaryExpression. So, for me, I just want to avoid making aliases like my previous comment.
protected def computeNullabilityFromMap: Boolean = {
val child = left
val key = right
....
}| case class GetArrayItem(child: Expression, ordinal: Expression) | ||
| extends BinaryExpression with ExpectsInputTypes with ExtractValue with NullIntolerant { | ||
| extends BinaryExpression with GetArrayItemUtil with ExpectsInputTypes with ExtractValue | ||
| with NullIntolerant { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation?
| checkEvaluation(ElementAt(mb0, Literal(Array[Byte](3, 4))), null) | ||
| } | ||
|
|
||
| test("SPARK-26965 correctly handles ElementAt nullability for arrays") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a new feature, it seems that we don't need JIRA ID, SPARK-26965, in the test case name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped that in the latest commit though, any rule for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I've heard the rule from @cloud-fan in my old PRs.
I'm not sure if that is a written rule or not.
Hi, @cloud-fan . Do we have some URLs for the above SPARK JIRA ID usage rule in the test case name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I see. I'll be more careful next time, thanks!
| since = "2.4.0") | ||
| case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { | ||
| case class ElementAt(left: Expression, right: Expression) | ||
| extends GetMapValueUtil with GetArrayItemUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation?
|
Test build #102885 has finished for PR 23867 at commit
|
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, ditto. SPARK-26965.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh...
|
+1, LGTM (except one nit comment). Ping, @cloud-fan . |
| left match { | ||
| case m: CreateMap if m.resolved => | ||
| m.keys.zip(m.values).filter { case (k, _) => k.foldable && !k.nullable }.find { | ||
| case (k, _) if k.eval() == keyObj => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rely on ==? What if one expression returns unsafe row and the other returns safe row?
I know this is existing code, but after a hindsight maybe it's not worth to linear scan the map entries and just to get the precise nullability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I'll update soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped the nullablity computation for the map side and added the TODO comment there. How about this?
|
Test build #102897 has finished for PR 23867 at commit
|
|
Test build #102901 has finished for PR 23867 at commit
|
| override def nullable: Boolean = true | ||
| override def nullable: Boolean = left.dataType match { | ||
| case _: ArrayType => computeNullabilityFromArray(left, right) | ||
| case _: MapType => computeNullabilityFromMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Map is out of the scope, shall we update the title and use the following code path instead?
- Remove
extends GetMapValueUtil. - Remove
computeNullabilityFromMapand use the following.
case _: MapType => trueThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
| override def left: Expression = child | ||
| override def right: Expression = key | ||
|
|
||
| /** `Null` is returned for invalid ordinals. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to update this comment, to say why the nullability is always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, added.
|
Test build #102958 has finished for PR 23867 at commit
|
|
Test build #102959 has finished for PR 23867 at commit
|
|
Test build #102969 has finished for PR 23867 at commit
|
|
retest this please |
|
Test build #102973 has finished for PR 23867 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
In master,
ElementAtnullable is always true;spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
Line 1977 in be1cadf
But, If input is an array and foldable, we could make its nullability more precise.
This fix is based on SPARK-26637(#23566).
How was this patch tested?
Added tests in
CollectionExpressionsSuite.