-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23934][SQL] Adding map_from_entries function #21282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23934][SQL] Adding map_from_entries function #21282
Conversation
|
ok to test |
|
add to whitelist |
| if (key == null) { | ||
| throw new RuntimeException("The first field from a struct (key) can't be null.") | ||
| } | ||
| if (keySet.contains(key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check necessary for now? This is because other operations (e.g. CreateMap) allows us to create a map with duplicated key. Is it better to be consistent in Spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we've already touched this topic in your PR for SPARK-23933. I think if some hashing is added into maps in future, these duplicity checks will have to be introduced anyway. So if we add it now, we can avoid breaking changes in future. But I understand your point of view.
Presto also doesn't support duplicates:
presto:default> SELECT map_from_entries(ARRAY[(1, 'x'), (1, 'y')]);
Query 20180510_090536_00005_468a9 failed: Duplicate keys (1) are not allowed
WDYT @ueshin @gatorsmile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the super delay.
Let's just ignore the duplicated key like CreateMap for now. We will need to discuss map-related topics, such as duplicate keys, equality or ordering, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, no problem. I've removed duplicity checks.
|
Test build #90434 has finished for PR 21282 at commit
|
| since = "2.4.0") | ||
| case class MapFromEntries(child: Expression) extends UnaryExpression | ||
| { | ||
| private lazy val resolvedDataType: Option[MapType] = child.dataType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@transient?
| private lazy val resolvedDataType: Option[MapType] = child.dataType match { | ||
| case ArrayType( | ||
| StructType(Array( | ||
| StructField(_, keyType, false, _), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need key field to be nullable = false because we check the nullability when creating an array?
| StructType(Array( | ||
| StructField(_, keyType, false, _), | ||
| StructField(_, valueType, valueNullable, _))), | ||
| false) => Some(MapType(keyType, valueType, valueNullable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reject an array with containsNull = true here? The array might not contain nulls.
| """, | ||
| since = "2.4.0") | ||
| case class MapFromEntries(child: Expression) extends UnaryExpression | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: style
|
Test build #90459 has finished for PR 21282 at commit
|
|
Test build #90721 has finished for PR 21282 at commit
|
|
Test build #90741 has finished for PR 21282 at commit
|
|
retest this please |
|
Test build #90748 has finished for PR 21282 at commit
|
|
retest this please |
|
Test build #90773 has finished for PR 21282 at commit
|
|
retest this please |
|
Test build #90782 has finished for PR 21282 at commit
|
…p_from_entries-to-master
|
Test build #91229 has finished for PR 21282 at commit
|
|
Test build #91421 has finished for PR 21282 at commit
|
| var i = 0 | ||
| var j = 0 | ||
| while (i < length) { | ||
| if (!arrayData.isNullAt(i)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should throw an exception if arrayData.isNullAt(i)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ueshin,
wouldn't it be better return null in this case? And follow null handling of other functions like flatten?
flatten(array(array(1,2), null, array(3,4))) => null
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds reasonable. Thanks.
|
ok to test |
|
Test build #91774 has finished for PR 21282 at commit
|
…p_from_entries-to-master
…p_from_entries-to-master
|
Test build #92173 has finished for PR 21282 at commit
|
|
Test build #92175 has finished for PR 21282 at commit
|
|
LGTM. |
|
Thanks! merging to master. |
What changes were proposed in this pull request?
The PR adds the
map_from_entriesfunction that returns a map created from the given array of entries.How was this patch tested?
New tests added into:
CollectionExpressionSuiteDataFrameFunctionSuiteCodeGen Examples
Primitive-type Keys and Values
Result:
Non-primitive-type Keys and Values
Result: