-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19104][SQL] Lambda variables in ExternalMapToCatalyst should be global #18418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #78599 has finished for PR 18418 at commit
|
|
retest this please. |
|
Test build #78601 has finished for PR 18418 at commit
|
|
Test build #78609 has finished for PR 18418 at commit
|
|
retest this please. |
|
Test build #78613 has finished for PR 18418 at commit
|
|
retest this please. |
|
Test build #78625 has finished for PR 18418 at commit
|
|
retest this please. |
|
Test build #78635 has finished for PR 18418 at commit
|
| return expressions.mkString("\n") | ||
| } | ||
| splitExpressions(expressions, "apply", ("InternalRow", row) :: Nil) | ||
| splitExpressions(row, expressions, Seq.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the check in lines 735-738? Now, we do the same check at lines 754-757, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks.
|
cc @cloud-fan |
| // of the functions. | ||
| val (valExprCodes, valExprParams) = valExprs.map { expr => | ||
| val exprCode = expr.genCode(ctx) | ||
| val lambdaVars = expr.collect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks hacky, and why you only do it for CreateNamedStruct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, CreateArray is another expression using splitExpressions. However, if the external type is an array, MapObjects will be used to construct the internal array. CreateMap is the same, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there are a lot of places we use splitExpressions, shall we check all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very sure about other places. I move the collecting of lambda variable to splitExpressions and let the possible places to do this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach doesn't work well after rethinking about this and more experiments.
A simpler approach is letting those lambda variables global as MapObjectsandCollectObjectsToMap` do.
|
Test build #78666 has finished for PR 18418 at commit
|
|
Test build #78678 has finished for PR 18418 at commit
|
|
can you put more detailed information about what is the bug? I don't understand why it only happens for map inside case class. |
eb061da to
65d8873
Compare
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think this is the right fix according to how we deal with this kind of problem before. LGTM
|
BTW please update PR tile to mention that it's a bug fix for |
|
@cloud-fan Thanks. The PR title and description are both updated to reflect the change. |
|
Test build #78690 has finished for PR 18418 at commit
|
|
Test build #78688 has finished for PR 18418 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should also update the test name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miss it. updated.
|
Test build #78698 has finished for PR 18418 at commit
|
|
retest this please. |
|
Test build #78701 has finished for PR 18418 at commit
|
|
the test failure is unrelated, merging to master/2.2! |
…e global
The issue happens in `ExternalMapToCatalyst`. For example, the following codes create `ExternalMapToCatalyst` to convert Scala Map to catalyst map format.
val data = Seq.tabulate(10)(i => NestedData(1, Map("key" -> InnerData("name", i + 100))))
val ds = spark.createDataset(data)
The `valueConverter` in `ExternalMapToCatalyst` looks like:
if (isnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true))) null else named_struct(name, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).name, true), value, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).value)
There is a `CreateNamedStruct` expression (`named_struct`) to create a row of `InnerData.name` and `InnerData.value` that are referred by `ExternalMapToCatalyst_value52`.
Because `ExternalMapToCatalyst_value52` are local variable, when `CreateNamedStruct` splits expressions to individual functions, the local variable can't be accessed anymore.
Jenkins tests.
Author: Liang-Chi Hsieh <[email protected]>
Closes #18418 from viirya/SPARK-19104.
(cherry picked from commit fd8c931)
Signed-off-by: Wenchen Fan <[email protected]>
…e global
## What changes were proposed in this pull request?
The issue happens in `ExternalMapToCatalyst`. For example, the following codes create `ExternalMapToCatalyst` to convert Scala Map to catalyst map format.
val data = Seq.tabulate(10)(i => NestedData(1, Map("key" -> InnerData("name", i + 100))))
val ds = spark.createDataset(data)
The `valueConverter` in `ExternalMapToCatalyst` looks like:
if (isnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true))) null else named_struct(name, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).name, true), value, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).value)
There is a `CreateNamedStruct` expression (`named_struct`) to create a row of `InnerData.name` and `InnerData.value` that are referred by `ExternalMapToCatalyst_value52`.
Because `ExternalMapToCatalyst_value52` are local variable, when `CreateNamedStruct` splits expressions to individual functions, the local variable can't be accessed anymore.
## How was this patch tested?
Jenkins tests.
Author: Liang-Chi Hsieh <[email protected]>
Closes apache#18418 from viirya/SPARK-19104.
…alyst should be global ## What changes were proposed in this pull request? This PR is backport of #18418 to Spark 2.1. [SPARK-21391](https://issues.apache.org/jira/browse/SPARK-21391) reported this problem in Spark 2.1. The issue happens in `ExternalMapToCatalyst`. For example, the following codes create ExternalMap`ExternalMapToCatalyst`ToCatalyst to convert Scala Map to catalyst map format. ``` val data = Seq.tabulate(10)(i => NestedData(1, Map("key" -> InnerData("name", i + 100)))) val ds = spark.createDataset(data) ``` The `valueConverter` in `ExternalMapToCatalyst` looks like: ``` if (isnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true))) null else named_struct(name, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).name, true), value, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).value) ``` There is a `CreateNamedStruct` expression (`named_struct`) to create a row of `InnerData.name` and `InnerData.value` that are referred by `ExternalMapToCatalyst_value52`. Because `ExternalMapToCatalyst_value52` are local variable, when `CreateNamedStruct` splits expressions to individual functions, the local variable can't be accessed anymore. ## How was this patch tested? Added a new test suite into `DatasetPrimitiveSuite` Author: Kazuaki Ishizaki <[email protected]> Closes #18627 from kiszk/SPARK-21391.
What changes were proposed in this pull request?
The issue happens in
ExternalMapToCatalyst. For example, the following codes createExternalMapToCatalystto convert Scala Map to catalyst map format.The
valueConverterinExternalMapToCatalystlooks like:There is a
CreateNamedStructexpression (named_struct) to create a row ofInnerData.nameandInnerData.valuethat are referred byExternalMapToCatalyst_value52.Because
ExternalMapToCatalyst_value52are local variable, whenCreateNamedStructsplits expressions to individual functions, the local variable can't be accessed anymore.How was this patch tested?
Jenkins tests.