[SPARK-34141][SQL] Remove side effect from ExtractGenerator#31213
[SPARK-34141][SQL] Remove side effect from ExtractGenerator#31213tanelk wants to merge 4 commits intoapache:masterfrom
Conversation
|
@HyukjinKwon , any chance that this bugfix could get to 3.1.1? |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134154 has finished for PR 31213 at commit
|
|
Kubernetes integration test starting |
|
@LuciferYang, you have been working with 2.13. |
|
Kubernetes integration test status failure |
|
Test build #134158 has finished for PR 31213 at commit
|
|
@tanelk In Scala 2.13, Why do we need a non strict collection here? Can it be a strict collection? |
|
Well that is the test case I'm trying to cover. |
|
cc @srowen |
New UT in this pr will break the compilation of Scala 2.13, @tanelk want to get a lazy |
|
Hm why does Seq vs SeqView matter here? what's the compile error? |
|
@srowen Would it be okay do not add the UT or is there some lazy |
|
Hm, tough call. Can you invoke the second constructor by adding |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
I went ahead and added version dependent tests. Thanks for the advice. |
| import org.apache.spark.sql.catalyst.plans.logical._ | ||
| import org.apache.spark.sql.types._ | ||
|
|
||
| class ExtractGeneratorSuite extends AnalysisTest { |
There was a problem hiding this comment.
I'd add comments in each test file noting that there is a parallel one in the other source tree, so people realize that both need to change.
|
Test build #134511 has finished for PR 31213 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134689 has finished for PR 31213 at commit
|
| val explode = Alias(Explode(b), "c")() | ||
|
|
||
| // view is a lazy seq | ||
| val rel = LocalRelation(output = columns.view) |
There was a problem hiding this comment.
Is it possible for end users to get stuck on this issue? If possible, could you add end-2-end tests, too?
There was a problem hiding this comment.
I hit this issue while using spark in java. The step before was an join, where I used JavaConverters.collectionAsScalaIterable(Arrays.asList(columns)).toSeq() as the join condition. The JavaConverters helper returns an lazy collection.
I can take a look at on how to e2e test this.
|
Merged to master. I'm satisfied with the motivation and test. |
|
Oh, just saw it. LGTM |

What changes were proposed in this pull request?
Rewrote one
ExtractGeneratorcase such that it would not rely on a side effect of the flatmap function.Why are the changes needed?
With the dataframe api it is possible to have a lazy sequence as the
outputof aLogicalPlan. When exploding a column on this dataframe using thewithColumn("newName", explode(col("name")))method, theExtractGeneratordoes not extract the generator andCheckAnalysiswould throw an exception.Does this PR introduce any user-facing change?
Bugfix
Before this, the work around was to put
.select("*")before the explode.How was this patch tested?
UT