[SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField#24599
[SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField#24599viirya wants to merge 9 commits intoapache:masterfrom
Conversation
|
Test build #105381 has finished for PR 24599 at commit
|
| import org.apache.spark.sql.types.{StringType, StructType} | ||
| import org.apache.spark.sql.types.{StringType, StructField, StructType} | ||
|
|
||
| class NestedColumnAliasingSuite extends SchemaPruningTest { |
There was a problem hiding this comment.
There still are many usage of GetStructField in this test suite. Maybe make a minor PR to rewrite them.
|
Retest this please. |
|
Test build #106314 has finished for PR 24599 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
@viirya . Since this is important improvement, could you add a benchmark case to NestedSchemaPruningBenchmark? Also, please enumerate some newly support examples explicitly instead of more nested field cases in the PR description (at least).
|
@dongjoon-hyun Thanks for looking into this. I will add the benchmark case. The PR title and description were updated. |
|
Test build #106347 has finished for PR 24599 at commit
|
...e/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
Show resolved
Hide resolved
|
Test build #106353 has finished for PR 24599 at commit
|
|
Test build #106366 has finished for PR 24599 at commit
|
...e/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
|
Test build #106383 has finished for PR 24599 at commit
|
|
Test build #106384 has finished for PR 24599 at commit
|
There was a problem hiding this comment.
On the This PR (@viirya ) is irrelevant to that.master branch, it seems that there is a regression only in Orc (v1). I verified that Parquet/OrcV2 are consistent in master branch.
cc @gatorsmile
|
Hi, @viirya . I made a benchmark result PR to you. Could you review and merge that? |
| val path = dir.getCanonicalPath | ||
|
|
||
| Seq(1, 2).foreach { i => | ||
| Seq(1, 2, 3).foreach { i => |
EC2 result
|
Thanks @dongjoon-hyun! Merged the benchmark results now. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you so much, @viirya .
The last commit is .txt-file only updates about benchmark result.
Merged to master.
|
Test build #106398 has finished for PR 24599 at commit
|
…d cases including GetArrayStructField ## What changes were proposed in this pull request? `NestedColumnAliasing` rule covers `GetStructField` only, currently. It means that some nested field extraction expressions aren't pruned. For example, if only accessing a nested field in an array of struct (`GetArrayStructFields`), this column isn't pruned. This patch extends the rule to cover general nested field cases, including `GetArrayStructFields`. ## How was this patch tested? Added tests. Closes apache#24599 from viirya/nested-pruning-extract-value. Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
NestedColumnAliasingrule coversGetStructFieldonly, currently. It means that some nested field extraction expressions aren't pruned. For example, if only accessing a nested field in an array of struct (GetArrayStructFields), this column isn't pruned.This patch extends the rule to cover general nested field cases, including
GetArrayStructFields.How was this patch tested?
Added tests.