[SPARK-26416] [SQL] Refactor `ColumnPruning` from `Optimizer.scala` to `ColumnPruning.scala` #23359

dbtsai · 2018-12-20T23:44:10Z

What changes were proposed in this pull request?

As Optimizer.scala becomes bigger and bigger, it's hard to add new rules and maintain them. We are refactoring out ColumnPruning from Optimizer.scala to ColumnPruning.scala so it's easier to add new logics in ColumnPruning.

How was this patch tested?

Existing tests.

dbtsai · 2018-12-20T23:49:08Z

cc @gatorsmile @cloud-fan @dongjoon-hyun

dongjoon-hyun · 2018-12-21T00:14:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruning.scala

+   */
+  private def removeProjectBeforeFilter(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case p1 @ Project(_, f @ Filter(_, p2 @ Project(_, child)))
+        if p2.outputSet.subsetOf(child.outputSet) =>


I can see that this is a clean refactoring and this line is the additional style fix.

dongjoon-hyun

+1, LGTM (pening Jenkins)

gatorsmile

I would not suggest to move the code with such a complex change history.

dongjoon-hyun · 2018-12-21T00:18:28Z

@gatorsmile . Spark 3.0 is a good chance to make this more flexible for the future.

gatorsmile · 2018-12-21T00:19:15Z

It is hard for us to do the review in the future. Normally, we check the change history when doing the review. The change history is very important for us to do the review.

dongjoon-hyun · 2018-12-21T00:23:10Z

? It's only about 100 line. Keeping each rule in a single file have more benefits. Historically, at 2.0.0, we did a lot of refactoring already. And, this is for 3.0. Technically, all history is in branch-2.4 and we can see it easily in GitHub.

https://github.com/apache/spark/blame/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L532-L643

gatorsmile · 2018-12-21T02:12:17Z

Discussed it with @dongjoon-hyun offline. There are various bug fixes in the code history. The change history is very valuable for us [especially for new contributors] to understand the code. Previously, we did the split, but it makes us much harder to find out the original PRs who introduced the changes.

For example, the following screenshot can clearly show all the PRs that changed this column pruning rule. The change history is very useful for reviewers and new learners.

gatorsmile · 2018-12-21T02:15:44Z

Maybe not all of you know this trick. You can get the history by selection in IntelliJ. See the screen shot.

SparkQA · 2018-12-21T03:28:02Z

Test build #100351 has finished for PR 23359 at commit bf8f1b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-12-21T04:47:15Z

Yep. I agree with @gatorsmile now.

dbtsai · 2018-12-21T19:00:27Z

All the above is great conversation. I'm closing this PR. Thanks.

dbtsai added 2 commits December 19, 2018 15:25

Refactoring

3635b3a

styling

bf8f1b9

dbtsai mentioned this pull request Dec 20, 2018

[SPARK-26390][SQL] ColumnPruning rule should only do column pruning #23343

Closed

dongjoon-hyun reviewed Dec 21, 2018

View reviewed changes

dongjoon-hyun approved these changes Dec 21, 2018

View reviewed changes

gatorsmile requested changes Dec 21, 2018

View reviewed changes

dbtsai closed this Dec 21, 2018

[SPARK-26416] [SQL] Refactor ColumnPruning from Optimizer.scala to ColumnPruning.scala #23359

[SPARK-26416] [SQL] Refactor ColumnPruning from Optimizer.scala to ColumnPruning.scala #23359

Uh oh!

Conversation

dbtsai commented Dec 20, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dbtsai commented Dec 20, 2018

Uh oh!

dongjoon-hyun Dec 21, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

gatorsmile left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Dec 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Dec 21, 2018

Uh oh!

dongjoon-hyun commented Dec 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Dec 21, 2018

Uh oh!

gatorsmile commented Dec 21, 2018

Uh oh!

SparkQA commented Dec 21, 2018

Uh oh!

dongjoon-hyun commented Dec 21, 2018

Uh oh!

dbtsai commented Dec 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-26416] [SQL] Refactor `ColumnPruning` from `Optimizer.scala` to `ColumnPruning.scala` #23359

[SPARK-26416] [SQL] Refactor `ColumnPruning` from `Optimizer.scala` to `ColumnPruning.scala` #23359

gatorsmile left a comment •

edited

Loading

dongjoon-hyun commented Dec 21, 2018 •

edited

Loading

dongjoon-hyun commented Dec 21, 2018 •

edited

Loading