Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Dec 20, 2018

What changes were proposed in this pull request?

As Optimizer.scala becomes bigger and bigger, it's hard to add new rules and maintain them. We are refactoring out ColumnPruning from Optimizer.scala to ColumnPruning.scala so it's easier to add new logics in ColumnPruning.

How was this patch tested?

Existing tests.

@dbtsai
Copy link
Member Author

dbtsai commented Dec 20, 2018

*/
private def removeProjectBeforeFilter(plan: LogicalPlan): LogicalPlan = plan transformUp {
case p1 @ Project(_, f @ Filter(_, p2 @ Project(_, child)))
if p2.outputSet.subsetOf(child.outputSet) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that this is a clean refactoring and this line is the additional style fix.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (pening Jenkins)

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not suggest to move the code with such a complex change history.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 21, 2018

@gatorsmile . Spark 3.0 is a good chance to make this more flexible for the future.

@gatorsmile
Copy link
Member

It is hard for us to do the review in the future. Normally, we check the change history when doing the review. The change history is very important for us to do the review.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 21, 2018

? It's only about 100 line. Keeping each rule in a single file have more benefits. Historically, at 2.0.0, we did a lot of refactoring already. And, this is for 3.0. Technically, all history is in branch-2.4 and we can see it easily in GitHub.

@gatorsmile
Copy link
Member

Discussed it with @dongjoon-hyun offline. There are various bug fixes in the code history. The change history is very valuable for us [especially for new contributors] to understand the code. Previously, we did the split, but it makes us much harder to find out the original PRs who introduced the changes.

For example, the following screenshot can clearly show all the PRs that changed this column pruning rule. The change history is very useful for reviewers and new learners.

screen shot 2018-12-20 at 6 10 41 pm

@gatorsmile
Copy link
Member

Maybe not all of you know this trick. You can get the history by selection in IntelliJ. See the screen shot.

screen shot 2018-12-20 at 6 15 09 pm

@SparkQA
Copy link

SparkQA commented Dec 21, 2018

Test build #100351 has finished for PR 23359 at commit bf8f1b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Yep. I agree with @gatorsmile now.

@dbtsai
Copy link
Member Author

dbtsai commented Dec 21, 2018

All the above is great conversation. I'm closing this PR. Thanks.

@dbtsai dbtsai closed this Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants