[SPARK-17075][SQL][followup] fix some minor issues and clean up the code #17065

cloud-fan · 2017-02-25T06:47:05Z

What changes were proposed in this pull request?

This is a follow-up of #16395. It fixes some code style issues, naming issues, some missing cases in pattern match, etc.

How was this patch tested?

existing tests.

SparkQA · 2017-02-25T06:52:30Z

Test build #73465 has started for PR 17065 at commit 04cc681.

cloud-fan · 2017-02-25T06:52:46Z

...ain/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala

here we are actually over-estimating: if a condition is unsupported in And, we assume it's 100% selectivity, which may leads to under-estimation if this And is wrapped by Not.

We should

if one condition is unsupported, this And is unsupported

do not handle nested Not

cc @wzhfy @ron8hu

This shows that it is difficult to always over-estimate. How about we do not handle the nested NOT.

@cloud-fan @ron8hu I'm a little confused about this, for Not expression, it always becomes under-estimation if we do over-estimation, no matter it's nested or not. So should we remove support for nested Not or Not?

cloud-fan · 2017-02-25T06:53:07Z

...ain/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala

also support EqualNullSafe

cloud-fan · 2017-02-25T06:55:42Z

...ain/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala

previously we totally missed BooleanType and will throw MatchError if the attribute is bool. But the logic in evaluateBinaryForNumeric doesn't work for boolean, so I treat it as unsupported for now. @wzhfy do you have time to work on it?

cloud-fan · 2017-02-25T06:56:36Z

...ain/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala

@ron8hu the external value type of DecimalType is java decimal, and the internal value type is Decimal, we need to convert it.

Agreed. Thanks for fixing it.

cloud-fan · 2017-02-25T06:57:04Z

...ain/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala

added boolean type

cloud-fan · 2017-02-25T07:01:27Z

CC @ron8hu @wzhfy

SparkQA · 2017-02-25T07:02:30Z

Test build #73466 has started for PR 17065 at commit 5fa69b3.

gatorsmile · 2017-02-25T19:30:41Z

retest this please

SparkQA · 2017-02-25T21:33:57Z

Test build #73471 has finished for PR 17065 at commit 5fa69b3.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AttributeMap[A](val baseMap: Map[ExprId, (Attribute, A)])
class ColumnStatsMap
trait Range
case class NumericRange(min: JDecimal, max: JDecimal) extends Range
class DefaultRange extends Range
class NullRange extends Range

ron8hu

LGTM. Yesterday I started making changes based on your feedback. My effort was delayed by a build error after I fetched the latest code from master repository yesterday. Today I reviewed your follow-up PR. Your changes are a super set of mine. Pretty good and clean code! I will stop my code changes on this jira. Thanks for your follow-up effort.

ron8hu · 2017-02-25T22:48:57Z

...ain/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala

+      case _: NumericType | BooleanType | DateType | TimestampType =>
+        val statsRange = Range(colStat.min, colStat.max, dataType).asInstanceOf[NumericRange]
+        val validQuerySet = hSet.filter { v =>
+          v != null && statsRange.contains(Literal(v, dataType))


nit: for better readability: (v != null) && statsRange.contains(Literal(v, dataType))

cloud-fan · 2017-02-26T07:02:03Z

thanks for the review, merging to master!

cloud-fan · 2017-02-26T07:03:11Z

@ron8hu can you fix #17065 (comment) and #17065 (comment) if you have time? thanks!

## What changes were proposed in this pull request? This is a follow-up of apache#16395. It fixes some code style issues, naming issues, some missing cases in pattern match, etc. ## How was this patch tested? existing tests. Author: Wenchen Fan <[email protected]> Closes apache#17065 from cloud-fan/follow-up.

wzhfy · 2017-03-03T08:08:33Z

@cloud-fan @ron8hu Let me submit a follow-up pr to fix the remaining issues.

cloud-fan commented Feb 25, 2017

View reviewed changes

fix some minor issues and clean up the code

5fa69b3

cloud-fan force-pushed the follow-up branch from 04cc681 to 5fa69b3 Compare February 25, 2017 07:00

ron8hu reviewed Feb 25, 2017

View reviewed changes

asfgit closed this in 89608cf Feb 26, 2017

[SPARK-17075][SQL][followup] fix some minor issues and clean up the code #17065

[SPARK-17075][SQL][followup] fix some minor issues and clean up the code #17065

Uh oh!

Conversation

cloud-fan commented Feb 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Feb 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Feb 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Feb 25, 2017

Uh oh!

SparkQA commented Feb 25, 2017

Uh oh!

gatorsmile commented Feb 25, 2017

Uh oh!

SparkQA commented Feb 25, 2017

Uh oh!

ron8hu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Feb 26, 2017

Uh oh!

cloud-fan commented Feb 26, 2017

Uh oh!

wzhfy commented Mar 3, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cloud-fan commented Feb 25, 2017 •

edited

Loading

cloud-fan Feb 25, 2017 •

edited

Loading

cloud-fan Feb 25, 2017 •

edited

Loading