Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Feb 25, 2017

What changes were proposed in this pull request?

This is a follow-up of #16395. It fixes some code style issues, naming issues, some missing cases in pattern match, etc.

How was this patch tested?

existing tests.

@SparkQA
Copy link

SparkQA commented Feb 25, 2017

Test build #73465 has started for PR 17065 at commit 04cc681.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we are actually over-estimating: if a condition is unsupported in And, we assume it's 100% selectivity, which may leads to under-estimation if this And is wrapped by Not.

We should

  1. if one condition is unsupported, this And is unsupported
  2. do not handle nested Not

cc @wzhfy @ron8hu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shows that it is difficult to always over-estimate. How about we do not handle the nested NOT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan @ron8hu I'm a little confused about this, for Not expression, it always becomes under-estimation if we do over-estimation, no matter it's nested or not. So should we remove support for nested Not or Not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also support EqualNullSafe

Copy link
Contributor Author

@cloud-fan cloud-fan Feb 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we totally missed BooleanType and will throw MatchError if the attribute is bool. But the logic in evaluateBinaryForNumeric doesn't work for boolean, so I treat it as unsupported for now. @wzhfy do you have time to work on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ron8hu the external value type of DecimalType is java decimal, and the internal value type is Decimal, we need to convert it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Thanks for fixing it.

Copy link
Contributor Author

@cloud-fan cloud-fan Feb 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added boolean type

@cloud-fan
Copy link
Contributor Author

CC @ron8hu @wzhfy

@SparkQA
Copy link

SparkQA commented Feb 25, 2017

Test build #73466 has started for PR 17065 at commit 5fa69b3.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 25, 2017

Test build #73471 has finished for PR 17065 at commit 5fa69b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AttributeMap[A](val baseMap: Map[ExprId, (Attribute, A)])
  • class ColumnStatsMap
  • trait Range
  • case class NumericRange(min: JDecimal, max: JDecimal) extends Range
  • class DefaultRange extends Range
  • class NullRange extends Range

Copy link
Contributor

@ron8hu ron8hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Yesterday I started making changes based on your feedback. My effort was delayed by a build error after I fetched the latest code from master repository yesterday. Today I reviewed your follow-up PR. Your changes are a super set of mine. Pretty good and clean code! I will stop my code changes on this jira. Thanks for your follow-up effort.

case _: NumericType | BooleanType | DateType | TimestampType =>
val statsRange = Range(colStat.min, colStat.max, dataType).asInstanceOf[NumericRange]
val validQuerySet = hSet.filter { v =>
v != null && statsRange.contains(Literal(v, dataType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for better readability: (v != null) && statsRange.contains(Literal(v, dataType))

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

@cloud-fan
Copy link
Contributor Author

@ron8hu can you fix #17065 (comment) and #17065 (comment) if you have time? thanks!

@asfgit asfgit closed this in 89608cf Feb 26, 2017
Yunni pushed a commit to Yunni/spark that referenced this pull request Feb 27, 2017
## What changes were proposed in this pull request?

This is a follow-up of apache#16395. It fixes some code style issues, naming issues, some missing cases in pattern match, etc.

## How was this patch tested?

existing tests.

Author: Wenchen Fan <[email protected]>

Closes apache#17065 from cloud-fan/follow-up.
@wzhfy
Copy link
Contributor

wzhfy commented Mar 3, 2017

@cloud-fan @ron8hu Let me submit a follow-up pr to fix the remaining issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants