Skip to content

Conversation

@wzhfy
Copy link
Contributor

@wzhfy wzhfy commented Nov 15, 2017

What changes were proposed in this pull request?

Currently, relation stats is the same whether cbo is enabled or not. While relation (LogicalRelation or HiveTableRelation) is a LogicalPlan, its behavior is inconsistent with other plans. This can cause confusion when user runs EXPLAIN COST commands. Besides, when CBO is disabled, we apply the size-only estimation strategy, so there's no need to propagate other catalog statistics to relation.

How was this patch tested?

Enhanced existing tests case and added a test case.

@wzhfy
Copy link
Contributor Author

wzhfy commented Nov 15, 2017

cc @cloud-fan @gatorsmile

@wzhfy wzhfy changed the title [SPARK-22529] [SQL] Only sizeInBytes in catalog stats needs to be propagated when cbo is disabled [SPARK-22529] [SQL] Only sizeInBytes in catalog stats needs to be propagated to leaf nodes when cbo is disabled Nov 15, 2017
@SparkQA
Copy link

SparkQA commented Nov 15, 2017

Test build #83897 has finished for PR 19757 at commit b65a153.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy wzhfy changed the title [SPARK-22529] [SQL] Only sizeInBytes in catalog stats needs to be propagated to leaf nodes when cbo is disabled [SPARK-22529] [SQL] Relation stats should be consistent with other plans based on cbo config Nov 16, 2017
@SparkQA
Copy link

SparkQA commented Nov 16, 2017

Test build #83928 has finished for PR 19757 at commit c75ae70.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy
Copy link
Contributor Author

wzhfy commented Nov 16, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Nov 16, 2017

Test build #83935 has finished for PR 19757 at commit c75ae70.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 23, 2017

Test build #84123 has finished for PR 19757 at commit 4c7d12e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy
Copy link
Contributor Author

wzhfy commented Nov 23, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Nov 23, 2017

Test build #84130 has finished for PR 19757 at commit 4c7d12e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Statistics(sizeInBytes = sizeInBytes, rowCount = rowCount,
attributeStats = AttributeMap(matched))
def toPlanStats(planOutput: Seq[Attribute], cboEnabled: Boolean): Statistics = {
val attrStats = planOutput.flatMap(a => colStats.get(a.name).map(a -> _))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move into if.

withSQLConf(SQLConf.CBO_ENABLED.key -> "false") {
// Don't show rowCount if cbo is disabled
checkKeywordsExist(sql(explainCostCommand), "sizeInBytes")
checkKeywordsNotExist(sql(explainCostCommand), "rowCount")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you assume there is no table relation cache in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, should I refresh it for robustness?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please.

@SparkQA
Copy link

SparkQA commented Nov 27, 2017

Test build #84213 has finished for PR 19757 at commit 45ab60b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 27, 2017

Test build #84218 has finished for PR 19757 at commit dcdebe1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 1ff4a77 Nov 27, 2017
} else {
// When CBO is disabled, we apply the size-only estimation strategy, so there's no need to
// propagate other statistics from catalog to the plan.
Statistics(sizeInBytes = sizeInBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rowCount is available, why we ignore them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants