Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Sep 2, 2020

What changes were proposed in this pull request?

Spark SQL exists a bug show below:

spark.sql(
  " SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 2, 3)")
  .show()
+-----------------+--------------------+
|count(DISTINCT 2)|count(DISTINCT 2, 3)|
+-----------------+--------------------+
|                1|                   1|
+-----------------+--------------------+

spark.sql(
  " SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 3, 2)")
  .show()
+-----------------+--------------------+
|count(DISTINCT 2)|count(DISTINCT 3, 2)|
+-----------------+--------------------+
|                1|                   0|
+-----------------+--------------------+

The first query is correct, but the second query is not.
The root reason is the second query rewrited by RewriteDistinctAggregates who expand the output but lost the 2.

Why are the changes needed?

Fix a bug.
SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 3, 2) should return 1, 1

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

New UT

@SparkQA
Copy link

SparkQA commented Sep 2, 2020

Test build #128196 has finished for PR 29626 at commit 0f355ad.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Sep 2, 2020

retest this please

@HyukjinKwon
Copy link
Member

cc @cloud-fan and @linhongliu-db



-- !query
SELECT COUNT(DISTINCT id), COUNT(DISTINCT 2,3) FILTER (WHERE dept_id = 30) FROM emp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test WHERE dept_id > 0? to make sure that we get correct result even if there are more than one inputs after filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Sep 2, 2020

Test build #128198 has finished for PR 29626 at commit e15519e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class JDBCScan(
  • case class JDBCScanBuilder(
  • case class JDBCWriteBuilder(schema: StructType, options: JdbcOptionsInWrite) extends V1WriteBuilder

@SparkQA
Copy link

SparkQA commented Sep 2, 2020

Test build #128197 has finished for PR 29626 at commit e15519e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class JDBCScan(
  • case class JDBCScanBuilder(
  • case class JDBCWriteBuilder(schema: StructType, options: JdbcOptionsInWrite) extends V1WriteBuilder

@SparkQA
Copy link

SparkQA commented Sep 2, 2020

Test build #128201 has finished for PR 29626 at commit 9ad3469.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 2, 2020

Test build #128208 has finished for PR 29626 at commit 81eb9e4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 3, 2020

Test build #128219 has finished for PR 29626 at commit 52896fb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Sep 3, 2020

retest this please

@beliefer beliefer changed the title [WIP][SPARK-32777][SQL] Aggregation support aggregate function with multiple foldable expressions. [SPARK-32777][SQL] Aggregation support aggregate function with multiple foldable expressions. Sep 3, 2020
@beliefer
Copy link
Contributor Author

beliefer commented Sep 3, 2020

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Sep 4, 2020

Test build #128285 has finished for PR 29626 at commit 8622411.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 4, 2020

Test build #128294 has finished for PR 29626 at commit 43cb6a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}
val distinctAggGroupLookup = distinctAggGroupMap.toMap
val distinctAggGroups = distinctAggGroupMap.groupBy(_._2).map{ kv =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mapValues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

val naf = patchAggregateFunctionChildren(af) { x =>
val condition = if (e.filter.isDefined) {
e.filter.map(distinctAggFilterAttrLookup.get(_)).get
val condition = e.filter.map(distinctAggFilterAttrLookup.get(_)).getOrElse(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is .getOrElse(None) needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Sep 7, 2020

Test build #128341 has finished for PR 29626 at commit 64834f8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2020

Test build #128345 has finished for PR 29626 at commit fa7d578.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val condition = if (e.filter.isDefined) {
e.filter.map(distinctAggFilterAttrLookup.get(_)).get
val condition = e.filter.map(distinctAggFilterAttrLookup.get(_)).flatten
if (distinctAggGroupLookup(e).contains(x)) {
Copy link
Contributor

@cloud-fan cloud-fan Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can simplify the logic a bit. The goal is to only do the replacement for the first child if all the children are foldable. How about

val af = e.aggregateFunction
val condition = e.filter.map(distinctAggFilterAttrLookup.get(_)).flatten
val naf = if (af.children.forall(_.foldable)) {
  val firstChild = evalWithinGroup(id, af.children.head, condition)
  af.withNewChildren(firstChild +: af.children.drop(1)).asInstanceOf[AggregateFunction]
} else {
  patchAggregateFunctionChildren(af) { x =>
    distinctAggChildAttrLookup.get(x).map(evalWithinGroup(id, _, condition))
  }
}

Then we don't need to change a lot of code to create distinctAggGroupLookup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Sep 9, 2020

Test build #128429 has finished for PR 29626 at commit 9a44bec.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Sep 9, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Sep 9, 2020

Test build #128438 has finished for PR 29626 at commit 9a44bec.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

None
val condition = e.filter.map(distinctAggFilterAttrLookup.get(_)).flatten
val naf = if (af.children.forall(_.foldable)) {
val firstChild = evalWithinGroup(id, af.children.head, condition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some comments to explain why we are doing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128479 has finished for PR 29626 at commit 16588f5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128478 has finished for PR 29626 at commit 9a44bec.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128489 has finished for PR 29626 at commit 16588f5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128498 has finished for PR 29626 at commit 16588f5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

github action passed, merging to master, thanks!

@cloud-fan cloud-fan closed this in a22871f Sep 10, 2020
@beliefer
Copy link
Contributor Author

@cloud-fan Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants