[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

huaxingao · 2022-01-05T19:31:51Z

What changes were proposed in this pull request?

Currently, composite filed name such as dept id doesn't work with aggregate push down

sql("SELECT COUNT(`dept id`) FROM h2.test.dept")

org.apache.spark.sql.catalyst.parser.ParseException: 
extraneous input 'id' expecting <EOF>(line 1, pos 5)

== SQL ==
dept id
-----^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:271)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:132)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:63)
	at org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:39)
	at org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:365)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.translateAggregate(DataSourceStrategy.scala:717)
	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.$anonfun$pushAggregates$1(PushDownUtils.scala:125)
	at scala.collection.immutable.List.flatMap(List.scala:366)
	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushAggregates(PushDownUtils.scala:125)

This PR fixes the problem.

Why are the changes needed?

bug fixing

Does this PR introduce any user-facing change?

No

How was this patch tested?

New test

huaxingao · 2022-01-05T23:14:16Z

@cloud-fan Could you please take a look? Thanks!

Dintion · 2022-01-06T02:56:57Z

and Chinese filed name has same problem

== SQL ==
缺陷编号
^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:265)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:126)

huaxingao · 2022-01-06T04:31:48Z

Should be good now @Dintion

cloud-fan · 2022-01-06T05:07:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala

I think we need to fix the caller side. We shouldn't call FieldReference.apply(String) which parses the given string. We should call FieldReference(Seq(col_name)).

cloud-fan · 2022-01-06T07:34:08Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

We know it's a top-level column and it's a waste to parse it again. The column name may contain backtick as well and we need to escape it.

A simpler solution is to skip the parsing: FieldReference(Seq(name)). We can even create an util method for it: FieldReference.column(name)

beliefer · 2022-01-06T08:30:18Z

@huaxingao Could you wait #35101 merged and update with FieldReference.column(name) ?

Dintion · 2022-01-06T08:50:38Z

@huaxingao I think the code at org.apache.spark.sql.execution.datasources.v2.PushDownUtils#pushAggregates#columnAsString

 def columnAsString(e: Expression): Option[FieldReference] = e match {
      case PushableColumnWithoutNestedColumn(name) =>
        Some(FieldReference(name).asInstanceOf[FieldReference])
      case _ => None
    }```

 also exist same problem

…sh down

beliefer · 2022-01-07T01:16:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala

  }
+
+  def column(name: String) : NamedReference = {
+    FieldReference(Seq(name))


Thank you for your work

cloud-fan · 2022-01-07T03:37:25Z

thanks, merging to master!

huaxingao · 2022-01-07T05:47:45Z

Thank you all!

…sh down ### What changes were proposed in this pull request? Currently, composite filed name such as dept id doesn't work with aggregate push down sql("SELECT COUNT(\`dept id\`) FROM h2.test.dept") ``` org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'id' expecting <EOF>(line 1, pos 5) == SQL == dept id -----^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:271) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:132) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:63) at org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:39) at org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:365) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.translateAggregate(DataSourceStrategy.scala:717) at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.$anonfun$pushAggregates$1(PushDownUtils.scala:125) at scala.collection.immutable.List.flatMap(List.scala:366) at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushAggregates(PushDownUtils.scala:125) ``` This PR fixes the problem. ### Why are the changes needed? bug fixing ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New test Closes apache#35108 from huaxingao/composite_name. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

github-actions bot added the SQL label Jan 5, 2022

huaxingao force-pushed the composite_name branch from b58c093 to 01ae878 Compare January 5, 2022 19:38

cloud-fan reviewed Jan 6, 2022

View reviewed changes

huaxingao mentioned this pull request Jan 6, 2022

[SPARK-37527][SQL] Translate more standard aggregate functions for pushdown #35101

Closed

cloud-fan reviewed Jan 6, 2022

View reviewed changes

cloud-fan approved these changes Jan 6, 2022

View reviewed changes

huaxingao added 6 commits January 6, 2022 14:35

[SPARK-37802][SQL] Composite field name should work with Aggregate pu…

fcf9110

…sh down

rebase

722a518

fix parsing problem for non-ascii

288250b

address comments

8fa9ee6

add FieldReference.column(name)

80fe97a

FieldReference(name) => FieldReference.column(name) in a few more places

c9361a5

huaxingao force-pushed the composite_name branch from 49348a9 to c9361a5 Compare January 7, 2022 00:20

beliefer reviewed Jan 7, 2022

View reviewed changes

cloud-fan closed this in cf193b9 Jan 7, 2022

huaxingao deleted the composite_name branch January 7, 2022 05:47

xuzifu666 mentioned this pull request Aug 17, 2024

[spark] Support write with composite partition key apache/paimon#3985

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

Uh oh!

huaxingao commented Jan 5, 2022

Uh oh!

huaxingao commented Jan 5, 2022

Uh oh!

Dintion commented Jan 6, 2022

Uh oh!

huaxingao commented Jan 6, 2022

Uh oh!

cloud-fan Jan 6, 2022

Uh oh!

cloud-fan Jan 6, 2022

Uh oh!

beliefer commented Jan 6, 2022

Uh oh!

Dintion commented Jan 6, 2022 •

edited

Loading

Uh oh!

beliefer Jan 7, 2022

Uh oh!

cloud-fan commented Jan 7, 2022

Uh oh!

huaxingao commented Jan 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

[SPARK-37802][SQL] Composite field name should work with Aggregate push down #35108

Uh oh!

Conversation

huaxingao commented Jan 5, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

huaxingao commented Jan 5, 2022

Uh oh!

Dintion commented Jan 6, 2022

Uh oh!

huaxingao commented Jan 6, 2022

Uh oh!

cloud-fan Jan 6, 2022

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 6, 2022

Choose a reason for hiding this comment

Uh oh!

beliefer commented Jan 6, 2022

Uh oh!

Dintion commented Jan 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beliefer Jan 7, 2022

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jan 7, 2022

Uh oh!

huaxingao commented Jan 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Dintion commented Jan 6, 2022 •

edited

Loading