Skip to content

Conversation

@huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

Currently, composite filed name such as dept id doesn't work with aggregate push down

sql("SELECT COUNT(`dept id`) FROM h2.test.dept")

org.apache.spark.sql.catalyst.parser.ParseException: 
extraneous input 'id' expecting <EOF>(line 1, pos 5)

== SQL ==
dept id
-----^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:271)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:132)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:63)
	at org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:39)
	at org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:365)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.translateAggregate(DataSourceStrategy.scala:717)
	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.$anonfun$pushAggregates$1(PushDownUtils.scala:125)
	at scala.collection.immutable.List.flatMap(List.scala:366)
	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushAggregates(PushDownUtils.scala:125)

This PR fixes the problem.

Why are the changes needed?

bug fixing

Does this PR introduce any user-facing change?

No

How was this patch tested?

New test

@huaxingao
Copy link
Contributor Author

@cloud-fan Could you please take a look? Thanks!

@Dintion
Copy link

Dintion commented Jan 6, 2022

and Chinese filed name has same problem

== SQL ==
缺陷编号
^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:265)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:126)

@huaxingao
Copy link
Contributor Author

Should be good now @Dintion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to fix the caller side. We shouldn't call FieldReference.apply(String) which parses the given string. We should call FieldReference(Seq(col_name)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know it's a top-level column and it's a waste to parse it again. The column name may contain backtick as well and we need to escape it.

A simpler solution is to skip the parsing: FieldReference(Seq(name)). We can even create an util method for it: FieldReference.column(name)

@beliefer
Copy link
Contributor

beliefer commented Jan 6, 2022

@huaxingao Could you wait #35101 merged and update with FieldReference.column(name) ?

@Dintion
Copy link

Dintion commented Jan 6, 2022

@huaxingao I think the code at org.apache.spark.sql.execution.datasources.v2.PushDownUtils#pushAggregates#columnAsString

 def columnAsString(e: Expression): Option[FieldReference] = e match {
      case PushableColumnWithoutNestedColumn(name) =>
        Some(FieldReference(name).asInstanceOf[FieldReference])
      case _ => None
    }```

 also exist same problem

}

def column(name: String) : NamedReference = {
FieldReference(Seq(name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in cf193b9 Jan 7, 2022
@huaxingao
Copy link
Contributor Author

Thank you all!

@huaxingao huaxingao deleted the composite_name branch January 7, 2022 05:47
dchvn pushed a commit to dchvn/spark that referenced this pull request Jan 19, 2022
…sh down

### What changes were proposed in this pull request?
Currently, composite filed name such as dept id doesn't work with aggregate push down

sql("SELECT COUNT(\`dept id\`) FROM h2.test.dept")
```
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'id' expecting <EOF>(line 1, pos 5)

== SQL ==
dept id
-----^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:271)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:132)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:63)
	at org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:39)
	at org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:365)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.translateAggregate(DataSourceStrategy.scala:717)
	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.$anonfun$pushAggregates$1(PushDownUtils.scala:125)
	at scala.collection.immutable.List.flatMap(List.scala:366)
	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushAggregates(PushDownUtils.scala:125)
```
This PR fixes the problem.

### Why are the changes needed?
bug fixing

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
New test

Closes apache#35108 from huaxingao/composite_name.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants