Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jan 5, 2021

What changes were proposed in this pull request?

ResolveAggregateFunctions is a hacky rule and it calls executeSameContext to generate a resolved agg to determine which unresolved sort attribute should be pushed into the agg. However, after we add the PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, thus, the resolved agg cannot match original attributes anymore.

It causes some dissociative sort attribute to be pushed in and fails the query

[info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: expression 'testcat.t1.`v`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
[info]   Project [v#14, sum(i)#11L]
[info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
[info]      +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS sum(i)#11L, v#13 AS aggOrder#12]
[info]         +- SubqueryAlias testcat.t1
[info]            +- Project [if ((length(v#6) <= 3)) v#6 else if ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#6) as string),  exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
[info]               +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1
[info]
[info]   Project [v#14, sum(i)#11L]
[info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
[info]      +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS sum(i)#11L, v#13 AS aggOrder#12]
[info]         +- SubqueryAlias testcat.t1
[info]            +- Project [if ((length(v#6) <= 3)) v#6 else if ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#6) as string),  exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
[info]               +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1

Why are the changes needed?

bugfix

Does this PR introduce any user-facing change?

no

How was this patch tested?

new tests

@yaooqinn
Copy link
Member Author

yaooqinn commented Jan 5, 2021

cc @cloud-fan @maropu @dongjoon-hyun @HyukjinKwon thanks for reviewing

@github-actions github-actions bot added the SQL label Jan 5, 2021
@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Test build #133664 has finished for PR 31027 at commit 8cbd537.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38253/

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38253/

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38309/

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38309/

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Test build #133721 has finished for PR 31027 at commit 1b5c191.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38327/

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Test build #133742 has finished for PR 31027 at commit c715337.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38327/

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Test build #133739 has finished for PR 31027 at commit 9a44c7d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38330/

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38330/

@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38387/

@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Test build #133798 has finished for PR 31027 at commit 8042cd6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38387/

@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38389/

@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38389/

@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Test build #133800 has finished for PR 31027 at commit daa8549.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Jan 8, 2021

retest this please

@SparkQA
Copy link

SparkQA commented Jan 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38405/

@SparkQA
Copy link

SparkQA commented Jan 8, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38405/

@SparkQA
Copy link

SparkQA commented Jan 8, 2021

Test build #133816 has finished for PR 31027 at commit daa8549.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.1!

cloud-fan pushed a commit that referenced this pull request Jan 8, 2021
…rCharVarchar and ResolveAggregateFunctions

### What changes were proposed in this pull request?

ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` to generate a `resolved agg` to determine which unresolved sort attribute should be pushed into the agg. However, after we add the PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, thus, the `resolved agg` cannot match original attributes anymore.

It causes some dissociative sort attribute to be pushed in and fails the query

``` logtalk
[info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: expression 'testcat.t1.`v`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
[info]   Project [v#14, sum(i)#11L]
[info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
[info]      +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS sum(i)#11L, v#13 AS aggOrder#12]
[info]         +- SubqueryAlias testcat.t1
[info]            +- Project [if ((length(v#6) <= 3)) v#6 else if ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#6) as string),  exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
[info]               +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1
[info]
[info]   Project [v#14, sum(i)#11L]
[info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
[info]      +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS sum(i)#11L, v#13 AS aggOrder#12]
[info]         +- SubqueryAlias testcat.t1
[info]            +- Project [if ((length(v#6) <= 3)) v#6 else if ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#6) as string),  exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
[info]               +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1
```

### Why are the changes needed?

bugfix
### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

new tests

Closes #31027 from yaooqinn/SPARK-34003.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 0f8e5dd)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this in 0f8e5dd Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants