[SPARK-39783][SQL] Quote qualifiedName to fix backticks for column candidates in error messages #38256

EnricoMi · 2022-10-14T07:49:30Z

What changes were proposed in this pull request?

The NamedExpression.qualifiedName is a concatenation of qualifiers and the name, joined by dots. If those contain dots, the result qualifiedName is ambiguous. Quoting those if they contain dots fixes this, while this also fixes quoting column candidates in the error messages UNRESOLVED_COLUMN and UNRESOLVED_MAP_KEY:

UNRESOLVED_COLUMN:

Seq((0)).toDF("the.id").select("the.id").show()

The error message should read

org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN] A column or function parameter
with name `the`.`id` cannot be resolved. Did you mean one of the following? [`the.id`];

while it was:

org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN] A column or function parameter
with name `the`.`id` cannot be resolved. Did you mean one of the following? [`the`.`id`];

UNRESOLVED_MAP_KEY:

Seq((0)).toDF("id")
  .select(map(lit("key"), lit(1)).as("map"), lit(2).as("other.column"))
  .select($"`map`"($"nonexisting")).show()

The error message should read

Cannot resolve column `nonexisting` as a map key. If the key is a string literal, please add single quotes around it.
Otherwise did you mean one of the following column(s)? [`map`, `other.column`];

while it was:

Cannot resolve column `nonexisting` as a map key. If the key is a string literal, please add single quotes around it.
Otherwise did you mean one of the following column(s)? [`map`, `other`.`column`];

Why are the changes needed?

The current quoting is wrong and qualifiedName is ambiguous if name or qualifiers contain dots.

Does this PR introduce any user-facing change?

It corrects the error message.

How was this patch tested?

This is tested in AnalysisErrorSuite, DatasetSuite and QueryCompilationErrorsSuite.scala.

EnricoMi · 2022-10-14T07:51:34Z

I am not sure if this is the best place to fix this, but since qualifier and name could contain ., the qualifiedName should really quote its parts if needed:

def qualifiedName: String = (qualifier :+ name).map(quoteIfNeeded).mkString(".")

EnricoMi · 2022-10-14T07:54:47Z

Note that Dataset("the.id") is different to Dataset.select("the.id") is different to LocalRelation.select("the.id"). They all three have different semantics in what "the.id" means, only Dataset.select("the.id") and

LocalRelation.select($"`the`.`id`")

raise the exception.

EnricoMi · 2022-10-14T08:05:15Z

Note that ResolveUserSpecifiedColumns rule in Analyzer uses ._name instead of ._qualifiedName, so it should be exposed to the same issue, but I could not come up with a test that catches this.

EnricoMi · 2022-10-14T08:05:30Z

@srielau @gengliangwang @HyukjinKwon @cloud-fan @MaxGekk

EnricoMi · 2022-10-14T08:11:28Z

@sadikovi

sadikovi · 2022-10-14T08:11:33Z

I was already working on this ticket and opened a PR for #38254. I would prefer to continue with that change if you don't mind.

EnricoMi · 2022-10-14T08:15:48Z

@sadikovi I am happy to contribute the tests to your PR.

sadikovi · 2022-10-14T08:16:50Z

It is fine, let's work on your PR, it is more complete. I was in process of adding the tests but I noticed you opened another PR.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala

MaxGekk · 2022-10-14T17:16:44Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala

      "[`a`, `b`, `c`, `d`, `e`]"
      :: Nil)

+  errorTest(


Could you invoke errorClassTest(). This will allow to make the test independent from error message text, so, tech editors could edit error-classes.json and don't depend on Spark's tests.

MaxGekk · 2022-10-14T17:17:13Z

sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala

+        val errorMsg = intercept[AnalysisException] {
+          // Note: ds(colName) has different semantics than ds.select(colName)
+          ds.select(colName)
+        }
+        assert(errorMsg.getMessage.contains(


Please, use checkError().

EnricoMi · 2022-10-14T19:20:52Z

I have also managed to add tests for error UNRESOLVED_MAP_KEY, which also gets fixed by this PR.

EnricoMi · 2022-10-14T19:21:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

I could not find a way to test this though. Suggestions welcome!

AmplabJenkins · 2022-10-15T13:56:20Z

Can one of the admins verify this patch?

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala

srowen

I'm OK with it, if no further comment from Max

MaxGekk · 2022-10-18T10:04:27Z

+1, LGTM. Merging to master.
Thank you, @EnricoMi and @sadikovi @srowen for review.

cloud-fan · 2022-10-18T13:04:53Z

late LGTM

…ndidates in error messages ### What changes were proposed in this pull request? The `NamedExpression.qualifiedName` is a concatenation of qualifiers and the name, joined by `dots`. If those contain `dots`, the result `qualifiedName` is ambiguous. Quoting those if they contain `dots` fixes this, while this also fixes quoting column candidates in the error messages `UNRESOLVED_COLUMN` and `UNRESOLVED_MAP_KEY`: `UNRESOLVED_COLUMN`: ``` Seq((0)).toDF("the.id").select("the.id").show() ``` The error message should read org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN] A column or function parameter with name `the`.`id` cannot be resolved. Did you mean one of the following? [`the.id`]; while it was: org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN] A column or function parameter with name `the`.`id` cannot be resolved. Did you mean one of the following? [`the`.`id`]; `UNRESOLVED_MAP_KEY`: ``` Seq((0)).toDF("id") .select(map(lit("key"), lit(1)).as("map"), lit(2).as("other.column")) .select($"`map`"($"nonexisting")).show() ``` The error message should read Cannot resolve column `nonexisting` as a map key. If the key is a string literal, please add single quotes around it. Otherwise did you mean one of the following column(s)? [`map`, `other.column`]; while it was: Cannot resolve column `nonexisting` as a map key. If the key is a string literal, please add single quotes around it. Otherwise did you mean one of the following column(s)? [`map`, `other`.`column`]; ### Why are the changes needed? The current quoting is wrong and `qualifiedName` is ambiguous if `name` or `qualifiers` contain `dots`. ### Does this PR introduce _any_ user-facing change? It corrects the error message. ### How was this patch tested? This is tested in `AnalysisErrorSuite`, `DatasetSuite` and `QueryCompilationErrorsSuite.scala`. Closes apache#38256 from EnricoMi/branch-correct-backticks-error-message. Authored-by: Enrico Minack <[email protected]> Signed-off-by: Max Gekk <[email protected]>

Fix backticks for column candidates in error message

f59e9df

github-actions bot added the SQL label Oct 14, 2022

EnricoMi mentioned this pull request Oct 14, 2022

[SPARK-39783] Do not parse already qualified identifiers for UNRESOLVED_COLUMN AnalysisException #38254

Closed

sadikovi reviewed Oct 14, 2022

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala Show resolved Hide resolved

sadikovi approved these changes Oct 14, 2022

View reviewed changes

Adjust expected proposal to correct one

1d8a919

srowen approved these changes Oct 14, 2022

View reviewed changes

MaxGekk requested changes Oct 14, 2022

View reviewed changes

EnricoMi added 2 commits October 14, 2022 21:01

Use checkError and errorClassTest

dc80706

Make ResolveUserSpecifiedColumns use qualifiedName for candidates

cef2711

EnricoMi force-pushed the branch-correct-backticks-error-message branch 2 times, most recently from 0b56eb4 to 99f0a06 Compare October 14, 2022 19:03

EnricoMi changed the title ~~[SPARK-39783][SQL] Fix backticks for column candidates in error message~~ [SPARK-39783][SQL] Quote qualifiedName to fix backticks for column candidates in error messages Oct 14, 2022

EnricoMi commented Oct 14, 2022

View reviewed changes

EnricoMi added 2 commits October 14, 2022 23:06

Test for UNRESOLVED_MAP_KEY.WITH_SUGGESTION

f919303

Update documentation of qualifiedName

b9c9907

EnricoMi force-pushed the branch-correct-backticks-error-message branch from 99f0a06 to b9c9907 Compare October 14, 2022 21:07

srowen reviewed Oct 16, 2022

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala Show resolved Hide resolved

srowen reviewed Oct 18, 2022

View reviewed changes

MaxGekk approved these changes Oct 18, 2022

View reviewed changes

MaxGekk closed this in fc4643b Oct 18, 2022

[SPARK-39783][SQL] Quote qualifiedName to fix backticks for column candidates in error messages #38256

[SPARK-39783][SQL] Quote qualifiedName to fix backticks for column candidates in error messages #38256

Uh oh!

Conversation

EnricoMi commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

EnricoMi commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EnricoMi commented Oct 14, 2022

Uh oh!

EnricoMi commented Oct 14, 2022

Uh oh!

EnricoMi commented Oct 14, 2022

Uh oh!

EnricoMi commented Oct 14, 2022

Uh oh!

sadikovi commented Oct 14, 2022

Uh oh!

EnricoMi commented Oct 14, 2022

Uh oh!

sadikovi commented Oct 14, 2022

Uh oh!

Uh oh!

MaxGekk Oct 14, 2022

Choose a reason for hiding this comment

Uh oh!

EnricoMi Oct 14, 2022

Choose a reason for hiding this comment

Uh oh!

MaxGekk Oct 14, 2022

Choose a reason for hiding this comment

Uh oh!

EnricoMi Oct 14, 2022

Choose a reason for hiding this comment

Uh oh!

EnricoMi commented Oct 14, 2022

Uh oh!

EnricoMi Oct 14, 2022

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Oct 15, 2022

Uh oh!

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Oct 18, 2022

Uh oh!

cloud-fan commented Oct 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

EnricoMi commented Oct 14, 2022 •

edited

Loading

EnricoMi commented Oct 14, 2022 •

edited

Loading