Skip to content

Conversation

@EnricoMi
Copy link
Contributor

@EnricoMi EnricoMi commented Oct 14, 2022

What changes were proposed in this pull request?

The NamedExpression.qualifiedName is a concatenation of qualifiers and the name, joined by dots. If those contain dots, the result qualifiedName is ambiguous. Quoting those if they contain dots fixes this, while this also fixes quoting column candidates in the error messages UNRESOLVED_COLUMN and UNRESOLVED_MAP_KEY:

UNRESOLVED_COLUMN:

Seq((0)).toDF("the.id").select("the.id").show()

The error message should read

org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN] A column or function parameter
with name `the`.`id` cannot be resolved. Did you mean one of the following? [`the.id`];

while it was:

org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN] A column or function parameter
with name `the`.`id` cannot be resolved. Did you mean one of the following? [`the`.`id`];

UNRESOLVED_MAP_KEY:

Seq((0)).toDF("id")
  .select(map(lit("key"), lit(1)).as("map"), lit(2).as("other.column"))
  .select($"`map`"($"nonexisting")).show()

The error message should read

Cannot resolve column `nonexisting` as a map key. If the key is a string literal, please add single quotes around it.
Otherwise did you mean one of the following column(s)? [`map`, `other.column`];

while it was:

Cannot resolve column `nonexisting` as a map key. If the key is a string literal, please add single quotes around it.
Otherwise did you mean one of the following column(s)? [`map`, `other`.`column`];

Why are the changes needed?

The current quoting is wrong and qualifiedName is ambiguous if name or qualifiers contain dots.

Does this PR introduce any user-facing change?

It corrects the error message.

How was this patch tested?

This is tested in AnalysisErrorSuite, DatasetSuite and QueryCompilationErrorsSuite.scala.

@EnricoMi
Copy link
Contributor Author

EnricoMi commented Oct 14, 2022

I am not sure if this is the best place to fix this, but since qualifier and name could contain ., the qualifiedName should really quote its parts if needed:

def qualifiedName: String = (qualifier :+ name).map(quoteIfNeeded).mkString(".")

@EnricoMi
Copy link
Contributor Author

Note that Dataset("the.id") is different to Dataset.select("the.id") is different to LocalRelation.select("the.id"). They all three have different semantics in what "the.id" means, only Dataset.select("the.id") and

LocalRelation.select($"`the`.`id`")

raise the exception.

@github-actions github-actions bot added the SQL label Oct 14, 2022
@EnricoMi
Copy link
Contributor Author

Note that ResolveUserSpecifiedColumns rule in Analyzer uses ._name instead of ._qualifiedName, so it should be exposed to the same issue, but I could not come up with a test that catches this.

@EnricoMi
Copy link
Contributor Author

@EnricoMi
Copy link
Contributor Author

@sadikovi

@sadikovi
Copy link
Contributor

I was already working on this ticket and opened a PR for #38254. I would prefer to continue with that change if you don't mind.

@EnricoMi
Copy link
Contributor Author

@sadikovi I am happy to contribute the tests to your PR.

@sadikovi
Copy link
Contributor

It is fine, let's work on your PR, it is more complete. I was in process of adding the tests but I noticed you opened another PR.

"[`a`, `b`, `c`, `d`, `e`]"
:: Nil)

errorTest(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you invoke errorClassTest(). This will allow to make the test independent from error message text, so, tech editors could edit error-classes.json and don't depend on Spark's tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 1999 to 2003
val errorMsg = intercept[AnalysisException] {
// Note: ds(colName) has different semantics than ds.select(colName)
ds.select(colName)
}
assert(errorMsg.getMessage.contains(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use checkError().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@EnricoMi EnricoMi force-pushed the branch-correct-backticks-error-message branch 2 times, most recently from 0b56eb4 to 99f0a06 Compare October 14, 2022 19:03
@EnricoMi EnricoMi changed the title [SPARK-39783][SQL] Fix backticks for column candidates in error message [SPARK-39783][SQL] Quote qualifiedName to fix backticks for column candidates in error messages Oct 14, 2022
@EnricoMi
Copy link
Contributor Author

I have also managed to add tests for error UNRESOLVED_MAP_KEY, which also gets fixed by this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find a way to test this though. Suggestions welcome!

@EnricoMi EnricoMi force-pushed the branch-correct-backticks-error-message branch from 99f0a06 to b9c9907 Compare October 14, 2022 21:07
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with it, if no further comment from Max

@MaxGekk
Copy link
Member

MaxGekk commented Oct 18, 2022

+1, LGTM. Merging to master.
Thank you, @EnricoMi and @sadikovi @srowen for review.

@MaxGekk MaxGekk closed this in fc4643b Oct 18, 2022
@cloud-fan
Copy link
Contributor

late LGTM

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…ndidates in error messages

### What changes were proposed in this pull request?

The `NamedExpression.qualifiedName` is a concatenation of qualifiers and the name, joined by `dots`. If those contain `dots`, the result `qualifiedName` is ambiguous. Quoting those if they contain `dots` fixes this, while this also fixes quoting column candidates in the error messages `UNRESOLVED_COLUMN` and `UNRESOLVED_MAP_KEY`:

`UNRESOLVED_COLUMN`:
```
Seq((0)).toDF("the.id").select("the.id").show()
```

The error message should read

    org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN] A column or function parameter
    with name `the`.`id` cannot be resolved. Did you mean one of the following? [`the.id`];

while it was:

    org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN] A column or function parameter
    with name `the`.`id` cannot be resolved. Did you mean one of the following? [`the`.`id`];

`UNRESOLVED_MAP_KEY`:
```
Seq((0)).toDF("id")
  .select(map(lit("key"), lit(1)).as("map"), lit(2).as("other.column"))
  .select($"`map`"($"nonexisting")).show()
```

The error message should read

    Cannot resolve column `nonexisting` as a map key. If the key is a string literal, please add single quotes around it.
    Otherwise did you mean one of the following column(s)? [`map`, `other.column`];

while it was:

    Cannot resolve column `nonexisting` as a map key. If the key is a string literal, please add single quotes around it.
    Otherwise did you mean one of the following column(s)? [`map`, `other`.`column`];

### Why are the changes needed?
The current quoting is wrong and `qualifiedName` is ambiguous if `name` or `qualifiers` contain `dots`.

### Does this PR introduce _any_ user-facing change?
It corrects the error message.

### How was this patch tested?
This is tested in `AnalysisErrorSuite`, `DatasetSuite` and `QueryCompilationErrorsSuite.scala`.

Closes apache#38256 from EnricoMi/branch-correct-backticks-error-message.

Authored-by: Enrico Minack <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants