-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-31502][SQL][DOCS] Document identifier in SQL Reference #28277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #121561 has finished for PR 28277 at commit
|
|
cc @cloud-fan |
|
also cc @maropu |
|
Test build #121657 has finished for PR 28277 at commit
|
|
@cloud-fan @maropu |
docs/sql-ref-identifier.md
Outdated
|
|
||
| ### Description | ||
|
|
||
| An identifier is a string used to identify a database object such as a table, view, schema, column, etc. Spark SQL has regular identifiers and delimited identifiers, which are enclosed within backticks. Both regular identifiers and delimited identifiers are case insensitive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both regular identifiers and delimited identifiers are case insensitive.
This behaivour depneds on spark.sql.caseSensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are regular identifiers and delimited identifiers common words in database-like systems? Actually, the latter one is a table (or relation) identifier? Anyway, I think its better to use consistent words across SQL docs. For example, it seems the ANSI document just uses identifiers like ...as identifiers for table, view, column, function, alias, etc.
https://github.com/apache/spark/blob/master/docs/sql-ref-ansi-compliance.md#sql-keywords
Also, I think we need some examples there like regular identifiers (e.g., alias names, xxx, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regular identifiers and delimited identifiers are common words. By standard, delimited identifiers are case sensitive.
https://en.wikibooks.org/wiki/SQL_Dialects_Reference/Data_structure_definition/Delimited_identifiers
docs/sql-ref-identifier.md
Outdated
| {% highlight sql %} | ||
| { letter | digit | '_' } [ , ... ] | ||
| {% endhighlight %} | ||
| Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords cannot be used as identifiers. If `spark.sql.ansi.enabled` is set to false (this is the default), strict-non-reserved keywords cannot be used as table aliases. Please refer to [ANSI Compliance](sql-ref-ansi-compliance.html) for a complete list of the keywords. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just saying it like "Note: If spark.sql.ansi.enabled is set to true, ANSI SQL reserved keywords cannot be used as identifiers. For more details, please refer to ANSI Compliance for a complete list of the keywords."? That's because we don't define what strict-non-reserved is in this page.
docs/sql-ref-identifier.md
Outdated
| <dl> | ||
| <dt><code><em>c</em></code></dt> | ||
| <dd> | ||
| Any character from the character set. Use <code>`</code> to escape <code>`</code>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using the same words in https://github.com/apache/spark/pull/28237/files#diff-3b65d142b02e13e9889a0f558c0e7bc8R46 ?
| CREATE TABLE test (a.b int); | ||
|
|
||
| -- This CREATE TABLE works | ||
| CREATE TABLE test (`a.b` int); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to describe a multi-part name like a.b.c for DSv2? @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still evolving, we don't mention anything about it in the sql reference for now.
|
Ur, I forgot to submit my reviews last night... some comments might be stale... |
|
Test build #121703 has finished for PR 28277 at commit
|
|
Looks fine now cc: @srowen @cloud-fan |
docs/sql-ref-identifier.md
Outdated
|
|
||
| ### Description | ||
|
|
||
| An identifier is a string used to identify a database object such as a table, view, schema, column, etc. Spark SQL has regular identifiers and delimited identifiers, which are enclosed within backticks. When `spark.sql.caseSensitive` is set to false (default behavior since Spark 2.4), both regular identifiers and delimited identifiers are case-insensitive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql.caseSensitive is an internal config. Shall we ignore it and just say Spark is case insensitive in the doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will update.
| CREATE TABLE test (`a.b` int); | ||
|
|
||
| -- This CREATE TABLE fails | ||
| CREATE TABLE test1 (`a`b` int); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should fail the parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This throws ParseException. Do you want me to change L71 to
-- This CREATE TABLE fails with ParseException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include the error message in the example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
|
Test build #121722 has finished for PR 28277 at commit
|
|
Test build #121724 has finished for PR 28277 at commit
|
|
thanks, merging to master/3.0! |
### What changes were proposed in this pull request? Document identifier in SQL Reference ### Why are the changes needed? make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-23 at 11 14 10 PM" src="https://user-images.githubusercontent.com/13592258/80180695-2f2a4f00-85b8-11ea-819b-f96872956d05.png"> <img width="1050" alt="Screen Shot 2020-04-23 at 11 32 32 PM" src="https://user-images.githubusercontent.com/13592258/80182062-e6c06080-85ba-11ea-9502-1c38358c97c9.png"> ### How was this patch tested? Manually build and check Closes #28277 from huaxingao/identifier. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b14b980) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
Thank you all! |
What changes were proposed in this pull request?
Document identifier in SQL Reference
Why are the changes needed?
make SQL Reference complete
Does this PR introduce any user-facing change?
Yes

How was this patch tested?
Manually build and check