Skip to content

Conversation

@huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Apr 21, 2020

What changes were proposed in this pull request?

Document identifier in SQL Reference

Why are the changes needed?

make SQL Reference complete

Does this PR introduce any user-facing change?

Yes
Screen Shot 2020-04-23 at 11 14 10 PM

Screen Shot 2020-04-23 at 11 32 32 PM

How was this patch tested?

Manually build and check

@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121561 has finished for PR 28277 at commit 2dc2910.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @cloud-fan

@huaxingao
Copy link
Contributor Author

also cc @maropu

@SparkQA
Copy link

SparkQA commented Apr 23, 2020

Test build #121657 has finished for PR 28277 at commit b0b1262.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

@cloud-fan @maropu
One more to review. Thank you very much!


### Description

An identifier is a string used to identify a database object such as a table, view, schema, column, etc. Spark SQL has regular identifiers and delimited identifiers, which are enclosed within backticks. Both regular identifiers and delimited identifiers are case insensitive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both regular identifiers and delimited identifiers are case insensitive.

This behaivour depneds on spark.sql.caseSensitive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are regular identifiers and delimited identifiers common words in database-like systems? Actually, the latter one is a table (or relation) identifier? Anyway, I think its better to use consistent words across SQL docs. For example, it seems the ANSI document just uses identifiers like ...as identifiers for table, view, column, function, alias, etc.
https://github.com/apache/spark/blob/master/docs/sql-ref-ansi-compliance.md#sql-keywords

Also, I think we need some examples there like regular identifiers (e.g., alias names, xxx, ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regular identifiers and delimited identifiers are common words. By standard, delimited identifiers are case sensitive.
https://en.wikibooks.org/wiki/SQL_Dialects_Reference/Data_structure_definition/Delimited_identifiers

{% highlight sql %}
{ letter | digit | '_' } [ , ... ]
{% endhighlight %}
Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords cannot be used as identifiers. If `spark.sql.ansi.enabled` is set to false (this is the default), strict-non-reserved keywords cannot be used as table aliases. Please refer to [ANSI Compliance](sql-ref-ansi-compliance.html) for a complete list of the keywords.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just saying it like "Note: If spark.sql.ansi.enabled is set to true, ANSI SQL reserved keywords cannot be used as identifiers. For more details, please refer to ANSI Compliance for a complete list of the keywords."? That's because we don't define what strict-non-reserved is in this page.

<dl>
<dt><code><em>c</em></code></dt>
<dd>
Any character from the character set. Use <code>`</code> to escape <code>`</code>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE TABLE test (a.b int);

-- This CREATE TABLE works
CREATE TABLE test (`a.b` int);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to describe a multi-part name like a.b.c for DSv2? @cloud-fan

Copy link
Contributor

@cloud-fan cloud-fan Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still evolving, we don't mention anything about it in the sql reference for now.

@maropu
Copy link
Member

maropu commented Apr 23, 2020

Ur, I forgot to submit my reviews last night... some comments might be stale...

@SparkQA
Copy link

SparkQA commented Apr 23, 2020

Test build #121703 has finished for PR 28277 at commit 837a691.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Apr 23, 2020

Looks fine now cc: @srowen @cloud-fan


### Description

An identifier is a string used to identify a database object such as a table, view, schema, column, etc. Spark SQL has regular identifiers and delimited identifiers, which are enclosed within backticks. When `spark.sql.caseSensitive` is set to false (default behavior since Spark 2.4), both regular identifiers and delimited identifiers are case-insensitive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.sql.caseSensitive is an internal config. Shall we ignore it and just say Spark is case insensitive in the doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will update.

CREATE TABLE test (`a.b` int);

-- This CREATE TABLE fails
CREATE TABLE test1 (`a`b` int);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should fail the parser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This throws ParseException. Do you want me to change L71 to
-- This CREATE TABLE fails with ParseException?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include the error message in the example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks!

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121722 has finished for PR 28277 at commit 15f0402.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121724 has finished for PR 28277 at commit 21b6eb1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in b14b980 Apr 24, 2020
cloud-fan pushed a commit that referenced this pull request Apr 24, 2020
### What changes were proposed in this pull request?
Document identifier in SQL Reference

### Why are the changes needed?
make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
<img width="1049" alt="Screen Shot 2020-04-23 at 11 14 10 PM" src="https://user-images.githubusercontent.com/13592258/80180695-2f2a4f00-85b8-11ea-819b-f96872956d05.png">

<img width="1050" alt="Screen Shot 2020-04-23 at 11 32 32 PM" src="https://user-images.githubusercontent.com/13592258/80182062-e6c06080-85ba-11ea-9502-1c38358c97c9.png">

### How was this patch tested?
Manually build and check

Closes #28277 from huaxingao/identifier.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b14b980)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@huaxingao
Copy link
Contributor Author

Thank you all!

@huaxingao huaxingao deleted the identifier branch April 24, 2020 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants