-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28786][DOC][SQL]Document INSERT statement in SQL Reference #25525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #109468 has finished for PR 25525 at commit
|
|
Test build #109511 has finished for PR 25525 at commit
|
|
@srowen Could you please review this one too? Thanks a lot in advance! |
|
|
||
| ### Description | ||
|
|
||
| When the partition value is not provided, such inserts are called as the dynamic partition inserts, also called as multi-partition inserts. In partition spec, the partition column values are optional. When the values are not given, these columns are referred to as dynamic partition columns; otherwise, they are static partition columns. For example, the partition spec (p1 = 3, p2, p3) has a static partition column (p1) and two dynamic partition columns (p2 and p3). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's copied from the Databricks documentation: https://docs.databricks.com/spark/latest/spark-sql/language-manual/insert.html In general we can't just copy content from third party sources.
I think we'd have to check whether this is OK to contribute as OSS docs, and it might be. However there are references below to Databricks runtime that would not be appropriate or relevant.
Before going further, were any other of the sections here or in other PRs copied from elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refer to Databricks documentation but only directly copied this dynamic partition insert because I am not familiar with this part. I will check and rewrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen I removed dynamic partition insert. I also looked closely to make sure other insert files are OK. Could you please review one more time?
|
Test build #109527 has finished for PR 25525 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if the text here is written from scratch, that's OK. The form looks fine and looks reasonable to me. I don't know the semantics of these commands enough to evaluate if it's 100% correct, but looks good. As long as the info is drawn from what Spark supports rather than Hive, it looks directionally correct.
|
Test build #109557 has finished for PR 25525 at commit
|
|
|
||
| ### Description | ||
|
|
||
| The `INSERT INTO` statement inserts new rows into a table. The inserted rows can be specified by value expressions, or resulted from a query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao Wondering if this would sound better ?
The input rows for the INSERT statement can be produced by one of two ways :
- By value expressions
- By a query
|
|
||
| ### Syntax | ||
| {% highlight sql %} | ||
| INSERT INTO [TABLE] [db_name.]table_name [partition_spec] value_clause | query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao
Should it be { value_clause | query }
or
( value_clause | query ) ?
|
|
||
| ### Syntax | ||
| {% highlight sql %} | ||
| INSERT INTO [TABLE] [db_name.]table_name [partition_spec] value_clause | query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao Can you please check the grammar. I think we allow more cases ..
|
|
||
| INSERT INTO employees PARTITION (age = 35) | ||
| SELECT * FROM candidates WHERE name = "Bob Doe" | ||
| {% endhighlight %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao If we determine that we allow more syntax flavors .. could we please add one test for each ?
| Specify the values to be inserted. | ||
|
|
||
| #### ***query***: | ||
| A `SELECT` statement that provides the rows to be inserted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao would produces sound better ?
|
Test build #109792 has finished for PR 25525 at commit
|
| ### Examples | ||
| #### Single Row Insert Using a VALUES Clause | ||
| {% highlight sql %} | ||
| CREATE TABLE students (Name VARCHAR(64), Address VARCHAR(64), StudentID INT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao The SQLs in the example, should we terminate with a semi colon ? In case users would like to cut-paste into the shell ? What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should terminate with a semi colon. I checked several other SQL reference, all of the example SQLs end with a semi colon.
| Specifies the destination directory. | ||
|
|
||
| #### ***row_format***: | ||
| Specifies the row format for this insert. `SERDE` clause can be used to specify a custom `SERDE` for this insert. Alternatively, `DELIMITED` clause can be used to specify the native `SERDE` and state the delimiter, escape character, null character, and so on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao In here we specify the SERDE clause. How do we correlate this to the syntax diagram ? Is "ROW FORMAT" same as the "SERDE" clause ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add Valid options are `SERDE` clause and `DELIMITED` clause.
|
|
||
| The `INSERT INTO` statement inserts new rows into a table. The inserted rows can be specified by value expressions or result from a query. | ||
|
|
||
| See also: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao Do you think a "see also" or "Related statements (i have used this)" will be better at the end of the page ? After reading the description, personally i would prefer to see the syntax. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move to the end and also use "Related Statements". It sounds better than "see also".
| #### ***query***: | ||
| A query that produces the rows to be inserted. It can be in one of following formats: | ||
| - a `SELECT` statement | ||
| - a table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TABLE statement ?
|
@huaxingao Thanks.. this looks much better. I will follow your format :-) |
|
@huaxingao Is there a way we can delete the old screen shots ? We could also move the new screen-shots to the description part of the PR to make it easier for reviewers.. just a thought.. |
|
Test build #109837 has finished for PR 25525 at commit
|
|
LGTM cc @gatorsmile @srowen for final sign off. |
…BASE command(jira-28789)
|
@dilipbiswal I rebased and changed the code to make the parameters have the same format as your alter database cmd doc. |
|
Test build #109892 has finished for PR 25525 at commit
|
|
@huaxingao Thanks.. looks good to me .. |
| ### Examples | ||
| #### Single Row Insert Using a VALUES Clause | ||
| {% highlight sql %} | ||
| CREATE TABLE students (Name VARCHAR(64), Address VARCHAR(64), StudentID INT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change all the column names to lower case?
Name -> name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile
Changed. Thanks!
|
Test build #109898 has finished for PR 25525 at commit
|
|
LGTM Thanks! Merged to master. |
|
Thank you @gatorsmile @srowen @dilipbiswal for the help! |
What changes were proposed in this pull request?
Document INSERT statement in SQL Reference
Why are the changes needed?
To complete SQL reference.
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Tested using jykyll build --serve
Here are the screen shots: