Skip to content

Conversation

@huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Aug 21, 2019

What changes were proposed in this pull request?

Document INSERT statement in SQL Reference

Why are the changes needed?

To complete SQL reference.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Tested using jykyll build --serve

Here are the screen shots:

image

image

Screen Shot 2019-08-27 at 5 01 48 PM

Screen Shot 2019-08-27 at 5 03 22 PM

image

Screen Shot 2019-08-27 at 5 05 13 PM

image

Screen Shot 2019-08-27 at 5 07 19 PM

image

image

image

image

image

image

image

@SparkQA
Copy link

SparkQA commented Aug 21, 2019

Test build #109468 has finished for PR 25525 at commit b1f189e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 21, 2019

Test build #109511 has finished for PR 25525 at commit d80bf4d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

@srowen Could you please review this one too? Thanks a lot in advance!


### Description

When the partition value is not provided, such inserts are called as the dynamic partition inserts, also called as multi-partition inserts. In partition spec, the partition column values are optional. When the values are not given, these columns are referred to as dynamic partition columns; otherwise, they are static partition columns. For example, the partition spec (p1 = 3, p2, p3) has a static partition column (p1) and two dynamic partition columns (p2 and p3).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's copied from the Databricks documentation: https://docs.databricks.com/spark/latest/spark-sql/language-manual/insert.html In general we can't just copy content from third party sources.

I think we'd have to check whether this is OK to contribute as OSS docs, and it might be. However there are references below to Databricks runtime that would not be appropriate or relevant.

Before going further, were any other of the sections here or in other PRs copied from elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refer to Databricks documentation but only directly copied this dynamic partition insert because I am not familiar with this part. I will check and rewrite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen I removed dynamic partition insert. I also looked closely to make sure other insert files are OK. Could you please review one more time?

@SparkQA
Copy link

SparkQA commented Aug 21, 2019

Test build #109527 has finished for PR 25525 at commit 6966262.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if the text here is written from scratch, that's OK. The form looks fine and looks reasonable to me. I don't know the semantics of these commands enough to evaluate if it's 100% correct, but looks good. As long as the info is drawn from what Spark supports rather than Hive, it looks directionally correct.

@SparkQA
Copy link

SparkQA commented Aug 22, 2019

Test build #109557 has finished for PR 25525 at commit 4dc6caa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


### Description

The `INSERT INTO` statement inserts new rows into a table. The inserted rows can be specified by value expressions, or resulted from a query.
Copy link
Contributor

@dilipbiswal dilipbiswal Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao Wondering if this would sound better ?

The input rows for the INSERT statement can be produced by one of two ways :

  • By value expressions
  • By a query


### Syntax
{% highlight sql %}
INSERT INTO [TABLE] [db_name.]table_name [partition_spec] value_clause | query
Copy link
Contributor

@dilipbiswal dilipbiswal Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao
Should it be { value_clause | query }
or
( value_clause | query ) ?


### Syntax
{% highlight sql %}
INSERT INTO [TABLE] [db_name.]table_name [partition_spec] value_clause | query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao Can you please check the grammar. I think we allow more cases ..


INSERT INTO employees PARTITION (age = 35)
SELECT * FROM candidates WHERE name = "Bob Doe"
{% endhighlight %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao If we determine that we allow more syntax flavors .. could we please add one test for each ?

Specify the values to be inserted.

#### ***query***:
A `SELECT` statement that provides the rows to be inserted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao would produces sound better ?

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109792 has finished for PR 25525 at commit bd7677b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

### Examples
#### Single Row Insert Using a VALUES Clause
{% highlight sql %}
CREATE TABLE students (Name VARCHAR(64), Address VARCHAR(64), StudentID INT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao The SQLs in the example, should we terminate with a semi colon ? In case users would like to cut-paste into the shell ? What do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should terminate with a semi colon. I checked several other SQL reference, all of the example SQLs end with a semi colon.

Specifies the destination directory.

#### ***row_format***:
Specifies the row format for this insert. `SERDE` clause can be used to specify a custom `SERDE` for this insert. Alternatively, `DELIMITED` clause can be used to specify the native `SERDE` and state the delimiter, escape character, null character, and so on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao In here we specify the SERDE clause. How do we correlate this to the syntax diagram ? Is "ROW FORMAT" same as the "SERDE" clause ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add Valid options are `SERDE` clause and `DELIMITED` clause.


The `INSERT INTO` statement inserts new rows into a table. The inserted rows can be specified by value expressions or result from a query.

See also:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao Do you think a "see also" or "Related statements (i have used this)" will be better at the end of the page ? After reading the description, personally i would prefer to see the syntax. What do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move to the end and also use "Related Statements". It sounds better than "see also".

#### ***query***:
A query that produces the rows to be inserted. It can be in one of following formats:
- a `SELECT` statement
- a table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TABLE statement ?

@dilipbiswal
Copy link
Contributor

@huaxingao Thanks.. this looks much better. I will follow your format :-)

@dilipbiswal
Copy link
Contributor

dilipbiswal commented Aug 27, 2019

@huaxingao Is there a way we can delete the old screen shots ? We could also move the new screen-shots to the description part of the PR to make it easier for reviewers.. just a thought..

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

Test build #109837 has finished for PR 25525 at commit 5724edd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

LGTM

cc @gatorsmile @srowen for final sign off.

@huaxingao
Copy link
Contributor Author

@dilipbiswal I rebased and changed the code to make the parameters have the same format as your alter database cmd doc.

@SparkQA
Copy link

SparkQA commented Aug 29, 2019

Test build #109892 has finished for PR 25525 at commit caacd58.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

@huaxingao Thanks.. looks good to me ..

### Examples
#### Single Row Insert Using a VALUES Clause
{% highlight sql %}
CREATE TABLE students (Name VARCHAR(64), Address VARCHAR(64), StudentID INT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change all the column names to lower case?

Name -> name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile
Changed. Thanks!

@SparkQA
Copy link

SparkQA commented Aug 29, 2019

Test build #109898 has finished for PR 25525 at commit 43de251.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

gatorsmile commented Aug 29, 2019

LGTM

Thanks! Merged to master.

@huaxingao
Copy link
Contributor Author

Thank you @gatorsmile @srowen @dilipbiswal for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants