-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16217][SQL] Support SELECT INTO statement #14191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
| fromClause? | ||
| (WHERE where=booleanExpression)?) | ||
| | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause? | ||
| | ((kind=SELECT setQuantifier? namedExpressionSeq (intoClause? fromClause)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @wuxianxingkong .
Currently, the following seems to be not considered yet. Could you modify the syntax to support this too?
SELECT 1
INTO newtable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, @dongjoon-hyun , thank you for your advice.
SELECT 1
INTO newtableThis won't work because we need oldtable info to create newtable. So the sql should be
SELECT 1
INTO newtable
FROM oldtableThe result from my test is: a new table called newtable was created, one column called 1 has the length of oldtable.rows.length and all elements are 1.
Did you mean there is no FROM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Spark Shell, please run the followings.
sql("select 1")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun
At first, I modify grammar:

But it will affect multiInsertQueryBody rule, i.e.:
FROM OLD_TABLE
INSERT INTO T1
SELECT C1
INSERT INTO T2
SELECT C2The Syntax tree before adding intoClause is:

After adding intoClause ,the tree will be:
This is because INSERT is a nonreserved keyword and matching strategy of antlr.
One of the ways I can think of is to change grammar like this:

This can solve the problem because antlr parser chooses the alternative specified first.
The grammar can support "SELECT 1 INTO newtable" now.
But this will cause confusion about querySpecification rule because of the duplication. Is there any way to make the syntax less verbose?Thanks.
|
Hi, @wuxianxingkong . |
| // Add organization statements. | ||
| optionalMap(ctx.queryOrganization)(withQueryResultClauses). | ||
| // Add insert. | ||
| optionalMap(ctx.insertInto())(withInsertInto) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows for the following syntax:
INSERT INTO tbl_a
SELECT *
INTO tbl_a
FROM tbl_bMake sure that we cannot have both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to check what this does with multi-insert syntax, i.e.:
FROM tbl_a
INSERT INTO tbl_b
SELECT *
INSERT INTO tbl_c
SELECT *
INTO tbl_c2.Add check in multiinsertquery syntax:not allow multi insert and select into appear at the same time 3.Add check in singleinsertquery:not allow insert into and select into appear at the same time
| */ | ||
| protected def withSelectInto( | ||
| ctx: IntoClauseContext, | ||
| query: LogicalPlan): LogicalPlan = withOrigin(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why throwing a ParseException ?
|
@wuxianxingkong Are you still working on this? Thanks! |
|
We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks! |
## What changes were proposed in this pull request? This PR proposes to close stale PRs, mostly the same instances with apache#18017 I believe the author in apache#14807 removed his account. Closes apache#7075 Closes apache#8927 Closes apache#9202 Closes apache#9366 Closes apache#10861 Closes apache#11420 Closes apache#12356 Closes apache#13028 Closes apache#13506 Closes apache#14191 Closes apache#14198 Closes apache#14330 Closes apache#14807 Closes apache#15839 Closes apache#16225 Closes apache#16685 Closes apache#16692 Closes apache#16995 Closes apache#17181 Closes apache#17211 Closes apache#17235 Closes apache#17237 Closes apache#17248 Closes apache#17341 Closes apache#17708 Closes apache#17716 Closes apache#17721 Closes apache#17937 Added: Closes apache#14739 Closes apache#17139 Closes apache#17445 Closes apache#18042 Closes apache#18359 Added: Closes apache#16450 Closes apache#16525 Closes apache#17738 Added: Closes apache#16458 Closes apache#16508 Closes apache#17714 Added: Closes apache#17830 Closes apache#14742 ## How was this patch tested? N/A Author: hyukjinkwon <[email protected]> Closes apache#18417 from HyukjinKwon/close-stale-pr.
What changes were proposed in this pull request?
This PR implements the SELECT INTO statement.
The SELECT INTO statement selects data from one table and inserts it into a new table as follows.
This statement is commonly used in SQL but not currently supported in SparkSQL.
We investigated the Catalyst and found that this statement can be implemented by improving the grammar and reusing the logical plan of CTAS.
The related JIRA is https://issues.apache.org/jira/browse/SPARK-16217
How was this patch tested?
SQLQuerySuite.