-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aligning COPY TO and CREATE TABLE Syntax #9369
Comments
Related: #4808
I was intending on trying to move some of the syntax into sqlparser-rs (before I got distracted by other work), but would be happy to get more thoughts on this |
The current copy statement syntax is based on duckdb https://duckdb.org/docs/sql/statements/copy.html. Postgres also has a similar copy to syntax, though their options list is not as flexible and based on fixed keywords. I think there is a middle ground between these two approaches where dfparser could parse some special keywords in the option list and pass through anything else as a HashMap or Vec. #9274 is related. I'm also not opposed to pulling keywords out of the option list entirely to make copy support similar syntax to create external table. The flexible options list would have to stay of course to support format specific options like parquet row group size, but partition_by could be pulled out. Something like COPY table
TO 'file.parquet'
PARTITION BY (col1, col2)
(row_group_size 123) |
@devinjdangelo After reviewing DuckDB, I agree with you. Our COPY To syntax closely resembles theirs. However, DuckDB seems to have a self-consistent design, underscoring the importance of maintaining consistency in our SQL syntax too. This is how they handle the CSV imports While borrowing designs from other engines often works well because it minimizes unexpected elements, our user behaviour isn't uniformly consistent. There are likely multiple approaches to address it. I've suggested one method; I'm keen to hear what the community thinks as well. cc @alamb @Dandandan |
Thank you for brining this up @metesynnada
I agree adding specific syntax to the COPY statement to better align with CREATE EXTERNAL TABLE is a good idea and I like the example in #9369 (comment) cc @andygrove who may have additional context on the original CREATE EXTERNAL TABLE syntax |
Can we align with CREATE EXTERNAL TABLE syntax, what do you think? |
I think one challenge is that If we are to align |
Datafusion owns the COPY statement, so this change will be quite easy. I am preparing a PR for this. |
Is your feature request related to a problem or challenge?
There is a noticeable inconsistency between the syntax of
COPY TO
andCREATE EXTERNAL TABLE
commands, particularly in how they handle the specification of data format and partitioning. TheCOPY TO
command allows for these details to be included as part of a flexible options list, whileCREATE EXTERNAL TABLE
parses these details directly from the SQL statement.Detailed Description:
COPY TO Command Syntax:
In this command, the format and partitioning are specified as options within the command itself.
CREATE EXTERNAL TABLE Command Syntax:
Here, the format and partitioning details are part of the SQL command's structure, without the flexibility offered by an options list.
This syntactical inconsistency creates a learning curve and potential for confusion, as users must adapt to two different methods for specifying critical information like data format and partitioning, depending on the command they are using.
Describe the solution you'd like
I propose that a unified syntax approach be adopted for both commands. This could mean revising
COPY TO
to align with the direct parsing method ofCREATE EXTERNAL TABLE
. Additionally, updating the parser to support this unified syntax and providing detailed documentation for the same would be beneficial.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: