Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove DFParser #4808

Open
tustvold opened this issue Jan 3, 2023 · 6 comments
Open

Remove DFParser #4808

tustvold opened this issue Jan 3, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@tustvold
Copy link
Contributor

tustvold commented Jan 3, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

DataFusion contains a custom DFParser that extends sqlparser and overrides its parsing of DESCRIBE TABLES and CREATE EXTERNAL TABLE.

Describe the solution you'd like

Looking at sqlparser the only functionality missing upstream is understanding the COMPRESSION keyword in CREATE EXTERNAL TABLE. This feels like a very simple upstream contribution that would then allow removing DFParser and its corresponding Statement abstraction.

Describe alternatives you've considered

We could not do this

Additional context

FYI @andygrove @alamb @liukun4515

@tustvold tustvold added the enhancement New feature or request label Jan 3, 2023
@tustvold tustvold closed this as completed Jan 3, 2023
@tustvold
Copy link
Contributor Author

tustvold commented Jan 3, 2023

Actually I think I would like to use Statement, to remove the DDL from LogicalPlan

@andygrove
Copy link
Member

Makes sense to me.

Also, just as an FYI, when DFParser was introduced, we did not have a good mechanism for selectively overriding parsing in sqlparser-rs but we do now. It should now be possible for us to implement a DataFusion Dialect and override parsing at the statement, prefix, and infix levels. See tests at https://github.com/sqlparser-rs/sqlparser-rs/blob/main/tests/sqlparser_custom_dialect.rs for examples. I'm not sure the impact of this approach would be on supporting multiple dialects though.

@tustvold tustvold reopened this Jan 3, 2023
@unconsolable
Copy link
Contributor

unconsolable commented Jan 5, 2023

There are some differences in datafusion's CreateExternalTable and sqlparser's CreateTable, for example has_header , delimiter, table_partition_cols and file_compression_type are not exist in sqlparser's CreateTable. Are there any way to create custom Statement in sqlparser?

@tustvold
Copy link
Contributor Author

tustvold commented Jan 5, 2023

The partitoning is under one of the Hive options, not sure about the others. I believe there is support for custom dialects, but in this case I'd vote for upstreaming anything missing - we don't have any especially esoteric needs AFAIK

@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 1, 2024

I'd like to take this on. I'll try do the migration in steps to make PRs more digestible and ensure no breakages/regressions.

First step is here: #8703

@alamb
Copy link
Contributor

alamb commented Jan 2, 2024

Related conversation in sqlparser-rs/sqlparser-rs#1080

From what I know, the CREATE EXTERNAL TABLE syntax is not based on any other dialect and thus is DataFusion specific. I think it may not belong upstream in sqlparser-rs

The COPY syntax I tried to base on DuckDB's copy syntax (which is different than Postgres), so if we added DuckDB style COPY support to sqlparser-rs we could probably migrate DataFusion to use it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants