Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate OutputFileFormat and FileType #7323

Closed
alamb opened this issue Aug 17, 2023 · 1 comment · Fixed by #7336
Closed

Consolidate OutputFileFormat and FileType #7323

alamb opened this issue Aug 17, 2023 · 1 comment · Fixed by #7336
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Aug 17, 2023

Is your feature request related to a problem or challenge?

#7283 (comment) added a OutputFileFormat enum that is very similar to the existing FileType enum:

https://github.com/apache/arrow-datafusion/blob/f2c0100a5a10bf3ea166a1a590d94b8c9b6cf673/datafusion/expr/src/logical_plan/dml.rs#L45-L53

Describe the solution you'd like

Move FileType into datafusion_common and remove OutputFileFormat so it could be used by both the logical plan and datasource?

The tricky bit of this PR will be disentangling FileType from the code that instantiates compression: https://github.com/apache/arrow-datafusion/blob/f2c0100a5a10bf3ea166a1a590d94b8c9b6cf673/datafusion/core/src/datasource/file_format/file_type.rs#L287

I think we could move FileType over, however, and define some trait with get_ext_with_compression for the appropriate code

Describe alternatives you've considered

No response

Additional context

@devinjdangelo and I discussed this here: #7283 (comment)

I think this would be a good first issue as it is primarily a code movement thing, doesn't require intimate knowledge of DataFusion's code, and would be a good introduction to the development workflow.

@devinjdangelo
Copy link
Contributor

The tricky bit of this PR will be disentangling FileType from the code that instantiates compression:

The idea I went with here is moving the entire file_type.rs file to common i.e.:

git mv datafusion/core/src/datasource/file_format/file_type.rs datafusion/common/src/

Loads of file imports to clean up after that, but it seems like it worked (PR: #7336 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
2 participants