Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make/Remove FileType enum and replace with a trait #8657

Closed
alamb opened this issue Dec 26, 2023 · 2 comments · Fixed by #10499
Closed

Make/Remove FileType enum and replace with a trait #8657

alamb opened this issue Dec 26, 2023 · 2 comments · Fixed by #10499
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 26, 2023

Is your feature request related to a problem or challenge?

DataFusion as a neat ListingTable abstraction that offers the ability to read (and now write) multiple files in a directory (among other features)

DataFusion comes with built in support for Avro, Parquet, Arrow, CSV, and JSON files.

However, with the introduction of the ability to write to such files, we have inadvertently made it impossible for users to add support for their own formats which has been identified in several reports

I think we lost this ability, as pointed out on #8637 due to the fact that FileFormat::file_type trait now takes a FileType which is an enum and hence can not be extended.

I also have a longer term goal of extracting listing table out of the core of DataFusion (as it is just a (very specialized) TableProvider)

Describe the solution you'd like

I suggest we should use traits to extend FileType as we have done in other areas of the code.

When this is done, we should also make an end to end test case / example showing how a user can create support their owne custom file formats in ListingTable so that we don't cause a regression in functionality like this again in the future.

Describe alternatives you've considered

One potential design is to make FileType a trait rather than an enum.

I looked briefly into this, and it will likely require:

  1. converting other structures like FileTypeWriterOptions into traits (or incorporating them into the FileType trait).
  2. Sorting out how to handle serialization as pointed out by @tustvold on enable users to add support for LIstingTable / object_store table formats of different types #8345 (comment)

Another slightly different alternate design would be to incorporate all the functionality of FileType into the existing FileFormat as suggested by @devinjdangelo on #8345 (comment)

Additional context

No response

@devinjdangelo
Copy link
Contributor

@alamb @tychoish I took a stab at resolving this in #8667 . I'm also attempting to support the write path (e.g. a listing table backed by a custom FileFormat which supports insert into).

@alamb
Copy link
Contributor Author

alamb commented Dec 28, 2023

@alamb @tychoish I took a stab at resolving this in #8667 . I'm also attempting to support the write path (e.g. a listing table backed by a custom FileFormat which supports insert into).

Thank you @devinjdangelo -- I plan a whirlwind PR review blitz this afternoon but may not get to this until tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants