Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement serde for CSV and Parquet FileSinkExec #8646

Merged
merged 8 commits into from
Dec 29, 2023

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Dec 24, 2023

Which issue does this PR close?

Closes #8645

Rationale for this change

Needed by Ballista, so that we can support DataFrame:write_xxx again.

What changes are included in this PR?

Implement serde for CSV and Parquet FileSinkExec, based on existing support for JSON FileSinkExec.

Are these changes tested?

Yes, new tests added.

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Dec 24, 2023
@andygrove andygrove changed the title WIP: Implement serde for CSV and Parquet FileSinkExec Implement serde for CSV and Parquet FileSinkExec Dec 29, 2023
@andygrove andygrove marked this pull request as ready for review December 29, 2023 18:30
@andygrove
Copy link
Member Author

@devinjdangelo fyi, this is now ready for review

@@ -1220,20 +1222,22 @@ message ParquetWriterOptions {
}

message CsvWriterOptions {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not released a version of DataFusion that contains CsvWriterOptions yet, so it is safe to change the field numbers here.

@devinjdangelo
Copy link
Contributor

This LGTM @andygrove! I am planning to attempt a more significant refactor in #8667 as discussed with @alamb and might be able to simplify adding support for some of the more advanced file writing options, including externally defined file types is the goal.

@andygrove
Copy link
Member Author

Thanks for the reviews @avantgardnerio and @devinjdangelo

@andygrove andygrove merged commit 7fc663c into apache:main Dec 29, 2023
3 checks passed
@andygrove andygrove deleted the serde-csv-parquet-sink branch December 29, 2023 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add serde support for CSV and Parquet FileSinkExec nodes
3 participants