Skip to content

Data: Add GenericFileWriterFactory#9267

Merged
aokolnychyi merged 3 commits intoapache:mainfrom
aokolnychyi:generic-file-writer-factory
Dec 12, 2023
Merged

Data: Add GenericFileWriterFactory#9267
aokolnychyi merged 3 commits intoapache:mainfrom
aokolnychyi:generic-file-writer-factory

Conversation

@aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Dec 10, 2023

This PR adds GenericFileWriterFactory, similar to FlinkFileWriterFactory and SparkFileWriterFactory. This is a new API that should be used in favor of methods in FileAppenderFactory for creating data, equality and position delete writers. Spark, for instance, has migrated to FileWriterFactory a long time ago.

This PR migrates all tests to use this API for writing test files and adds a new test suite.

@github-actions github-actions bot added the data label Dec 10, 2023
@aokolnychyi aokolnychyi force-pushed the generic-file-writer-factory branch from 258d3d5 to a049d96 Compare December 10, 2023 07:16
@github-actions github-actions bot added the flink label Dec 11, 2023

Row binaryCol =
Row.of(
59L,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column sizes have changed cause the new writer picks up all table properties, which was not true before. I validated the actual values are correct using parquet-tools.

Copy link
Member

@szehon-ho szehon-ho Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should have originally picked these up from the file itself, instead of hard-coding it, but its a suggestion for another PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we should do that eventually.

@aokolnychyi
Copy link
Contributor Author

@nastra @Fokko @flyrain @amogh-jahagirdar, could you check this one?

super(TABLE_FORMAT_VERSION);
this.fileFormat = fileFormat;
this.partitioned = partitioned;
this.dataRows =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to move the initialization after the table creation.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. Whose the audience of this, is it generics? Still in early stages of re-ramping up and so may miss some context

@aokolnychyi
Copy link
Contributor Author

@szehon-ho, I faced some test failures in other PRs because of issues in GenericFileAppenderFactory. It was not that easy to fix that class so I went ahead and added GenericFileWriterFactory.

@aokolnychyi aokolnychyi merged commit 0c5b87a into apache:main Dec 12, 2023
@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @szehon-ho!

lisirrx pushed a commit to lisirrx/iceberg that referenced this pull request Jan 4, 2024
geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024
devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants