Skip to content

Implement Iceberg OPTIMIZE#10497

Merged
findepi merged 4 commits intotrinodb:masterfrom
findepi:findepi/iceberg-optimize
Jan 13, 2022
Merged

Implement Iceberg OPTIMIZE#10497
findepi merged 4 commits intotrinodb:masterfrom
findepi:findepi/iceberg-optimize

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Jan 7, 2022

No description provided.

@cla-bot cla-bot bot added the cla-signed label Jan 7, 2022
@findepi findepi force-pushed the findepi/iceberg-optimize branch from fdd6530 to c817478 Compare January 7, 2022 13:13
@alexjo2144
Copy link
Copy Markdown
Member

Just clarifying before I start reading this. This is specifically compaction of V1 tables which cannot contain positional or equality based delete markers?

@alexjo2144
Copy link
Copy Markdown
Member

The SparkSQL procedure is called rewrite_data_files should we name this procedure to match? https://github.com/apache/iceberg/blob/master/site/docs/spark-procedures.md?plain=1#L247

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Jan 10, 2022

This is specifically compaction of V1 tables which cannot contain positional or equality based delete markers?

Yes, but only because the reader doesn't support positional or equality based delete markers today.

Once reader has support for them, this should work with v2 tables.

The SparkSQL procedure is called rewrite_data_files should we name this procedure to match?

Thanks for the pointer. "rewrite files" feels low-level description of what the operation does (today), and "optimize" describes (or hints at) the intent.

Integration tests rarely interact with Hadoop FS directly, so
`org.apache.hadoop.fs.Path` is uncommon. This allows to import
`java.nio.file.Path`.
@findepi findepi force-pushed the findepi/iceberg-optimize branch from c817478 to 55599a0 Compare January 10, 2022 15:54
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert we should not ever get one empty and other not? Feels like a bug situation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanned file list may be non empty, but resulting data may be empty, if input files were empty.

@findepi findepi force-pushed the findepi/iceberg-optimize branch from 55599a0 to 410e4fb Compare January 12, 2022 16:43
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Jan 13, 2022

CI #10583

@findepi findepi merged commit f0c67f0 into trinodb:master Jan 13, 2022
@findepi findepi deleted the findepi/iceberg-optimize branch January 13, 2022 08:09
@findepi findepi mentioned this pull request Jan 13, 2022
@github-actions github-actions bot added this to the 369 milestone Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants