Skip to content

Support setting Iceberg table format version#11880

Merged
findepi merged 1 commit intotrinodb:masterfrom
alexjo2144:iceberg/format-version
Apr 15, 2022
Merged

Support setting Iceberg table format version#11880
findepi merged 1 commit intotrinodb:masterfrom
alexjo2144:iceberg/format-version

Conversation

@alexjo2144
Copy link
Member

Description

Allows setting the table format for newly created Iceberg tables. Defaults to v1.

Is this change a fix, improvement, new feature, refactoring, or other?

Feature

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Allows creating either v1 or v2 Iceberg tables

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Iceberg
* Support creating tables with either Iceberg format version 1, or 2.

@cla-bot cla-bot bot added the cla-signed label Apr 8, 2022
@alexjo2144 alexjo2144 requested review from findepi and findinpath April 8, 2022 17:33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a new user perspective I think it would be useful to have a very high-level description about what 1 and 2 actually mean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to use for tables created by Trino

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for tables created by Trino

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this applies to CREATE TABLE, but also to SHOW CREATE, so should just describe what it is:

"Iceberg table format version"

@github-actions github-actions bot added the docs label Apr 8, 2022
@findepi
Copy link
Member

findepi commented Apr 11, 2022

What's the benefit?

Until v2 is fully supported, i don't think we should merge this.

@alexjo2144
Copy link
Member Author

What's the benefit?

There are some spec changes in v2 besides merge-on-read row level deletes. https://iceberg.apache.org/spec/#version-2 A user might want v2 tables for those reasons, but we can combine this with #11886 if you'd like

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is compatible, but please follow up to create v2 by default.
We are implementing v2 deletes, and are currently not working on copy-on-write delete mode (as required for v1), so v2 are better for end users.

Copy link
Contributor

@kbendick kbendick Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If copy-on-write isn't implemented, do consider defaulting write.format.default to merge-on-read then. Particularly for users who might interact with the table via other engines.

Copy link
Contributor

@kbendick kbendick Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there are 3 configurations coming up with respect to copy-on-write vs merge-on-read. These technically only apply to v2, but if you're implementing merge-on-read only, then I'd suggest considering updating these properties from the upstream default of copy-on-write:

  • write.delete.mode
  • write.update.mode
  • write.merge.mode

https://github.com/apache/iceberg/blob/a2392355e2a9afe757b700409882f26737a5062e/docs/tables/configuration.md?plain=1#L75-L79

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, at the moment neither merge-on-read or copy-on-write are supported so I didn't add those three options in here, but they are in the PR adding support for merge-on-read deletes: 8ebba99#diff-2af3e19a6b656640a7d0bb73114ef224953a2efa04e569b1fe4da953b2cc6d15R411

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this applies to CREATE TABLE, but also to SHOW CREATE, so should just describe what it is:

"Iceberg table format version"

@alexjo2144 alexjo2144 force-pushed the iceberg/format-version branch from 65a6a5c to e529efa Compare April 13, 2022 17:36
@alexjo2144
Copy link
Member Author

@findepi addressed comments, @findinpath added to the docs a little, explaining that v2 is required for row level deletes and linking the Iceberg documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://iceberg.apache.org/spec/#version-2 would be a correct link.
However, i am fine with omitting the whole sentence, we can consider this out of scope of this document.

From Trino user perspective, we should say that "2" is required for DML.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Trino user perspective, we should say that "2" is required for DML

I was going to put that, but not until after DML is in

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the last sentence

@alexjo2144 alexjo2144 force-pushed the iceberg/format-version branch from e529efa to dba240d Compare April 13, 2022 19:01
@findepi findepi merged commit f080e8c into trinodb:master Apr 15, 2022
@findepi findepi added the no-release-notes This pull request does not require release notes entry label Apr 15, 2022
@github-actions github-actions bot added this to the 378 milestone Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed docs no-release-notes This pull request does not require release notes entry

Development

Successfully merging this pull request may close these issues.

4 participants