Skip to content

Update iceberg to 0.13.1#11032

Merged
losipiuk merged 2 commits intotrinodb:masterfrom
homar:homar/update_iceberg
Feb 24, 2022
Merged

Update iceberg to 0.13.1#11032
losipiuk merged 2 commits intotrinodb:masterfrom
homar:homar/update_iceberg

Conversation

@homar
Copy link
Member

@homar homar commented Feb 14, 2022

Description

Updates apache iceberg version to 0.13.1

General information

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`5678`)

@cla-bot cla-bot bot added the cla-signed label Feb 14, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a Note, not TODO. CBO has no use for NaN counts today

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty map instead?

@findepi
Copy link
Member

findepi commented Feb 14, 2022

@homar are you aware of any changes impacting users that flows from this upgrade?

@homar
Copy link
Member Author

homar commented Feb 14, 2022

@homar are you aware of any changes impacting users that flows from this upgrade?

I am not. Main reason for this update is that it should allow us to implement row level deletes and updates for orc files.

@homar homar force-pushed the homar/update_iceberg branch 2 times, most recently from 60df744 to c8dcd1e Compare February 15, 2022 09:39
Copy link
Member

@electrum electrum Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are Iceberg metrics that are part of the Iceberg specification, and thus we should be writing them. The purpose of MetricsWrapper is to allow JSON serialization of Iceberg Metrics, so we should pass the value through here. The Parquet writer already collects NaN counts, and the ORC writer should be updated to collect them in the future.

Saying "Trino doesn't use them so we don't care" is hostile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace comment with

// TODO: Add nanValueCounts to ORC writer

@homar homar changed the title Update iceberg to 0.13.0 Update iceberg to 0.13.1 Feb 16, 2022
@homar homar force-pushed the homar/update_iceberg branch 2 times, most recently from 44e5b7c to 922ea65 Compare February 16, 2022 14:21
@homar
Copy link
Member Author

homar commented Feb 18, 2022

@electrum could you please take another look ?

@alexjo2144
Copy link
Member

One of the changes in 0.13 was to deprecate the write.folder-storage.path and write.object-storage.path properties, replacing them with write.data.path
apache/iceberg#3094

Can you add an appropriate test to TestIcebergSparkCompatibility similar to the existing one for the deprecated properties?
https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java#L822

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise looks good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add values so we test the serialization

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test NaN counts for Parquet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any tests for metrics for Parquet format. Nothing like TestIcebergOrcMetricsCollection exists for parquet. I will add a ticket to create such a test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath
Copy link
Contributor

In case that this PR gets merged only after #11062 please do remove the Iceberg related code for deleting table metadata files in io.trino.plugin.iceberg.TrinoHiveCatalog#dropTable

@homar homar force-pushed the homar/update_iceberg branch from 922ea65 to 33685cd Compare February 24, 2022 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants