Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Feb 9, 2021

This fixes Parquet row group filters when types have been promoted from int to long or from float to double.

The filters are passed the file schema after ids are added, which is used to convert dictionary values or lower/upper bounds. That conversion currently uses the file's types to deserialize, but the filter expression is bound to the table types. If the types differ, then comparison in the evaluator fails.

This updates the conversion to first deserialize the Parquet value and then promote it if the table's type has changed. Only int to long and float to double are needed because those are the only type promotions that use a different representation.

@rdblue rdblue requested a review from danielcweeks February 10, 2021 18:51
Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aokolnychyi aokolnychyi merged commit 5218f43 into apache:master Feb 13, 2021
@aokolnychyi
Copy link
Contributor

This looked correct to me so I merged it. @danielcweeks, let us know if you have any comments.

Thanks, @rdblue!

@danielcweeks
Copy link
Contributor

Thanks for reviewing @aokolnychyi

@rdblue rdblue added this to the Java 0.11.1 Release milestone Mar 4, 2021
aokolnychyi pushed a commit that referenced this pull request Mar 24, 2021
This fixes Parquet row group filters when types have been promoted from int to long or from float to double.

The filters are passed the file schema after ids are added, which is used to convert dictionary values or lower/upper bounds. That conversion currently uses the file's types to deserialize, but the filter expression is bound to the table types. If the types differ, then comparison in the evaluator fails.

This updates the conversion to first deserialize the Parquet value and then promote it if the table's type has changed. Only int to long and float to double are needed because those are the only type promotions that use a different representation.
coolderli pushed a commit to coolderli/iceberg that referenced this pull request Apr 26, 2021
This fixes Parquet row group filters when types have been promoted from int to long or from float to double.

The filters are passed the file schema after ids are added, which is used to convert dictionary values or lower/upper bounds. That conversion currently uses the file's types to deserialize, but the filter expression is bound to the table types. If the types differ, then comparison in the evaluator fails.

This updates the conversion to first deserialize the Parquet value and then promote it if the table's type has changed. Only int to long and float to double are needed because those are the only type promotions that use a different representation.
cwsteinbach added a commit to cwsteinbach/apache-iceberg that referenced this pull request Aug 17, 2021
cwsteinbach added a commit that referenced this pull request Aug 19, 2021
* Add 0.12.0 release notes pt 2

* Add more blurbs and fix formatting.

- Add blurbs for #2565, #2583, and #2547.
- Make formatting consistent.

* Add blurb for #2613 Hive Vectorized Reader

* Reword blurbs for #2565 and #2365

* More changes based on review comments

* More updates to the 0.12.0 release notes

* Add blurb for #2232 fix parquet row group filters

* Add blurb for #2308
lrvingzhou-tx pushed a commit to BKBASE-Plugin/iceberg that referenced this pull request Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants