Parquet: Fix map projection after map to key_value rename #3309

rdblue · 2021-10-18T19:56:36Z

This fixes the Parquet map projection bug introduced by apache/parquet-java#798

The projection code in Iceberg would create map projections by using the Parquet Types.map builder. But, the type created by this builder changed by renaming the key-value pair, map to key_value, so the projection was no longer valid for Parquet files. As a result, Parquet would not project the map column and loading it would fail with an error like this:

Caused by: java.lang.IllegalArgumentException: [mapCol, map, key] required binary key (STRING) = 2 is not in the store: [] 1000
        at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ColumnChunkPageReadStore.getPageReader(ColumnChunkPageReadStore.java:272)
        at org.apache.iceberg.parquet.ParquetValueReaders$PrimitiveReader.setPageSource(ParquetValueReaders.java:185)
        at org.apache.iceberg.parquet.ParquetValueReaders$RepeatedKeyValueReader.setPageSource(ParquetValueReaders.java:529)
        at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.setPageSource(ParquetValueReaders.java:685)

The solution is to copy the map structure and ensure that the names are preserved rather than generated.

Closes #2962.

RussellSpitzer

LGTM, Is there a non painful way we can add a test for this? Seems like manually testing it requires loading up a Parquet Version which defines map types with slightly different names?

rdblue · 2021-10-18T22:50:12Z

We can add a test for PruneColumns directly that uses an alternative map structure. The only issue is that I had some trouble building the map type without going through the Types API. But there's probably a different way to do it that I was missing.

rdblue · 2021-10-19T00:06:16Z

Added the missing tests. I'll merge this when tests are passing.

RussellSpitzer · 2021-10-19T01:44:46Z

parquet/src/test/java/org/apache/iceberg/parquet/TestPruneColumns.java

+                    .addField(Types.primitive(PrimitiveTypeName.DOUBLE, Type.Repetition.REQUIRED).id(5).named("y"))
+                    .addField(Types.primitive(PrimitiveTypeName.DOUBLE, Type.Repetition.REQUIRED).id(6).named("z"))
+                    .id(3)
+                    .named("value"))


This is quite the type declaration. Love it.

kbendick

Great work @rdblue.

kbendick · 2021-10-19T03:45:08Z

parquet/src/test/java/org/apache/iceberg/parquet/TestPruneColumns.java

+                    .addField(Types.primitive(PrimitiveTypeName.DOUBLE, Type.Repetition.REQUIRED).id(5).named("y"))
+                    .addField(Types.primitive(PrimitiveTypeName.DOUBLE, Type.Repetition.REQUIRED).id(6).named("z"))
+                    .id(3)
+                    .named("value"))


This is quite the type declaration. Love it.

kbendick · 2021-10-19T03:46:13Z

parquet/src/test/java/org/apache/iceberg/parquet/TestPruneColumns.java

+  }
+
+  @Test
+  public void testListElementName() {


Maybe testListElementDoesNotAssumeName?

If I were making another change I'd probably update this, but I think it's pretty minor and I'd like to get this in to make it possible to release 0.12.1.

kbendick · 2021-10-19T03:47:29Z

Added the missing tests. I'll merge this when tests are passing.

These are great tests and this is great work. Thank you!

Parquet: Fix map projection after map to key_value rename.

5a0588d

github-actions bot added the parquet label Oct 18, 2021

rdblue linked an issue Oct 18, 2021 that may be closed by this pull request

Parquet 1.11.1 update causes regressions while reading iceberg data written with v1.11.0 #2962

Closed

RussellSpitzer approved these changes Oct 18, 2021

View reviewed changes

rdblue added this to the Java 0.12.1 Release milestone Oct 18, 2021

Parquet: Add tests for PruneColumns map and list cases.

5896608

RussellSpitzer reviewed Oct 19, 2021

View reviewed changes

kbendick approved these changes Oct 19, 2021

View reviewed changes

rdblue merged commit edc6985 into apache:master Oct 19, 2021

rdblue mentioned this pull request Oct 19, 2021

Parquet 1.11.1 update causes regressions while reading iceberg data written with v1.11.0 #2962

Closed

kbendick pushed a commit to kbendick/iceberg that referenced this pull request Oct 27, 2021

Parquet: Fix map projection after map to key_value rename (apache#3309)

1525b94

kbendick pushed a commit to kbendick/iceberg that referenced this pull request Oct 27, 2021

Parquet: Fix map projection after map to key_value rename (apache#3309)

0f64158

kbendick pushed a commit to kbendick/iceberg that referenced this pull request Oct 28, 2021

Parquet: Fix map projection after map to key_value rename (apache#3309)

96030e0

rdblue added a commit that referenced this pull request Oct 29, 2021

Parquet: Fix map projection after map to key_value rename (#3309)

a7859dc

izchen pushed a commit to izchen/iceberg that referenced this pull request Dec 7, 2021

Parquet: Fix map projection after map to key_value rename (apache#3309)

050b32a

kbendick mentioned this pull request Dec 13, 2021

Fix Iceberg's parquet reader returning nulls incorrectly for parquet files written by writers that don't use list and element as names. #3723

Merged

Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Dec 15, 2021

Parquet: Fix map projection after map to key_value rename apache#3309

9c306d4

Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Dec 17, 2021

Parquet: Fix map projection after map to key_value rename apache#3309

a27c6d4

islamismailov mentioned this pull request May 26, 2022

PARQUET-2069: Allow list and array record types to be compatible. apache/parquet-java#957

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parquet: Fix map projection after map to key_value rename #3309

Parquet: Fix map projection after map to key_value rename #3309

Uh oh!

rdblue commented Oct 18, 2021 •

edited

Loading

Uh oh!

RussellSpitzer left a comment

Uh oh!

rdblue commented Oct 18, 2021

Uh oh!

rdblue commented Oct 19, 2021

Uh oh!

RussellSpitzer Oct 19, 2021

Uh oh!

kbendick Oct 19, 2021

Uh oh!

kbendick left a comment

Uh oh!

kbendick Oct 19, 2021

Uh oh!

kbendick Oct 19, 2021

Uh oh!

rdblue Oct 19, 2021

Uh oh!

kbendick commented Oct 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Parquet: Fix map projection after map to key_value rename #3309

Parquet: Fix map projection after map to key_value rename #3309

Uh oh!

Conversation

rdblue commented Oct 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

rdblue commented Oct 18, 2021

Uh oh!

rdblue commented Oct 19, 2021

Uh oh!

RussellSpitzer Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick left a comment

Choose a reason for hiding this comment

Uh oh!

kbendick Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

rdblue Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick commented Oct 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rdblue commented Oct 18, 2021 •

edited

Loading