Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Mar 6, 2025

This updates Flink's Avro and Parquet readers to support new timestamp(9) and unknown types.

While enabling DataTest cases, I found that supportsDefaultValues was not enabled so default value tests were not running for Avro. After I enabled those tests, I also needed to update the RowData assertions and also convert values to match Flink's object model in the readers by calling RowDataUtil.convertConstant.

@github-actions github-actions bot added the flink label Mar 6, 2025
@rdblue
Copy link
Contributor Author

rdblue commented Mar 6, 2025

I'll also follow up with a PR for Parquet readers, but that depends on changes in #12463.

@Fokko Fokko self-requested a review March 6, 2025 19:56
@rdblue rdblue force-pushed the flink-readers-writers branch from 5f505e7 to 0af3f01 Compare March 6, 2025 20:53
@github-actions github-actions bot added the API label Mar 6, 2025
@rdblue
Copy link
Contributor Author

rdblue commented Mar 6, 2025

#12463 was merged and the changes for Parquet were small, so I included them here.

@rdblue rdblue changed the title Flink: Support Avro timestamp(9), unknown, and defaults Flink: Support Avro and Parquet timestamp(9), unknown, and defaults Mar 6, 2025
} else {
return Optional.of(new MicrosToTimestampReader(desc));
}
return Optional.of(new MicrosToTimestampReader(desc));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the readers were converting values to LocalDateTime or OffsetDateTime and then Flink would convert those values back to a (millis, nanosOfMilli) pair. This involved a lot of unnecessary date/time logic in both Iceberg and Flink as well as readers to produce the separate types.

Now, the conversion to Flink is direct and doesn't go through Java date/time classes. That avoids all time zone calculations and should be quicker.

LogicalTypeAnnotation annotation = primitive.getLogicalTypeAnnotation();
if (annotation != null) {
Optional<ParquetValueWriter<?>> writer =
annotation.accept(new LogicalTypeWriterBuilder(fType, desc));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to use the logical annotation visitor.

@rdblue rdblue changed the title Flink: Support Avro and Parquet timestamp(9), unknown, and defaults Flink 1.20: Support Avro and Parquet timestamp(9), unknown, and defaults Mar 6, 2025
Comment on lines +211 to +214
}
pos += 1;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a bit strange. Could we fix the empty lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dense control flow blocks, we often leave extra newlines to make it more readable.

Comment on lines -156 to -180
public void testNumericTypes() throws IOException {
List<Record> expected =
ImmutableList.of(
recordNumType(
2,
Integer.MAX_VALUE,
Float.MAX_VALUE,
Double.MAX_VALUE,
Long.MAX_VALUE,
1643811742000L,
1643811742000L,
1643811742000L,
10.24d),
recordNumType(
2,
Integer.MIN_VALUE,
Float.MIN_VALUE,
Double.MIN_VALUE,
Long.MIN_VALUE,
1643811742000L,
1643811742000L,
1643811742000L,
10.24d));

writeAndValidate(SCHEMA_NUM_TYPE, expected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be an interesting edge case to test in the TestData?
Min/max values for different columns? Could be especially interesting for timestamps/date etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cases are tested in the random data generator, so this is duplication.

@rdblue rdblue merged commit 7a572a9 into apache:main Mar 14, 2025
43 checks passed
@rdblue
Copy link
Contributor Author

rdblue commented Mar 14, 2025

Thanks for reviewing, @pvary and @danielcweeks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants