[iceberg] Add UUID type support#23627
Conversation
49f7ab3 to
5343fb6
Compare
|
Nit suggestion for the release note entry to follow the Order of changes in the Release Notes Guidelines: |
|
Fixed, thanks Steve! |
5343fb6 to
70b6f07
Compare
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public int hashCode() | ||
| { | ||
| return UUID.hashCode(); |
There was a problem hiding this comment.
so this is a singleton? If so the equals method can be simpler. Otherwise this hashCode method is incorrect
There was a problem hiding this comment.
TypeInfo is a hive-specific class that I am implementing because it does not natively support UUIDs. This implementation follows the same primitive paradigm as the other PrimitiveTypeInfo which exist in the Apache Hive's implementation
There was a problem hiding this comment.
If I'm reading this right the hashCode is a constant, which only makes sense if the object is a singleton. Am I missing something? (always possible)
There was a problem hiding this comment.
I am just matching the same implementation used by the rest of the TypeInfo class implementations to maintain compatibility with Hive. I don't think we should be deviating from their implementation style at risk of introducing unknown issues.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/ExpressionConverter.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Outdated
Show resolved
Hide resolved
| long value = random.nextLong(); | ||
| writer.writeLong(value); | ||
| addedValues.add(value); | ||
| value = random.nextLong(); |
There was a problem hiding this comment.
no random values please; pick an arbitrary one
There was a problem hiding this comment.
The entire test class revolves around generating random values for different primitive types. I don't think we should go against the grain here. Changing this entire class to not use random values is out of scope for this PR.
There was a problem hiding this comment.
I'm not asking for the entire class to change. I'm asking that this PR not make the problem worse. Random values in unit tests are a major source of flakiness. Please don't do this.
There was a problem hiding this comment.
I am not going to change this because it requires re-writing a lots of boilerplate of code to perform the same assertions. I filed an issue for someone to fix this class in the future: #23840
There was a problem hiding this comment.
Instead of random values why not just i, -79Li, 237Li?
70b6f07 to
29fa113
Compare
| @Override | ||
| public int hashCode() | ||
| { | ||
| return UUID.hashCode(); |
There was a problem hiding this comment.
If I'm reading this right the hashCode is a constant, which only makes sense if the object is a singleton. Am I missing something? (always possible)
| long value = random.nextLong(); | ||
| writer.writeLong(value); | ||
| addedValues.add(value); | ||
| value = random.nextLong(); |
There was a problem hiding this comment.
I'm not asking for the entire class to change. I'm asking that this PR not make the problem worse. Random values in unit tests are a major source of flakiness. Please don't do this.
29fa113 to
fb53070
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for support UUID type in Iceberg connector. Should we add the mapping of UUID between PrestoDB type and Iceberg type to the chapter Type mapping in iceberg document?
fb53070 to
7be1ecb
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
hantangwangd
left a comment
There was a problem hiding this comment.
The change overall looks good to me. Some nits and little problems.
presto-parquet/src/main/java/com/facebook/presto/parquet/cache/MetadataReader.java
Show resolved
Hide resolved
...o-parquet/src/main/java/com/facebook/presto/parquet/writer/valuewriter/UuidValuesWriter.java
Show resolved
Hide resolved
...o-parquet/src/main/java/com/facebook/presto/parquet/writer/valuewriter/UuidValuesWriter.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/main/java/com/facebook/presto/parquet/ColumnReaderFactory.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/main/java/com/facebook/presto/parquet/ColumnReaderFactory.java
Outdated
Show resolved
Hide resolved
...ebook/presto/parquet/batchreader/decoders/plain/FixedLenByteArrayUuidPlainValuesDecoder.java
Outdated
Show resolved
Hide resolved
...ava/com/facebook/presto/parquet/batchreader/decoders/rle/UuidRLEDictionaryValuesDecoder.java
Outdated
Show resolved
Hide resolved
...ava/com/facebook/presto/parquet/batchreader/decoders/rle/UuidRLEDictionaryValuesDecoder.java
Outdated
Show resolved
Hide resolved
7be1ecb to
e1f0034
Compare
...o-parquet/src/main/java/com/facebook/presto/parquet/writer/valuewriter/UuidValuesWriter.java
Show resolved
Hide resolved
...o-parquet/src/main/java/com/facebook/presto/parquet/writer/valuewriter/UuidValuesWriter.java
Outdated
Show resolved
Hide resolved
e002737 to
74ac591
Compare
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
adding these blank lines should be a separate PR, if it's needed at all. (If checkstyle doesn't complain it probably isn't)
There was a problem hiding this comment.
They are caught by checkstyle. They are added automatically by the templating tool which generates the files, but weren't caught before because they existed in generated-sources. I tried a large combination of whitespace-control directives from freemarker but was unsuccessful in removing them. I can split these changes into a separate commit on this PR. The commit will be preserved when this PR merges.
Let me know if you think that is acceptable
There was a problem hiding this comment.
I'd go even further and make this a separate PR that goes in first since there are no other changes in this file.
There was a problem hiding this comment.
What's the benefit of making a separate PR? Commits in a PR are preserved when merging to master
74ac591 to
5c9def7
Compare
yingsu00
left a comment
There was a problem hiding this comment.
Haven't finished reviewing, will continue later tonight.
presto-parquet/src/main/java/com/facebook/presto/parquet/reader/BinaryColumnReader.java
Outdated
Show resolved
Hide resolved
891af15
5c9def7 to
891af15
Compare
...ebook/presto/parquet/batchreader/decoders/plain/FixedLenByteArrayUuidPlainValuesDecoder.java
Outdated
Show resolved
Hide resolved
...ebook/presto/parquet/batchreader/decoders/plain/FixedLenByteArrayUuidPlainValuesDecoder.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
...ava/com/facebook/presto/parquet/batchreader/decoders/rle/UuidRLEDictionaryValuesDecoder.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/BenchmarkUuidColumnReader.java
Outdated
Show resolved
Hide resolved
Previously, the code generation step was removed in 7c814ae However, this makes it more difficult to add new readers in the future, such as for UUIDs. This change adds the code generation step as an optional profile which can be enabled when building the parquet module through -PgenerateParquetReaders
891af15 to
069e2b5
Compare
|
Thanks for your detailed review @yingsu00 . Here are the new performance numbers from the benchmark original previous update (+50-100% in non-batch reader from original) newest update (+62% improvement in plain batch reader) |
069e2b5 to
c8ec2e4
Compare
The iceberg spec lists uuid as a valid schema type. Presto supports UUID types but there was no support for reading or writing them in the connector. This commit makes the necessary changes in the connector to create tables with UUID columns and support for UUIDs in the parquet reader. This includes an implementation for UUIDs in the batchreader.
c8ec2e4 to
365b88e
Compare
Description
This PR adds support for reading and writing UUIDs on Iceberg tables with all available catalogs. In order to support this we also needed improvements to the parquet reader and writer for Parquet's UUID logical type.
Motivation and Context
Presto has type support for UUIDs. We should support reading and writing them in some of the connectors.
Impact
Test Plan
I also added a benchmark for reading and writing UUID types and compared it to our current LongDecimal benchmark to see the performance difference for another type which uses
FIXED_LENGTH_BYTE_ARRAYas the underlying physical type.These were the microbenchmarks from my local machine on ARM using a build of Corretto JDK11 and with reader verification disabled
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.