Improve native parquet writer performance for flat types#13030
Improve native parquet writer performance for flat types#13030raunaqmorarka merged 3 commits intotrinodb:masterfrom
Conversation
alexjo2144
left a comment
There was a problem hiding this comment.
I haven't checked the spec, but this implies that rep level is optional, but def level is not?
lib/trino-parquet/src/main/java/io/trino/parquet/writer/PrimitiveColumnWriter.java
Outdated
Show resolved
Hide resolved
Definition level can also be skipped when it's a required primitive (max definition level is 0). However, I don't see a way to practically encounter that in Trino parquet writer right now as we don't have a way to define always non-null column in Trino (please correct if I'm wrong). If we do have such a case, the ValuesWriter will be a DevNullValuesWriter and we'll still benefit avoiding the complexity of DefLevelIterables. |
skrzypo987
left a comment
There was a problem hiding this comment.
LGTM % comment about definition level = 0
lib/trino-parquet/src/main/java/io/trino/parquet/writer/PrimitiveColumnWriter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
ParquetTester#testRoundTrip already tests with and without nulls
AbstractTestParquetReader also has tests with flat as well as nested types
lib/trino-parquet/src/main/java/io/trino/parquet/writer/PrimitiveColumnWriter.java
Outdated
Show resolved
Hide resolved
Before Benchmark (benchmarkFileFormat) (compression) (dataSet) Score Error write LINEITEM NONE TRINO_PARQUET 62.4MB/s ± 2731.4kB/s ( 4.27%) (N = 45, α = 99.9%) write BIGINT_SEQUENTIAL NONE TRINO_PARQUET 114.3MB/s ± 4368.2kB/s ( 3.73%) (N = 45, α = 99.9%) After write LINEITEM NONE TRINO_PARQUET 85.0MB/s ± 2937.1kB/s ( 3.37%) (N = 45, α = 99.9%) write BIGINT_SEQUENTIAL NONE TRINO_PARQUET 159.7MB/s ± 2377.8kB/s ( 1.45%) (N = 45, α = 99.9%)
Benchmark (benchmarkFileFormat) (compression) (dataSet) Score Error write LINEITEM NONE TRINO_PARQUET 117.1MB/s ± 5061.0kB/s ( 4.22%) (N = 45, α = 99.9%) write BIGINT_SEQUENTIAL NONE TRINO_PARQUET 305.5MB/s ± 6889.1kB/s ( 2.20%) (N = 45, α = 99.9%)
Description
Improve native parquet writer performance for flat types
improvement
native parquet writer
Improve native parquet writer performance for flat types
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: