Fix parquet reader batch size calculation#14094
Fix parquet reader batch size calculation#14094raunaqmorarka merged 1 commit intotrinodb:masterfrom kabunchi:fix-parquet-reader-max-bytes
Conversation
|
Could you elaborate on regression that it fixes? |
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
|
The issue here, is that since we didn't kept the maxBytesPerCell, we always assumed we hit the max and called this line : |
|
Is it possible to write a unit test that shows the regression? |
@kabunchi could you try testing this by checking size of pages produced by ParquetReader#nextPage ? |
Fixed regression in calculation of maxBytesPerCell that caused maxBatchSize to be small and degrade performance
|
Took a look at testing, not that simple as the batch size is not fixed and/or exposed so predicting the page sizes or number of pages is not trivial... |
Description
Fixed regression in calculation of maxBytesPerCell that
caused maxBatchSize to be small and degrade performance
Non-technical explanation
Fixes possible perf regression in parquet reader introduced by a change in #13757
Release notes
( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: