Fix ArrayIndexOutOfBoundsException when parquet statistics are ignored#19761
Conversation
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
|
CI Failure |
1f5084a to
3772e82
Compare
|
Thank you @jinyangli34 |
There was a problem hiding this comment.
if ignore-stats, this fills parquetTupleDomains with a series of TupleDomain.all objects, which is not very useful.
ionstead of this, let's change for (int i = 0; i < disjunctTupleDomains.size(); i++) { loop iteration to iterate over parquetTupleDomains.size() instead iof disjunctTupleDomains.size()
There was a problem hiding this comment.
That will skip the following blocks.add and blockStarts.add. Is that OK?
There was a problem hiding this comment.
@findepi i agree with your suggestion.
Adapting the loop to use parquetTupleDomains.size() instead of disjunctTupleDomains.size() should give us the same result - ALL the blocks in the file.
@jinyangli34 the loop https://github.com/trinodb/trino/pull/19761/files#diff-bb16d0894036f6a50fd882558df497a4ddbf2907097b8a33118c4f6ce147753dR240 is to be used only if options.isIgnoreStatistics() is false.
That will skip the following
blocks.addandblockStarts.add
This happens now as well right ?
There was a problem hiding this comment.
Hi @findinpath @findepi, to make sure I understand correctly, parquetTupleDomains will still hold TupleDomain.all objects, not an empty list. Is that correct?
3772e82 to
a542ca3
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
findinpath
left a comment
There was a problem hiding this comment.
A slight reworking is needed in the PR.
2cb2996 to
04931ae
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
...trino-hive/src/test/java/io/trino/plugin/hive/parquet/TestParquetReaderIgnoreStatistics.java
Outdated
Show resolved
Hide resolved
04931ae to
5dc360f
Compare
Description
Fix ArrayIndexOutOfBoundsException when using parquet ignore statistics. (#19760)
Add unit tests.
Additional context and related issues
parquetTupleDomains and parquetPredicates are empty when ignore statistics is used, can cause exception.
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: