Skip to content

Fix ArrayIndexOutOfBoundsException when parquet statistics are ignored#19761

Merged
findepi merged 1 commit intotrinodb:masterfrom
jinyangli34:jinyang-parquet_ignore_statistics
Nov 23, 2023
Merged

Fix ArrayIndexOutOfBoundsException when parquet statistics are ignored#19761
findepi merged 1 commit intotrinodb:masterfrom
jinyangli34:jinyang-parquet_ignore_statistics

Conversation

@jinyangli34
Copy link
Copy Markdown
Contributor

@jinyangli34 jinyangli34 commented Nov 15, 2023

Description

Fix ArrayIndexOutOfBoundsException when using parquet ignore statistics. (#19760)
Add unit tests.

Additional context and related issues

parquetTupleDomains and parquetPredicates are empty when ignore statistics is used, can cause exception.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`19760`)

@cla-bot cla-bot bot added the cla-signed label Nov 15, 2023
@github-actions github-actions bot added tests:hive hive Hive connector labels Nov 15, 2023
@findinpath findinpath requested a review from findepi November 16, 2023 05:19
@findinpath
Copy link
Copy Markdown
Contributor

CI Failure TestHiveTransactionalTable > testReadFullAcidPartitioned [groups: hive_transactional] is related to #16315

@jinyangli34 jinyangli34 force-pushed the jinyang-parquet_ignore_statistics branch from 1f5084a to 3772e82 Compare November 16, 2023 05:25
@findinpath
Copy link
Copy Markdown
Contributor

Thank you @jinyangli34

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ignore-stats, this fills parquetTupleDomains with a series of TupleDomain.all objects, which is not very useful.

ionstead of this, let's change for (int i = 0; i < disjunctTupleDomains.size(); i++) { loop iteration to iterate over parquetTupleDomains.size() instead iof disjunctTupleDomains.size()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will skip the following blocks.add and blockStarts.add. Is that OK?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath please chime in

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi i agree with your suggestion.
Adapting the loop to use parquetTupleDomains.size() instead of disjunctTupleDomains.size() should give us the same result - ALL the blocks in the file.
@jinyangli34 the loop https://github.com/trinodb/trino/pull/19761/files#diff-bb16d0894036f6a50fd882558df497a4ddbf2907097b8a33118c4f6ce147753dR240 is to be used only if options.isIgnoreStatistics() is false.

That will skip the following blocks.add and blockStarts.add
This happens now as well right ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @findinpath @findepi, to make sure I understand correctly, parquetTupleDomains will still hold TupleDomain.all objects, not an empty list. Is that correct?

@jinyangli34 jinyangli34 force-pushed the jinyang-parquet_ignore_statistics branch from 3772e82 to a542ca3 Compare November 17, 2023 23:09
Copy link
Copy Markdown
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slight reworking is needed in the PR.

@findinpath findinpath self-requested a review November 18, 2023 03:51
@jinyangli34 jinyangli34 force-pushed the jinyang-parquet_ignore_statistics branch 2 times, most recently from 2cb2996 to 04931ae Compare November 18, 2023 06:01
@findinpath findinpath requested a review from findepi November 20, 2023 11:18
@jinyangli34 jinyangli34 force-pushed the jinyang-parquet_ignore_statistics branch from 04931ae to 5dc360f Compare November 22, 2023 00:00
@raunaqmorarka raunaqmorarka changed the title fix ArrayIndexOutOfBoundsException when using parquet ignore statistics Fix ArrayIndexOutOfBoundsException when parquet statistics are ignored Nov 22, 2023
@findepi findepi merged commit 4cb3072 into trinodb:master Nov 23, 2023
@github-actions github-actions bot added this to the 434 milestone Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector

Development

Successfully merging this pull request may close these issues.

4 participants