Skip to content

Avoid reading data in parquet ChunkedInputStream constructor#15552

Merged
raunaqmorarka merged 3 commits intotrinodb:masterfrom
raunaqmorarka:fix-lazy-read
Jan 3, 2023
Merged

Avoid reading data in parquet ChunkedInputStream constructor#15552
raunaqmorarka merged 3 commits intotrinodb:masterfrom
raunaqmorarka:fix-lazy-read

Conversation

@raunaqmorarka
Copy link
Copy Markdown
Member

@raunaqmorarka raunaqmorarka commented Dec 29, 2022

Description

Avoid reading data in parquet ChunkedInputStream constructor

Additional context and related issues

Fixes a perf regression in lazy reading of data from #15374

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive, Hudi, Iceberg, Delta
* Fixes a bug which could lead to more data than necessary getting read from parquet files for queries with filters. ({issue}`15552`)

Copy link
Copy Markdown
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % minor comment

Chunks should be read lazily to avoid reading data in case
lazy loading of blocks can avoid reading a column due to a
selective filter on a preceding column

if (data == null) {
byte[] buffer = new byte[toIntExact(range.getLength())];
readerMemoryUsage.setBytes(buffer.length);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in practice it doesn't matter since memory usage is propagated in ScanFilterOperator in a lazy way anyway

@raunaqmorarka raunaqmorarka merged commit 949118b into trinodb:master Jan 3, 2023
@raunaqmorarka raunaqmorarka deleted the fix-lazy-read branch January 3, 2023 11:24
@github-actions github-actions bot added this to the 406 milestone Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants