Skip to content

Conversation

@yabola
Copy link
Contributor

@yabola yabola commented Mar 2, 2023

What changes were proposed in this pull request?

StatisticsFilter(minMax) , DictionaryFilter, BloomFilterImpl are used one by one to filter RowGroups. But when column has a dictionary filter and all pages are encoded, there is no needed to use BloomFilterImpl.

Why are the changes needed?

Improve performance when filter RowGroups.
In the old parquet v1 , BloomFilter is still generated, even if the column pages are all encoded by dictionary(fix this bug in #1033)

Some more discussions can be seen here #1023 (comment)

@yabola yabola changed the title PARQUET-2237 Improve performance by skipping BloomFilter when there is already a dictionary filter PARQUET-2237 Improve performance by skipping BloomFilter when column has a dictionary filter Mar 2, 2023
@yabola
Copy link
Contributor Author

yabola commented Mar 3, 2023

@wgtmac @gszadovszky If you have time, please take a look, thank you

@yabola yabola marked this pull request as draft March 6, 2023 00:44
@yabola
Copy link
Contributor Author

yabola commented Mar 6, 2023

The implementation of and or in this PR is incorrect, I want to change to the previous implementation in #1023

@yabola yabola closed this Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant