Skip to content

Iceberg connector support ORC bloom filter feature#11732

Merged
findepi merged 3 commits intotrinodb:masterfrom
zhengxingmao:OrcBloomFilterForIceberg
May 24, 2022
Merged

Iceberg connector support ORC bloom filter feature#11732
findepi merged 3 commits intotrinodb:masterfrom
zhengxingmao:OrcBloomFilterForIceberg

Conversation

@zhengxingmao
Copy link

Description

Is this change a fix, improvement, new feature, refactoring, or other?
new feature
Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
for iceberg connector
How would you describe this change to a non-technical end user or system administrator?
Iceberg connector support ORC bloom filter feature

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
(x) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:
Iceberg connector support ORC bloom filter feature

# Section
Iceberg connector support ORC bloom filter feature

@cla-bot
Copy link

cla-bot bot commented Mar 31, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@raunaqmorarka
Copy link
Member

We're missing tests which exercise the functionality.
Please refactor TestOrcWithBloomFilters into a base class (move it to io.trino.testing package) and extend it to test both hive and iceberg connectors.

@zhengxingmao
Copy link
Author

At present, the community has not passed the cla protocol submitted by me. After passing, test cases will be added.@raunaqmorarka

@raunaqmorarka
Copy link
Member

@martint please help with CLA and enable CI here

@findepi
Copy link
Member

findepi commented Apr 15, 2022

I approved the CI run.

@zhengxingmao please rebase and add the tests so that the PR is ready for merge when CLA gets processed.

@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from 2037fdf to c0e2e9e Compare April 16, 2022 06:38
@cla-bot
Copy link

cla-bot bot commented Apr 16, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@zhengxingmao
Copy link
Author

zhengxingmao commented Apr 16, 2022

I approved the CI run.

@zhengxingmao please rebase and add the tests so that the PR is ready for merge when CLA gets processed.

Done @findepi @raunaqmorarka

@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from c0e2e9e to f8dd9c2 Compare April 18, 2022 03:12
@cla-bot
Copy link

cla-bot bot commented Apr 18, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from f8dd9c2 to 6dcb5fd Compare April 18, 2022 05:04
@cla-bot
Copy link

cla-bot bot commented Apr 18, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@ebyhr
Copy link
Member

ebyhr commented Apr 18, 2022

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Apr 18, 2022
@cla-bot
Copy link

cla-bot bot commented Apr 18, 2022

The cla-bot has been summoned, and re-checked this pull request!

Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some documentation to iceberg.rst?

@zhengxingmao zhengxingmao requested a review from alexjo2144 April 21, 2022 01:10
@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch 2 times, most recently from a2d9fa0 to e33a597 Compare April 21, 2022 04:57
@github-actions github-actions bot added the docs label Apr 21, 2022
@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from e33a597 to 7ab0706 Compare April 21, 2022 06:22
@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch 2 times, most recently from 0cfb997 to 243da1b Compare May 12, 2022 10:58
@findepi
Copy link
Member

findepi commented May 16, 2022

waiting for @Praveen2112 clarification in #11732 (comment)

@Praveen2112 did you have a chance to look into this?

cc @raunaqmorarka

Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One QQ to @findepi In case of error code can we stick to code from Iceberg instead of Hive ?

If yes can we update the error code ?

@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from 243da1b to 09dce05 Compare May 18, 2022 02:22
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending clarity on #11732 (comment)

@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from 09dce05 to b7bd78e Compare May 19, 2022 08:31
@zhengxingmao zhengxingmao requested a review from findepi May 19, 2022 08:39
@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from b7bd78e to c056630 Compare May 19, 2022 09:01
@findepi
Copy link
Member

findepi commented May 19, 2022

@zhengxingmao thanks for removing the sort order.
it's awesome how quickly you adapt your code to review comments

however, we seem to struggle to reach clarity whether/why we wanted the test files to be sorted.
Please wait a bit while @Praveen2112 has time to explain this to me.

@findepi
Copy link
Member

findepi commented May 23, 2022

@zhengxingmao thanks for your patience. Seems we reached consensus: #11732 (comment)

Can you please add a preparatory commit that removes sorting from hive bloom filter tests?
This should come with a commit message explaining that it was not needed.

I know this adds more work for you, but this way these changes will make more sense, when viewed later on by someone who didn't participate in the discussion here.

@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from c056630 to 911f9ce Compare May 23, 2022 13:02
@zhengxingmao
Copy link
Author

@zhengxingmao thanks for your patience. Seems we reached consensus: #11732 (comment)

Can you please add a preparatory commit that removes sorting from hive bloom filter tests? This should come with a commit message explaining that it was not needed.

I know this adds more work for you, but this way these changes will make more sense, when viewed later on by someone who didn't participate in the discussion here.

Done.Thank you for reviewing.

@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from 911f9ce to 2c0de8c Compare May 24, 2022 10:42
@zhengxingmao zhengxingmao force-pushed the OrcBloomFilterForIceberg branch from 2c0de8c to e91a37d Compare May 24, 2022 11:01
@findepi findepi merged commit b89aac6 into trinodb:master May 24, 2022
findepi pushed a commit that referenced this pull request May 24, 2022
Please refer to the link for detailed discussion information
#11732 (comment)
- 100
* - ``hive.orc.bloom-filters.enabled``
- Enable bloom filters for predicate pushdown.
- ``false``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing, so wasn't a prerequisite for a merge here. @dain @Praveen2112 @raunaqmorarka do you know why default is false?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there used to be bugs in the past, but not sure why it isn't enabled now. In most cases bloom filter will not exist and it will be a no-op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raunaqmorarka do you want to switch it on by default now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, AFAIK there shouldn't be a problem in enabling it now

@findepi
Copy link
Member

findepi commented May 24, 2022

@zhengxingmao thanks for your contribution

@github-actions github-actions bot added this to the 382 milestone May 24, 2022
This was referenced May 24, 2022
@zhengxingmao
Copy link
Author

@zhengxingmao thanks for your contribution

@raunaqmorarka @alexjo2144 @Praveen2112 @findepi Thank you for your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants