Skip to content

Conversation

@mapleFU
Copy link
Member

@mapleFU mapleFU commented Feb 2, 2023

Rationale for this change

  • adding a interface OptimalNumOfBytes, because OptimalNumOfBits is confusing...
  • BloomFilter accept a MemoryPool as input argument

What changes are included in this PR?

Are these changes tested?

They're already tested...

Are there any user-facing changes?

No. (But user may misuse BloomFilter::Init previously)

@mapleFU mapleFU marked this pull request as ready for review February 2, 2023 05:10
@mapleFU mapleFU requested a review from wjones127 as a code owner February 2, 2023 05:10
@mapleFU
Copy link
Member Author

mapleFU commented Feb 2, 2023

@pitrou @wjones127 PTAL

// Maximum Bloom filter size, it sets to HDFS default block size 128MB
// This value will be reconsidered when implementing Bloom filter producer.
static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
static constexpr uint64_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the return type of OptimalNumOfBits be changed to uint64_t as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently just keep it uint32_t is ok, since it could support 4G. For Bits, its 1G Bytes, and it's 8 times greater than kMaximumBloomFilterBytes.

@kou kou changed the title MINOR: [C++][Parquet] Parquet: Some tiny updates on bf MINOR: [C++][Parquet] Some tiny updates on bf Feb 2, 2023
@kou
Copy link
Member

kou commented Feb 2, 2023

Could you open a new issue for this?
See also our "MINOR" definition: https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#minor-fixes

@mapleFU mapleFU changed the title MINOR: [C++][Parquet] Some tiny updates on bf GH-15164: [C++][Parquet] Some tiny updates on bf Feb 2, 2023
@github-actions
Copy link

github-actions bot commented Feb 2, 2023

@pitrou pitrou changed the title GH-15164: [C++][Parquet] Some tiny updates on bf GH-15164: [C++][Parquet] Minor API improvements for BloomFilter Feb 2, 2023
@pitrou
Copy link
Member

pitrou commented Feb 2, 2023

@mapleFU Can you take a look at the CI failures? Thanks!

@mapleFU
Copy link
Member Author

mapleFU commented Feb 2, 2023

That's my fault. I didn't add explicit after add an ctor argument.

@mapleFU mapleFU force-pushed the parquet/bf-some-tiny-updates branch from c6144bc to 52f1281 Compare February 5, 2023 14:20
@mapleFU
Copy link
Member Author

mapleFU commented Feb 5, 2023

No idea why these two tests still failed... The error messages are to confusing for me...

@mapleFU
Copy link
Member Author

mapleFU commented Feb 5, 2023

Oh, seems #34038 cause the error... Let's waiting it to be merged...

@mapleFU
Copy link
Member Author

mapleFU commented Feb 7, 2023

@pitrou Seems all tests passed, mind take a look?

@wjones127 wjones127 changed the title GH-15164: [C++][Parquet] Minor API improvements for BloomFilter GH-34078: [C++][Parquet] Minor API improvements for BloomFilter Feb 8, 2023
@github-actions
Copy link

github-actions bot commented Feb 8, 2023

@github-actions
Copy link

github-actions bot commented Feb 8, 2023

⚠️ GitHub issue #34078 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link

github-actions bot commented Feb 8, 2023

⚠️ GitHub issue #34078 has no components, please add labels for components.

@wjones127
Copy link
Member

Hi @mapleFU we need a unique issue for each PR. I created #34078 for this one, but I need to you assign yourself to it before I can merge. Could you do that? (For some reason GitHub won't let me.)

@rok
Copy link
Member

rok commented Feb 8, 2023

Hi @mapleFU we need a unique issue for each PR. I created #34078 for this one, but I need to you assign yourself to it before I can merge. Could you do that? (For some reason GitHub won't let me.)

Perhaps @mapleFU needs to comment on #34078 before you can assign him.

@mapleFU
Copy link
Member Author

mapleFU commented Feb 9, 2023

I've taken the issue now

@wjones127 wjones127 merged commit d512dd2 into apache:master Feb 9, 2023
@ursabot
Copy link

ursabot commented Feb 10, 2023

Benchmark runs are scheduled for baseline = 9cb6fd6 and contender = d512dd2. d512dd2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.52% ⬆️0.06%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.22% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] d512dd2e ec2-t3-xlarge-us-east-2
[Finished] d512dd2e test-mac-arm
[Finished] d512dd2e ursa-i9-9960x
[Finished] d512dd2e ursa-thinkcentre-m75q
[Finished] 9cb6fd66 ec2-t3-xlarge-us-east-2
[Failed] 9cb6fd66 test-mac-arm
[Finished] 9cb6fd66 ursa-i9-9960x
[Finished] 9cb6fd66 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@mapleFU mapleFU deleted the parquet/bf-some-tiny-updates branch February 10, 2023 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Parquet] Clean up BloomFilter API

7 participants