Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added i) directory support, ii) FP rate args, iii) No-Save option #11

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

revbucket
Copy link

Several changes to main.rs:

  1. Added progress bar printouts vs printouts at each filename (tried to use similar formatting as in wimbd)
  2. Added directory support for inputs (can pass dir as input now, only selects .json*.gz files). Same as wimbd
  3. Added option to not save bloom filter at the end (but update during)
  4. Added option to compute filter size dynamically given a desired false-positive rate

@revbucket
Copy link
Author

Oh and some more notes about point 4:
We search for a size of bloom filter using an accurate ngram count and then find the smallest filter that results in that fp_rate. We return min(filter_size, 0.90 * systemRAM) (so we don't allocate too much memory -- maybe I should warn if we get to this point though!)

Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with the progress bar if the output of bff is piped into a file? Or the terminal is in a weird state?

This is why I generally prefer sticking to writing to stdout and stderr, one line at a time, and that's all. It causes the least amount of trouble in unexpected settings.

Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one suggestion. If that's too hard, let's ship it like this.

src/main.rs Outdated
@@ -609,6 +619,10 @@ fn main() {
let pbar = pbar.clone();

threadpool.execute(move || {
if args.no_progress {
println!("Processing {input:?}...");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra newline?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅
Yes will fix this before shipping

src/main.rs Outdated
Comment on lines 638 to 639
&pbar,
args.no_progress,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can Arc pointers be null? Then you could save a parameter and pass in null here instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes...? Pretty sure we can wrap with an option and then we don't have to pass the second parameter

@revbucket revbucket requested a review from dirkgr March 13, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants