Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big OpenLM/DCLM <-> AI2 PR # 1 #12

Open
wants to merge 67 commits into
base: main
Choose a base branch
from
Open

Conversation

revbucket
Copy link

Lots of changes here (may be considered a refactor more than a PR, but will still require some heavy code reviews and discussion about which changes to keep/fold in).

Summary of changes:

  • Added commands for bff and sysreq to get sense of how much memory a given BFF run will require
  • Changed some defaults of arguments:
    • min-ngram/max-ngram now default to [20,20]
    • by default the bloom filter file is not saved (this can be specified)
    • annotations have been merged into a single argument
  • progress bar present (but a no-progress-bar arg is also present)
  • some more abstraction/functions to break things up and eventually not repeat code when I push the S3 PR
  • added BOTH level removal type (some discussion about what this does in the RemoveType enum)
  • Added some printouts with BFF sparsity, removal rates, time
  • misc performance-y things, like parallel iteration in some places

Matt Jordan and others added 30 commits February 29, 2024 12:45
… to describe some new features

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Sure, LGTM
Matt Jordan and others added 30 commits March 26, 2024 17:06
…ormatting, iii) better whitespace/empty document filtration

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
some updates (making this pr to view diff and keep an eye on changes)

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants