-
Notifications
You must be signed in to change notification settings - Fork 491
Write out label_mask instead of labels in olmocore tokenization script #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jacob-morrison
merged 48 commits into
olmo3-chat-templates
from
tyler/olmocore-tokenization-bug-fix-label-mask
Jul 21, 2025
Merged
Changes from all commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
122b65d
Write out label_mask instead of labels in olmocore tokenization script
tyler-romero 671f186
.
tyler-romero af5dadd
.
tyler-romero a45a88c
Better tqdm
tyler-romero 5fdde57
less frequent updates tqdm
tyler-romero 6db1ad4
debug statement
454d690
add tylertest chat template
tyler-romero e9c0902
Update dataset_transformation.py
jacob-morrison f42cbc1
Update dataset_transformation.py
jacob-morrison e0c735d
reorder
tyler-romero 98fae2e
fix
tyler-romero ab67344
save too
tyler-romero 0520b62
auto add chat template
tyler-romero 56f647d
shuffle before write
tyler-romero 5287676
Update dataset_transformation.py
jacob-morrison 39b7265
Fix generation_config
tyler-romero fc009f1
test
jacob-morrison 0521be8
add olmo toolu chat template
jacob-morrison e61fc56
add chat template
jacob-morrison 5e66013
turn off generation config
jacob-morrison 1b6e1d6
new chat templates
jacob-morrison 1e89733
fix
jacob-morrison 2a528b5
Merge branch 'main' into tyler/olmocore-tokenization-bug-fix-label-mask
jacob-morrison b012b93
use my workspace
jacob-morrison 4deb643
Merge branch 'olmo3-chat-templates' into tyler/olmocore-tokenization-…
jacob-morrison a959ec9
Merge branch 'main' into tyler/olmocore-tokenization-bug-fix-label-mask
jacob-morrison bd02fc0
test
jacob-morrison 27cb495
test
jacob-morrison de1424d
update
jacob-morrison 1696897
update
jacob-morrison 83ea35d
update doc
jacob-morrison 797c54d
delete
jacob-morrison 728710f
reset
jacob-morrison f7c2371
update
jacob-morrison f913633
update
jacob-morrison 1266096
update
jacob-morrison 065da6a
remove comment
jacob-morrison ac04d2d
update
jacob-morrison f77a53a
Also write metadata files to support GCS
tyler-romero 3231305
data mixing qol fixes
jacob-morrison c37e26a
fix style + update saving logic for ppo/grpo
jacob-morrison ca54316
Merge branch 'tyler/olmocore-tokenization-bug-fix-label-mask' of http…
jacob-morrison ae33877
add logging
jacob-morrison ba0f05f
update sampling
jacob-morrison a980e4e
Merge branch 'main' into tyler/olmocore-tokenization-bug-fix-label-mask
jacob-morrison 695acb4
Merge branch 'olmo3-chat-templates' into tyler/olmocore-tokenization-…
jacob-morrison 550764c
update oe-eval script
jacob-morrison 2f5baf5
Merge branch 'tyler/olmocore-tokenization-bug-fix-label-mask' of http…
jacob-morrison File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.