upstream/main merge and dataset stats #36

garrett361 · 2025-07-23T16:01:54Z

Merge upstream/main into this man. Also uses the upstream version of the dataset stats printing.

* Added scripts to run benchmarks. * Removed install script. * Added install script back.

* first pass remap verifier * make judge json parsing a little more robust * typoooooooo * typoooooooo * fix logic... * clean logging naming up

Co-authored-by: Michael Noukhovitch <[email protected]>

* add punk tokenizer * fix up command

* Made changes. * Switched to use ray.util.queue.Queue instead of a custom RayQueue class. * Now, only handles new version. * Updated benchmark_generators.py and test_grpo_fast.py. * CLeaned up code from Claude. * training_step defaults to None. * Added an info dataclass to replace the tuple. * Removes assumption that queries_prompt_Q and inference_results_Q are in sync by moving queries_prompt_Q to be a map. * CLeaned up benchmark * Added code to split batch sizes. * Removed benchmark scripts, which are now in a separate PR. * Now, we create all Ray queues in main, and pass them in as appropriate. * Removed changes * Test changes. * Linter passes * Added tests. * Now, we index with the dataset indices. * Checks and tests pass. * Ran linter * Added benchmark scripts back. Whoops.

* Set new default value for num_samples * Now run N batches at once * different batch size * Fix pack length * Fix pack length * Fix wasted compute % (was accidentally multiplying by 100), and fix num rollouts (was referencing the wrong variable). * Now, we save benchmark results to CSV. * Now show a percentage for time spent generating. * Updated benchmark saving code. * Fixed syntax error. * Fixed benchmark * Fixed timing code. * Removed changes to vllm_utils3.py. * Now, we actually write the data to disk> * Bigger batch * Modified benchmark * Undid changes to benchmark script. * Temp change * Undid changes to benchmark script.

it was only being installed in regular Dockerfile Co-authored-by: Michael Noukhovitch <[email protected]> Co-authored-by: Saurabh Shah <[email protected]>

Co-authored-by: Michael Noukhovitch <[email protected]>

* binary reward for code * style * binary code reward flag -> pass rate reward threshold

* Now, we run individual prompts through the queue. * Fixed issues. * Ran linter * Fixed linter errors. * COde lints. * Test passes. * Ran linter. * Ensures that we send single prompts as requests. * Now, code lints. * Cleaned up code. * Fixes test. * Linter passes. * Cleaned test up. * Removed redundant comments.

* Adds flashinfer dep. * Now, open_instruct builds even on mac. * Updated install instructions to add flash-infer. * Now, we set flashinfer as the default attention backend. * Added flashinfer to the base dockerfile. * Ran linter. * Removed extra changes to mason.py. * Undid changes to uv.lock. * Updated requirements.txt * Updated flash-attn version. --------- Co-authored-by: Hamish Ivison <[email protected]>

* delete function Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Update open_instruct/dataset_transformation.py --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: Hamish Ivison <[email protected]>

prev-branch: padding-free-squashing-7 Co-authored-by: Hamish Ivison <[email protected]>

…" (allenai#804) This reverts commit 541058c.

* Fix misnamed variables. * Ran linter.

Co-authored-by: Hamish Ivison <[email protected]>

…lenai#765) Adds new olmo-core-compatible chat templates. Includes: * New olmo template with support for function-calling. Includes a basic hard-coded system prompt, and appends "You do not have access to any functions" to any SFT examples that do not include functions. * Thinker version of the above template, has <think> included in the generation prompt * R1-style thinker template These 3 templates mirror our current Tulu templates Also includes some necessary changes to the --add_bos logic, to handle the new chat template which does not have a bos token. Includes a few other QoL fixes: * Fixes a bug in the olmocore tokenization script re: label mask * Logs dataset-level statistics during data mixing and tokenization * Supports easy upsampling during data mixing

* fix up my (jacob's) slightly broken pr --------- Co-authored-by: jacob-morrison <[email protected]>

* remove moar things * create on pr * dont create on pr

dangxuanhong

Thanks @garrett361 The changes made to dataset_transformation.py look good to me.

finbarrtimbers and others added 27 commits July 16, 2025 10:21

Update oe-eval.sh to set a default timeout of 48h. (allenai#789)

e2be8d0

Updated configs to support changes. (allenai#790)

ca55010

Add benchmark scripts (allenai#786)

c56efad

* Added scripts to run benchmarks. * Removed install script. * Added install script back.

Add remap verifier (allenai#773)

004a48d

* first pass remap verifier * make judge json parsing a little more robust * typoooooooo * typoooooooo * fix logic... * clean logging naming up

Ran the linter. (allenai#792)

3b5b354

fix the URL for code api setup (allenai#791)

7f7308c

Co-authored-by: Michael Noukhovitch <[email protected]>

Add nltk setup to uv dockerfile (allenai#785)

829796a

* add punk tokenizer * fix up command

Set new default value for num_samples

7eb6c4d

install nginx in uv (allenai#793)

d5e7160

it was only being installed in regular Dockerfile Co-authored-by: Michael Noukhovitch <[email protected]> Co-authored-by: Saurabh Shah <[email protected]>

allow passing local models, bubble up dataset cache errors (allenai#797)

bb7477d

Co-authored-by: Michael Noukhovitch <[email protected]>

binary reward for code (allenai#798)

839a806

* binary reward for code * style * binary code reward flag -> pass rate reward threshold

new beaker names (allenai#803)

b3e8e70

Remove Unused DPO Function (allenai#794)

266f214

* delete function Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Update open_instruct/dataset_transformation.py --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: Hamish Ivison <[email protected]>

extra reporting (allenai#799)

8048c9a

prev-branch: padding-free-squashing-7 Co-authored-by: Hamish Ivison <[email protected]>

Revert "Now, we run individual prompts through the queue. (allenai#796)…

4659dca

…" (allenai#804) This reverts commit 541058c.

Fix misnamed variables. (allenai#808)

45ae474

* Fix misnamed variables. * Ran linter.

Fix broken syntax. (allenai#809)

9d1620d

Co-authored-by: Hamish Ivison <[email protected]>

Fixes from last PR (allenai#810)

d944d42

* fix up my (jacob's) slightly broken pr --------- Co-authored-by: jacob-morrison <[email protected]>

Delete run_repro.sh (allenai#813)

207268a

Fix disk space error on image creation (allenai#814)

cc33540

* remove moar things * create on pr * dont create on pr

Merge remote-tracking branch 'upstream/main' into main-merge

79fcc0b

use upstream stats

6a66db7

garrett361 marked this pull request as ready for review July 23, 2025 16:06

garrett361 requested review from dangxuanhong and removed request for dangxuanhong July 23, 2025 16:06

garrett361 requested a review from fabianlim July 23, 2025 16:06

garrett361 changed the title ~~Main merge~~ upstream/main merge and dataset stats Jul 23, 2025

dangxuanhong approved these changes Jul 23, 2025

View reviewed changes

garrett361 merged commit 9be3034 into main Jul 23, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upstream/main merge and dataset stats #36

upstream/main merge and dataset stats #36

Uh oh!

garrett361 commented Jul 23, 2025

Uh oh!

dangxuanhong left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

upstream/main merge and dataset stats #36

upstream/main merge and dataset stats #36

Uh oh!

Conversation

garrett361 commented Jul 23, 2025

Uh oh!

dangxuanhong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants