feat: extra SFT reporting #799

garrett361 · 2025-07-18T16:08:38Z

This PR adds additional reporting and a --verbose flag for additional prints during SFT.

The main changes are for SFT with sum loss. This PR now logs three different losses:

The avg sum loss per fwd pass
The avg loss per token ingested
The avg loss per token predicted

These are all useful for different cases. E.g. 3 is useful because it's more comparable between runs on different datasets and/or with different global batch sizes. Reporting for mean loss is unchanged.

Additional quantities related to token counts and memory statistics are also reported.

prev-branch: padding-free-squashing-7

* Update oe-eval.sh to set a default timeout of 48h. (allenai#789) * Updated configs to support changes. (allenai#790) * Add benchmark scripts (allenai#786) * Added scripts to run benchmarks. * Removed install script. * Added install script back. * Add remap verifier (allenai#773) * first pass remap verifier * make judge json parsing a little more robust * typoooooooo * typoooooooo * fix logic... * clean logging naming up * Ran the linter. (allenai#792) * fix the URL for code api setup (allenai#791) Co-authored-by: Michael Noukhovitch <[email protected]> * Add nltk setup to uv dockerfile (allenai#785) * add punk tokenizer * fix up command * Switches the actors to use the Ray queue. (allenai#784) * Made changes. * Switched to use ray.util.queue.Queue instead of a custom RayQueue class. * Now, only handles new version. * Updated benchmark_generators.py and test_grpo_fast.py. * CLeaned up code from Claude. * training_step defaults to None. * Added an info dataclass to replace the tuple. * Removes assumption that queries_prompt_Q and inference_results_Q are in sync by moving queries_prompt_Q to be a map. * CLeaned up benchmark * Added code to split batch sizes. * Removed benchmark scripts, which are now in a separate PR. * Now, we create all Ray queues in main, and pass them in as appropriate. * Removed changes * Test changes. * Linter passes * Added tests. * Now, we index with the dataset indices. * Checks and tests pass. * Ran linter * Added benchmark scripts back. Whoops. * Set new default value for num_samples * Updates the benchmark script (allenai#795) * Set new default value for num_samples * Now run N batches at once * different batch size * Fix pack length * Fix pack length * Fix wasted compute % (was accidentally multiplying by 100), and fix num rollouts (was referencing the wrong variable). * Now, we save benchmark results to CSV. * Now show a percentage for time spent generating. * Updated benchmark saving code. * Fixed syntax error. * Fixed benchmark * Fixed timing code. * Removed changes to vllm_utils3.py. * Now, we actually write the data to disk> * Bigger batch * Modified benchmark * Undid changes to benchmark script. * Temp change * Undid changes to benchmark script. * install nginx in uv (allenai#793) it was only being installed in regular Dockerfile Co-authored-by: Michael Noukhovitch <[email protected]> Co-authored-by: Saurabh Shah <[email protected]> * allow passing local models, bubble up dataset cache errors (allenai#797) Co-authored-by: Michael Noukhovitch <[email protected]> * binary reward for code (allenai#798) * binary reward for code * style * binary code reward flag -> pass rate reward threshold * Now, we run individual prompts through the queue. (allenai#796) * Now, we run individual prompts through the queue. * Fixed issues. * Ran linter * Fixed linter errors. * COde lints. * Test passes. * Ran linter. * Ensures that we send single prompts as requests. * Now, code lints. * Cleaned up code. * Fixes test. * Linter passes. * Cleaned test up. * Removed redundant comments. * Adds flashinfer dep. (allenai#800) * Adds flashinfer dep. * Now, open_instruct builds even on mac. * Updated install instructions to add flash-infer. * Now, we set flashinfer as the default attention backend. * Added flashinfer to the base dockerfile. * Ran linter. * Removed extra changes to mason.py. * Undid changes to uv.lock. * Updated requirements.txt * Updated flash-attn version. --------- Co-authored-by: Hamish Ivison <[email protected]> * new beaker names (allenai#803) * Remove Unused DPO Function (allenai#794) * delete function Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Update open_instruct/dataset_transformation.py --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: Hamish Ivison <[email protected]> * extra reporting (allenai#799) prev-branch: padding-free-squashing-7 Co-authored-by: Hamish Ivison <[email protected]> * Revert "Now, we run individual prompts through the queue. (allenai#796)" (allenai#804) This reverts commit 541058c. * Fix misnamed variables. (allenai#808) * Fix misnamed variables. * Ran linter. * Fix broken syntax. (allenai#809) Co-authored-by: Hamish Ivison <[email protected]> * Add new olmo chat templates, and improve data mixing/tokenization (allenai#765) Adds new olmo-core-compatible chat templates. Includes: * New olmo template with support for function-calling. Includes a basic hard-coded system prompt, and appends "You do not have access to any functions" to any SFT examples that do not include functions. * Thinker version of the above template, has <think> included in the generation prompt * R1-style thinker template These 3 templates mirror our current Tulu templates Also includes some necessary changes to the --add_bos logic, to handle the new chat template which does not have a bos token. Includes a few other QoL fixes: * Fixes a bug in the olmocore tokenization script re: label mask * Logs dataset-level statistics during data mixing and tokenization * Supports easy upsampling during data mixing * Fixes from last PR (allenai#810) * fix up my (jacob's) slightly broken pr --------- Co-authored-by: jacob-morrison <[email protected]> * Delete run_repro.sh (allenai#813) * Fix disk space error on image creation (allenai#814) * remove moar things * create on pr * dont create on pr * use upstream stats --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: Finbarr Timbers <[email protected]> Co-authored-by: Hamish Ivison <[email protected]> Co-authored-by: Michael <[email protected]> Co-authored-by: Michael Noukhovitch <[email protected]> Co-authored-by: Saurabh Shah <[email protected]> Co-authored-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: Jacob Morrison <[email protected]>

garrett361 and others added 2 commits July 18, 2025 12:07

extra reporting

1f7ab08

prev-branch: padding-free-squashing-7

Merge branch 'main' into extra-reporting

9a93f04

hamishivi approved these changes Jul 21, 2025

View reviewed changes

Merge branch 'main' into extra-reporting

6588025

hamishivi merged commit 8048c9a into allenai:main Jul 21, 2025
3 checks passed

This was referenced Jul 24, 2025

main with reporting garrett361/open-instruct#32

Closed

fix avg_loss rename #828

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: extra SFT reporting #799

feat: extra SFT reporting #799

Uh oh!

garrett361 commented Jul 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: extra SFT reporting #799

feat: extra SFT reporting #799

Uh oh!

Conversation

garrett361 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

garrett361 commented Jul 18, 2025 •

edited

Loading