Conversation
hamishivi
left a comment
There was a problem hiding this comment.
The removing logic seems good, but less sure on these mutable returns.
|
test GRPO job to make sure everything's good to go: https://beaker.allen.ai/orgs/ai2/workspaces/tulu-thinker/work/01K0ZDKD1N3ABZ9JMJ70HE8WD6?taskId=01K0ZDKD1VQATYD8SW6XCDKKBR&jobId=01K0ZDKD7Y68RMRZR2SKNA0BHA |
|
Seemed like the job errored, also now there's a merge conflict? |
hamishivi
left a comment
There was a problem hiding this comment.
Managed to fix up the merge conflict and tested that this works fine.
Test run: https://beaker.allen.ai/orgs/ai2/workspaces/tulu-thinker/work/01K1817Q3MXPQ9BQQVXGW3SBKN?taskId=01K1817Q3TVQCBR3S3RYWNBEQB&jobId=01K1817Q7P352SCJFJAMSFST8N
* look at changes * tweak * style * fix * fix * one more fix * fix? * verbose is way too verbose?? * update * fix * correct logging * fix small wandb logging bug --------- Co-authored-by: Hamish Ivison <hamishivi@gmail.com>
* look at changes * tweak * style * fix * fix * one more fix * fix? * verbose is way too verbose?? * update * fix * correct logging * fix small wandb logging bug --------- Co-authored-by: Hamish Ivison <hamishivi@gmail.com>
Fix the dataset source field for SFT, and also fix a couple issues with the new verbose logging