Skip to content

Conversation

lewtun
Copy link
Member

@lewtun lewtun commented Feb 5, 2025

Adds GPQA diamond and various important fixes for evaluation (parsing & incompat between latest vllm and lighteval). I've also unified the Slurm scripts for evaluation so we don't have multiple ways to eval models.

TODO

@lewtun lewtun mentioned this pull request Feb 5, 2025
@lewtun lewtun changed the title [WIP] Add GPQA Diamond Add GPQA Diamond and fix evaluation deps Feb 6, 2025
@lewtun lewtun marked this pull request as ready for review February 6, 2025 13:06

```shell
pip install -e ".[dev]"
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]" --link-mode=copy
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed because uv cannot install lighteval otherwise due to some LFS file conflict

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had this issue, I had reverted back to pip, glad you fixed it.

lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
--custom-tasks src/open_r1/evaluate.py \
--use-chat-template \
--system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed for the DeepSeek models (gives ~1 point gain if included)

@lewtun lewtun requested a review from edbeeching February 6, 2025 13:20
setup.py Outdated
"liger_kernel==0.5.2",
"lighteval @ git+https://github.com/huggingface/lighteval.git@0e462692436e1f0575bdb4c6ef63453ad9bde7d4#egg=lighteval[math]",
"math-verify>=0.3.3", # Used for math verification in grpo
"lighteval @ git+https://github.com/huggingface/lighteval.git@3c9b0c9dde6718b23ef5b0f4960355f0d494bdfc#egg=lighteval[math]",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump to latest commit once vllm fix for DDP is merged: huggingface/lighteval#541

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done it's 86f62259f105ae164f655e0b91c92a823a742724

@lewtun lewtun merged commit cec57f3 into main Feb 6, 2025
1 check passed
@lewtun lewtun deleted the lewtun/add-gpqa-cmd branch February 6, 2025 14:24
GitMonkey0 pushed a commit to GitMonkey0/open-r1 that referenced this pull request Feb 24, 2025
* Add GPQA Diamond

* Add table

* Fix README

* Up

* Fixes

* Ignore logs

* Fix

* Pin deps

* Fix GRPO

* Add Llama 70B tabels

* Restore dp

* Pin lighteval

* Use bfloat16

* Tune table

* Add note
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants