Skip to content

added gsm_plus#2103

Merged
lintangsutawika merged 9 commits intoEleutherAI:mainfrom
ysjprojects:gsm-plus
Aug 5, 2024
Merged

added gsm_plus#2103
lintangsutawika merged 9 commits intoEleutherAI:mainfrom
ysjprojects:gsm-plus

Conversation

@ysjprojects
Copy link
Contributor

GSM-Plus Math benchmark
Paper: https://arxiv.org/abs/2402.19255

Strengths:

  • More updated and more capable version of gsm8k

@lintangsutawika
Copy link
Contributor

Thanks!

Are you able to run a sanity check, maybe with models like LLaMA-2-7B and see if the eval results are similar to the paper?

@ysjprojects
Copy link
Contributor Author

Thanks!

Are you able to run a sanity check, maybe with models like LLaMA-2-7B and see if the eval results are similar to the paper?

telegram-cloud-photo-size-5-6102492368738565827-y

running llama-2-7b, same as evals on paper

@ysjprojects
Copy link
Contributor Author

UPDATE:

  • Reverted to original GSM-Plus dataset for attribution
  • Added GSM-Plus_mini subtask

@lintangsutawika lintangsutawika merged commit d8506db into EleutherAI:main Aug 5, 2024
jmercat pushed a commit to TRI-ML/lm-evaluation-harness that referenced this pull request Sep 25, 2024
* added gsm_plus

* formatted dataset to have train-test-splits

* README.md for gsm-plus

* Update README.md

* GSM-Plus: added gsm_plus_mini

* GSM-Plus: attribution to original dataset

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants