[CI] Small Accuracy Eval Test for Deepseek Model#24259
[CI] Small Accuracy Eval Test for Deepseek Model#24259mgoin merged 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
There was a problem hiding this comment.
Code Review
This pull request adds a new manual accuracy evaluation test for the Deepseek V2 Lite model on the GSM8K benchmark. The changes include a new Buildkite CI step, a model-specific configuration file, and an update to the list of models to be tested. My review found a critical issue in the new CI step configuration where file paths appear to be incorrect, which would likely cause the test to fail. I've provided a suggestion to correct the paths.
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
@mgoin CC |
|
Might as well use a compressed version of the checkpoint to test multiple things at once and save time |
Signed-off-by: yewentao256 <zhyanwentao@126.com>
@mgoin Updated, thanks for the suggestions! |
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
Partially fix #23354
And we also have #24119 comments saying about unit test, so we add a small eval test here.
This may avoid some of the accuracy issues
CI test we reuse the
LM Eval Small ModelsTest
pytest -s -v tests/evals/gsm8k/test_gsm8k_correctness.py --config-list-file=tests/evals/gsm8k/configs/models-small.txt --tp-size=1 -k DeepSeek-V2-Lite-Chat