Skip to content

[CI] Small Accuracy Eval Test for Deepseek Model#24259

Merged
mgoin merged 4 commits intovllm-project:mainfrom
neuralmagic:wye-small-eval-test-for-deepseek-model
Sep 16, 2025
Merged

[CI] Small Accuracy Eval Test for Deepseek Model#24259
mgoin merged 4 commits intovllm-project:mainfrom
neuralmagic:wye-small-eval-test-for-deepseek-model

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Sep 4, 2025

Purpose

Partially fix #23354

And we also have #24119 comments saying about unit test, so we add a small eval test here.

This may avoid some of the accuracy issues

CI test we reuse the LM Eval Small Models

Test

pytest -s -v tests/evals/gsm8k/test_gsm8k_correctness.py --config-list-file=tests/evals/gsm8k/configs/models-small.txt --tp-size=1 -k DeepSeek-V2-Lite-Chat

Evaluating: 100%|████████████████████████████████████████████████| 100/100 [00:10<00:00,  9.66it/s]
GSM8K Results for deepseek-ai/DeepSeek-V2-Lite-Chat:
  Accuracy: 0.660
  Expected: 0.300
  Questions: 100
  Invalid rate: 0.000
  Latency: 10.4s
  QPS: 9.7
✅ GSM8K test passed for deepseek-ai/DeepSeek-V2-Lite-Chat

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256
Copy link
Member Author

@WoosukKwon @houseroad CC

@mergify mergify bot added ci/build deepseek Related to DeepSeek models labels Sep 4, 2025
@yewentao256 yewentao256 added ready ONLY add when PR is ready to merge/full CI is needed and removed ci/build deepseek Related to DeepSeek models labels Sep 4, 2025
@mergify mergify bot added ci/build deepseek Related to DeepSeek models labels Sep 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new manual accuracy evaluation test for the Deepseek V2 Lite model on the GSM8K benchmark. The changes include a new Buildkite CI step, a model-specific configuration file, and an update to the list of models to be tested. My review found a critical issue in the new CI step configuration where file paths appear to be incorrect, which would likely cause the test to fail. I've provided a suggestion to correct the paths.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256
Copy link
Member Author

@mgoin CC

@mgoin
Copy link
Member

mgoin commented Sep 10, 2025

Might as well use a compressed version of the checkpoint to test multiple things at once and save time
What about https://huggingface.co/RedHatAI/DeepSeek-Coder-V2-Lite-Instruct-FP8?

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256
Copy link
Member Author

yewentao256 commented Sep 11, 2025

Might as well use a compressed version of the checkpoint to test multiple things at once and save time What about https://huggingface.co/RedHatAI/DeepSeek-Coder-V2-Lite-Instruct-FP8?

@mgoin Updated, thanks for the suggestions!

@mgoin mgoin enabled auto-merge (squash) September 16, 2025 02:14
@mgoin mgoin merged commit 3c96e7b into vllm-project:main Sep 16, 2025
16 checks passed
@mgoin mgoin deleted the wye-small-eval-test-for-deepseek-model branch September 16, 2025 02:15
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] How could we prevent model like R1 E2E accuracy down to 0?

2 participants