Implement integration tests in CI pipeline (won't be merged)#532
Closed
Implement integration tests in CI pipeline (won't be merged)#532
Conversation
DescriptionStart with a short description of what the PR does and how this is a change from The rest of the description includes relevant details and context, examples:
If the change fixes a bug or a Github issue, please include a link, e.g.,: TestsPlease describe how you tested this change, and include any instructions and/or ChecklistBefore submitting this PR, please make sure:
|
0e2c5af to
f643160
Compare
…y on TPU to the CI pipeline. The test covers the Llama-3.1-8B-Instruct and Llama-3.1-70B-Instruct models, modifying the test to support comparing `EXPECTED_VALUE`. It also allows users to input `tensor-parallel-size` and `model-names` parameters for greater flexibility during execution
9c63264 to
af34594
Compare
0b3273c to
ecd93ad
Compare
Collaborator
Author
|
this PR is deprecated |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Based on the Integration Test requirements, this change adds a new test accuracy on TPU to the CI pipeline. The test covers the Llama-3.1-8B-Instruct and Llama-3.1-70B-Instruct models, modifying the test to support comparing
EXPECTED_VALUE. It also allows users to inputtensor-parallel-sizeandmodel-namesparameters for greater flexibility during execution.Based on the description of the vllm PR at vllm-project/vllm#18800, we have changed the lm_eval version used to git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d#egg=lm-eval[api].
In the future, we'll support collecting expected values on GPUs and comparing them with results from TPUs. We'll achieve this by enabling both GPUs and TPUs to read and write expected JSON files between Buildkite steps. We've already started implementing some of these files.
The testing logic is based on the source code at https://github.com/vllm-project/vllm/blob/839ab00/tests/entrypoints/llm/test_accuracy.py. We are developing based on this source.
Tests
Tested on the Buildkite agent.
Checklist
Before submitting this PR, please make sure: