-
-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[CI] Replace large models with tiny alternatives in tests #24057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
9318236
add pre-commit check as first CI step to catch linting issues early
tahsintunan 250b7e4
replace arbitrary use of big llama models with smaller models
tahsintunan 1c8e033
split test_models into separate basic correctness and sliding window …
tahsintunan 9ecbd1e
skip test_collective_rpc if num_gpu < tp
tahsintunan 115af2e
enable prefix caching in test_sampling_params_e2e
tahsintunan 82aa2b3
refactor shutdown test to use explicit server termination
tahsintunan 1749e88
remove pre-commit check from buildkite
tahsintunan 62e37dc
Merge branch 'main' into ci-tiny-models
tahsintunan 9fefe3e
replace models with hmellor/tiny-random-LlamaForCausalLM
tahsintunan 16ac0fe
Merge branch 'main' into ci-tiny-models
tahsintunan 8bae25d
remove tiny-random-llama from test_basic_correctness due to tokenizat…
tahsintunan 3429a6b
fix memory profiling test flakiness
tahsintunan 86b35f9
use small model to fix CI timeout
tahsintunan b13a504
use opt-125m for TP correctness tests
tahsintunan 3f103c3
use meta-llama for SP tests
tahsintunan b892976
Use opt-125m for pytorch checkpoint test
tahsintunan eaf9786
use tiny-random-LlamaForCausalLM in SP tests
tahsintunan 8c1707f
Merge branch 'main' into ci-tiny-models
njhill af5a75b
Merge branch 'main' into ci-tiny-models
tahsintunan a6cdfc3
Merge commit '17edd8a' into pr/tahsintunan/24057
hmellor 6d813f7
ruff
hmellor 2b7421a
Merge commit 'd6953be' into pr/tahsintunan/24057
hmellor 5d2b46a
Merge branch 'main' into pr/tahsintunan/24057
hmellor 4bacf51
Don't use Pythia because it's max model len is too short
hmellor ad78423
Revert one test which doesn't pass to unblock the rest
hmellor 200df3e
Merge branch 'main' into ci-tiny-models
tahsintunan 93bc36d
fix failing tests
tahsintunan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,37 +1,93 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
|
|
||
| import signal | ||
| import subprocess | ||
| import sys | ||
| import time | ||
|
|
||
| import openai | ||
| import pytest | ||
|
|
||
| from ...utils import RemoteOpenAIServer | ||
| from ...utils import get_open_port | ||
|
|
||
| MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct" | ||
| MODEL_NAME = "hmellor/tiny-random-LlamaForCausalLM" | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_shutdown_on_engine_failure(): | ||
| # dtype, max-len etc set so that this can run in CI | ||
| args = [ | ||
| "--dtype", | ||
| "bfloat16", | ||
| "--max-model-len", | ||
| "8192", | ||
| "--enforce-eager", | ||
| "--max-num-seqs", | ||
| "128", | ||
| ] | ||
|
|
||
| with RemoteOpenAIServer(MODEL_NAME, args) as remote_server: | ||
| async with remote_server.get_async_client() as client: | ||
| with pytest.raises((openai.APIConnectionError, openai.InternalServerError)): | ||
| # Asking for lots of prompt logprobs will currently crash the | ||
| # engine. This may change in the future when that bug is fixed | ||
| prompt = "Hello " * 4000 | ||
| await client.completions.create( | ||
| model=MODEL_NAME, prompt=prompt, extra_body={"prompt_logprobs": 10} | ||
| """Verify that API returns connection error when server process is killed. | ||
|
|
||
| Starts a vLLM server, kills it to simulate a crash, then verifies that | ||
| subsequent API calls fail appropriately. | ||
| """ | ||
|
|
||
| port = get_open_port() | ||
|
|
||
| proc = subprocess.Popen( | ||
| [ | ||
| # dtype, max-len etc set so that this can run in CI | ||
| sys.executable, | ||
| "-m", | ||
| "vllm.entrypoints.openai.api_server", | ||
| "--model", | ||
| MODEL_NAME, | ||
| "--dtype", | ||
| "bfloat16", | ||
| "--max-model-len", | ||
| "128", | ||
| "--enforce-eager", | ||
| "--port", | ||
| str(port), | ||
| "--gpu-memory-utilization", | ||
| "0.05", | ||
| "--max-num-seqs", | ||
| "2", | ||
| "--disable-frontend-multiprocessing", | ||
| ], | ||
| stdout=subprocess.PIPE, | ||
| stderr=subprocess.PIPE, | ||
| text=True, | ||
| preexec_fn=lambda: signal.signal(signal.SIGINT, signal.SIG_IGN), | ||
| ) | ||
|
|
||
| # Wait for server startup | ||
| start_time = time.time() | ||
| client = openai.AsyncOpenAI( | ||
| base_url=f"http://localhost:{port}/v1", | ||
| api_key="dummy", | ||
| max_retries=0, | ||
| timeout=10, | ||
| ) | ||
|
|
||
| # Poll until server is ready | ||
| while time.time() - start_time < 30: | ||
| try: | ||
| await client.completions.create( | ||
| model=MODEL_NAME, prompt="Hello", max_tokens=1 | ||
| ) | ||
| break | ||
| except Exception: | ||
| time.sleep(0.5) | ||
| if proc.poll() is not None: | ||
| stdout, stderr = proc.communicate(timeout=1) | ||
| pytest.fail( | ||
| f"Server died during startup. stdout: {stdout}, stderr: {stderr}" | ||
| ) | ||
| else: | ||
| proc.terminate() | ||
| proc.wait(timeout=5) | ||
| pytest.fail("Server failed to start in 30 seconds") | ||
|
|
||
| # Kill server to simulate crash | ||
| proc.terminate() | ||
| time.sleep(1) | ||
|
|
||
| # Verify API calls now fail | ||
| with pytest.raises((openai.APIConnectionError, openai.APIStatusError)): | ||
| await client.completions.create( | ||
| model=MODEL_NAME, prompt="This should fail", max_tokens=1 | ||
| ) | ||
|
|
||
| # Now the server should shut down | ||
| return_code = remote_server.proc.wait(timeout=8) | ||
| assert return_code is not None | ||
| return_code = proc.wait(timeout=5) | ||
| assert return_code is not None |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.