Skip to content

[CI/Build] Reorganize models tests #7820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 70 commits into from
Sep 13, 2024
Merged

[CI/Build] Reorganize models tests #7820

merged 70 commits into from
Sep 13, 2024

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Aug 23, 2024

To avoid timeout error, split up the models tests as per #7439 (comment).

Also, I have moved the distributed basic correctness and model tests into the respective files for single-GPU case to avoid fragmentation of the test logic.

Note: The core_model flag isn't used in CI yet.

cc @khluu

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Aug 24, 2024

For some reason, splitting them up actually causes the multimodal tests to take longer than before...

Edit: It's probably because Qwen-VL tests are now being run as well. Let me disable them for now...

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Aug 24, 2024

That didn't seem to reduce the test time. @khluu any ideas?

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Aug 25, 2024

The longer test time also occurs for other PRs. Hope that this PR didn't mess up the HF cache somehow...

@DarkLight1337 DarkLight1337 changed the title [DONT MERGE] [CI/Build] Reorganize models tests [CI/Build] Reorganize models tests Aug 26, 2024
@DarkLight1337 DarkLight1337 requested a review from simon-mo August 26, 2024 07:41
@DarkLight1337
Copy link
Member Author

Seems to be a temporary network issue, since the test times are back to normal now. Let's merge this before working on core_model.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 26, 2024
@@ -311,11 +332,11 @@ steps:
- tests/distributed/
commands:
- # the following commands are for the first node, with ip 192.168.10.10 (ray environment already set up)
- VLLM_TEST_SAME_HOST=0 torchrun --nnodes 2 --nproc-per-node=2 --rdzv_backend=c10d --rdzv_endpoint=192.168.10.10 distributed/test_same_node.py
- VLLM_TEST_SAME_HOST=0 torchrun --nnodes 2 --nproc-per-node=2 --rdzv_backend=c10d --rdzv_endpoint=192.168.10.10 distributed/test_same_node.py | grep -q 'Same node test passed'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add grep?

Copy link
Member Author

@DarkLight1337 DarkLight1337 Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I have added an if __name__ == "__main__" guard to avoid executing the code during test collection, I use grep to ensure that the code inside is actually run during the test.

@youkaichao
Copy link
Member

youkaichao commented Sep 13, 2024

LGTM in general, thanks for the huge efforts!

My only question is w.r.t. the fork of process for every test (which is resolved in the comments).

@tlrmchlsmth
Copy link
Collaborator

Nice reorganization

tlrmchlsmth added a commit to neuralmagic/nm-vllm that referenced this pull request Sep 13, 2024
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Sep 16, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
garg-amit pushed a commit to garg-amit/vllm that referenced this pull request Oct 28, 2024
LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants