Skip to content

[CI] Add Buildkite#2355

Merged
simon-mo merged 74 commits intovllm-project:mainfrom
simon-mo:ci-buildkite
Jan 14, 2024
Merged

[CI] Add Buildkite#2355
simon-mo merged 74 commits intovllm-project:mainfrom
simon-mo:ci-buildkite

Conversation

@simon-mo
Copy link
Copy Markdown
Collaborator

@simon-mo simon-mo commented Jan 5, 2024

This PR adds basic setup for GPU CI environment. It should enable us to run our tests on L4 GPUs. As developer, you can add new tests to .buildkite/test-pipeline.yaml.

Currently, I have all tests enabled, in addition to benchmarks. However, I don't want to block the merge of this PR for debugging model output (test_models.py) and tuning memory (test_attention.py). I marked those tests as "soft fail" for now.

Please reference the latest build here: https://buildkite.com/vllm/ci/builds/182 as example. The end to end time of the build on worst case (fresh docker build, slow machine starts) is about 1hr, on the best case (docker cached, machine available) is about 15 minutes. Of course, if there are too many PRs submitted at the same time, it might needs to wait and queue up a bit. We are capped at 10 GPU machines due to budget reason. The full infrastructure setup is described and maintained in a separate repo https://github.com/vllm-project/buildkite-ci.

Code change in the PR is mostly in .buildkite directory and associated Dockerfile and setup.py. Everything else is done in order to make existing test pass.

Future work includes:

  • Fix test models
  • Fix test kernels
  • Migrate lint, docs, and wheels to CPU only machines
  • Add tests for chat models (test chat template coverage)
  • Benchmark real models (instead of just OPT) on A100

@simon-mo
Copy link
Copy Markdown
Collaborator Author

note to self: The one last blocker is the memory requirements for huggingface model, we can run models on 2xL4 in vLLM but huggingface doesn't have a good TP strategy that's simple to use. currently i'm trying accelerate to do offload, but if it doesn't work we might go A100

@simon-mo
Copy link
Copy Markdown
Collaborator Author

simon-mo commented Jan 12, 2024

todos: migrate lint, docs, and wheels to CPU only machines
add tests for chat models (test chat template coverage)

@simon-mo
Copy link
Copy Markdown
Collaborator Author

I soft failed the kernels and models tests. The models ran successfully with bfloat16 but the some output doesn't match :(. The kernel is too difficult to tune.

Copy link
Copy Markdown
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hard work Simon! Left some small comments.

Comment on lines +65 to +66
if max_tries == 0:
raise RuntimeError("Server did not start") from err
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if max_tries == 0:
raise RuntimeError("Server did not start") from err
if max_tries == 0:
raise RuntimeError("Server did not start") from err
max_tries -= 1


import pytest
import torch
import ray
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why change to ray?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the code below, i added a comment saying ray gives way better log for debugging purpose (i couldn't figure the failures from multiprocessing)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha! makes a lot of sense.

@pytest.mark.parametrize("model", MODELS)
@pytest.mark.parametrize("dtype", ["float"])
# half is required to get this working on CI's L4 GPU
@pytest.mark.parametrize("dtype", ["half"])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as float, since otherwise the test will fail on a100s.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can implement a simpler test for L4 in another PR.

@zhuohan123
Copy link
Copy Markdown
Member

@simon-mo Let’s expedite this PR so that we can have CI working for all other PRs?

@simon-mo simon-mo merged commit 6e01e8c into vllm-project:main Jan 14, 2024
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Jan 18, 2024
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants