This repository was archived by the owner on Oct 11, 2024. It is now read-only.
forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 10
[Rel Eng] Upstream sync 2024 06 11 #298
Merged
Merged
Changes from all commits
Commits
Show all changes
93 commits
Select commit
Hold shift + click to select a range
4b41095
[CI/Build] CMakeLists: build all extensions' cmake targets at the sam…
dtrifiro 045812f
[Kernel] Refactor CUTLASS kernels to always take scales that reside o…
tlrmchlsmth db09745
[Kernel] Update Cutlass fp8 configs (#5144)
varun-sundar-rabindranath 46b6b26
[Minor] Fix the path typo in loader.py: save_sharded_states.py -> sav…
dashanji 5b5c2b9
[Bugfix] Fix call to init_logger in openai server (#4765)
NadavShmayo cb6b7a0
[Feature][Kernel] Support bitsandbytes quantization and QLoRA (#4776)
chenqianfzh 9c2a759
[Bugfix] Remove deprecated @abstractproperty (#5174)
zhuohan123 fd82eff
[Bugfix]: Fix issues related to prefix caching example (#5177) (#5180)
Delviet 5b6b8ed
[BugFix] Prevent `LLM.encode` for non-generation Models (#5184)
robertgshaw2-redhat 15650a3
Update test_ignore_eos (#4898)
simon-mo dc64b07
[Frontend][OpenAI] Support for returning max_model_len on /v1/models …
Avinash-Raj bfc6bc7
[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#…
divakar-amd 5008643
[Misc] Simplify code and fix type annotations in `conftest.py` (#5118)
DarkLight1337 c070e44
[Core] Support image processor (#4197)
DarkLight1337 314398c
[Core] Remove unnecessary copies in flash attn backend (#5138)
Yard1 1ebb772
[Kernel] Pass a device pointer into the quantize kernel for the scale…
tlrmchlsmth 48e8e3f
[CI/BUILD] enable intel queue for longer CPU tests (#4113)
zhouyuan a6f0725
[Misc]: Implement CPU/GPU swapping in BlockManagerV2 (#3834)
Kaiyang-Chen 198d784
New CI template on AWS stack (#5110)
khluu 1923dcb
[FRONTEND] OpenAI `tools` support named functions (#5032)
br3no fa0bba2
[Bugfix] Support `prompt_logprobs==0` (#5217)
toslunar d8b71e3
[Bugfix] Add warmup for prefix caching example (#5235)
zhuohan123 1d88071
[Kernel] Enhance MoE benchmarking & tuning script (#4921)
WoosukKwon 7899055
[Bugfix]: During testing, use pytest monkeypatch for safely overridin…
afeldman-nm 0e8a84d
[Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecu…
zifeitong 88368d3
[CI/Build] Add inputs tests (#5215)
DarkLight1337 756340a
[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU b…
DamonFool 789553f
[Kernel] Add back batch size 1536 and 3072 to MoE tuning (#5242)
WoosukKwon c57b71e
[CI/Build] Simplify model loading for `HfRunner` (#5251)
DarkLight1337 14ec8df
[CI/Build] Reducing CPU CI execution time (#5241)
bigPYJ1151 3b6f9d6
[CI] mark AMD test as softfail to prevent blockage (#5256)
simon-mo 06bcc97
[Misc] Add transformers version to collect_env.py (#5259)
mgoin c3a46dd
[Misc] update collect env (#5261)
youkaichao c6bcf66
[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to…
zifeitong f5d9197
[Misc] Add CustomOp interface for device portability (#5255)
WoosukKwon bbfee0c
[Misc] Fix docstring of get_attn_backend (#5271)
WoosukKwon 47c1256
[Frontend] OpenAI API server: Add `add_special_tokens` to ChatComplet…
tomeras91 d619bd9
[CI] Add nightly benchmarks (#5260)
simon-mo 2cf5911
[misc] benchmark_serving.py -- add ITL results and tweak TPOT results…
tlrmchlsmth 8f5fafa
[Kernel] Add GPU architecture guards to the CUTLASS w8a8 kernels to r…
tlrmchlsmth 0770930
[Model] Correct Mixtral FP8 checkpoint loading (#5231)
comaniac 8310e34
[BugFix] Apply get_cached_tokenizer to the tokenizer setter of LLM (#…
DriverSong 6e32dd4
[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100 (#5238)
pcmoritz c2c62c8
[Docs] Add Sequoia as sponsors (#5287)
simon-mo ee3104b
[Speculative Decoding] Add `ProposerWorkerBase` abstract class (#5252)
njhill 1680d99
[BugFix] Fix log message about default max model length (#5284)
njhill efb32e1
[Bugfix] Make EngineArgs use named arguments for config construction …
mgoin 9a28c64
[Bugfix][Frontend/Core] Don't log exception when AsyncLLMEngine grace…
wuisawesome 2b27f72
[Misc] Skip for logits_scale == 1.0 (#5291)
WoosukKwon 54d2690
[Docs] Add Ray Summit CFP (#5295)
simon-mo cc2aaba
[CI] Disable flash_attn backend for spec decode (#5286)
simon-mo d72ae5b
[Frontend][Core] Update Outlines Integration from `FSM` to `Guide` (#…
br3no 08fd788
[CI/Build] Update vision tests (#5307)
DarkLight1337 cbfd3d9
Bugfix: fix broken of download models from modelscope (#5233)
liuyhwangyh 7bb7e9b
[Kernel] Retune Mixtral 8x22b configs for FP8 on H100 (#5294)
pcmoritz fbd60f3
[Frontend] enable passing multiple LoRA adapters at once to generate(…
mgoldey 14a49c2
[Core] Avoid copying prompt/output tokens if no penalties are used (#…
Yard1 a60515d
[Core] Change LoRA embedding sharding to support loading methods (#5038)
Yard1 653a080
[Misc] Missing error message for custom ops import (#5282)
DamonFool 219a385
[Feature][Frontend]: Add support for `stream_options` in `ChatComplet…
Etelis bd66622
[Misc][Utils] allow get_open_port to be called for multiple times (#5…
youkaichao ed99ec9
[Kernel] Switch fp8 layers to use the CUTLASS kernels (#5183)
tlrmchlsmth 50520b4
Remove Ray health check (#4693)
Yard1 98744f9
Addition of lacked ignored_seq_groups in _schedule_chunked_prefill (#…
JamesLim-sy 334e0a7
[Kernel] Dynamic Per-Token Activation Quantization (#5037)
dsikka 17984a7
[Frontend] Add OpenAI Vision API Support (#5237)
ywang96 3da0119
[Misc] Remove unused cuda_utils.h in CPU backend (#5345)
DamonFool d65c3ab
fix DbrxFusedNormAttention missing cache_config (#5340)
Calvinnncy97 e349c2d
[Bug Fix] Fix the support check for FP8 CUTLASS (#5352)
cli99 4d5b699
[Misc] Add args for selecting distributed executor to benchmarks (#5335)
BKitor f12b636
[ROCm][AMD] Use pytorch sdpa math backend to do naive attention (#4965)
hongxiayang 842974c
[CI/Test] improve robustness of test (hf_runner) (#5347)
youkaichao 2a16c03
[CI/Test] improve robustness of test (vllm_runner) (#5357)
youkaichao f8fe956
[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…
mgoin 550ed83
[Core][CUDA Graph] add output buffer for cudagraph (#5074)
youkaichao 52a90dd
[mis][ci/test] fix flaky test in test_sharded_state_loader.py (#5361)
youkaichao d20586a
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custo…
bnellnm 27e68e9
[Bugfix] Fix KeyError: 1 When Using LoRA adapters (#5164)
tdrunsri 8f865f6
[Misc] Update to comply with the new `compressed-tensors` config (#5350)
dsikka d3bd135
[Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API S…
ywang96 b21be06
[misc][typo] fix typo (#5372)
youkaichao 1b41d11
[Misc] Improve error message when LoRA parsing fails (#5194)
DarkLight1337 f932e32
[Model] Initial support for LLaVA-NeXT (#4199)
DarkLight1337 e3f0b32
[Feature][Frontend]: Continued `stream_options` implementation also …
Etelis f8392d6
[Bugfix] Fix LLaVA-NeXT (#5380)
DarkLight1337 9d82433
[ci] Use small_cpu_queue for doc build (#5331)
khluu a9bd95b
[ci] Mount buildkite agent on Docker container to upload benchmark re…
khluu 6823d9e
[Docs] Add Docs on Limitations of VLM Support (#5383)
ywang96 ca0ae3c
[Docs] Alphabetically sort sponsors (#5386)
WoosukKwon 16be761
Bump version to v0.5.0 (#5384)
simon-mo 1444822
format
2df326f
updated test model logprobs
446a144
format
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| set -euo pipefail | ||
|
|
||
| # Install system packages | ||
| apt update | ||
| apt install -y curl jq | ||
|
|
||
| # Install minijinja for templating | ||
| curl -sSfL https://github.com/mitsuhiko/minijinja/releases/latest/download/minijinja-cli-installer.sh | sh | ||
| source $HOME/.cargo/env | ||
|
|
||
| # If BUILDKITE_PULL_REQUEST != "false", then we check the PR labels using curl and jq | ||
| if [ "$BUILDKITE_PULL_REQUEST" != "false" ]; then | ||
| PR_LABELS=$(curl -s "https://api.github.com/repos/vllm-project/vllm/pulls/$BUILDKITE_PULL_REQUEST" | jq -r '.labels[].name') | ||
|
|
||
| if [[ $PR_LABELS == *"perf-benchmarks"* ]]; then | ||
| echo "This PR has the 'perf-benchmarks' label. Proceeding with the nightly benchmarks." | ||
| else | ||
| echo "This PR does not have the 'perf-benchmarks' label. Skipping the nightly benchmarks." | ||
| exit 0 | ||
| fi | ||
| fi | ||
|
|
||
| # Upload sample.yaml | ||
| buildkite-agent pipeline upload .buildkite/nightly-benchmarks/sample.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| steps: | ||
| # NOTE(simon): You can create separate blocks for different jobs | ||
| - label: "A100: NVIDIA SMI" | ||
| agents: | ||
| queue: A100 | ||
| plugins: | ||
| - kubernetes: | ||
| podSpec: | ||
| containers: | ||
| # - image: us-central1-docker.pkg.dev/vllm-405802/vllm-ci-test-repo/vllm-test:$BUILDKITE_COMMIT | ||
| # TODO(simon): check latest main branch or use the PR image. | ||
| - image: us-central1-docker.pkg.dev/vllm-405802/vllm-ci-test-repo/vllm-test:45c35f0d58f4508bf43bd6af1d3d0d0ec0c915e6 | ||
| command: | ||
| - bash -c 'nvidia-smi && nvidia-smi topo -m && pwd && ls' | ||
| resources: | ||
| limits: | ||
| nvidia.com/gpu: 8 | ||
| volumeMounts: | ||
| - name: devshm | ||
| mountPath: /dev/shm | ||
| nodeSelector: | ||
| nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB | ||
| volumes: | ||
| - name: devshm | ||
| emptyDir: | ||
| medium: Memory | ||
| # TODO(simon): bring H100 online | ||
| # - label: "H100: NVIDIA SMI" | ||
| # agents: | ||
| # queue: H100 | ||
| # plugins: | ||
| # - docker#v5.11.0: | ||
| # image: us-central1-docker.pkg.dev/vllm-405802/vllm-ci-test-repo/vllm-test:45c35f0d58f4508bf43bd6af1d3d0d0ec0c915e6 | ||
| # command: | ||
| # - bash -c 'nvidia-smi && nvidia-smi topo -m' | ||
| # propagate-environment: true | ||
| # ipc: host | ||
| # gpus: all | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| {% set docker_image = "public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT" %} | ||
| {% set default_working_dir = "/vllm-workspace/tests" %} | ||
|
|
||
| steps: | ||
| - label: ":docker: build image" | ||
| agents: | ||
| queue: cpu_queue | ||
| commands: | ||
| - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7" | ||
| - "docker build --build-arg max_jobs=16 --tag {{ docker_image }} --target test --progress plain ." | ||
| - "docker push {{ docker_image }}" | ||
| env: | ||
| DOCKER_BUILDKIT: "1" | ||
| retry: | ||
| automatic: | ||
| - exit_status: -1 # Agent was lost | ||
| limit: 5 | ||
| - exit_status: -10 # Agent was lost | ||
| limit: 5 | ||
| - wait | ||
|
|
||
| {% for step in steps %} | ||
| - label: "{{ step.label }}" | ||
| agents: | ||
| {% if step.label == "Documentation Build" %} | ||
| queue: small_cpu_queue | ||
| {% elif step.no_gpu %} | ||
| queue: cpu_queue | ||
| {% elif step.num_gpus == 2 or step.num_gpus == 4 %} | ||
| queue: gpu_4_queue | ||
| {% else %} | ||
| queue: gpu_1_queue | ||
| {% endif %} | ||
| soft_fail: true | ||
| {% if step.parallelism %} | ||
| parallelism: {{ step.parallelism }} | ||
| {% endif %} | ||
| retry: | ||
| automatic: | ||
| - exit_status: -1 # Agent was lost | ||
| limit: 5 | ||
| - exit_status: -10 # Agent was lost | ||
| limit: 5 | ||
| plugins: | ||
| - docker#v5.2.0: | ||
| image: {{ docker_image }} | ||
| always-pull: true | ||
| propagate-environment: true | ||
| {% if not step.no_gpu %} | ||
| gpus: all | ||
| {% endif %} | ||
| {% if step.label == "Benchmarks" %} | ||
| mount-buildkite-agent: true | ||
| {% endif %} | ||
| command: ["bash", "-c", "cd {{ (step.working_dir or default_working_dir) | safe }} && {{ step.command or (step.commands | join(' && ')) | safe }}"] | ||
| environment: | ||
| - VLLM_USAGE_SOURCE=ci-test | ||
| - HF_TOKEN | ||
| {% if step.label == "Speculative decoding tests" %} | ||
| - VLLM_ATTENTION_BACKEND=XFORMERS | ||
| {% endif %} | ||
| volumes: | ||
| - /dev/shm:/dev/shm | ||
| {% endfor %} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.