Compatible with Decapoda Research llama hf version#251
Merged
zhuohan123 merged 1 commit intovllm-project:mainfrom Jun 26, 2023
Merged
Compatible with Decapoda Research llama hf version#251zhuohan123 merged 1 commit intovllm-project:mainfrom
zhuohan123 merged 1 commit intovllm-project:mainfrom
Conversation
For the Decapoda Research llama hf version:
Model's config.json:
"architectures": ["LLaMAForCausalLM"]
This may be seen as the Llama alias
zhuohan123
approved these changes
Jun 26, 2023
Member
zhuohan123
left a comment
There was a problem hiding this comment.
LGTM! Thank you for your contribution to vLLM!
michaelfeil
pushed a commit
to michaelfeil/vllm
that referenced
this pull request
Jul 1, 2023
hongxiayang
pushed a commit
to hongxiayang/vllm
that referenced
this pull request
Feb 13, 2024
yukavio
pushed a commit
to yukavio/vllm
that referenced
this pull request
Jul 3, 2024
SUMMARY:
* make `magic-wand` version check robust
TEST PLAN:
runs on remote push. will be manually triggering NIGHTLY and RELEASE
relative to this branch.
```bash
andy@waldorf:~$ cat test.sh
#!/bin/bash
set -euo pipefail
MAGIC_WAND=$(pip3 show nm-magic-wand-nightly | grep "Version" | cut -d' ' -f2) || echo "nightly not installed"
if [ -z "$MAGIC_WAND" ]; then
MAGIC_WAND=$(pip3 show nm-magic-wand | grep "Version" | cut -d' ' -f2)
fi
echo ${MAGIC_WAND}
andy@waldorf:~$ ./test.sh
WARNING: Package(s) not found: nm-magic-wand-nightly
nightly not installed
0.2.2
andy@waldorf:~$ echo $?
0
```
when "nightly" is installed ...
```bash
andy@waldorf:~$ ./test.sh
0.2.2.20240520
```
---------
Co-authored-by: andy-neuma <andy@neuralmagic.com>
jikunshang
pushed a commit
to jikunshang/vllm
that referenced
this pull request
Sep 11, 2024
This PR fixes crashes observed on older Synapse builds introduced with HabanaAI#227. Setting PT_COMPILE_ONLY_MODE is not supported in current or older public Synapse builds, but we should not crash because of it, rather we should advise user to use the latest build. Previous behavior: ``` ... INFO 09-06 17:08:37 habana_executor.py:85] # HPU blocks: 10761, # CPU blocks: 910 INFO 09-06 17:08:37 habana_worker.py:201] Initializing cache engine took 47.29 GiB of device memory (54.34 GiB/94.62 GiB used) and -159.6 MiB of host memory (414.9 GiB/1007 GiB used) [rank0]: Traceback (most recent call last): [rank0]: File "/software/users/kzawora/vllm-utils/vllm_hpu_simple_test.py", line 9, in <module> [rank0]: llm = LLM(model="facebook/opt-125m") [rank0]: File "/software/users/kzawora/vllm-fork/vllm/entrypoints/llm.py", line 155, in __init__ [rank0]: self.llm_engine = LLMEngine.from_engine_args( [rank0]: File "/software/users/kzawora/vllm-fork/vllm/engine/llm_engine.py", line 456, in from_engine_args [rank0]: engine = cls( [rank0]: File "/software/users/kzawora/vllm-fork/vllm/engine/llm_engine.py", line 266, in __init__ [rank0]: self._initialize_kv_caches() [rank0]: File "/software/users/kzawora/vllm-fork/vllm/engine/llm_engine.py", line 378, in _initialize_kv_caches [rank0]: self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks) [rank0]: File "/software/users/kzawora/vllm-fork/vllm/executor/habana_executor.py", line 89, in initialize_cache [rank0]: self.driver_worker.initialize_cache(num_gpu_blocks, num_cpu_blocks) [rank0]: File "/software/users/kzawora/vllm-fork/vllm/worker/habana_worker.py", line 202, in initialize_cache [rank0]: self._warm_up_model() [rank0]: File "/software/users/kzawora/vllm-fork/vllm/worker/habana_worker.py", line 220, in _warm_up_model [rank0]: self.model_runner.warmup_model(self.hpu_cache[0]) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/software/users/kzawora/vllm-fork/vllm/worker/habana_model_runner.py", line 1412, in warmup_model [rank0]: with compile_only_mode_context(): [rank0]: File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__ [rank0]: return next(self.gen) [rank0]: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/internal/bridge_config.py", line 20, in env_setting [rank0]: get_func = globals()['get_' + var.lower()] [rank0]: KeyError: 'get_pt_compile_only_mode' inc shutdown inc shutdown inc shutdown inc shutdown ``` Current behavior: ``` ... INFO 09-06 17:06:42 habana_executor.py:85] # HPU blocks: 10761, # CPU blocks: 910 INFO 09-06 17:06:43 habana_worker.py:201] Initializing cache engine took 47.29 GiB of device memory (54.34 GiB/94.62 GiB used) and -143.7 MiB of host memory (415 GiB/1007 GiB used) WARNING 09-06 17:06:43 habana_model_runner.py:1419] Cannot use PT_COMPILE_ONLY_MODE. Warmup time will be negatively impacted. Please update Gaudi Software Suite. INFO 09-06 17:06:43 habana_model_runner.py:1336] [Warmup][Prompt][1/23] batch_size:2 seq_len:1024 free_mem:40.28 GiB ... ```
billishyahao
pushed a commit
to billishyahao/vllm
that referenced
this pull request
Dec 31, 2024
* rocm support for moe tuning script - add rocm triton search space and pruning - Ray fix: use device id for multi-gpu tuning * current_platform.is_rocm(), not is_navi()
wuhuikx
pushed a commit
to wuhuikx/vllm
that referenced
this pull request
Mar 27, 2025
…ct#251) (vllm-project#270) ### What this PR does / why we need it? Backport: vllm-project/vllm-ascend#251 Add dispatch job to leverage jobs to dynamic devices include 2 stage as below: The dispatch job will spend extra about `10s * parallel number + 30s` time to wait other job launch container and release lock. - **Stage 1: Acquire lock** add a dispatch job, this job use lockfile to acquire locks and then get device number dynamically - **Stage 2.1: Launch container with dynamic device** pass the device number via output and start the container job with dynamic device - **Stage 2.2: Release lock** once the job started, release the lock. In the backend, we use multiple path to setup multiple self host runners as load balancer: ``` $ pwd /home/action $ ll | grep actions drwx------ 6 action action 4096 Mar 7 08:55 actions-runner-01 drwx------ 6 action action 4096 Mar 7 08:55 actions-runner-02 drwx------ 6 action action 4096 Mar 7 08:55 actions-runner-03 drwx------ 6 action action 4096 Mar 7 08:56 actions-runner-04 drwx------ 4 action action 4096 Jan 24 22:08 actions-runner-05 drwx------ 4 action action 4096 Jan 24 22:08 actions-runner-06 ``` ``` adduser -G docker action su action pip3 install docker prettytable sudo yum install procmail ``` ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? - CI passed - E2E test manully, triggered 3 jobs in parallel: - [1st job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711345757/job/38348309297) dispatch to /dev/davinci2. - [2nd job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711348739/job/38348316250) dispatch to /dev/davinci3 - [3rd job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711351493/job/38348324551) dispatch to /dev/davinci4 <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
amy-why-3459
pushed a commit
to amy-why-3459/vllm
that referenced
this pull request
Sep 15, 2025
### What this PR does / why we need it? Add dispatch job to leverage jobs to dynamic devices include 2 stage as below: The dispatch job will spend extra about `10s * parallel number + 30s` time to wait other job launch container and release lock. - **Stage 1: Acquire lock** add a dispatch job, this job use lockfile to acquire locks and then get device number dynamically - **Stage 2.1: Launch container with dynamic device** pass the device number via output and start the container job with dynamic device - **Stage 2.2: Release lock** once the job started, release the lock. In the backend, we use multiple path to setup multiple self host runners as load balancer: ``` $ pwd /home/action $ ll | grep actions drwx------ 6 action action 4096 Mar 7 08:55 actions-runner-01 drwx------ 6 action action 4096 Mar 7 08:55 actions-runner-02 drwx------ 6 action action 4096 Mar 7 08:55 actions-runner-03 drwx------ 6 action action 4096 Mar 7 08:56 actions-runner-04 drwx------ 4 action action 4096 Jan 24 22:08 actions-runner-05 drwx------ 4 action action 4096 Jan 24 22:08 actions-runner-06 ``` ``` adduser -G docker action su action pip3 install docker prettytable sudo yum install procmail ``` ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? - CI passed - E2E test manully, triggered 3 jobs in parallel: - [1st job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711345757/job/38348309297) dispatch to /dev/davinci2. - [2nd job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711348739/job/38348316250) dispatch to /dev/davinci3 - [3rd job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711351493/job/38348324551) dispatch to /dev/davinci4 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
iwooook
pushed a commit
to moreh-dev/vllm
that referenced
this pull request
Nov 29, 2025
…llm-project#251) * Check if qwen-vl-utils import succeeded, print nice warning if not. * Dependeny add. Co-authored-by: Salar Hosseini <159165450+skhorasganiTT@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For the Decapoda Research llama hf version:
Model's config.json:
"architectures": ["LLaMAForCausalLM"]
This may be seen as the Llama alias