Skip to content

Compatible with Decapoda Research llama hf version#251

Merged
zhuohan123 merged 1 commit intovllm-project:mainfrom
BasicCoder:patch-1
Jun 26, 2023
Merged

Compatible with Decapoda Research llama hf version#251
zhuohan123 merged 1 commit intovllm-project:mainfrom
BasicCoder:patch-1

Conversation

@BasicCoder
Copy link
Copy Markdown
Contributor

For the Decapoda Research llama hf version:
Model's config.json:
"architectures": ["LLaMAForCausalLM"]
This may be seen as the Llama alias

For the Decapoda Research llama hf version:
Model's config.json:
    "architectures": ["LLaMAForCausalLM"]
This may be seen as the Llama alias
@zhuohan123 zhuohan123 self-requested a review June 26, 2023 15:11
Copy link
Copy Markdown
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your contribution to vLLM!

@zhuohan123 zhuohan123 merged commit 471a7a4 into vllm-project:main Jun 26, 2023
michaelfeil pushed a commit to michaelfeil/vllm that referenced this pull request Jul 1, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
SUMMARY:
* make `magic-wand` version check robust

TEST PLAN:
runs on remote push. will be manually triggering NIGHTLY and RELEASE
relative to this branch.

```bash
andy@waldorf:~$ cat test.sh 
#!/bin/bash

set -euo pipefail

MAGIC_WAND=$(pip3 show nm-magic-wand-nightly | grep "Version" | cut -d' ' -f2) || echo "nightly not installed" 
if [ -z "$MAGIC_WAND" ]; then
    MAGIC_WAND=$(pip3 show nm-magic-wand | grep "Version" | cut -d' ' -f2)
fi

echo ${MAGIC_WAND}

andy@waldorf:~$ ./test.sh 
WARNING: Package(s) not found: nm-magic-wand-nightly
nightly not installed
0.2.2
andy@waldorf:~$ echo $?
0
```

when "nightly" is installed ... 

```bash
andy@waldorf:~$ ./test.sh 
0.2.2.20240520
```

---------

Co-authored-by: andy-neuma <andy@neuralmagic.com>
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Sep 11, 2024
This PR fixes crashes observed on older Synapse builds introduced with
HabanaAI#227. Setting
PT_COMPILE_ONLY_MODE is not supported in current or older public Synapse
builds, but we should not crash because of it, rather we should advise
user to use the latest build.

Previous behavior:
```
...
INFO 09-06 17:08:37 habana_executor.py:85] # HPU blocks: 10761, # CPU blocks: 910
INFO 09-06 17:08:37 habana_worker.py:201] Initializing cache engine took 47.29 GiB of device memory (54.34 GiB/94.62 GiB used) and -159.6 MiB of host memory (414.9 GiB/1007 GiB used)
[rank0]: Traceback (most recent call last):
[rank0]:   File "/software/users/kzawora/vllm-utils/vllm_hpu_simple_test.py", line 9, in <module>
[rank0]:     llm = LLM(model="facebook/opt-125m")
[rank0]:   File "/software/users/kzawora/vllm-fork/vllm/entrypoints/llm.py", line 155, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/software/users/kzawora/vllm-fork/vllm/engine/llm_engine.py", line 456, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/software/users/kzawora/vllm-fork/vllm/engine/llm_engine.py", line 266, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/software/users/kzawora/vllm-fork/vllm/engine/llm_engine.py", line 378, in _initialize_kv_caches
[rank0]:     self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
[rank0]:   File "/software/users/kzawora/vllm-fork/vllm/executor/habana_executor.py", line 89, in initialize_cache
[rank0]:     self.driver_worker.initialize_cache(num_gpu_blocks, num_cpu_blocks)
[rank0]:   File "/software/users/kzawora/vllm-fork/vllm/worker/habana_worker.py", line 202, in initialize_cache
[rank0]:     self._warm_up_model()
[rank0]:   File "/software/users/kzawora/vllm-fork/vllm/worker/habana_worker.py", line 220, in _warm_up_model
[rank0]:     self.model_runner.warmup_model(self.hpu_cache[0])
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/software/users/kzawora/vllm-fork/vllm/worker/habana_model_runner.py", line 1412, in warmup_model
[rank0]:     with compile_only_mode_context():
[rank0]:   File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
[rank0]:     return next(self.gen)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/internal/bridge_config.py", line 20, in env_setting
[rank0]:     get_func = globals()['get_' + var.lower()]
[rank0]: KeyError: 'get_pt_compile_only_mode'
inc shutdown
inc shutdown
inc shutdown
inc shutdown
```

Current behavior:

```
...
INFO 09-06 17:06:42 habana_executor.py:85] # HPU blocks: 10761, # CPU blocks: 910
INFO 09-06 17:06:43 habana_worker.py:201] Initializing cache engine took 47.29 GiB of device memory (54.34 GiB/94.62 GiB used) and -143.7 MiB of host memory (415 GiB/1007 GiB used)
WARNING 09-06 17:06:43 habana_model_runner.py:1419] Cannot use PT_COMPILE_ONLY_MODE. Warmup time will be negatively impacted. Please update Gaudi Software Suite.
INFO 09-06 17:06:43 habana_model_runner.py:1336] [Warmup][Prompt][1/23] batch_size:2 seq_len:1024 free_mem:40.28 GiB
...
```
billishyahao pushed a commit to billishyahao/vllm that referenced this pull request Dec 31, 2024
* rocm support for moe tuning script

- add rocm triton search space and pruning
- Ray fix: use device id for multi-gpu tuning

* current_platform.is_rocm(), not is_navi()
wuhuikx pushed a commit to wuhuikx/vllm that referenced this pull request Mar 27, 2025
…ct#251) (vllm-project#270)

### What this PR does / why we need it?
Backport: vllm-project/vllm-ascend#251

Add dispatch job to leverage jobs to dynamic devices include 2 stage as
below:

The dispatch job will spend extra about `10s * parallel number + 30s`
time to wait other job launch container and release lock.

- **Stage 1: Acquire lock** add a dispatch job, this job use lockfile to
acquire locks and then get device number dynamically
- **Stage 2.1: Launch container with dynamic device** pass the device
number via output and start the container job with dynamic device
- **Stage 2.2: Release lock** once the job started, release the lock.

In the backend, we use multiple path to setup multiple self host runners
as load balancer:
```
$ pwd
/home/action
$ ll | grep actions
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-01
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-02
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-03
drwx------   6 action action 4096 Mar  7 08:56 actions-runner-04
drwx------   4 action action 4096 Jan 24 22:08 actions-runner-05
drwx------   4 action action 4096 Jan 24 22:08 actions-runner-06
```

```
adduser -G docker action
su action
pip3 install docker prettytable
sudo yum install procmail
```

### Does this PR introduce _any_ user-facing change? NO

### How was this patch tested?
- CI passed
- E2E test manully, triggered 3 jobs in parallel:
- [1st
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711345757/job/38348309297)
dispatch to /dev/davinci2.
- [2nd
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711348739/job/38348316250)
dispatch to /dev/davinci3
- [3rd
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711351493/job/38348324551)
dispatch to /dev/davinci4

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
amy-why-3459 pushed a commit to amy-why-3459/vllm that referenced this pull request Sep 15, 2025
### What this PR does / why we need it?
Add dispatch job to leverage jobs to dynamic devices include 2 stage as
below:

The dispatch job will spend extra about `10s * parallel number + 30s`
time to wait other job launch container and release lock.

- **Stage 1: Acquire lock**
add a dispatch job, this job use lockfile to acquire locks and then get
device number dynamically
- **Stage 2.1: Launch container with dynamic device**
pass the device number via output and start the container job with
dynamic device
- **Stage 2.2: Release lock**
once the job started, release the lock.

In the backend, we use multiple path to setup multiple self host runners
as load balancer:
```
$ pwd
/home/action
$ ll | grep actions
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-01
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-02
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-03
drwx------   6 action action 4096 Mar  7 08:56 actions-runner-04
drwx------   4 action action 4096 Jan 24 22:08 actions-runner-05
drwx------   4 action action 4096 Jan 24 22:08 actions-runner-06
```

```
adduser -G docker action
su action
pip3 install docker prettytable
sudo yum install procmail
```

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
- CI passed
- E2E test manully, triggered 3 jobs in parallel:
- [1st
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711345757/job/38348309297)
dispatch to /dev/davinci2.
- [2nd
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711348739/job/38348316250)
dispatch to /dev/davinci3
- [3rd
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711351493/job/38348324551)
dispatch to /dev/davinci4

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
iwooook pushed a commit to moreh-dev/vllm that referenced this pull request Nov 29, 2025
…llm-project#251)

* Check if qwen-vl-utils import succeeded, print nice warning if not.
* Dependeny add.

Co-authored-by: Salar Hosseini <159165450+skhorasganiTT@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants