Merge EmbeddedLLM/vllm-rocm into vLLM main by tjtanaa · Pull Request #1749 · vllm-project/vllm

tjtanaa · 2023-11-22T05:17:16Z

Checklist:

Merge changes from upstream vllm commit 094f716
Dynamic code path selection for CUDA or ROCm in PyTorch
Pass all unit tests
ROCm Dockerfile

* port dtype_float16.cuh and cache_kernels.cu * port dtype_bfloat16.cuh * port attention_utils.cuh * port more kernels * fix typo * add cuda_compat.h * sync branches * update * update * fixes * cleanup * update * update * update * fmt * cleanup * refactor * update * detecting rocm and adding flag for compiling * using asm volatile instead of hip api * using asm volatile for type casting of f16 --------- Co-authored-by: Philipp Moritz <pcmoritz@gmail.com> Co-authored-by: Amir Balwel <amoooori04@gmail.com>

…oblem and xformers license

simon-mo

Thank you for upstreaming this! We will review soon.

simon-mo · 2023-11-22T05:26:54Z

-
-ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]
-
+FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1


Please make a new docker file.

hongxiayang · 2023-11-26T03:30:39Z

+    && git clone https://github.com/ROCmSoftwarePlatform/flash-attention.git \
+    && cd flash-attention \
+    && git submodule update --init \
+    && sed -i -e "s/--offload-arch=native/--offload-arch=$(/opt/rocm/llvm/bin/amdgpu-offload-arch)/g" setup.py \


Thank you for the pull request.
This line is no-op since I don't see any reference of offload-arch in setup.py file.
Therefore, when I test this pull request and build the docker using this Dockerfile, it failed because of that.
Can you check the setup.py file?

/opt/rocm/bin/hipcc -I/app/libs/flash-attention/csrc/flash_attn_rocm -I/app/libs/flash-attention/csrc/flash_attn_rocm/src -I/app/libs/flash-attention/csrc/flash_attn_rocm/composable_kernel/include -I/app/libs/flash-attention/csrc/flash_attn_rocm/composable_kernel/library/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /app/libs/flash-attention/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim32_bf16_causal_gfx9x_hip.hip -o /app/libs/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim32_bf16_causal_gfx9x_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -fno-gpu-rdc^M clang++: error: cannot determine amdgcn architecture: /opt/rocm/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'^M

We implemented a temporary solution during the build process with ROCm/flash-attention@edc7698.
The issue with the hardcoded --offload-arch=native has been resolved by the commit ROCm/flash-attention@5f1ae07. It appears that the temporary fix is no longer necessary. Following the testing of the most recent version of flash-attention, we plan to revise the Dockerfile accordingly.

Yes, use a specific commit of a named branch to achieve stable and reproducible result than using the default branch since it might keep changing. Related to the name of the Dockerfile, you might want to rename the Dockerfile to Dockerfile.rocm_xxx with xxx related to the version of the rocm you are using.

tjtanaa · 2023-11-29T16:02:42Z

@hongxiayang @WoosukKwon @simon-mo
I am closing this PR and continue the work on PR https://github.com/EmbeddedLLM/vllm-rocm/pull/17
🙏

simon-mo · 2023-12-10T22:02:57Z

The full version is merged in #1836!

### What this PR does / why we need it? since the interface in vllm-ascend has changed so quickly, the quantization function in mindie_turbo is no longer needed, so it needs to be discarded. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? through ci Signed-off-by: zouyida <zouyida@huawei.com> Co-authored-by: zouyida <zouyida@huawei.com>

### What this PR does / why we need it? cherry pick vllm-project#1749 from v0.9.1-dev since the interface in vllm-ascend has changed so quickly, the quantization function in mindie_turbo is no longer needed, so it needs to be discarded. Co-authored-by: zouyida [zouyida@huawei.com](mailto:zouyida@huawei.com) Co-authored-by: wangli [wangli858794774@gmail.com](mailto:wangli858794774@gmail.com) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: vllm-project@207b750 Signed-off-by: wangli <wangli858794774@gmail.com>

tjtanaa and others added 20 commits October 27, 2023 00:27

Imported ROCm flash_attn-related xformers modules

eea4631

Ported vLLM 0.2.x to ROCm

fc2d074

Added hip adaptation in squeezellm layer

998a80d

Merged latest vllm main branch

cddb9b2

Added multi-gpu support, workaround for safetensors weight loading pr…

726cddf

…oblem and xformers license

vLLM vllm-project#1531

bf999b1

Omitted certain block sizes

7f5cf5b

Forced contiguous qkv for flash attention1 support

edab2f4

Fixed whatever this is

1c1bb0f

Removed debug print

1815c0a

Updated readme

749bc86

Disabled awq for now

b4d6f2e

Adapt to ROCm flash-attention2 interface

9be4bba

Update readme

077c77c

Update readme

3a0eea4

Merge branch 'main' into v0.2.1.post1-rocm-dev

89e8cf4

Merge branch 'main' into v0.2.1.post1-rocm-dev

168b6e6

Update docker

343d234

Update readme

5abe1e5

tanpinsiang mentioned this pull request Nov 22, 2023

Merging with vLLM main branch EmbeddedLLM/vllm#12

Closed

simon-mo reviewed Nov 22, 2023

View reviewed changes

tjtanaa mentioned this pull request Nov 22, 2023

Roadmap EmbeddedLLM/vllm#4

Closed

15 tasks

hongxiayang reviewed Nov 26, 2023

View reviewed changes

tjtanaa mentioned this pull request Nov 29, 2023

[Continuation] Merge EmbeddedLLM/vllm-rocm into vLLM main #1836

Merged

6 tasks

tjtanaa closed this Nov 29, 2023

kliuae deleted the vllm-rocm-merge-to-vllm branch December 1, 2023 17:21

WoosukKwon mentioned this pull request Dec 10, 2023

[Do not merge] Hacks for the ROCm port #1314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge EmbeddedLLM/vllm-rocm into vLLM main#1749

Merge EmbeddedLLM/vllm-rocm into vLLM main#1749
tjtanaa wants to merge 20 commits into
vllm-project:mainfrom
EmbeddedLLM:vllm-rocm-merge-to-vllm

tjtanaa commented Nov 22, 2023 •

edited

Loading

Uh oh!

simon-mo left a comment

Uh oh!

simon-mo Nov 22, 2023

Uh oh!

hongxiayang Nov 26, 2023 •

edited

Loading

Uh oh!

tanpinsiang Nov 26, 2023

Uh oh!

hongxiayang Nov 27, 2023 •

edited

Loading

Uh oh!

tjtanaa commented Nov 29, 2023

Uh oh!

simon-mo commented Dec 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

		FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1

Uh oh!

Conversation

tjtanaa commented Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simon-mo left a comment

Choose a reason for hiding this comment

Uh oh!

simon-mo Nov 22, 2023

Choose a reason for hiding this comment

Uh oh!

hongxiayang Nov 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanpinsiang Nov 26, 2023

Choose a reason for hiding this comment

Uh oh!

hongxiayang Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Nov 29, 2023

Uh oh!

simon-mo commented Dec 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tjtanaa commented Nov 22, 2023 •

edited

Loading

hongxiayang Nov 26, 2023 •

edited

Loading

hongxiayang Nov 27, 2023 •

edited

Loading