Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Mono-InternVL with PyTorch backend #2727

Merged
merged 5 commits into from
Nov 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Contributing to InternLM
## Contributing to LMDeploy

Welcome to the InternLM community, all kinds of contributions are welcomed, including but not limited to
Welcome to the LMDeploy community, all kinds of contributions are welcomed, including but not limited to

**Fix bug**

Expand Down Expand Up @@ -56,7 +56,7 @@ upstream [email protected]:InternLM/lmdeploy.git (push)

#### 2. Configure pre-commit

You should configure [pre-commit](https://pre-commit.com/#intro) in the local development environment to make sure the code style matches that of InternLM. **Note**: The following code should be executed under the lmdeploy directory.
You should configure [pre-commit](https://pre-commit.com/#intro) in the local development environment to make sure the code style matches that of LMDeploy. **Note**: The following code should be executed under the lmdeploy directory.

```shell
pip install -U pre-commit
Expand Down Expand Up @@ -96,7 +96,7 @@ git checkout -b yhc/refactor_contributing_doc
In subsequent development, if the master branch of the local repository is behind the master branch of "upstream", we need to pull the upstream for synchronization, and then execute the above command:

```shell
git pull upstream master
git pull upstream main
```

#### 4. Commit the code and pass the unit test
Expand Down Expand Up @@ -151,7 +151,7 @@ Find more details about Pull Request description in [pull request guidelines](#p

<img src="https://user-images.githubusercontent.com/57566630/167307490-f9ebf9fa-63c0-4d83-8ba1-081ea169eb3a.png" width="1200">

IternLM will run unit test for the posted Pull Request on different platforms (Linux, Window, Mac), based on different versions of Python, PyTorch, CUDA to make sure the code is correct. We can see the specific test information by clicking `Details` in the above image so that we can modify the code.
LMDeploy will run unit test for the posted Pull Request on different platforms (Linux, Window, Mac), based on different versions of Python, PyTorch, CUDA to make sure the code is correct. We can see the specific test information by clicking `Details` in the above image so that we can modify the code.

(3) If the Pull Request passes the CI, then you can wait for the review from other developers. You'll modify the code based on the reviewer's comments, and repeat the steps [4](#4-commit-the-code-and-pass-the-unit-test)-[5](#5-push-the-code-to-remote) until all reviewers approve it. Then, we will merge it ASAP.

Expand All @@ -163,14 +163,14 @@ If your local branch conflicts with the latest master branch of "upstream", you'

```shell
git fetch --all --prune
git rebase upstream/master
git rebase upstream/main
```

or

```shell
git fetch --all --prune
git merge upstream/master
git merge upstream/main
```

If you are very good at handling conflicts, then you can use rebase to resolve conflicts, as this will keep your commit logs tidy. If you are not familiar with `rebase`, then you can use `merge` to resolve conflicts.
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/11\] Support Mono-InternVL with PyTorch engine
- \[2024/10\] PyTorchEngine supports graph mode on ascend platform, doubling the inference speed
- \[2024/09\] LMDeploy PyTorchEngine adds support for [Huawei Ascend](./docs/en/get_started/ascend/get_started.md). See supported models [here](docs/en/supported_models/supported_models.md)
- \[2024/09\] LMDeploy PyTorchEngine achieves 1.3x faster on Llama3-8B inference by introducing CUDA graph
Expand Down Expand Up @@ -155,6 +156,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>DeepSeek-VL (7B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
<li>InternVL2 (1B-76B)</li>
<li>Mono-InternVL (2B)</li>
<li>MiniGeminiLlama (7B)</li>
<li>CogVLM-Chat (17B)</li>
<li>CogVLM2-Chat (19B)</li>
Expand Down
2 changes: 2 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/11\] PyTorch engine 支持 Mono-InternVL 模型
- \[2024/10\] PyTorchEngine 在 ascend 平台上支持了图模式,推理性能提高了 1 倍
- \[2024/09\] LMDeploy PyTorchEngine 增加了对 [华为 Ascend](docs/zh_cn/get_started/ascend/get_started.md) 的支持。支持的模型请见[这里](docs/zh_cn/supported_models/supported_models.md)
- \[2024/09\] 通过引入 CUDA Graph,LMDeploy PyTorchEngine 在 Llama3-8B 推理上实现了 1.3 倍的加速
Expand Down Expand Up @@ -156,6 +157,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>DeepSeek-VL (7B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
<li>InternVL2 (1B-76B)</li>
<li>Mono-InternVL (2B)</li>
<li>MiniGeminiLlama (7B)</li>
<li>CogVLM-Chat (17B)</li>
<li>CogVLM2-Chat (19B)</li>
Expand Down
13 changes: 7 additions & 6 deletions docs/en/multi_modal/internvl.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@

LMDeploy supports the following InternVL series of models, which are detailed in the table below:

| Model | Size | Supported Inference Engine |
| :---------: | :--------: | :------------------------: |
| InternVL | 13B-19B | TurboMind |
| InternVL1.5 | 2B-26B | TurboMind, PyTorch |
| InternVL2 | 1B, 4B | PyTorch |
| InternVL2 | 2B, 8B-76B | TurboMind, PyTorch |
| Model | Size | Supported Inference Engine |
| :-----------: | :--------: | :------------------------: |
| InternVL | 13B-19B | TurboMind |
| InternVL1.5 | 2B-26B | TurboMind, PyTorch |
| InternVL2 | 1B, 4B | PyTorch |
| InternVL2 | 2B, 8B-76B | TurboMind, PyTorch |
| Mono-InternVL | 2B | PyTorch |

The next chapter demonstrates how to deploy an InternVL model using LMDeploy, with [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B) as an example.

Expand Down
1 change: 1 addition & 0 deletions docs/en/multi_modal/vl_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Currently, it supports the following models.
- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)
- [DeepSeek-VL](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat)
- [InternVL](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)
- [Mono-InternVL](https://huggingface.co/OpenGVLab/Mono-InternVL-2B)
- [MGM](https://huggingface.co/YanweiLi/MGM-7B)
- [XComposer](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b)
- [CogVLM](https://github.com/InternLM/lmdeploy/tree/main/docs/en/multi_modal/cogvlm.md)
Expand Down
5 changes: 5 additions & 0 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | No | - |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | No | - |
| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | No | - |
| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | No | - |
| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No |
Expand All @@ -88,6 +89,10 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | No | - |
| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | No | - |

```{note}
* Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
```

## PyTorchEngine on Huawei Ascend Platform

| Model | Size | Type | FP16/BF16 | W4A16 |
Expand Down
17 changes: 9 additions & 8 deletions docs/zh_cn/multi_modal/internvl.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@

LMDeploy 支持 InternVL 系列模型,具体如下:

| Model | Size | Supported Inference Engine |
| :---------: | :--------: | :------------------------: |
| InternVL | 13B-19B | TurboMind |
| InternVL1.5 | 2B-26B | TurboMind, PyTorch |
| InternVL2 | 1B, 4B | PyTorch |
| InternVL2 | 2B, 8B-76B | TurboMind, PyTorch |

本文将以[InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)为例,演示使用 LMDeploy 部署 InternVL 系列模型的方法
| Model | Size | Supported Inference Engine |
| :-----------: | :--------: | :------------------------: |
| InternVL | 13B-19B | TurboMind |
| InternVL1.5 | 2B-26B | TurboMind, PyTorch |
| InternVL2 | 1B, 4B | PyTorch |
| InternVL2 | 2B, 8B-76B | TurboMind, PyTorch |
| Mono-InternVL | 2B | PyTorch |

本文将以[InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)为例,演示使用 LMDeploy 部署 InternVL 系列模型的方法。

## 安装

Expand Down
1 change: 1 addition & 0 deletions docs/zh_cn/multi_modal/vl_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ LMDeploy 把视觉-语言模型(VLM)复杂的推理过程,抽象为简单
- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)
- [DeepSeek-VL](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat)
- [InternVL](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)
- [Mono-InternVL](https://huggingface.co/OpenGVLab/Mono-InternVL-2B)
- [MGM](https://huggingface.co/YanweiLi/MGM-7B)
- [XComposer](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b)
- [CogVLM](https://github.com/InternLM/lmdeploy/tree/main/docs/zh_cn/multi_modal/cogvlm.md)
Expand Down
5 changes: 5 additions & 0 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | No | - |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | No | - |
| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | No | - |
| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | No | - |
| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No |
Expand All @@ -88,6 +89,10 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | No | - |
| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | No | - |

```{note}
* Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
```

## PyTorchEngine 华为昇腾平台

| Model | Size | Type | FP16/BF16 | W4A16 |
Expand Down
3 changes: 2 additions & 1 deletion lmdeploy/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,8 @@ def match(cls, model_path: str) -> Optional[str]:
model_path (str): the model path used for matching.
"""
path = model_path.lower()
if 'internvl2' in path and 'internvl2-4b' not in path:
if ('internvl2' in path
and 'internvl2-4b' not in path) or 'mono-internvl' in path:
return 'internvl2-internlm2'


Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/baichuan.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ def __init__(self,
# build attention layer
self.self_attn = BaichuanAttention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.mlp = MLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/chatglm2.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ def __init__(self,
# build attention layer
self.self_attention = SelfAttention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.mlp = MLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/cogvlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ def __init__(self,
dtype=dtype,
device=device)

# builf MLP
# build MLP
self.mlp = VisionExpertMLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/dbrx.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ def __init__(self,
dtype=dtype,
device=device)

# builf MLP
# build MLP
self.ffn = DbrxFFN(config, dtype=dtype, device=device)

def forward(
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/deepseek.py
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ def __init__(self,
# build attention layer
self.self_attn = DeepseekAttention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.mlp = (DeepseekMoE(config, dtype=dtype, device=device) if
(config.n_routed_experts is not None
and layer_idx >= config.first_k_dense_replace
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/falcon.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ def __init__(self,
dtype=dtype,
device=device)

# builf MLP
# build MLP
self.mlp = FalconMLP(config, dtype=dtype, device=device)

if not hasattr(config, 'num_ln_in_parallel_attn'):
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/gemma.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ def __init__(self,
dtype=dtype,
device=device)

# builf MLP
# build MLP
self.mlp = GemmaMLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/internlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def __init__(self,
# build attention layer
self.self_attn = InternLMAttention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.mlp = InternLMMLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/internlm2.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ def __init__(self,
# build attention layer
self.attention = InternLM2Attention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.feed_forward = InternLM2MLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
Loading
Loading