[Feature] Support vision module w8a8 inference #2308

AllentDan · 2024-08-14T09:11:41Z

Running w8a8 for vision module and awq for llm.

lmdeploy lite auto_awq /path/of/InternVL2-2B --work-dir InternVL2-2B-AWQ-VisionSmooth --calib-image tiger.jpeg
lmdeploy serve api_server InternVL2-2B-AWQ-VisionSmooth --model-format awq

Running w8a8 for both llm and vision module.

lmdeploy lite smooth_quant /path/of/InternVL2-2B --work-dir InternVL2-2B-W8A8-Vision --calib-image tiger.jpeg
lmdeploy serve api_server InternVL2-2B-W8A8-Vision --backend pytorch

Note

--tp is not supported since the triton kernel can not get the right stream when using accelerate to dispatch modules.
Only InternVL2-2B was verified.

Conflicts: lmdeploy/lite/apis/calibrate.py lmdeploy/pytorch/configurations/internvl.py

lvhan028 · 2024-09-05T03:05:08Z

lmdeploy/cli/utils.py

@@ -342,7 +352,8 @@ def calib_search_scale(parser):

        return parser.add_argument(
            '--search-scale',
-            type=bool,
+            action='store_true',


Is search-scale time consuming?

Yes, it is time-consuming. By default, it is False.

lvhan028 · 2024-09-05T03:05:43Z

lmdeploy/cli/utils.py

+        """Add argument calib_image to parser."""
+
+        return parser.add_argument(
+            '--calib-image',


Only one image?

lvhan028 · 2024-09-14T03:40:29Z

@AllentDan @irexyc @RunningLeon
Let's discuss about seperating vision part, audio part and LLM part after the Mid-Autumn Festival

AllentDan · 2024-09-26T07:14:22Z

Accuracy on MMStar

InternVL2-2B	InternVL2-2B-AWQ	InternVL2-2B-AWQ-VisionW8A8
0.498	0.495	0.477

AllentDan added 5 commits August 7, 2024 15:46

WIP

07c39dc

add a fused layer norm kernel

c1ecec9

Tried tp=2, but accelerate can't give triton the correct cuda stream

8f198ac

refactor lite and fuse awq and smooth_quant for vision model

e86504f

Merge branch 'main' into vision-w8a8

d994103

Conflicts: lmdeploy/lite/apis/calibrate.py lmdeploy/pytorch/configurations/internvl.py

AllentDan changed the title ~~【Feature】Support vision module w8a8 inference~~ [Feature] Support vision module w8a8 inference Aug 14, 2024

AllentDan added 2 commits August 15, 2024 13:26

fix cli

d7f3226

fix

37cf54e

AllentDan mentioned this pull request Aug 26, 2024

More w8a8 models #2373

Closed

lvhan028 mentioned this pull request Aug 28, 2024

[Feature] LLaVA 1.5 WINT 8 量化 #2336

Open

lvhan028 reviewed Sep 5, 2024

View reviewed changes

lmdeploy/cli/utils.py

"""Add argument calib_image to parser."""

return parser.add_argument(

'--calib-image',

Copy link

Collaborator

lvhan028 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one image?

lvhan028 requested a review from grimoire September 14, 2024 03:19

lvhan028 added the improvement label Sep 14, 2024

lvhan028 removed the request for review from grimoire September 14, 2024 03:19

AllentDan mentioned this pull request Sep 24, 2024

[Feature] Will multi-modal models support W8A8 quantization in the future? #2496

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support vision module w8a8 inference #2308

[Feature] Support vision module w8a8 inference #2308

AllentDan commented Aug 14, 2024 •

edited

Loading

lvhan028 Sep 5, 2024

AllentDan Sep 6, 2024

lvhan028 Sep 5, 2024

lvhan028 commented Sep 14, 2024

AllentDan commented Sep 26, 2024

[Feature] Support vision module w8a8 inference #2308

Are you sure you want to change the base?

[Feature] Support vision module w8a8 inference #2308

Conversation

AllentDan commented Aug 14, 2024 • edited Loading

lvhan028 Sep 5, 2024

Choose a reason for hiding this comment

AllentDan Sep 6, 2024

Choose a reason for hiding this comment

lvhan028 Sep 5, 2024

Choose a reason for hiding this comment

lvhan028 commented Sep 14, 2024

AllentDan commented Sep 26, 2024

AllentDan commented Aug 14, 2024 •

edited

Loading