Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support vision module w8a8 inference #2308

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

AllentDan
Copy link
Collaborator

@AllentDan AllentDan commented Aug 14, 2024

Running w8a8 for vision module and awq for llm.

lmdeploy lite auto_awq /path/of/InternVL2-2B --work-dir InternVL2-2B-AWQ-VisionSmooth --calib-image tiger.jpeg
lmdeploy serve api_server InternVL2-2B-AWQ-VisionSmooth --model-format awq

Running w8a8 for both llm and vision module.

lmdeploy lite smooth_quant /path/of/InternVL2-2B --work-dir InternVL2-2B-W8A8-Vision --calib-image tiger.jpeg
lmdeploy serve api_server InternVL2-2B-W8A8-Vision --backend pytorch

Note

  • --tp is not supported since the triton kernel can not get the right stream when using accelerate to dispatch modules.
  • Only InternVL2-2B was verified.

@AllentDan AllentDan changed the title 【Feature】Support vision module w8a8 inference [Feature] Support vision module w8a8 inference Aug 14, 2024
@@ -342,7 +352,8 @@ def calib_search_scale(parser):

return parser.add_argument(
'--search-scale',
type=bool,
action='store_true',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is search-scale time consuming?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is time-consuming. By default, it is False.

"""Add argument calib_image to parser."""

return parser.add_argument(
'--calib-image',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one image?

@lvhan028 lvhan028 requested a review from grimoire September 14, 2024 03:19
@lvhan028 lvhan028 removed the request for review from grimoire September 14, 2024 03:19
@lvhan028
Copy link
Collaborator

@AllentDan @irexyc @RunningLeon
Let's discuss about seperating vision part, audio part and LLM part after the Mid-Autumn Festival

@AllentDan
Copy link
Collaborator Author

Accuracy on MMStar

InternVL2-2B InternVL2-2B-AWQ InternVL2-2B-AWQ-VisionW8A8
0.498 0.495 0.477

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants