[Quant] Molmo SupportsQuant#13336
Conversation
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
|
||
| class MolmoVisionBackbone(nn.Module): | ||
| class MolmoVisionBackbone(nn.Module, SupportsQuant): | ||
| packed_modules_mapping = {"merged_linear": ["gate_proj", "up_proj"]} |
There was a problem hiding this comment.
Can we remove this from L1440 now that it's defined in the inner model?
There was a problem hiding this comment.
I believe that that line is still required by SupportsLoRA
Lines 342 to 343 in ce77eb9
There was a problem hiding this comment.
Theoretically, the packed_modules_mapping attribute should be defined at the lowest level possible, ie on every Module with packed modules.
I think in practice we should add the SupportsQuant and packed_modules_mapping to all classes which define a load_weights method, as it seems like these are the classes which are composable with other models
There was a problem hiding this comment.
I see, thanks for the explanation!
I think in practice we should add the
SupportsQuantandpacked_modules_mappingto all classes which define aload_weightsmethod, as it seems like these are the classes which are composable with other models
Sounds good to me!
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
Purpose
SupportsQuantinterface to Molo models whichpacked_modules_mappingconfigure_quant_configfunction after it has been added to a sufficient number of modelsconfigure_quant_configassumes that allpacked_modules_mappings will be declared prior to initialization. In reality, some models are submodels of each other, so their mappings can only be determined at init time.Prerequisites
SupportsQuantto phi3 and clip #13104Changes
SupportsQuantinterface toMolmoVisionBackbone,MolmoModel, andMolmoForCausalLM