[Misc] Clean up Kimi-audio whisper encoder loading#36903
[Misc] Clean up Kimi-audio whisper encoder loading#36903Isotr0py merged 7 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
There was a problem hiding this comment.
Code Review
This pull request introduces a subfolder option to the DefaultModelLoader and refactors the Kimi-audio model to use this new, cleaner mechanism for loading the Whisper encoder weights. This is a significant improvement in simplifying the weight loading logic. However, I've identified a critical issue with the new skip_prefixes configuration that would likely prevent any model weights from being loaded. Additionally, there's a high-severity maintainability concern due to a large block of duplicated code for weight loading.
| def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]: | ||
| stacked_params_mapping = [ | ||
| # (param_name, shard_name, shard_id) | ||
| ("qkv_proj", "q_proj", "q"), | ||
| ("qkv_proj", "k_proj", "k"), | ||
| ("qkv_proj", "v_proj", "v"), | ||
| ] | ||
| params_dict = dict(self.named_parameters()) | ||
| loaded_params: set[str] = set() | ||
| for name, loaded_weight in weights: | ||
| for param_name, weight_name, shard_id in stacked_params_mapping: | ||
| if weight_name not in name: | ||
| continue | ||
| name = name.replace(weight_name, param_name) | ||
| # Skip loading extra bias for GPTQ models. | ||
| if name.endswith(".bias") and name not in params_dict: | ||
| continue | ||
|
|
||
| param = params_dict[name] | ||
| weight_loader = param.weight_loader | ||
| weight_loader(param, loaded_weight, shard_id) | ||
| break | ||
| else: | ||
| # Skip loading extra bias for GPTQ models. | ||
| if name.endswith(".bias") and name not in params_dict: | ||
| continue | ||
|
|
||
| param = params_dict[name] | ||
| weight_loader = getattr(param, "weight_loader", default_weight_loader) | ||
| weight_loader(param, loaded_weight) | ||
| loaded_params.add(name) | ||
| return loaded_params |
There was a problem hiding this comment.
This load_weights method is nearly identical to the implementation in WhisperModel.load_weights. This significant code duplication can create maintenance challenges, as updates to one implementation may not be reflected in the other. To improve maintainability and code reuse, consider refactoring this logic into a shared utility function or a mixin class that can be used by both KimiAudioWhisperEncoder and WhisperModel.
|
Let me double check model loading from remote repo with weights downloading tomorrow. |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
Model loading from remote repo should work well now: |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Purpose
subfoldertoDefaultModelLoader.Source, which can allow us to load model components from subfolder similar to vLLM-Omni's diffusion loader: https://github.com/vllm-project/vllm-omni/blob/36c6cb9634306f0f250d691b41da422a97d0aff1/vllm_omni/diffusion/model_loader/diffusers_loader.py#L50-L73Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.