Skip to content

Conversation

@BBuf
Copy link
Collaborator

@BBuf BBuf commented Mar 18, 2025

Motivation

Modifications

Checklist

@zhyncs zhyncs merged commit dd865be into main Mar 18, 2025
3 of 20 checks passed
@zhyncs zhyncs deleted the fix_fp8_w8a8_ci_test branch March 18, 2025 06:17
@qeternity
Copy link
Contributor

This commit has broken loading of older Marlin packed models.

KeyError: 'model.layers.0.mlp.gate_up_proj.B'

Looking into it now.

@qeternity
Copy link
Contributor

qeternity commented Mar 22, 2025

Ok so this is actually related to the deprecation of the SGLang types, which now correctly passes the check_marlin_supported check in GPTQMarlinConfig. Early Marlin reference code set the model config quant method to gptq with flags like is_marlin_format. So now that we are using vllm.scalar_type.ScalarType instead of sglang.srt.layers.quantization.utils.ScalarType the type check passes, and causes this error (previously the type check failed, forcing the Marlin config usage).

Changing the model config to marlin quant method resolves this.

@qeternity
Copy link
Contributor

Not sure if we want to fix this, or just have people change older configs, but PR here: #4675. Feel free to close it if out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants