Skip to content

[Model][Hardware][NV] Add support for ModelOpt static scaling checkpoints#5387

Closed
pavanimajety wants to merge 1 commit intovllm-project:mainfrom
pavanimajety:read_ammo_chkpoint
Closed

[Model][Hardware][NV] Add support for ModelOpt static scaling checkpoints#5387
pavanimajety wants to merge 1 commit intovllm-project:mainfrom
pavanimajety:read_ammo_chkpoint

Conversation

@pavanimajety
Copy link
Copy Markdown
Collaborator

This change adds support for running ModelOpt FP8 checkpoints. The change converts the names of keys from ModelOpt to vLLM recognized key names in FP8 quantization mode.

"""Replaces the names of *quantizer._amax to _scale."""
replacements = {
"weight_quantizer._amax": "weight_scale",
"input_quantizer._amax": "act_scale",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just tweaked this to input_scale FYI ahead of the v0.5.0 beta launch

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

robertgshaw2-redhat commented Jun 10, 2024

Does this have to be implemented in llama.py or could this logic be generic to all models and implemented in our existing FP8Linearmethod?

"""Replaces the names of *quantizer._amax to _scale."""
replacements = {
"weight_quantizer._amax": "weight_scale",
"input_quantizer._amax": "act_scale",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated such that act_scale -> input_scale #5353

weights_to_convert = []
vllm_state_dict = {}
for key, value in input_state_dict.items():
if key.endswith("_amax"):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to make this as specific as possible to avoid possible conflicts -- would if key.endswith("_quantizer._amax"): work?

else:
return key, value

def _convert_ammo_weights(self, input_state_dict: Dict[str, torch.tensor]):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ammo is no the product name. Let's use modelopt instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants