Skip to content

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale#5353

Merged
mgoin merged 7 commits intovllm-project:mainfrom
neuralmagic:fp8-input-scale
Jun 8, 2024
Merged

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale#5353
mgoin merged 7 commits intovllm-project:mainfrom
neuralmagic:fp8-input-scale

Conversation

@mgoin
Copy link
Copy Markdown
Member

@mgoin mgoin commented Jun 7, 2024

In tandem with neuralmagic/AutoFP8#11.

BREAKING CHANGE: Because there can be input and output scales (kv_scale is an example of an "output" scale for k_proj+v_proj), the current usage of act_scale is quite vague. We would like to be explicit here to properly support future formats, so we are proposing to change act_scale —> input_scale.

Obviously this is going to be a breaking change for the checkpoints we have currently and against previous releases of vLLM. We think this is the right time to make such a change before we “finalize” the beta with v0.5.0 next week.

Here is a script for rewriting checkpoints with act_scale to use the new input_scale format:

import safetensors.torch
import json
import os
import argparse

def rename_tensors_in_directory(directory):

    for filename in os.listdir(directory):
        # Handle safetensors index files
        if filename.endswith('.safetensors.index.json'):
            # Load the index file
            index_file_path = os.path.join(directory, filename)
            print(f"Updating keys in {index_file_path}")
            with open(index_file_path, 'r') as f:
                index = json.load(f)

            # Rename index
            renamed_index = {}
            for name, location in index.items():
                new_name = name.replace('act_scale', 'input_scale')
                renamed_index[new_name] = location
            
            # Write the new index file
            with open(index_file_path, 'w') as f:
                json.dump(renamed_index, f, indent=2)

        # Handle safetensors files with data
        elif filename.endswith('.safetensors'):
            # Load the tensors from the safetensors file
            data_file_path = os.path.join(directory, filename)
            print(f"Updating keys in {data_file_path}")
            tensors = safetensors.torch.load_file(data_file_path)
            
            # Rename tensors
            renamed_tensors = {}
            for name, tensor in tensors.items():
                new_name = name.replace('act_scale', 'input_scale')
                renamed_tensors[new_name] = tensor
            
            # Save the modified tensors to the same safetensors file
            safetensors.torch.save_file(renamed_tensors, data_file_path)
        
        else:
            skipped_file_path = os.path.join(directory, filename)
            print(f"Skipping {skipped_file_path}")
        
    print(f"Tensors renamed and overwritten in the directory {directory}")

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Rename act_scale tensors to input_scale in safetensors files.')
    parser.add_argument('directory', type=str, help='The directory containing the safetensors files and index file.')

    args = parser.parse_args()
    rename_tensors_in_directory(args.directory)

Comment thread vllm/model_executor/layers/quantization/fp8.py Outdated
Comment thread vllm/model_executor/layers/quantization/fp8.py Outdated
Copy link
Copy Markdown
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

Thanks Michael!

@mgoin mgoin merged commit c09dade into vllm-project:main Jun 8, 2024
@mgoin mgoin deleted the fp8-input-scale branch June 8, 2024 17:54
robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 9, 2024
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jun 10, 2024
robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 11, 2024
joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants