Skip to content

Add FP8 postprocess_measure.py#2238

Closed
astachowiczhabana wants to merge 1 commit into
huggingface:mainfrom
HabanaAI:dev/astachowicz/fp8_postprocess
Closed

Add FP8 postprocess_measure.py#2238
astachowiczhabana wants to merge 1 commit into
huggingface:mainfrom
HabanaAI:dev/astachowicz/fp8_postprocess

Conversation

@astachowiczhabana
Copy link
Copy Markdown
Collaborator

This pull request introduces an important update to the quantization workflow for FP8 models by requiring post-processing of measurement artifacts before quantization. The changes include a new helper script to ensure correct cache input mapping, updated documentation with usage instructions, and reminders throughout the quantization examples to use the post-processing step. These updates help prevent accuracy degradation by fixing scaling statistics and cache associations.

Quantization workflow improvements:

  • Added quantization_tools/postprocess_measure.py, a new script that post-processes measurement JSON/NPZ files to fix K/V cache tensor mappings for attention layers, including support for DeepSeek/MLA models.
  • Updated examples/text-generation/README.md to document the need for the post-processing step after measurement and before quantization, including example usage and rationale.

Documentation and user guidance:

  • Inserted reminders and instructions in all quantization example sections to run the post-processing script on measurement directories before quantizing models, ensuring users follow the correct workflow. [1] [2] [3] [4] [5] [6]

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 2, 2025

The code quality check failed, please run make style.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

if f'model.layers.{layer_index}.self_attn.{v_cache_name}' in node_name:
oh_v_cache_input = node_info['inputs'][0]

if matmul_av_input != v_cache_input:
Copy link
Copy Markdown
Contributor

@yafshar yafshar Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if matmul_av_input != v_cache_input:
if matmul_av_input is not None and v_cache_input is not None and matmul_av_input != v_cache_input:

@astachowiczhabana I suggest to add validation to ensure inputs are not None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same for below comparisons

description="Run the measurements parser", formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument(
"-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed"
"-m", "--measurements", type=str, required=True, help="full path to the directory of the measurements that should be fixed"

It sounds to me this arg is required otherwise it will fail

--bf16
```

(After this measurement run, execute the post-processing step before proceeding.)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the same sentence in all places below?

if f'model.layers.{layer_index}.self_attn.{v_cache_name}' in node_name:
oh_v_cache_input = node_info['inputs'][0]

if matmul_av_input != v_cache_input:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the same validation "trick" as in other parts of OH, i.e.:

if (matmul_av_input is not None) ^ (v_cache_input is not None):
    ....

json_data['Nodes'][f'model.layers.{layer_index}.self_attn.{attn_name}.impl.matmul_av']['inputs'][1] = k_cache_input
else:
json_data['Nodes'][f'model.layers.{layer_index}.self_attn.attn.impl.matmul_av']['inputs'][1] = v_cache_input
if matmul_qk_input != k_cache_input:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

json_data['Nodes'][f'model.layers.{layer_index}.self_attn.attn.impl.matmul_qk']['inputs'][1] = k_cache_input

# Flash attention
if fsdpa_k_input != oh_k_cache_input:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

description="Run the measurements parser", formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument(
"-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call it measurements_dir?

Comment on lines +100 to +101
measurements_paths_ranges = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(
".json") and 'MAXABS_HW' not in measurement_path and "mod_list" not in measurement_path]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
measurements_paths_ranges = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(
".json") and 'MAXABS_HW' not in measurement_path and "mod_list" not in measurement_path]
measurements_paths_ranges = [
path
for path in measurements_paths
if path.endswith(".json") and 'MAXABS_HW' not in path and "mod_list" not in path
]

Comment on lines +102 to +103
measurements_paths_scales = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(
".json") and 'MAXABS_HW' in measurement_path and "mod_list" not in measurement_path]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
measurements_paths_scales = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(
".json") and 'MAXABS_HW' in measurement_path and "mod_list" not in measurement_path]
measurements_paths_scales = [
path
for path in measurements_paths
if path.endswith(".json") and 'MAXABS_HW' in path and "mod_list" not in path
]

Comment on lines +107 to +108
fixed_json_path = os.path.join(
output_path, f"{measurement.split(os.sep)[-1]}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this code fragment: measurement.split(os.sep)[-1]? Shouldn't measurement only contain filenames?

If splitting is necessary, I would suggest to go with: os.path.basename instead:

Suggested change
fixed_json_path = os.path.join(
output_path, f"{measurement.split(os.sep)[-1]}")
fixed_json_path = os.path.join(output_path, os.path.basename(measurement))

print("measurement=", measurement, flush=True)
print("measurements_paths_scales=",
measurements_paths_scales, flush=True)
if measurement in measurements_paths_ranges + measurements_paths_scales:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always be true - note that we are iterating over the same files in the outermost for loop:

    for measurement in measurements_paths_ranges + measurements_paths_scales:

print("measurement=", measurement, flush=True)
print("measurements_paths_scales=",
measurements_paths_scales, flush=True)
if measurement in measurements_paths_ranges + measurements_paths_scales:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part doesn't need the json_file and fixed_json_file to be open. Let's move this block out to a lower indentation level.

@pbielak
Copy link
Copy Markdown
Collaborator

pbielak commented Sep 16, 2025

@astachowiczhabana please close in favor of #2260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants