Add FP8 postprocess_measure.py by astachowiczhabana · Pull Request #2238 · huggingface/optimum-habana

astachowiczhabana · 2025-09-02T17:59:09Z

This pull request introduces an important update to the quantization workflow for FP8 models by requiring post-processing of measurement artifacts before quantization. The changes include a new helper script to ensure correct cache input mapping, updated documentation with usage instructions, and reminders throughout the quantization examples to use the post-processing step. These updates help prevent accuracy degradation by fixing scaling statistics and cache associations.

Quantization workflow improvements:

Added quantization_tools/postprocess_measure.py, a new script that post-processes measurement JSON/NPZ files to fix K/V cache tensor mappings for attention layers, including support for DeepSeek/MLA models.
Updated examples/text-generation/README.md to document the need for the post-processing step after measurement and before quantization, including example usage and rationale.

Documentation and user guidance:

Inserted reminders and instructions in all quantization example sections to run the post-processing script on measurement directories before quantizing models, ensuring users follow the correct workflow. [1] [2] [3] [4] [5] [6]

github-actions · 2025-09-02T17:59:50Z

The code quality check failed, please run make style.

HuggingFaceDocBuilderDev · 2025-09-02T18:03:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yafshar · 2025-09-02T18:59:50Z

+            if f'model.layers.{layer_index}.self_attn.{v_cache_name}' in node_name:
+                oh_v_cache_input = node_info['inputs'][0]
+
+        if matmul_av_input != v_cache_input:


Suggested change

if matmul_av_input != v_cache_input:

if matmul_av_input is not None and v_cache_input is not None and matmul_av_input != v_cache_input:

@astachowiczhabana I suggest to add validation to ensure inputs are not None

the same for below comparisons

yafshar · 2025-09-02T19:05:27Z

+        description="Run the measurements parser", formatter_class=argparse.ArgumentDefaultsHelpFormatter
+    )
+    parser.add_argument(
+        "-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed"


Suggested change

"-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed"

"-m", "--measurements", type=str, required=True, help="full path to the directory of the measurements that should be fixed"

It sounds to me this arg is required otherwise it will fail

pbielak · 2025-09-05T07:48:42Z

 --bf16
 ```

+(After this measurement run, execute the post-processing step before proceeding.)


Why not use the same sentence in all places below?

pbielak · 2025-09-05T07:51:35Z

+            if f'model.layers.{layer_index}.self_attn.{v_cache_name}' in node_name:
+                oh_v_cache_input = node_info['inputs'][0]
+
+        if matmul_av_input != v_cache_input:


Maybe use the same validation "trick" as in other parts of OH, i.e.:

if (matmul_av_input is not None) ^ (v_cache_input is not None): ....

pbielak · 2025-09-05T07:53:21Z

+                json_data['Nodes'][f'model.layers.{layer_index}.self_attn.{attn_name}.impl.matmul_av']['inputs'][1] = k_cache_input
+            else:
+                json_data['Nodes'][f'model.layers.{layer_index}.self_attn.attn.impl.matmul_av']['inputs'][1] = v_cache_input
+        if matmul_qk_input != k_cache_input:


pbielak · 2025-09-05T07:53:26Z

+            json_data['Nodes'][f'model.layers.{layer_index}.self_attn.attn.impl.matmul_qk']['inputs'][1] = k_cache_input
+
+        # Flash attention
+        if fsdpa_k_input != oh_k_cache_input:


pbielak · 2025-09-05T07:54:28Z

+        description="Run the measurements parser", formatter_class=argparse.ArgumentDefaultsHelpFormatter
+    )
+    parser.add_argument(
+        "-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed"


Why not call it measurements_dir?

pbielak · 2025-09-05T07:59:24Z

+    measurements_paths_ranges = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(
+        ".json") and 'MAXABS_HW' not in measurement_path and "mod_list" not in measurement_path]


Suggested change

measurements_paths_ranges = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(

".json") and 'MAXABS_HW' not in measurement_path and "mod_list" not in measurement_path]

measurements_paths_ranges = [

path

for path in measurements_paths

if path.endswith(".json") and 'MAXABS_HW' not in path and "mod_list" not in path

]

pbielak · 2025-09-05T08:00:11Z

+    measurements_paths_scales = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(
+        ".json") and 'MAXABS_HW' in measurement_path and "mod_list" not in measurement_path]


Suggested change

measurements_paths_scales = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(

".json") and 'MAXABS_HW' in measurement_path and "mod_list" not in measurement_path]

measurements_paths_scales = [

path

for path in measurements_paths

if path.endswith(".json") and 'MAXABS_HW' in path and "mod_list" not in path

]

pbielak · 2025-09-05T08:04:38Z

+        fixed_json_path = os.path.join(
+            output_path, f"{measurement.split(os.sep)[-1]}")


Do we need this code fragment: measurement.split(os.sep)[-1]? Shouldn't measurement only contain filenames?

If splitting is necessary, I would suggest to go with: os.path.basename instead:

Suggested change

fixed_json_path = os.path.join(

output_path, f"{measurement.split(os.sep)[-1]}")

fixed_json_path = os.path.join(output_path, os.path.basename(measurement))

pbielak · 2025-09-05T08:07:16Z

+                print("measurement=", measurement, flush=True)
+                print("measurements_paths_scales=",
+                      measurements_paths_scales, flush=True)
+                if measurement in measurements_paths_ranges + measurements_paths_scales:


This will always be true - note that we are iterating over the same files in the outermost for loop:

for measurement in measurements_paths_ranges + measurements_paths_scales:

pbielak · 2025-09-05T08:08:49Z

+                print("measurement=", measurement, flush=True)
+                print("measurements_paths_scales=",
+                      measurements_paths_scales, flush=True)
+                if measurement in measurements_paths_ranges + measurements_paths_scales:


This part doesn't need the json_file and fixed_json_file to be open. Let's move this block out to a lower indentation level.

pbielak · 2025-09-16T10:53:27Z

@astachowiczhabana please close in favor of #2260

Add FP8 postprocess_measure.py

38cb2c1

yafshar reviewed Sep 2, 2025

View reviewed changes

karol-brejna-i assigned pbielak Sep 4, 2025

pbielak suggested changes Sep 5, 2025

View reviewed changes

pbielak mentioned this pull request Sep 16, 2025

Add measurements postprocessing script #2260

Merged

astachowiczhabana closed this Sep 17, 2025

	if matmul_av_input != v_cache_input:
	if matmul_av_input is not None and v_cache_input is not None and matmul_av_input != v_cache_input:

	"-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed"
	"-m", "--measurements", type=str, required=True, help="full path to the directory of the measurements that should be fixed"

		measurements_paths_ranges = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(
		".json") and 'MAXABS_HW' not in measurement_path and "mod_list" not in measurement_path]

		measurements_paths_scales = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith(
		".json") and 'MAXABS_HW' in measurement_path and "mod_list" not in measurement_path]

		fixed_json_path = os.path.join(
		output_path, f"{measurement.split(os.sep)[-1]}")

	fixed_json_path = os.path.join(
	output_path, f"{measurement.split(os.sep)[-1]}")
	fixed_json_path = os.path.join(output_path, os.path.basename(measurement))

Conversation

astachowiczhabana commented Sep 2, 2025

Uh oh!

github-actions Bot commented Sep 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 2, 2025

Uh oh!

yafshar Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pbielak commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yafshar Sep 2, 2025 •

edited

Loading