Add FP8 postprocess_measure.py#2238
Conversation
|
The code quality check failed, please run |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| if f'model.layers.{layer_index}.self_attn.{v_cache_name}' in node_name: | ||
| oh_v_cache_input = node_info['inputs'][0] | ||
|
|
||
| if matmul_av_input != v_cache_input: |
There was a problem hiding this comment.
| if matmul_av_input != v_cache_input: | |
| if matmul_av_input is not None and v_cache_input is not None and matmul_av_input != v_cache_input: |
@astachowiczhabana I suggest to add validation to ensure inputs are not None
There was a problem hiding this comment.
the same for below comparisons
| description="Run the measurements parser", formatter_class=argparse.ArgumentDefaultsHelpFormatter | ||
| ) | ||
| parser.add_argument( | ||
| "-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed" |
There was a problem hiding this comment.
| "-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed" | |
| "-m", "--measurements", type=str, required=True, help="full path to the directory of the measurements that should be fixed" |
It sounds to me this arg is required otherwise it will fail
| --bf16 | ||
| ``` | ||
|
|
||
| (After this measurement run, execute the post-processing step before proceeding.) |
There was a problem hiding this comment.
Why not use the same sentence in all places below?
| if f'model.layers.{layer_index}.self_attn.{v_cache_name}' in node_name: | ||
| oh_v_cache_input = node_info['inputs'][0] | ||
|
|
||
| if matmul_av_input != v_cache_input: |
There was a problem hiding this comment.
Maybe use the same validation "trick" as in other parts of OH, i.e.:
if (matmul_av_input is not None) ^ (v_cache_input is not None):
....| json_data['Nodes'][f'model.layers.{layer_index}.self_attn.{attn_name}.impl.matmul_av']['inputs'][1] = k_cache_input | ||
| else: | ||
| json_data['Nodes'][f'model.layers.{layer_index}.self_attn.attn.impl.matmul_av']['inputs'][1] = v_cache_input | ||
| if matmul_qk_input != k_cache_input: |
| json_data['Nodes'][f'model.layers.{layer_index}.self_attn.attn.impl.matmul_qk']['inputs'][1] = k_cache_input | ||
|
|
||
| # Flash attention | ||
| if fsdpa_k_input != oh_k_cache_input: |
| description="Run the measurements parser", formatter_class=argparse.ArgumentDefaultsHelpFormatter | ||
| ) | ||
| parser.add_argument( | ||
| "-m", "--measurements", type=str, help="full path to the directory of the measurements that should be fixed" |
There was a problem hiding this comment.
Why not call it measurements_dir?
| measurements_paths_ranges = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith( | ||
| ".json") and 'MAXABS_HW' not in measurement_path and "mod_list" not in measurement_path] |
There was a problem hiding this comment.
| measurements_paths_ranges = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith( | |
| ".json") and 'MAXABS_HW' not in measurement_path and "mod_list" not in measurement_path] | |
| measurements_paths_ranges = [ | |
| path | |
| for path in measurements_paths | |
| if path.endswith(".json") and 'MAXABS_HW' not in path and "mod_list" not in path | |
| ] |
| measurements_paths_scales = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith( | ||
| ".json") and 'MAXABS_HW' in measurement_path and "mod_list" not in measurement_path] |
There was a problem hiding this comment.
| measurements_paths_scales = [measurement_path for measurement_path in measurements_paths if measurement_path.endswith( | |
| ".json") and 'MAXABS_HW' in measurement_path and "mod_list" not in measurement_path] | |
| measurements_paths_scales = [ | |
| path | |
| for path in measurements_paths | |
| if path.endswith(".json") and 'MAXABS_HW' in path and "mod_list" not in path | |
| ] |
| fixed_json_path = os.path.join( | ||
| output_path, f"{measurement.split(os.sep)[-1]}") |
There was a problem hiding this comment.
Do we need this code fragment: measurement.split(os.sep)[-1]? Shouldn't measurement only contain filenames?
If splitting is necessary, I would suggest to go with: os.path.basename instead:
| fixed_json_path = os.path.join( | |
| output_path, f"{measurement.split(os.sep)[-1]}") | |
| fixed_json_path = os.path.join(output_path, os.path.basename(measurement)) |
| print("measurement=", measurement, flush=True) | ||
| print("measurements_paths_scales=", | ||
| measurements_paths_scales, flush=True) | ||
| if measurement in measurements_paths_ranges + measurements_paths_scales: |
There was a problem hiding this comment.
This will always be true - note that we are iterating over the same files in the outermost for loop:
for measurement in measurements_paths_ranges + measurements_paths_scales:| print("measurement=", measurement, flush=True) | ||
| print("measurements_paths_scales=", | ||
| measurements_paths_scales, flush=True) | ||
| if measurement in measurements_paths_ranges + measurements_paths_scales: |
There was a problem hiding this comment.
This part doesn't need the json_file and fixed_json_file to be open. Let's move this block out to a lower indentation level.
|
@astachowiczhabana please close in favor of #2260 |
This pull request introduces an important update to the quantization workflow for FP8 models by requiring post-processing of measurement artifacts before quantization. The changes include a new helper script to ensure correct cache input mapping, updated documentation with usage instructions, and reminders throughout the quantization examples to use the post-processing step. These updates help prevent accuracy degradation by fixing scaling statistics and cache associations.
Quantization workflow improvements:
quantization_tools/postprocess_measure.py, a new script that post-processes measurement JSON/NPZ files to fix K/V cache tensor mappings for attention layers, including support for DeepSeek/MLA models.examples/text-generation/README.mdto document the need for the post-processing step after measurement and before quantization, including example usage and rationale.Documentation and user guidance: