change quant conf example to use act_maxabs_pow2_weights_pcs_opt_pow2 by HolyFalafel · Pull Request #763 · huggingface/optimum-habana

HolyFalafel · 2024-03-05T11:31:38Z

change quant conf example to use act_maxabs_pow2_weights_pcs_opt_pow2

regisss · 2024-03-05T11:53:35Z

@HolyFalafel What's the difference between both methods?

HuggingFaceDocBuilderDev · 2024-03-05T11:56:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

HolyFalafel · 2024-03-05T12:20:26Z

@regisss

maxabs_hw - Scale is calculated from the maxabs measurement and then aligned to the corresponding HW accelerated scale.
maxabs_hw_opt_weight - Scale of model params (weights) is chosen as the scale that provides minimal mean-square-error between quantized and unquantized weights, from all possible HW accelerated scales. Scale of activations is calculated in same manner as maxabs_hw.
maxabs_pow2 - Scale is calculated from the maxabs measurement and then rounded to the power of 2.
act_maxabs_pow2_weights_pcs_opt_pow2 - Scale of model params (weights) is calculated per-channel of the params tesnor. The scale per-channel is calculated in same manner as maxabs_hw_opt_weight. Scale of activations is calculated in same manner as maxabs_pow2.
act_maxabs_hw_weights_pcs_maxabs_pow2 - Scale of model params (weights) is calculated per-channel of the params tesnor. The scale per-channel is calculated in same manner as maxabs_pow2. Scale of activations is calculated in same manner as maxabs_hw.

The method we changed into gave us better accuracy

regisss · 2024-03-05T13:36:38Z

Thanks for the explanation!

Your branch is quite far from the head of the main branch (it still relies on Transformers v4.34), could you rebase it so that I can test it please?

HolyFalafel · 2024-03-05T13:42:19Z

I wanted to minimize the number of commits. I'll rebase

…re_1.10.0

regisss · 2024-03-06T02:08:29Z

Just to make sure I test it correctly, I first need to measure with

QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1

and then run the quantized model with

QUANT_CONFIG=./quantization_config/act_maxabs_hw_weights_pcs_maxabs_pow2_quant.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  --fp8

right?

regisss · 2024-03-06T02:14:59Z

The second command returns this error:

File "/root/workspace/fork/examples/text-generation/utils.py", line 246, in setup_distributed_model
    habana_quantization_toolkit.prep_model(model)
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/prepare_quant/prepare_model.py", line 12, in prep_model
    prepare_model(model)  # registers hooks
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/__init__.py", line 45, in prepare_model
    return quantize_hooks(model, mod_list)
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/quantize.py", line 67, in quantize_hooks
    scale_convert_method = scale_convert_methods[scaling_method_name]
KeyError: 'act_maxabs_pts_pow2_weights_opt_pcs_pow2'

Is it supposed to work with v1.14 or should I wait for v1.15?

HolyFalafel · 2024-03-06T06:02:11Z

The second command returns this error:

File "/root/workspace/fork/examples/text-generation/utils.py", line 246, in setup_distributed_model
    habana_quantization_toolkit.prep_model(model)
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/prepare_quant/prepare_model.py", line 12, in prep_model
    prepare_model(model)  # registers hooks
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/__init__.py", line 45, in prepare_model
    return quantize_hooks(model, mod_list)
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/quantize.py", line 67, in quantize_hooks
    scale_convert_method = scale_convert_methods[scaling_method_name]
KeyError: 'act_maxabs_pts_pow2_weights_opt_pcs_pow2'

Is it supposed to work with v1.14 or should I wait for v1.15?

Yes, it's a change done in v1.15, so let's wait for it

Just to make sure I test it correctly, I first need to measure with

QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1

and then run the quantized model with

QUANT_CONFIG=./quantization_config/act_maxabs_hw_weights_pcs_maxabs_pow2_quant.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  --fp8

right?

This looks correct. @bgoldberg-habana any remarks?

HolyFalafel · 2024-03-06T09:25:25Z

@regisss
These are the commands for the whole process:

Please notice that we measure with run_lm_eval.py to catch more data.

Measurement:

QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lm_eval.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  -o 70b_bs1_measure.txt

Per Tensor Quantization (PTQ):

QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lm_eval.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  --fp8 \
  -o 70b_bs1_ptq_quant.txt

Per Channel Quantization (PCQ):

QUANT_CONFIG=./quantization_config/act_maxabs_hw_weights_pcs_maxabs_pow2_quant.json  python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lm_eval.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  --fp8 \
  -o 70b_bs1_pcq_quant.out

You can also run on 7B

regisss

LGTM!

Waiting for the release of Synapse v1.15 to merge.

regisss · 2024-03-22T22:05:47Z

@HolyFalafel I guess we can close this PR as this change was present in #765 too?

HolyFalafel · 2024-03-24T05:44:25Z

@HolyFalafel I guess we can close this PR as this change was present in #765 too?

Right

change quant conf example to use act_maxabs_pow2_weights_pcs_opt_pow2

3b887c8

HolyFalafel requested a review from regisss as a code owner March 5, 2024 11:31

regisss added the run-test Run CI for PRs from external contributors label Mar 5, 2024

Merge branch 'huggingface:main' into dev/dsemiat/quant_conf_example_p…

733b9c0

…re_1.10.0

regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Mar 6, 2024

libinta added the synapse 1.15 label Mar 6, 2024

regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Mar 11, 2024

regisss approved these changes Mar 11, 2024

View reviewed changes

regisss changed the base branch from main to synapse_1.15 March 22, 2024 21:52

regisss closed this Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change quant conf example to use act_maxabs_pow2_weights_pcs_opt_pow2#763

change quant conf example to use act_maxabs_pow2_weights_pcs_opt_pow2#763
HolyFalafel wants to merge 2 commits into
huggingface:synapse_1.15from
HabanaAI:dev/dsemiat/quant_conf_example_pre_1.10.0

HolyFalafel commented Mar 5, 2024

Uh oh!

regisss commented Mar 5, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Mar 5, 2024

Uh oh!

HolyFalafel commented Mar 5, 2024 •

edited

Loading

Uh oh!

regisss commented Mar 5, 2024

Uh oh!

HolyFalafel commented Mar 5, 2024

Uh oh!

regisss commented Mar 6, 2024

Uh oh!

regisss commented Mar 6, 2024

Uh oh!

HolyFalafel commented Mar 6, 2024

Uh oh!

HolyFalafel commented Mar 6, 2024 •

edited

Loading

Uh oh!

regisss left a comment

Uh oh!

regisss commented Mar 22, 2024

Uh oh!

HolyFalafel commented Mar 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

HolyFalafel commented Mar 5, 2024

Uh oh!

regisss commented Mar 5, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Mar 5, 2024

Uh oh!

HolyFalafel commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

regisss commented Mar 5, 2024

Uh oh!

HolyFalafel commented Mar 5, 2024

Uh oh!

regisss commented Mar 6, 2024

Uh oh!

regisss commented Mar 6, 2024

Uh oh!

HolyFalafel commented Mar 6, 2024

Uh oh!

HolyFalafel commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

regisss commented Mar 22, 2024

Uh oh!

HolyFalafel commented Mar 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HolyFalafel commented Mar 5, 2024 •

edited

Loading

HolyFalafel commented Mar 6, 2024 •

edited

Loading