Skip to content

change quant conf example to use act_maxabs_pow2_weights_pcs_opt_pow2#763

Closed
HolyFalafel wants to merge 2 commits into
huggingface:synapse_1.15from
HabanaAI:dev/dsemiat/quant_conf_example_pre_1.10.0
Closed

change quant conf example to use act_maxabs_pow2_weights_pcs_opt_pow2#763
HolyFalafel wants to merge 2 commits into
huggingface:synapse_1.15from
HabanaAI:dev/dsemiat/quant_conf_example_pre_1.10.0

Conversation

@HolyFalafel
Copy link
Copy Markdown
Contributor

change quant conf example to use act_maxabs_pow2_weights_pcs_opt_pow2

@HolyFalafel HolyFalafel requested a review from regisss as a code owner March 5, 2024 11:31
@regisss regisss added the run-test Run CI for PRs from external contributors label Mar 5, 2024
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 5, 2024

@HolyFalafel What's the difference between both methods?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@HolyFalafel
Copy link
Copy Markdown
Contributor Author

HolyFalafel commented Mar 5, 2024

@regisss

  • maxabs_hw - Scale is calculated from the maxabs measurement and then aligned to the corresponding HW accelerated scale.
  • maxabs_hw_opt_weight - Scale of model params (weights) is chosen as the scale that provides minimal mean-square-error between quantized and unquantized weights, from all possible HW accelerated scales. Scale of activations is calculated in same manner as maxabs_hw.
  • maxabs_pow2 - Scale is calculated from the maxabs measurement and then rounded to the power of 2.
  • act_maxabs_pow2_weights_pcs_opt_pow2 - Scale of model params (weights) is calculated per-channel of the params tesnor. The scale per-channel is calculated in same manner as maxabs_hw_opt_weight. Scale of activations is calculated in same manner as maxabs_pow2.
  • act_maxabs_hw_weights_pcs_maxabs_pow2 - Scale of model params (weights) is calculated per-channel of the params tesnor. The scale per-channel is calculated in same manner as maxabs_pow2. Scale of activations is calculated in same manner as maxabs_hw.

The method we changed into gave us better accuracy

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 5, 2024

Thanks for the explanation!

Your branch is quite far from the head of the main branch (it still relies on Transformers v4.34), could you rebase it so that I can test it please?

@HolyFalafel
Copy link
Copy Markdown
Contributor Author

I wanted to minimize the number of commits. I'll rebase

@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Mar 6, 2024
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 6, 2024

Just to make sure I test it correctly, I first need to measure with

QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1

and then run the quantized model with

QUANT_CONFIG=./quantization_config/act_maxabs_hw_weights_pcs_maxabs_pow2_quant.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  --fp8

right?

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 6, 2024

The second command returns this error:

File "/root/workspace/fork/examples/text-generation/utils.py", line 246, in setup_distributed_model
    habana_quantization_toolkit.prep_model(model)
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/prepare_quant/prepare_model.py", line 12, in prep_model
    prepare_model(model)  # registers hooks
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/__init__.py", line 45, in prepare_model
    return quantize_hooks(model, mod_list)
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/quantize.py", line 67, in quantize_hooks
    scale_convert_method = scale_convert_methods[scaling_method_name]
KeyError: 'act_maxabs_pts_pow2_weights_opt_pcs_pow2'

Is it supposed to work with v1.14 or should I wait for v1.15?

@HolyFalafel
Copy link
Copy Markdown
Contributor Author

The second command returns this error:

File "/root/workspace/fork/examples/text-generation/utils.py", line 246, in setup_distributed_model
    habana_quantization_toolkit.prep_model(model)
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/prepare_quant/prepare_model.py", line 12, in prep_model
    prepare_model(model)  # registers hooks
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/__init__.py", line 45, in prepare_model
    return quantize_hooks(model, mod_list)
  File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/quantize.py", line 67, in quantize_hooks
    scale_convert_method = scale_convert_methods[scaling_method_name]
KeyError: 'act_maxabs_pts_pow2_weights_opt_pcs_pow2'

Is it supposed to work with v1.14 or should I wait for v1.15?

Yes, it's a change done in v1.15, so let's wait for it

Just to make sure I test it correctly, I first need to measure with

QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1

and then run the quantized model with

QUANT_CONFIG=./quantization_config/act_maxabs_hw_weights_pcs_maxabs_pow2_quant.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  --fp8

right?

This looks correct. @bgoldberg-habana any remarks?

@HolyFalafel
Copy link
Copy Markdown
Contributor Author

HolyFalafel commented Mar 6, 2024

@regisss
These are the commands for the whole process:

Please notice that we measure with run_lm_eval.py to catch more data.

Measurement:

QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lm_eval.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  -o 70b_bs1_measure.txt

Per Tensor Quantization (PTQ):

QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lm_eval.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  --fp8 \
  -o 70b_bs1_ptq_quant.txt

Per Channel Quantization (PCQ):

QUANT_CONFIG=./quantization_config/act_maxabs_hw_weights_pcs_maxabs_pow2_quant.json  python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lm_eval.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --attn_softmax_bf16 \
  --use_hpu_graphs \
  --trim_logits \
  --use_kv_cache \
  --reuse_cache \
  --bf16 \
  --batch_size 1 \
  --fp8 \
  -o 70b_bs1_pcq_quant.out

You can also run on 7B

@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Mar 11, 2024
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Waiting for the release of Synapse v1.15 to merge.

@regisss regisss changed the base branch from main to synapse_1.15 March 22, 2024 21:52
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 22, 2024

@HolyFalafel I guess we can close this PR as this change was present in #765 too?

@regisss regisss closed this Mar 22, 2024
@HolyFalafel
Copy link
Copy Markdown
Contributor Author

@HolyFalafel I guess we can close this PR as this change was present in #765 too?

Right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors synapse 1.15

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants