We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
- CPU architecture: x86_64 - GPU properties - GPU name: NVIDIA A100 - GPU memory size: 40G - Libraries - TensorRT-LLM branch or tag: v0.8.0 - TensorRT-LLM commit: 5955b8afbad - Container used: yes, `make -C docker release_build` on v0.8.0 branch - NVIDIA driver version: 525.89.02 - OS: Ubuntu 22.04
@Tracin
examples
pip install transformers==4.33.0 # fix: https://huggingface.co/THUDM/chatglm2-6b/discussions/87
tp_size=1
python examples/chatglm/convert_checkpoint.py --model_dir ${hf_model_dir} --tp_size ${tp_size} --dtype float16 --use_weight_only --weight_only_precision int8 --int8_kv_cache --workers ${tp_size} --output_dir ${quant_out_dir}/int8-kv8/${tp_size}-gpu/
trtllm-build --checkpoint_dir ${quant_out_dir}/int8-kv8/${tp_size}-gpu/ --output_dir ${trt_out_dir}/int8-kv8/${tp_size}-gpu/ --gemm_plugin float16 --gpt_attention_plugin float16 --context_fmha_fp32_acc enable --remove_input_padding enable --max_batch_size 128 --max_input_len 2048 --max_output_len 2048
build success
[TensorRT-LLM] TensorRT-LLM version: 0.8.00.8.0 Inferring chatglm version from path... Chatglm version: chatglm2 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:09<00:00, 1.36s/it] Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:11<00:00, 5.42it/s] Weights loaded. Total time: 00:06:07 Total time of converting checkpoints: 00:07:18 [TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/06/2024-03:40:23] [TRT-LLM] [I] Set bert_attention_plugin to float16. [03/06/2024-03:40:23] [TRT-LLM] [I] Set gpt_attention_plugin to float16. [03/06/2024-03:40:23] [TRT-LLM] [I] Set gemm_plugin to float16. [03/06/2024-03:40:23] [TRT-LLM] [I] Set lookup_plugin to None. [03/06/2024-03:40:23] [TRT-LLM] [I] Set lora_plugin to None. [03/06/2024-03:40:23] [TRT-LLM] [I] Set context_fmha to True. [03/06/2024-03:40:23] [TRT-LLM] [I] Set context_fmha_fp32_acc to True. [03/06/2024-03:40:23] [TRT-LLM] [I] Set paged_kv_cache to True. [03/06/2024-03:40:23] [TRT-LLM] [I] Set remove_input_padding to True. [03/06/2024-03:40:23] [TRT-LLM] [I] Set use_custom_all_reduce to True. [03/06/2024-03:40:23] [TRT-LLM] [I] Set multi_block_mode to False. [03/06/2024-03:40:23] [TRT-LLM] [I] Set enable_xqa to True. [03/06/2024-03:40:23] [TRT-LLM] [I] Set attention_qk_half_accumulation to False. [03/06/2024-03:40:23] [TRT-LLM] [I] Set tokens_per_block to 128. [03/06/2024-03:40:23] [TRT-LLM] [I] Set use_paged_context_fmha to False. [03/06/2024-03:40:23] [TRT-LLM] [I] Set use_context_fmha_for_generation to False. [03/06/2024-03:40:23] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_sizemax_input_len. It may not be optimal to set max_num_tokens=max_batch_sizemax_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads. Traceback (most recent call last): File "/usr/local/bin/trtllm-build", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 497, in main parallel_build(source, build_config, args.output_dir, workers, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 420, in parallel_build passed = build_and_save(rank, rank % workers, ckpt_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 392, in build_and_save engine = build(build_config, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 272, in build model.load(weights) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 338, in load raise RuntimeError(err_msg) RuntimeError: Provided tensor names are different from those expected by the engine. Provided but not expected tensors: {'transformer.layers.2.attention.dense.act_scale', 'transformer.layers.25.attention.quantization_scaling_factor', 'transformer.layers.22.mlp.quantization_scaling_factor', 'transformer.layers.26.mlp.fc.act_scale', 'transformer.layers.24.attention.dense.act_scale', 'transformer.layers.18.mlp.fc.act_scale', 'transformer.layers.6.mlp.fc.act_scale', 'transformer.layers.19.mlp.fc.act_scale', 'transformer.layers.0.input_layernorm.scale_to_int', 'transformer.layers.4.attention.dense.act_scale', 'transformer.layers.21.mlp.fc.act_scale', 'transformer.layers.0.mlp.proj.act_scale', 'transformer.layers.16.post_layernorm.scale_to_int', 'transformer.layers.24.attention.quantization_scaling_factor', 'transformer.layers.17.attention.quantization_scaling_factor', 'transformer.layers.10.input_layernorm.scale_to_int', 'transformer.layers.0.mlp.fc.act_scale', 'transformer.layers.19.attention.quantization_scaling_factor', 'transformer.layers.15.mlp.fc.act_scale', 'transformer.layers.6.mlp.proj.act_scale', 'transformer.layers.9.attention.qkv.act_scale', 'transformer.layers.10.attention.dense.act_scale', 'transformer.layers.23.mlp.quantization_scaling_factor', 'transformer.layers.4.mlp.quantization_scaling_factor', 'transformer.layers.17.mlp.fc.act_scale', 'transformer.layers.21.input_layernorm.scale_to_int', 'transformer.layers.21.attention.dense.act_scale', 'transformer.layers.9.mlp.proj.act_scale', 'transformer.layers.1.mlp.proj.act_scale', 'transformer.layers.13.mlp.quantization_scaling_factor', 'transformer.layers.9.attention.dense.act_scale', 'transformer.layers.12.input_layernorm.scale_to_int', 'transformer.layers.21.attention.quantization_scaling_factor', 'transformer.layers.23.attention.quantization_scaling_factor', 'transformer.layers.14.mlp.quantization_scaling_factor', 'transformer.layers.16.input_layernorm.scale_to_int', 'transformer.layers.12.attention.quantization_scaling_factor', 'transformer.layers.11.attention.qkv.act_scale', 'transformer.layers.11.input_layernorm.scale_to_int', 'transformer.layers.26.post_layernorm.scale_to_int', 'transformer.layers.4.mlp.proj.act_scale', 'transformer.layers.5.mlp.fc.act_scale', 'transformer.layers.23.mlp.fc.act_scale', 'transformer.layers.26.attention.qkv.act_scale', 'transformer.layers.0.attention.quantization_scaling_factor', 'transformer.layers.2.attention.quantization_scaling_factor', 'transformer.layers.25.input_layernorm.scale_to_int', 'transformer.layers.19.input_layernorm.scale_to_int', 'transformer.layers.26.attention.quantization_scaling_factor', 'transformer.layers.21.mlp.proj.act_scale', 'transformer.layers.2.input_layernorm.scale_to_int', 'transformer.layers.25.mlp.proj.act_scale', 'transformer.layers.23.mlp.proj.act_scale', 'transformer.layers.15.attention.qkv.act_scale', 'transformer.layers.16.mlp.proj.act_scale', 'transformer.layers.8.mlp.proj.act_scale', 'transformer.layers.17.input_layernorm.scale_to_int', 'transformer.layers.1.attention.quantization_scaling_factor', 'transformer.layers.16.mlp.fc.act_scale', 'transformer.layers.1.attention.qkv.act_scale', 'transformer.layers.5.input_layernorm.scale_to_int', 'transformer.layers.4.mlp.fc.act_scale', 'transformer.layers.10.attention.quantization_scaling_factor', 'transformer.layers.9.mlp.quantization_scaling_factor', 'transformer.layers.22.mlp.proj.act_scale', 'transformer.layers.8.attention.dense.act_scale', 'transformer.layers.22.input_layernorm.scale_to_int', 'transformer.layers.27.attention.dense.act_scale', 'transformer.layers.27.attention.qkv.act_scale', 'transformer.layers.3.input_layernorm.scale_to_int', 'transformer.layers.13.mlp.proj.act_scale', 'transformer.layers.24.mlp.proj.act_scale', 'transformer.layers.15.mlp.proj.act_scale', 'transformer.layers.22.post_layernorm.scale_to_int', 'transformer.layers.6.input_layernorm.scale_to_int', 'transformer.layers.19.mlp.quantization_scaling_factor', 'transformer.layers.8.mlp.quantization_scaling_factor', 'transformer.layers.13.post_layernorm.scale_to_int', 'transformer.layers.20.post_layernorm.scale_to_int', 'transformer.layers.11.attention.dense.act_scale', 'transformer.layers.1.mlp.quantization_scaling_factor', 'transformer.layers.20.attention.qkv.act_scale', 'transformer.layers.23.attention.dense.act_scale', 'transformer.layers.18.attention.dense.act_scale', 'transformer.layers.7.attention.quantization_scaling_factor', 'transformer.layers.22.attention.qkv.act_scale', 'transformer.layers.7.attention.qkv.act_scale', 'transformer.layers.26.mlp.quantization_scaling_factor', 'transformer.layers.22.mlp.fc.act_scale', 'transformer.layers.11.post_layernorm.scale_to_int', 'transformer.layers.2.post_layernorm.scale_to_int', 'transformer.layers.3.attention.qkv.act_scale', 'transformer.layers.17.post_layernorm.scale_to_int', 'transformer.layers.24.input_layernorm.scale_to_int', 'transformer.layers.10.mlp.quantization_scaling_factor', 'transformer.layers.3.post_layernorm.scale_to_int', 'transformer.layers.3.mlp.fc.act_scale', 'transformer.layers.12.mlp.proj.act_scale', 'transformer.layers.8.mlp.fc.act_scale', 'transformer.layers.4.attention.quantization_scaling_factor', 'transformer.layers.6.mlp.quantization_scaling_factor', 'transformer.layers.6.attention.quantization_scaling_factor', 'transformer.layers.27.mlp.proj.act_scale', 'transformer.layers.5.mlp.proj.act_scale', 'transformer.layers.12.mlp.fc.act_scale', 'transformer.layers.15.input_layernorm.scale_to_int', 'transformer.layers.24.post_layernorm.scale_to_int', 'transformer.layers.5.post_layernorm.scale_to_int', 'transformer.layers.23.post_layernorm.scale_to_int', 'transformer.layers.3.attention.dense.act_scale', 'transformer.layers.20.input_layernorm.scale_to_int', 'transformer.layers.7.mlp.fc.act_scale', 'transformer.layers.17.mlp.proj.act_scale', 'transformer.layers.20.attention.quantization_scaling_factor', 'transformer.layers.27.mlp.quantization_scaling_factor', 'transformer.layers.14.attention.quantization_scaling_factor', 'transformer.layers.11.attention.quantization_scaling_factor', 'transformer.layers.23.attention.qkv.act_scale', 'transformer.layers.17.attention.qkv.act_scale', 'transformer.layers.7.post_layernorm.scale_to_int', 'transformer.layers.9.post_layernorm.scale_to_int', 'transformer.layers.9.input_layernorm.scale_to_int', 'transformer.layers.14.mlp.fc.act_scale', 'transformer.layers.14.attention.qkv.act_scale', 'transformer.layers.3.mlp.quantization_scaling_factor', 'transformer.layers.0.mlp.quantization_scaling_factor', 'transformer.layers.18.post_layernorm.scale_to_int', 'transformer.layers.10.mlp.proj.act_scale', 'transformer.layers.7.mlp.quantization_scaling_factor', 'transformer.layers.13.attention.dense.act_scale', 'transformer.layers.17.mlp.quantization_scaling_factor', 'transformer.layers.27.attention.quantization_scaling_factor', 'transformer.layers.17.attention.dense.act_scale', 'transformer.layers.15.post_layernorm.scale_to_int', 'transformer.layers.18.attention.quantization_scaling_factor', 'transformer.layers.14.attention.dense.act_scale', 'transformer.layers.19.attention.qkv.act_scale', 'transformer.layers.8.input_layernorm.scale_to_int', 'transformer.layers.24.attention.qkv.act_scale', 'transformer.layers.19.attention.dense.act_scale', 'transformer.layers.2.mlp.quantization_scaling_factor', 'transformer.layers.22.attention.dense.act_scale', 'transformer.layers.15.attention.dense.act_scale', 'transformer.layers.12.attention.qkv.act_scale', 'transformer.layers.25.mlp.fc.act_scale', 'transformer.layers.12.post_layernorm.scale_to_int', 'transformer.layers.26.attention.dense.act_scale', 'transformer.layers.13.input_layernorm.scale_to_int', 'transformer.layers.1.input_layernorm.scale_to_int', 'transformer.layers.10.mlp.fc.act_scale', 'transformer.layers.3.mlp.proj.act_scale', 'transformer.layers.11.mlp.proj.act_scale', 'transformer.layers.24.mlp.fc.act_scale', 'transformer.layers.23.input_layernorm.scale_to_int', 'transformer.layers.12.mlp.quantization_scaling_factor', 'transformer.layers.2.mlp.fc.act_scale', 'transformer.layers.4.attention.qkv.act_scale', 'transformer.layers.6.attention.qkv.act_scale', 'transformer.layers.9.mlp.fc.act_scale', 'transformer.layers.26.input_layernorm.scale_to_int', 'transformer.layers.19.mlp.proj.act_scale', 'transformer.layers.18.mlp.quantization_scaling_factor', 'transformer.layers.25.attention.qkv.act_scale', 'transformer.layers.21.post_layernorm.scale_to_int', 'transformer.layers.2.attention.qkv.act_scale', 'transformer.layers.15.mlp.quantization_scaling_factor', 'transformer.layers.7.input_layernorm.scale_to_int', 'transformer.layers.6.post_layernorm.scale_to_int', 'transformer.layers.18.input_layernorm.scale_to_int', 'transformer.layers.13.mlp.fc.act_scale', 'transformer.layers.14.mlp.proj.act_scale', 'transformer.layers.1.attention.dense.act_scale', 'transformer.layers.13.attention.quantization_scaling_factor', 'transformer.layers.10.attention.qkv.act_scale', 'transformer.layers.1.mlp.fc.act_scale', 'transformer.layers.7.attention.dense.act_scale', 'transformer.layers.22.attention.quantization_scaling_factor', 'transformer.layers.14.post_layernorm.scale_to_int', 'transformer.layers.6.attention.dense.act_scale', 'transformer.layers.24.mlp.quantization_scaling_factor', 'transformer.layers.9.attention.quantization_scaling_factor', 'transformer.layers.2.mlp.proj.act_scale', 'transformer.layers.13.attention.qkv.act_scale', 'transformer.layers.16.attention.dense.act_scale', 'transformer.layers.5.attention.qkv.act_scale', 'transformer.layers.5.attention.quantization_scaling_factor', 'transformer.layers.11.mlp.fc.act_scale', 'transformer.layers.3.attention.quantization_scaling_factor', 'transformer.layers.27.mlp.fc.act_scale', 'transformer.layers.20.attention.dense.act_scale', 'transformer.layers.21.mlp.quantization_scaling_factor', 'transformer.layers.25.attention.dense.act_scale', 'transformer.layers.8.post_layernorm.scale_to_int', 'transformer.layers.8.attention.qkv.act_scale', 'transformer.layers.15.attention.quantization_scaling_factor', 'transformer.layers.27.post_layernorm.scale_to_int', 'transformer.layers.7.mlp.proj.act_scale', 'transformer.layers.4.input_layernorm.scale_to_int', 'transformer.layers.0.post_layernorm.scale_to_int', 'transformer.layers.16.mlp.quantization_scaling_factor', 'transformer.layers.1.post_layernorm.scale_to_int', 'transformer.layers.20.mlp.quantization_scaling_factor', 'transformer.layers.16.attention.qkv.act_scale', 'transformer.layers.5.attention.dense.act_scale', 'transformer.layers.20.mlp.proj.act_scale', 'transformer.layers.21.attention.qkv.act_scale', 'transformer.layers.11.mlp.quantization_scaling_factor', 'transformer.layers.0.attention.dense.act_scale', 'transformer.layers.25.mlp.quantization_scaling_factor', 'transformer.layers.18.mlp.proj.act_scale', 'transformer.layers.26.mlp.proj.act_scale', 'transformer.layers.5.mlp.quantization_scaling_factor', 'transformer.layers.20.mlp.fc.act_scale', 'transformer.layers.18.attention.qkv.act_scale', 'transformer.layers.16.attention.quantization_scaling_factor', 'transformer.layers.12.attention.dense.act_scale', 'transformer.layers.25.post_layernorm.scale_to_int', 'transformer.layers.8.attention.quantization_scaling_factor', 'transformer.layers.0.attention.qkv.act_scale', 'transformer.layers.10.post_layernorm.scale_to_int', 'transformer.layers.14.input_layernorm.scale_to_int', 'transformer.layers.19.post_layernorm.scale_to_int', 'transformer.layers.4.post_layernorm.scale_to_int', 'transformer.layers.27.input_layernorm.scale_to_int'}
none
The text was updated successfully, but these errors were encountered:
Could you share the config.json attached to checkpoint?
Sorry, something went wrong.
hello, @Tracin , this is the config.json:
{ "architecture": "ChatGLMForCausalLM", "dtype": "float16", "logits_dtype": "float32", "num_hidden_layers": 28, "num_attention_heads": 32, "num_key_value_heads": 2, "hidden_size": 4096, "intermediate_size": 13696, "norm_epsilon": 1e-05, "vocab_size": 65024, "position_embedding_type": "rope_gptj", "max_position_embeddings": 32768, "hidden_act": "swiglu", "use_parallel_embedding": false, "embedding_sharding_dim": 0, "share_embedding_table": false, "quantization": { "quant_algo": "W8A16", "kv_cache_quant_algo": "INT8", "sq_use_plugin": true }, "mapping": { "world_size": 1, "tp_size": 1, "pp_size": 1 }, "chatglm_version": "chatglm2", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": false, "apply_residual_connection_post_layernorm": false, "rmsnorm": true, "rope_ratio": 1.0 }
@NaNAGISaSA We will fix this in next update. You can build SQ + int8kv or weight-only with FP16 kv before that.
No branches or pull requests
System Info
Who can help?
@Tracin
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
pip install transformers==4.33.0 # fix: https://huggingface.co/THUDM/chatglm2-6b/discussions/87
tp_size=1
python examples/chatglm/convert_checkpoint.py --model_dir ${hf_model_dir}
--tp_size ${tp_size}
--dtype float16
--use_weight_only
--weight_only_precision int8
--int8_kv_cache
--workers ${tp_size}
--output_dir ${quant_out_dir}/int8-kv8/${tp_size}-gpu/
trtllm-build --checkpoint_dir ${quant_out_dir}/int8-kv8/${tp_size}-gpu/
--output_dir ${trt_out_dir}/int8-kv8/${tp_size}-gpu/
--gemm_plugin float16
--gpt_attention_plugin float16
--context_fmha_fp32_acc enable
--remove_input_padding enable
--max_batch_size 128
--max_input_len 2048
--max_output_len 2048
Expected behavior
build success
actual behavior
[TensorRT-LLM] TensorRT-LLM version: 0.8.00.8.0
Inferring chatglm version from path...
Chatglm version: chatglm2
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:09<00:00, 1.36s/it]
Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:11<00:00, 5.42it/s]
Weights loaded. Total time: 00:06:07
Total time of converting checkpoints: 00:07:18
[TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/06/2024-03:40:23] [TRT-LLM] [I] Set bert_attention_plugin to float16.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set gpt_attention_plugin to float16.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set gemm_plugin to float16.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set lookup_plugin to None.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set lora_plugin to None.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set context_fmha to True.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set context_fmha_fp32_acc to True.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set paged_kv_cache to True.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set remove_input_padding to True.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set use_custom_all_reduce to True.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set multi_block_mode to False.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set enable_xqa to True.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set attention_qk_half_accumulation to False.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set tokens_per_block to 128.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[03/06/2024-03:40:23] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
[03/06/2024-03:40:23] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_sizemax_input_len.
It may not be optimal to set max_num_tokens=max_batch_sizemax_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads.
Traceback (most recent call last):
File "/usr/local/bin/trtllm-build", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 497, in main
parallel_build(source, build_config, args.output_dir, workers,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 420, in parallel_build
passed = build_and_save(rank, rank % workers, ckpt_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 392, in build_and_save
engine = build(build_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 272, in build
model.load(weights)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 338, in load
raise RuntimeError(err_msg)
RuntimeError: Provided tensor names are different from those expected by the engine.
Provided but not expected tensors: {'transformer.layers.2.attention.dense.act_scale', 'transformer.layers.25.attention.quantization_scaling_factor', 'transformer.layers.22.mlp.quantization_scaling_factor', 'transformer.layers.26.mlp.fc.act_scale', 'transformer.layers.24.attention.dense.act_scale', 'transformer.layers.18.mlp.fc.act_scale', 'transformer.layers.6.mlp.fc.act_scale', 'transformer.layers.19.mlp.fc.act_scale', 'transformer.layers.0.input_layernorm.scale_to_int', 'transformer.layers.4.attention.dense.act_scale', 'transformer.layers.21.mlp.fc.act_scale', 'transformer.layers.0.mlp.proj.act_scale', 'transformer.layers.16.post_layernorm.scale_to_int', 'transformer.layers.24.attention.quantization_scaling_factor', 'transformer.layers.17.attention.quantization_scaling_factor', 'transformer.layers.10.input_layernorm.scale_to_int', 'transformer.layers.0.mlp.fc.act_scale', 'transformer.layers.19.attention.quantization_scaling_factor', 'transformer.layers.15.mlp.fc.act_scale', 'transformer.layers.6.mlp.proj.act_scale', 'transformer.layers.9.attention.qkv.act_scale', 'transformer.layers.10.attention.dense.act_scale', 'transformer.layers.23.mlp.quantization_scaling_factor', 'transformer.layers.4.mlp.quantization_scaling_factor', 'transformer.layers.17.mlp.fc.act_scale', 'transformer.layers.21.input_layernorm.scale_to_int', 'transformer.layers.21.attention.dense.act_scale', 'transformer.layers.9.mlp.proj.act_scale', 'transformer.layers.1.mlp.proj.act_scale', 'transformer.layers.13.mlp.quantization_scaling_factor', 'transformer.layers.9.attention.dense.act_scale', 'transformer.layers.12.input_layernorm.scale_to_int', 'transformer.layers.21.attention.quantization_scaling_factor', 'transformer.layers.23.attention.quantization_scaling_factor', 'transformer.layers.14.mlp.quantization_scaling_factor', 'transformer.layers.16.input_layernorm.scale_to_int', 'transformer.layers.12.attention.quantization_scaling_factor', 'transformer.layers.11.attention.qkv.act_scale', 'transformer.layers.11.input_layernorm.scale_to_int', 'transformer.layers.26.post_layernorm.scale_to_int', 'transformer.layers.4.mlp.proj.act_scale', 'transformer.layers.5.mlp.fc.act_scale', 'transformer.layers.23.mlp.fc.act_scale', 'transformer.layers.26.attention.qkv.act_scale', 'transformer.layers.0.attention.quantization_scaling_factor', 'transformer.layers.2.attention.quantization_scaling_factor', 'transformer.layers.25.input_layernorm.scale_to_int', 'transformer.layers.19.input_layernorm.scale_to_int', 'transformer.layers.26.attention.quantization_scaling_factor', 'transformer.layers.21.mlp.proj.act_scale', 'transformer.layers.2.input_layernorm.scale_to_int', 'transformer.layers.25.mlp.proj.act_scale', 'transformer.layers.23.mlp.proj.act_scale', 'transformer.layers.15.attention.qkv.act_scale', 'transformer.layers.16.mlp.proj.act_scale', 'transformer.layers.8.mlp.proj.act_scale', 'transformer.layers.17.input_layernorm.scale_to_int', 'transformer.layers.1.attention.quantization_scaling_factor', 'transformer.layers.16.mlp.fc.act_scale', 'transformer.layers.1.attention.qkv.act_scale', 'transformer.layers.5.input_layernorm.scale_to_int', 'transformer.layers.4.mlp.fc.act_scale', 'transformer.layers.10.attention.quantization_scaling_factor', 'transformer.layers.9.mlp.quantization_scaling_factor', 'transformer.layers.22.mlp.proj.act_scale', 'transformer.layers.8.attention.dense.act_scale', 'transformer.layers.22.input_layernorm.scale_to_int', 'transformer.layers.27.attention.dense.act_scale', 'transformer.layers.27.attention.qkv.act_scale', 'transformer.layers.3.input_layernorm.scale_to_int', 'transformer.layers.13.mlp.proj.act_scale', 'transformer.layers.24.mlp.proj.act_scale', 'transformer.layers.15.mlp.proj.act_scale', 'transformer.layers.22.post_layernorm.scale_to_int', 'transformer.layers.6.input_layernorm.scale_to_int', 'transformer.layers.19.mlp.quantization_scaling_factor', 'transformer.layers.8.mlp.quantization_scaling_factor', 'transformer.layers.13.post_layernorm.scale_to_int', 'transformer.layers.20.post_layernorm.scale_to_int', 'transformer.layers.11.attention.dense.act_scale', 'transformer.layers.1.mlp.quantization_scaling_factor', 'transformer.layers.20.attention.qkv.act_scale', 'transformer.layers.23.attention.dense.act_scale', 'transformer.layers.18.attention.dense.act_scale', 'transformer.layers.7.attention.quantization_scaling_factor', 'transformer.layers.22.attention.qkv.act_scale', 'transformer.layers.7.attention.qkv.act_scale', 'transformer.layers.26.mlp.quantization_scaling_factor', 'transformer.layers.22.mlp.fc.act_scale', 'transformer.layers.11.post_layernorm.scale_to_int', 'transformer.layers.2.post_layernorm.scale_to_int', 'transformer.layers.3.attention.qkv.act_scale', 'transformer.layers.17.post_layernorm.scale_to_int', 'transformer.layers.24.input_layernorm.scale_to_int', 'transformer.layers.10.mlp.quantization_scaling_factor', 'transformer.layers.3.post_layernorm.scale_to_int', 'transformer.layers.3.mlp.fc.act_scale', 'transformer.layers.12.mlp.proj.act_scale', 'transformer.layers.8.mlp.fc.act_scale', 'transformer.layers.4.attention.quantization_scaling_factor', 'transformer.layers.6.mlp.quantization_scaling_factor', 'transformer.layers.6.attention.quantization_scaling_factor', 'transformer.layers.27.mlp.proj.act_scale', 'transformer.layers.5.mlp.proj.act_scale', 'transformer.layers.12.mlp.fc.act_scale', 'transformer.layers.15.input_layernorm.scale_to_int', 'transformer.layers.24.post_layernorm.scale_to_int', 'transformer.layers.5.post_layernorm.scale_to_int', 'transformer.layers.23.post_layernorm.scale_to_int', 'transformer.layers.3.attention.dense.act_scale', 'transformer.layers.20.input_layernorm.scale_to_int', 'transformer.layers.7.mlp.fc.act_scale', 'transformer.layers.17.mlp.proj.act_scale', 'transformer.layers.20.attention.quantization_scaling_factor', 'transformer.layers.27.mlp.quantization_scaling_factor', 'transformer.layers.14.attention.quantization_scaling_factor', 'transformer.layers.11.attention.quantization_scaling_factor', 'transformer.layers.23.attention.qkv.act_scale', 'transformer.layers.17.attention.qkv.act_scale', 'transformer.layers.7.post_layernorm.scale_to_int', 'transformer.layers.9.post_layernorm.scale_to_int', 'transformer.layers.9.input_layernorm.scale_to_int', 'transformer.layers.14.mlp.fc.act_scale', 'transformer.layers.14.attention.qkv.act_scale', 'transformer.layers.3.mlp.quantization_scaling_factor', 'transformer.layers.0.mlp.quantization_scaling_factor', 'transformer.layers.18.post_layernorm.scale_to_int', 'transformer.layers.10.mlp.proj.act_scale', 'transformer.layers.7.mlp.quantization_scaling_factor', 'transformer.layers.13.attention.dense.act_scale', 'transformer.layers.17.mlp.quantization_scaling_factor', 'transformer.layers.27.attention.quantization_scaling_factor', 'transformer.layers.17.attention.dense.act_scale', 'transformer.layers.15.post_layernorm.scale_to_int', 'transformer.layers.18.attention.quantization_scaling_factor', 'transformer.layers.14.attention.dense.act_scale', 'transformer.layers.19.attention.qkv.act_scale', 'transformer.layers.8.input_layernorm.scale_to_int', 'transformer.layers.24.attention.qkv.act_scale', 'transformer.layers.19.attention.dense.act_scale', 'transformer.layers.2.mlp.quantization_scaling_factor', 'transformer.layers.22.attention.dense.act_scale', 'transformer.layers.15.attention.dense.act_scale', 'transformer.layers.12.attention.qkv.act_scale', 'transformer.layers.25.mlp.fc.act_scale', 'transformer.layers.12.post_layernorm.scale_to_int', 'transformer.layers.26.attention.dense.act_scale', 'transformer.layers.13.input_layernorm.scale_to_int', 'transformer.layers.1.input_layernorm.scale_to_int', 'transformer.layers.10.mlp.fc.act_scale', 'transformer.layers.3.mlp.proj.act_scale', 'transformer.layers.11.mlp.proj.act_scale', 'transformer.layers.24.mlp.fc.act_scale', 'transformer.layers.23.input_layernorm.scale_to_int', 'transformer.layers.12.mlp.quantization_scaling_factor', 'transformer.layers.2.mlp.fc.act_scale', 'transformer.layers.4.attention.qkv.act_scale', 'transformer.layers.6.attention.qkv.act_scale', 'transformer.layers.9.mlp.fc.act_scale', 'transformer.layers.26.input_layernorm.scale_to_int', 'transformer.layers.19.mlp.proj.act_scale', 'transformer.layers.18.mlp.quantization_scaling_factor', 'transformer.layers.25.attention.qkv.act_scale', 'transformer.layers.21.post_layernorm.scale_to_int', 'transformer.layers.2.attention.qkv.act_scale', 'transformer.layers.15.mlp.quantization_scaling_factor', 'transformer.layers.7.input_layernorm.scale_to_int', 'transformer.layers.6.post_layernorm.scale_to_int', 'transformer.layers.18.input_layernorm.scale_to_int', 'transformer.layers.13.mlp.fc.act_scale', 'transformer.layers.14.mlp.proj.act_scale', 'transformer.layers.1.attention.dense.act_scale', 'transformer.layers.13.attention.quantization_scaling_factor', 'transformer.layers.10.attention.qkv.act_scale', 'transformer.layers.1.mlp.fc.act_scale', 'transformer.layers.7.attention.dense.act_scale', 'transformer.layers.22.attention.quantization_scaling_factor', 'transformer.layers.14.post_layernorm.scale_to_int', 'transformer.layers.6.attention.dense.act_scale', 'transformer.layers.24.mlp.quantization_scaling_factor', 'transformer.layers.9.attention.quantization_scaling_factor', 'transformer.layers.2.mlp.proj.act_scale', 'transformer.layers.13.attention.qkv.act_scale', 'transformer.layers.16.attention.dense.act_scale', 'transformer.layers.5.attention.qkv.act_scale', 'transformer.layers.5.attention.quantization_scaling_factor', 'transformer.layers.11.mlp.fc.act_scale', 'transformer.layers.3.attention.quantization_scaling_factor', 'transformer.layers.27.mlp.fc.act_scale', 'transformer.layers.20.attention.dense.act_scale', 'transformer.layers.21.mlp.quantization_scaling_factor', 'transformer.layers.25.attention.dense.act_scale', 'transformer.layers.8.post_layernorm.scale_to_int', 'transformer.layers.8.attention.qkv.act_scale', 'transformer.layers.15.attention.quantization_scaling_factor', 'transformer.layers.27.post_layernorm.scale_to_int', 'transformer.layers.7.mlp.proj.act_scale', 'transformer.layers.4.input_layernorm.scale_to_int', 'transformer.layers.0.post_layernorm.scale_to_int', 'transformer.layers.16.mlp.quantization_scaling_factor', 'transformer.layers.1.post_layernorm.scale_to_int', 'transformer.layers.20.mlp.quantization_scaling_factor', 'transformer.layers.16.attention.qkv.act_scale', 'transformer.layers.5.attention.dense.act_scale', 'transformer.layers.20.mlp.proj.act_scale', 'transformer.layers.21.attention.qkv.act_scale', 'transformer.layers.11.mlp.quantization_scaling_factor', 'transformer.layers.0.attention.dense.act_scale', 'transformer.layers.25.mlp.quantization_scaling_factor', 'transformer.layers.18.mlp.proj.act_scale', 'transformer.layers.26.mlp.proj.act_scale', 'transformer.layers.5.mlp.quantization_scaling_factor', 'transformer.layers.20.mlp.fc.act_scale', 'transformer.layers.18.attention.qkv.act_scale', 'transformer.layers.16.attention.quantization_scaling_factor', 'transformer.layers.12.attention.dense.act_scale', 'transformer.layers.25.post_layernorm.scale_to_int', 'transformer.layers.8.attention.quantization_scaling_factor', 'transformer.layers.0.attention.qkv.act_scale', 'transformer.layers.10.post_layernorm.scale_to_int', 'transformer.layers.14.input_layernorm.scale_to_int', 'transformer.layers.19.post_layernorm.scale_to_int', 'transformer.layers.4.post_layernorm.scale_to_int', 'transformer.layers.27.input_layernorm.scale_to_int'}
additional notes
none
The text was updated successfully, but these errors were encountered: