-
Notifications
You must be signed in to change notification settings - Fork 272
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Using this recipe:
quant_stage:
quant_modifiers:
QuantizationModifier:
ignore: ["lm_head", "re:.*layers.0..*", "re:.*layers.79..*"]
config_groups:
group_0:
weights:
num_bits: 8
type: float
strategy: channel
dynamic: false
symmetric: true
input_activations:
num_bits: 8
type: float
strategy: token
dynamic: true
symmetric: true
targets: ["Linear"]
kv_cache_scheme:
num_bits: 8
type: float
strategy: tensor
dynamic: false
symmetric: true
Results in the error below. If I remove the kv_cache_scheme, it works.
The model I am quantizing is Llama 3.3 70b.
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Include all relevant environment information:
- OS [e.g. Ubuntu 20.04]:
- Python version [e.g. 3.7]:
- LLM Compressor version or commit hash [e.g. 0.1.0,
f7245c8]: 0.4.1 - ML framework version(s) [e.g. torch 2.3.1]:
- Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
- Other relevant environment information [e.g. hardware, CUDA version]: 2xH100SXM
To Reproduce
Exact steps to reproduce the behavior:
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
File "/root/venv/lib/python3.12/site-packages/llmcompressor/modifiers/quantization/quantization/base.py", line 112, in on_initialize
self._calibrate_if_possible(module)
File "/root/venv/lib/python3.12/site-packages/llmcompressor/modifiers/quantization/quantization/base.py", line 266, in _calibrate_if_possible
self._calibrate(module)
File "/root/venv/lib/python3.12/site-packages/llmcompressor/modifiers/quantization/quantization/base.py", line 314, in _calibrate
run_calibration_forward(
File "/root/venv/lib/python3.12/site-packages/llmcompressor/modifiers/utils/pytorch_helpers.py", line 82, in run_calibration_forward
forward_fn(batch, module=model)
File "/root/venv/lib/python3.12/site-packages/llmcompressor/pytorch/utils/helpers.py", line 394, in tensors_module_forward
return module(**tensors)
^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 853, in forward
outputs = self.model(
^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 601, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 343, in forward
hidden_states, self_attn_weights = self.self_attn(
^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1845, in _call_impl
return inner()
^^^^^^^
File "/root/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1793, in inner
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 287, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/llmcompressor/modifiers/quantization/cache.py", line 93, in update
q_key_states = self._quantize(
^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.12/site-packages/llmcompressor/modifiers/quantization/cache.py", line 144, in _quantize
observer = self.k_observers[layer_idx]
~~~~~~~~~~~~~~~~^^^^^^^^^^^
IndexError: list index out of range
Additional context
Add any other context about the problem here. Also include any relevant files.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working