Skip to content

Conversation

@jixiongdeng
Copy link
Contributor

@jixiongdeng jixiongdeng commented Nov 25, 2025

Problem

As we discussed in this PR, I separate disable_qkv_fusion option as a new PR.

The current model builder ties q_proj, k_proj and v_proj together as qkv_proj by default, which is not controllable by upstream quantization choice.

Solution

Added disable_qkv_fusion in extra_options to override attention_attrs["use_packed_matmul"].

Running examples:

untied qvk_projs for 4 bit rtn on Llama-3.2-3B-Instruct:

python src/python/py/models/builder.py -m meta-llama/Llama-3.2-3B-Instruct -p int4 -e cuda -o export_model/llama32_3bi_rtn_u4_untied_qkv --extra_options int4_algo_config=rtn disable_qkv_fusion=true

Changes

Modified Files

  • src/python/py/models/builder.py
  • src/python/py/models/builders/base.py
  • src/python/py/models/README.MD

Key Modifications

  1. Added disable_qkv_fusion as a part of assigning logic of attention_attrs["use_packed_matmul"].
  2. Added documents.

@jixiongdeng
Copy link
Contributor Author

Rebased main & resolved conflict. Thank you! @kunal-vaishnavi

@tianleiwu tianleiwu merged commit 4f37298 into main Nov 26, 2025
15 checks passed
@tianleiwu tianleiwu deleted the jd/disable_fuse_qkv branch November 26, 2025 21:55
kunal-vaishnavi pushed a commit that referenced this pull request Dec 5, 2025
…hoice (#1893)

## Problem
As we discussed in [this
PR](#1885), I
separate `disable_qkv_fusion` option as a new PR.

The current model builder ties q_proj, k_proj and v_proj together as
qkv_proj by default, which is not controllable by upstream quantization
choice.

## Solution

Added `disable_qkv_fusion` in extra_options to override
`attention_attrs["use_packed_matmul"]`.

Running examples:

**untied qvk_projs for 4 bit rtn on Llama-3.2-3B-Instruct**:
```
python src/python/py/models/builder.py -m meta-llama/Llama-3.2-3B-Instruct -p int4 -e cuda -o export_model/llama32_3bi_rtn_u4_untied_qkv --extra_options int4_algo_config=rtn disable_qkv_fusion=true
```

## Changes

### Modified Files
- `src/python/py/models/builder.py`
- `src/python/py/models/builders/base.py`
- `src/python/py/models/README.MD`

### Key Modifications
1. Added `disable_qkv_fusion` as a part of assigning logic of
`attention_attrs["use_packed_matmul"]`.
2. Added documents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants