-
-
Couldn't load subscription status.
- Fork 323
Closed
Description
Seems to be caused by:
- Rope type name changed to longrope
- Scaling factor list changed
Useful references:
ggml-org/llama.cpp#8262
ggml-org/llama.cpp#6849 (comment)
Conversion log:
------------------------------------------------
| Measured: model.layers.31 (Attention) |
| Duration: 7.80 seconds |
| Completed step: 63/67 |
| Avg time / step (rolling): 9.28 seconds |
| Estimated remaining time: 0min 37sec |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
-- Layer: model.layers.31 (MLP)
-- model.layers.31.mlp.gate_proj 0.05:3b_64g/0.95:2b_64g s4 2.13 bpw
-- model.layers.31.mlp.gate_proj 0.1:3b_64g/0.9:2b_64g s4 2.17 bpw
-- model.layers.31.mlp.gate_proj 0.1:4b_128g/0.9:3b_128g s4 3.16 bpw
-- model.layers.31.mlp.gate_proj 0.1:4b_32g/0.9:3b_32g s4 3.23 bpw
-- model.layers.31.mlp.gate_proj 1:4b_128g s4 4.04 bpw
-- model.layers.31.mlp.gate_proj 1:4b_32g s4 4.13 bpw
-- model.layers.31.mlp.gate_proj 0.1:5b_128g/0.9:4b_128g s4 4.16 bpw
-- model.layers.31.mlp.gate_proj 0.1:5b_32g/0.9:4b_32g s4 4.23 bpw
-- model.layers.31.mlp.gate_proj 0.1:6b_128g/0.9:5b_128g s4 5.16 bpw
-- model.layers.31.mlp.gate_proj 0.1:6b_32g/0.9:5b_32g s4 5.23 bpw
-- model.layers.31.mlp.gate_proj 1:6b_128g s4 6.04 bpw
-- model.layers.31.mlp.gate_proj 0.1:8b_128g/0.9:6b_128g s4 6.29 bpw
-- model.layers.31.mlp.gate_proj 1:8b_128g s4 8.04 bpw
-- model.layers.31.mlp.up_proj 0.05:3b_64g/0.95:2b_64g s4 2.13 bpw
-- model.layers.31.mlp.up_proj 0.25:3b_64g/0.75:2b_64g s4 2.32 bpw
-- model.layers.31.mlp.up_proj 0.3:3b_64g/0.7:2b_64g s4 2.38 bpw
-- model.layers.31.mlp.up_proj 0.25:4b_128g/0.75:3b_128g s4 3.29 bpw
-- model.layers.31.mlp.up_proj 0.25:4b_32g/0.75:3b_32g s4 3.38 bpw
-- model.layers.31.mlp.up_proj 1:4b_32g s4 4.13 bpw
-- model.layers.31.mlp.up_proj 0.25:5b_128g/0.75:4b_128g s4 4.29 bpw
-- model.layers.31.mlp.up_proj 0.25:5b_32g/0.75:4b_32g s4 4.38 bpw
-- model.layers.31.mlp.up_proj 0.25:6b_128g/0.75:5b_128g s4 5.29 bpw
-- model.layers.31.mlp.up_proj 0.25:6b_32g/0.75:5b_32g s4 5.38 bpw
-- model.layers.31.mlp.up_proj 1:6b_128g s4 6.04 bpw
-- model.layers.31.mlp.up_proj 0.1:8b_128g/0.9:6b_128g s4 6.29 bpw
-- model.layers.31.mlp.up_proj 1:8b_128g s4 8.04 bpw
-- model.layers.31.mlp.down_proj 0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4 2.48 bpw
-- model.layers.31.mlp.down_proj 0.05:5b_32g/0.95:3b_32g s4 3.24 bpw
-- model.layers.31.mlp.down_proj 0.05:5b_32g/0.95:4b_32g s4 4.19 bpw
-- model.layers.31.mlp.down_proj 0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4 3.41 bpw
-- model.layers.31.mlp.down_proj 0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4 3.49 bpw
-- model.layers.31.mlp.down_proj 0.05:8b_32g/0.95:4b_128g s4 4.25 bpw
-- model.layers.31.mlp.down_proj 0.05:8b_32g/0.95:4b_32g s4 4.34 bpw
-- model.layers.31.mlp.down_proj 0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4 4.36 bpw
-- model.layers.31.mlp.down_proj 0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4 4.44 bpw
-- model.layers.31.mlp.down_proj 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4 5.31 bpw
-- model.layers.31.mlp.down_proj 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4 5.39 bpw
-- model.layers.31.mlp.down_proj 0.05:8b_32g/0.95:6b_128g s4 6.15 bpw
-- model.layers.31.mlp.down_proj 0.15:8b_128g/0.85:6b_128g s4 6.35 bpw
-- model.layers.31.mlp.down_proj 1:8b_128g s4 8.04 bpw
-- 2.2469 bpw accuracy: 0.93468168
-- 2.3233 bpw accuracy: 0.93676452
-- 2.5957 bpw accuracy: 0.94465024
-- 2.9121 bpw accuracy: 0.94718373
-- 3.2851 bpw accuracy: 0.96705803
-- 3.3679 bpw accuracy: 0.96966901
-- 3.6207 bpw accuracy: 0.97334990
-- 4.1380 bpw accuracy: 0.98255626
-- 4.1991 bpw accuracy: 0.98405144
-- 4.2682 bpw accuracy: 0.98309226
-- 4.3510 bpw accuracy: 0.98517615
-- 5.2513 bpw accuracy: 0.99132111
-- 5.3341 bpw accuracy: 0.99250382
-- 6.0729 bpw accuracy: 0.99510243
-- 6.3082 bpw accuracy: 0.99555561
-- 6.8707 bpw accuracy: 0.99634729
-- 8.0374 bpw accuracy: 0.99851187
------------------------------------------------
| Measured: model.layers.31 (MLP) |
| Duration: 10.76 seconds |
| Completed step: 64/67 |
| Avg time / step (rolling): 9.29 seconds |
| Estimated remaining time: 0min 27sec |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
-- Layer: model.norm (RMSNorm)
------------------------------------------------
| Measured: model.norm (RMSNorm) |
| Duration: 0.26 seconds |
| Completed step: 65/67 |
| Avg time / step (rolling): 8.52 seconds |
| Estimated remaining time: 0min 17sec |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
-- Layer: lm_head (Linear)
------------------------------------------------
| Measured: lm_head (Linear) |
| Duration: 0.34 seconds |
| Completed step: 66/67 |
| Avg time / step (rolling): 7.51 seconds |
| Estimated remaining time: 0min 7sec |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
-- Saving checkpoint...
-- Optimizing...
-- Optimizing: 1/ 240
-- Optimizing: 9/ 240
-- Optimizing: 17/ 240
-- Optimizing: 25/ 240
-- Optimizing: 33/ 240
-- Optimizing: 41/ 240
-- Optimizing: 49/ 240
-- Optimizing: 57/ 240
-- Optimizing: 65/ 240
-- Optimizing: 73/ 240
-- Optimizing: 80/ 240
-- Optimizing: 88/ 240
-- Optimizing: 96/ 240
-- Optimizing: 104/ 240
-- Optimizing: 112/ 240
-- Optimizing: 120/ 240
-- Optimizing: 128/ 240
-- Optimizing: 136/ 240
-- Optimizing: 144/ 240
-- Optimizing: 152/ 240
-- Optimizing: 160/ 240
-- Optimizing: 168/ 240
-- Optimizing: 176/ 240
-- Optimizing: 184/ 240
-- Optimizing: 192/ 240
-- Optimizing: 200/ 240
-- Optimizing: 208/ 240
-- Optimizing: 216/ 240
-- Optimizing: 224/ 240
-- Optimizing: 232/ 240
-- Optimizing: 240/ 240
-- max(err): 0.005406
-- error_norm: 1.485759
-- Quantization strategy:
-- model.layers.0.self_attn 6.6359 bpw - exp. error: 0.00218182
-- model.layers.0.mlp 8.0374 bpw - exp. error: 0.00114895
-- model.layers.1.self_attn 8.0418 bpw - exp. error: 0.00184583
-- model.layers.1.mlp 8.0374 bpw - exp. error: 0.00199654
-- model.layers.2.self_attn 8.0418 bpw - exp. error: 0.00177566
-- model.layers.2.mlp 6.0729 bpw - exp. error: 0.00249584
-- model.layers.3.self_attn 4.1930 bpw - exp. error: 0.00383048
-- model.layers.3.mlp 6.0729 bpw - exp. error: 0.00203851
-- model.layers.4.self_attn 6.6359 bpw - exp. error: 0.00102152
-- model.layers.4.mlp 6.3082 bpw - exp. error: 0.00182404
-- model.layers.5.self_attn 4.4013 bpw - exp. error: 0.00264310
-- model.layers.5.mlp 5.2513 bpw - exp. error: 0.00287902
-- model.layers.6.self_attn 4.4013 bpw - exp. error: 0.00337663
-- model.layers.6.mlp 6.8707 bpw - exp. error: 0.00146585
-- model.layers.7.self_attn 6.6359 bpw - exp. error: 0.00094822
-- model.layers.7.mlp 6.8707 bpw - exp. error: 0.00184917
-- model.layers.8.self_attn 6.6359 bpw - exp. error: 0.00114748
-- model.layers.8.mlp 6.0729 bpw - exp. error: 0.00230076
-- model.layers.9.self_attn 6.6359 bpw - exp. error: 0.00127157
-- model.layers.9.mlp 5.3341 bpw - exp. error: 0.00378097
-- model.layers.10.self_attn 6.6359 bpw - exp. error: 0.00155776
-- model.layers.10.mlp 6.3082 bpw - exp. error: 0.00244060
-- model.layers.11.self_attn 8.0418 bpw - exp. error: 0.00068859
-- model.layers.11.mlp 6.0729 bpw - exp. error: 0.00267253
-- model.layers.12.self_attn 6.6359 bpw - exp. error: 0.00177117
-- model.layers.12.mlp 6.8707 bpw - exp. error: 0.00214834
-- model.layers.13.self_attn 5.4640 bpw - exp. error: 0.00361148
-- model.layers.13.mlp 6.8707 bpw - exp. error: 0.00213348
-- model.layers.14.self_attn 6.0418 bpw - exp. error: 0.00148709
-- model.layers.14.mlp 6.0729 bpw - exp. error: 0.00155184
-- model.layers.15.self_attn 8.0418 bpw - exp. error: 0.00039677
-- model.layers.15.mlp 6.8707 bpw - exp. error: 0.00120598
-- model.layers.16.self_attn 6.6359 bpw - exp. error: 0.00103175
-- model.layers.16.mlp 6.3082 bpw - exp. error: 0.00161467
-- model.layers.17.self_attn 8.0418 bpw - exp. error: 0.00047822
-- model.layers.17.mlp 6.0729 bpw - exp. error: 0.00194863
-- model.layers.18.self_attn 6.0418 bpw - exp. error: 0.00202788
-- model.layers.18.mlp 5.2513 bpw - exp. error: 0.00404148
-- model.layers.19.self_attn 6.0418 bpw - exp. error: 0.00191705
-- model.layers.19.mlp 5.3341 bpw - exp. error: 0.00383573
-- model.layers.20.self_attn 6.6359 bpw - exp. error: 0.00128817
-- model.layers.20.mlp 5.3341 bpw - exp. error: 0.00428636
-- model.layers.21.self_attn 6.0418 bpw - exp. error: 0.00207416
-- model.layers.21.mlp 5.3341 bpw - exp. error: 0.00474077
-- model.layers.22.self_attn 6.0418 bpw - exp. error: 0.00207343
-- model.layers.22.mlp 6.3082 bpw - exp. error: 0.00300660
-- model.layers.23.self_attn 8.0418 bpw - exp. error: 0.00056060
-- model.layers.23.mlp 5.3341 bpw - exp. error: 0.00540571
-- model.layers.24.self_attn 6.6359 bpw - exp. error: 0.00141783
-- model.layers.24.mlp 6.0729 bpw - exp. error: 0.00354173
-- model.layers.25.self_attn 5.4640 bpw - exp. error: 0.00263537
-- model.layers.25.mlp 6.3082 bpw - exp. error: 0.00349990
-- model.layers.26.self_attn 6.6359 bpw - exp. error: 0.00133379
-- model.layers.26.mlp 8.0374 bpw - exp. error: 0.00102325
-- model.layers.27.self_attn 5.4640 bpw - exp. error: 0.00248246
-- model.layers.27.mlp 6.3082 bpw - exp. error: 0.00371280
-- model.layers.28.self_attn 6.0418 bpw - exp. error: 0.00244441
-- model.layers.28.mlp 8.0374 bpw - exp. error: 0.00109955
-- model.layers.29.self_attn 5.4640 bpw - exp. error: 0.00300564
-- model.layers.29.mlp 8.0374 bpw - exp. error: 0.00177070
-- model.layers.30.self_attn 6.6359 bpw - exp. error: 0.00173835
-- model.layers.30.mlp 8.0374 bpw - exp. error: 0.00135131
-- model.layers.31.self_attn 8.0418 bpw - exp. error: 0.00071250
-- model.layers.31.mlp 8.0374 bpw - exp. error: 0.00148813
-- sum(log(err)): -402.140137
-- max(err): 0.005406
-- Tokenizing samples...
-- Token embeddings again...
-- Quantizing...
-- Layer: model.layers.0 (Attention)
-- Linear: model.layers.0.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.0.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.0.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.0.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.002210
-- Layer: model.layers.0 (MLP)
-- Linear: model.layers.0.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.0.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.0.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001247
-- Layer: model.layers.1 (Attention)
-- Linear: model.layers.1.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.1.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.1.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.1.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001910
-- Layer: model.layers.1 (MLP)
-- Linear: model.layers.1.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.1.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.1.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.002304
-- Layer: model.layers.2 (Attention)
-- Linear: model.layers.2.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.2.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.2.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.2.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001826
-- Layer: model.layers.2 (MLP)
-- Linear: model.layers.2.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.2.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.2.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
-- Module quantized, rfn_error: 0.003184
-- Layer: model.layers.3 (Attention)
-- Linear: model.layers.3.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
-- Linear: model.layers.3.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
-- Linear: model.layers.3.self_attn.v_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.24 bpw
-- Linear: model.layers.3.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
-- Module quantized, rfn_error: 0.004051
-- Layer: model.layers.3 (MLP)
-- Linear: model.layers.3.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.3.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.3.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
-- Module quantized, rfn_error: 0.002333
-- Layer: model.layers.4 (Attention)
-- Linear: model.layers.4.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.4.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.4.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.4.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.001081
-- Layer: model.layers.4 (MLP)
-- Linear: model.layers.4.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.4.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.4.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
-- Module quantized, rfn_error: 0.001737
-- Layer: model.layers.5 (Attention)
-- Linear: model.layers.5.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
-- Linear: model.layers.5.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
-- Linear: model.layers.5.self_attn.v_proj -> 1:5b_64g s4, 5.07 bpw
-- Linear: model.layers.5.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
-- Module quantized, rfn_error: 0.002412
-- Layer: model.layers.5 (MLP)
-- Linear: model.layers.5.mlp.gate_proj -> 0.1:6b_128g/0.9:5b_128g s4, 5.16 bpw
-- Linear: model.layers.5.mlp.up_proj -> 0.25:6b_128g/0.75:5b_128g s4, 5.29 bpw
-- Linear: model.layers.5.mlp.down_proj -> 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4, 5.31 bpw
-- Module quantized, rfn_error: 0.002792
-- Layer: model.layers.6 (Attention)
-- Linear: model.layers.6.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
-- Linear: model.layers.6.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
-- Linear: model.layers.6.self_attn.v_proj -> 1:5b_64g s4, 5.07 bpw
-- Linear: model.layers.6.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
-- Module quantized, rfn_error: 0.003026
-- Layer: model.layers.6 (MLP)
-- Linear: model.layers.6.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.6.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.6.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001388
-- Layer: model.layers.7 (Attention)
-- Linear: model.layers.7.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.7.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.7.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.7.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.000886
-- Layer: model.layers.7 (MLP)
-- Linear: model.layers.7.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.7.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.7.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001762
-- Layer: model.layers.8 (Attention)
-- Linear: model.layers.8.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.8.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.8.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.8.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.001070
-- Layer: model.layers.8 (MLP)
-- Linear: model.layers.8.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.8.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.8.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
-- Module quantized, rfn_error: 0.002282
-- Layer: model.layers.9 (Attention)
-- Linear: model.layers.9.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.9.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.9.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.9.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.001224
-- Layer: model.layers.9 (MLP)
-- Linear: model.layers.9.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
-- Linear: model.layers.9.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
-- Linear: model.layers.9.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
-- Module quantized, rfn_error: 0.003722
-- Layer: model.layers.10 (Attention)
-- Linear: model.layers.10.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.10.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.10.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.10.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.001441
-- Layer: model.layers.10 (MLP)
-- Linear: model.layers.10.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.10.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.10.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
-- Module quantized, rfn_error: 0.002382
-- Layer: model.layers.11 (Attention)
-- Linear: model.layers.11.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.11.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.11.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.11.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.000652
-- Layer: model.layers.11 (MLP)
-- Linear: model.layers.11.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.11.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.11.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
-- Module quantized, rfn_error: 0.002618
-- Layer: model.layers.12 (Attention)
-- Linear: model.layers.12.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.12.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.12.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.12.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.001683
-- Layer: model.layers.12 (MLP)
-- Linear: model.layers.12.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.12.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.12.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.002034
-- Layer: model.layers.13 (Attention)
-- Linear: model.layers.13.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Linear: model.layers.13.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Linear: model.layers.13.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.13.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Module quantized, rfn_error: 0.003492
-- Layer: model.layers.13 (MLP)
-- Linear: model.layers.13.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.13.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.13.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001966
-- Layer: model.layers.14 (Attention)
-- Linear: model.layers.14.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.14.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.14.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.14.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
-- Module quantized, rfn_error: 0.001318
-- Layer: model.layers.14 (MLP)
-- Linear: model.layers.14.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.14.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.14.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
-- Module quantized, rfn_error: 0.001441
-- Layer: model.layers.15 (Attention)
-- Linear: model.layers.15.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.15.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.15.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.15.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.000365
-- Layer: model.layers.15 (MLP)
-- Linear: model.layers.15.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.15.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.15.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001103
-- Layer: model.layers.16 (Attention)
-- Linear: model.layers.16.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.16.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.16.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.16.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.000938
-- Layer: model.layers.16 (MLP)
-- Linear: model.layers.16.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.16.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.16.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
-- Module quantized, rfn_error: 0.001508
-- Layer: model.layers.17 (Attention)
-- Linear: model.layers.17.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.17.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.17.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.17.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.000418
-- Saving checkpoint...
-- Layer: model.layers.17 (MLP)
-- Linear: model.layers.17.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.17.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.17.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
-- Module quantized, rfn_error: 0.001869
-- Layer: model.layers.18 (Attention)
-- Linear: model.layers.18.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.18.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.18.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.18.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
-- Module quantized, rfn_error: 0.001830
-- Layer: model.layers.18 (MLP)
-- Linear: model.layers.18.mlp.gate_proj -> 0.1:6b_128g/0.9:5b_128g s4, 5.16 bpw
-- Linear: model.layers.18.mlp.up_proj -> 0.25:6b_128g/0.75:5b_128g s4, 5.29 bpw
-- Linear: model.layers.18.mlp.down_proj -> 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4, 5.31 bpw
-- Module quantized, rfn_error: 0.003908
-- Layer: model.layers.19 (Attention)
-- Linear: model.layers.19.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.19.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.19.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.19.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
-- Module quantized, rfn_error: 0.001757
-- Layer: model.layers.19 (MLP)
-- Linear: model.layers.19.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
-- Linear: model.layers.19.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
-- Linear: model.layers.19.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
-- Module quantized, rfn_error: 0.003729
-- Layer: model.layers.20 (Attention)
-- Linear: model.layers.20.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.20.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.20.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.20.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.001186
-- Layer: model.layers.20 (MLP)
-- Linear: model.layers.20.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
-- Linear: model.layers.20.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
-- Linear: model.layers.20.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
-- Module quantized, rfn_error: 0.004217
-- Layer: model.layers.21 (Attention)
-- Linear: model.layers.21.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.21.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.21.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.21.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
-- Module quantized, rfn_error: 0.001915
-- Layer: model.layers.21 (MLP)
-- Linear: model.layers.21.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
-- Linear: model.layers.21.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
-- Linear: model.layers.21.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
-- Module quantized, rfn_error: 0.004769
-- Layer: model.layers.22 (Attention)
-- Linear: model.layers.22.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.22.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.22.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.22.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
-- Module quantized, rfn_error: 0.002010
-- Layer: model.layers.22 (MLP)
-- Linear: model.layers.22.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.22.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.22.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
-- Module quantized, rfn_error: 0.003114
-- Layer: model.layers.23 (Attention)
-- Linear: model.layers.23.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.23.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.23.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.23.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.000544
-- Layer: model.layers.23 (MLP)
-- Linear: model.layers.23.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
-- Linear: model.layers.23.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
-- Linear: model.layers.23.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
-- Module quantized, rfn_error: 0.005750
-- Layer: model.layers.24 (Attention)
-- Linear: model.layers.24.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.24.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.24.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.24.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.001395
-- Layer: model.layers.24 (MLP)
-- Linear: model.layers.24.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.24.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.24.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
-- Module quantized, rfn_error: 0.003878
-- Layer: model.layers.25 (Attention)
-- Linear: model.layers.25.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Linear: model.layers.25.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Linear: model.layers.25.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.25.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Module quantized, rfn_error: 0.002646
-- Layer: model.layers.25 (MLP)
-- Linear: model.layers.25.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.25.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.25.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
-- Module quantized, rfn_error: 0.003885
-- Layer: model.layers.26 (Attention)
-- Linear: model.layers.26.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.26.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.26.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.26.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.001354
-- Layer: model.layers.26 (MLP)
-- Linear: model.layers.26.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.26.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.26.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001154
-- Layer: model.layers.27 (Attention)
-- Linear: model.layers.27.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Linear: model.layers.27.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Linear: model.layers.27.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.27.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Module quantized, rfn_error: 0.002578
-- Layer: model.layers.27 (MLP)
-- Linear: model.layers.27.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.27.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
-- Linear: model.layers.27.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
-- Module quantized, rfn_error: 0.004201
-- Layer: model.layers.28 (Attention)
-- Linear: model.layers.28.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.28.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.28.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
-- Linear: model.layers.28.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
-- Module quantized, rfn_error: 0.002510
-- Layer: model.layers.28 (MLP)
-- Linear: model.layers.28.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.28.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.28.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001251
-- Layer: model.layers.29 (Attention)
-- Linear: model.layers.29.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Linear: model.layers.29.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Linear: model.layers.29.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.29.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
-- Module quantized, rfn_error: 0.003163
-- Layer: model.layers.29 (MLP)
-- Linear: model.layers.29.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.29.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.29.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.002406
-- Layer: model.layers.30 (Attention)
-- Linear: model.layers.30.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.30.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
-- Linear: model.layers.30.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
-- Linear: model.layers.30.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
-- Module quantized, rfn_error: 0.001843
-- Layer: model.layers.30 (MLP)
-- Linear: model.layers.30.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.30.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.30.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001549
-- Layer: model.layers.31 (Attention)
-- Linear: model.layers.31.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.31.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.31.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.31.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.000743
-- Layer: model.layers.31 (MLP)
-- Linear: model.layers.31.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.31.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
-- Linear: model.layers.31.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
-- Module quantized, rfn_error: 0.001628
-- Layer: model.norm (RMSNorm)
-- Module quantized, rfn_error: 0.000000
-- Layer: lm_head (Linear)
-- Linear: lm_head -> 0.15:8b_128g/0.85:6b_128g s4, 6.37 bpw
-- Module quantized, calibration perplexity (quant): 9.5581
-- Saving checkpoint...
-- Compiling output file...
-- Writing shard 1...
-- Creating directory models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/
-- models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/output.safetensors (3,068 MB)
-- Copying non-tensor files to output directory models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/
-- .gitattributes
-- added_tokens.json
-- CODE_OF_CONDUCT.md
-- config.json
-- configuration_phi3.py
-- generation_config.json
-- LICENSE
-- model.safetensors.index.json
-- modeling_phi3.py
-- NOTICE.md
-- README.md
-- sample_finetune.py
-- SECURITY.md
-- special_tokens_map.json
-- tokenizer.json
-- tokenizer.model
-- tokenizer_config.json
-- Finished
Metadata
Metadata
Assignees
Labels
No labels