Skip to content

Conversion to EXL2 of Phi-3 Mini 128k July update produces gibberish output #537

@SystemPanic

Description

@SystemPanic

Seems to be caused by:

  • Rope type name changed to longrope
  • Scaling factor list changed

Useful references:

ggml-org/llama.cpp#8262
ggml-org/llama.cpp#6849 (comment)

Conversion log:

------------------------------------------------
| Measured: model.layers.31 (Attention)        |
| Duration: 7.80 seconds                       |
| Completed step: 63/67                        |
| Avg time / step (rolling): 9.28 seconds      |
| Estimated remaining time: 0min 37sec         |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Layer: model.layers.31 (MLP)
 -- model.layers.31.mlp.gate_proj                      0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.31.mlp.gate_proj                      1:4b_128g s4                                       4.04 bpw
 -- model.layers.31.mlp.gate_proj                      1:4b_32g s4                                        4.13 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.31.mlp.gate_proj                      1:6b_128g s4                                       6.04 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:8b_128g/0.9:6b_128g s4                         6.29 bpw
 -- model.layers.31.mlp.gate_proj                      1:8b_128g s4                                       8.04 bpw
 -- model.layers.31.mlp.up_proj                        0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.31.mlp.up_proj                        0.25:3b_64g/0.75:2b_64g s4                         2.32 bpw
 -- model.layers.31.mlp.up_proj                        0.3:3b_64g/0.7:2b_64g s4                           2.38 bpw
 -- model.layers.31.mlp.up_proj                        0.25:4b_128g/0.75:3b_128g s4                       3.29 bpw
 -- model.layers.31.mlp.up_proj                        0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.31.mlp.up_proj                        1:4b_32g s4                                        4.13 bpw
 -- model.layers.31.mlp.up_proj                        0.25:5b_128g/0.75:4b_128g s4                       4.29 bpw
 -- model.layers.31.mlp.up_proj                        0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.31.mlp.up_proj                        0.25:6b_128g/0.75:5b_128g s4                       5.29 bpw
 -- model.layers.31.mlp.up_proj                        0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.31.mlp.up_proj                        1:6b_128g s4                                       6.04 bpw
 -- model.layers.31.mlp.up_proj                        0.1:8b_128g/0.9:6b_128g s4                         6.29 bpw
 -- model.layers.31.mlp.up_proj                        1:8b_128g s4                                       8.04 bpw
 -- model.layers.31.mlp.down_proj                      0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.48 bpw
 -- model.layers.31.mlp.down_proj                      0.05:5b_32g/0.95:3b_32g s4                         3.24 bpw
 -- model.layers.31.mlp.down_proj                      0.05:5b_32g/0.95:4b_32g s4                         4.19 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.41 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.49 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.95:4b_128g s4                        4.25 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.95:4b_32g s4                         4.34 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.36 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.44 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.31 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.39 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.95:6b_128g s4                        6.15 bpw
 -- model.layers.31.mlp.down_proj                      0.15:8b_128g/0.85:6b_128g s4                       6.35 bpw
 -- model.layers.31.mlp.down_proj                      1:8b_128g s4                                       8.04 bpw
 -- 2.2469 bpw  accuracy: 0.93468168
 -- 2.3233 bpw  accuracy: 0.93676452
 -- 2.5957 bpw  accuracy: 0.94465024
 -- 2.9121 bpw  accuracy: 0.94718373
 -- 3.2851 bpw  accuracy: 0.96705803
 -- 3.3679 bpw  accuracy: 0.96966901
 -- 3.6207 bpw  accuracy: 0.97334990
 -- 4.1380 bpw  accuracy: 0.98255626
 -- 4.1991 bpw  accuracy: 0.98405144
 -- 4.2682 bpw  accuracy: 0.98309226
 -- 4.3510 bpw  accuracy: 0.98517615
 -- 5.2513 bpw  accuracy: 0.99132111
 -- 5.3341 bpw  accuracy: 0.99250382
 -- 6.0729 bpw  accuracy: 0.99510243
 -- 6.3082 bpw  accuracy: 0.99555561
 -- 6.8707 bpw  accuracy: 0.99634729
 -- 8.0374 bpw  accuracy: 0.99851187
------------------------------------------------
| Measured: model.layers.31 (MLP)              |
| Duration: 10.76 seconds                      |
| Completed step: 64/67                        |
| Avg time / step (rolling): 9.29 seconds      |
| Estimated remaining time: 0min 27sec         |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Layer: model.norm (RMSNorm)
------------------------------------------------
| Measured: model.norm (RMSNorm)               |
| Duration: 0.26 seconds                       |
| Completed step: 65/67                        |
| Avg time / step (rolling): 8.52 seconds      |
| Estimated remaining time: 0min 17sec         |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Layer: lm_head (Linear)
------------------------------------------------
| Measured: lm_head (Linear)                   |
| Duration: 0.34 seconds                       |
| Completed step: 66/67                        |
| Avg time / step (rolling): 7.51 seconds      |
| Estimated remaining time: 0min 7sec          |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Saving checkpoint...
 -- Optimizing...
 -- Optimizing:    1/ 240
 -- Optimizing:    9/ 240
 -- Optimizing:   17/ 240
 -- Optimizing:   25/ 240
 -- Optimizing:   33/ 240
 -- Optimizing:   41/ 240
 -- Optimizing:   49/ 240
 -- Optimizing:   57/ 240
 -- Optimizing:   65/ 240
 -- Optimizing:   73/ 240
 -- Optimizing:   80/ 240
 -- Optimizing:   88/ 240
 -- Optimizing:   96/ 240
 -- Optimizing:  104/ 240
 -- Optimizing:  112/ 240
 -- Optimizing:  120/ 240
 -- Optimizing:  128/ 240
 -- Optimizing:  136/ 240
 -- Optimizing:  144/ 240
 -- Optimizing:  152/ 240
 -- Optimizing:  160/ 240
 -- Optimizing:  168/ 240
 -- Optimizing:  176/ 240
 -- Optimizing:  184/ 240
 -- Optimizing:  192/ 240
 -- Optimizing:  200/ 240
 -- Optimizing:  208/ 240
 -- Optimizing:  216/ 240
 -- Optimizing:  224/ 240
 -- Optimizing:  232/ 240
 -- Optimizing:  240/ 240
 -- max(err): 0.005406
 -- error_norm: 1.485759
 -- Quantization strategy:
 --   model.layers.0.self_attn                           6.6359 bpw - exp. error: 0.00218182
 --   model.layers.0.mlp                                 8.0374 bpw - exp. error: 0.00114895
 --   model.layers.1.self_attn                           8.0418 bpw - exp. error: 0.00184583
 --   model.layers.1.mlp                                 8.0374 bpw - exp. error: 0.00199654
 --   model.layers.2.self_attn                           8.0418 bpw - exp. error: 0.00177566
 --   model.layers.2.mlp                                 6.0729 bpw - exp. error: 0.00249584
 --   model.layers.3.self_attn                           4.1930 bpw - exp. error: 0.00383048
 --   model.layers.3.mlp                                 6.0729 bpw - exp. error: 0.00203851
 --   model.layers.4.self_attn                           6.6359 bpw - exp. error: 0.00102152
 --   model.layers.4.mlp                                 6.3082 bpw - exp. error: 0.00182404
 --   model.layers.5.self_attn                           4.4013 bpw - exp. error: 0.00264310
 --   model.layers.5.mlp                                 5.2513 bpw - exp. error: 0.00287902
 --   model.layers.6.self_attn                           4.4013 bpw - exp. error: 0.00337663
 --   model.layers.6.mlp                                 6.8707 bpw - exp. error: 0.00146585
 --   model.layers.7.self_attn                           6.6359 bpw - exp. error: 0.00094822
 --   model.layers.7.mlp                                 6.8707 bpw - exp. error: 0.00184917
 --   model.layers.8.self_attn                           6.6359 bpw - exp. error: 0.00114748
 --   model.layers.8.mlp                                 6.0729 bpw - exp. error: 0.00230076
 --   model.layers.9.self_attn                           6.6359 bpw - exp. error: 0.00127157
 --   model.layers.9.mlp                                 5.3341 bpw - exp. error: 0.00378097
 --   model.layers.10.self_attn                          6.6359 bpw - exp. error: 0.00155776
 --   model.layers.10.mlp                                6.3082 bpw - exp. error: 0.00244060
 --   model.layers.11.self_attn                          8.0418 bpw - exp. error: 0.00068859
 --   model.layers.11.mlp                                6.0729 bpw - exp. error: 0.00267253
 --   model.layers.12.self_attn                          6.6359 bpw - exp. error: 0.00177117
 --   model.layers.12.mlp                                6.8707 bpw - exp. error: 0.00214834
 --   model.layers.13.self_attn                          5.4640 bpw - exp. error: 0.00361148
 --   model.layers.13.mlp                                6.8707 bpw - exp. error: 0.00213348
 --   model.layers.14.self_attn                          6.0418 bpw - exp. error: 0.00148709
 --   model.layers.14.mlp                                6.0729 bpw - exp. error: 0.00155184
 --   model.layers.15.self_attn                          8.0418 bpw - exp. error: 0.00039677
 --   model.layers.15.mlp                                6.8707 bpw - exp. error: 0.00120598
 --   model.layers.16.self_attn                          6.6359 bpw - exp. error: 0.00103175
 --   model.layers.16.mlp                                6.3082 bpw - exp. error: 0.00161467
 --   model.layers.17.self_attn                          8.0418 bpw - exp. error: 0.00047822
 --   model.layers.17.mlp                                6.0729 bpw - exp. error: 0.00194863
 --   model.layers.18.self_attn                          6.0418 bpw - exp. error: 0.00202788
 --   model.layers.18.mlp                                5.2513 bpw - exp. error: 0.00404148
 --   model.layers.19.self_attn                          6.0418 bpw - exp. error: 0.00191705
 --   model.layers.19.mlp                                5.3341 bpw - exp. error: 0.00383573
 --   model.layers.20.self_attn                          6.6359 bpw - exp. error: 0.00128817
 --   model.layers.20.mlp                                5.3341 bpw - exp. error: 0.00428636
 --   model.layers.21.self_attn                          6.0418 bpw - exp. error: 0.00207416
 --   model.layers.21.mlp                                5.3341 bpw - exp. error: 0.00474077
 --   model.layers.22.self_attn                          6.0418 bpw - exp. error: 0.00207343
 --   model.layers.22.mlp                                6.3082 bpw - exp. error: 0.00300660
 --   model.layers.23.self_attn                          8.0418 bpw - exp. error: 0.00056060
 --   model.layers.23.mlp                                5.3341 bpw - exp. error: 0.00540571
 --   model.layers.24.self_attn                          6.6359 bpw - exp. error: 0.00141783
 --   model.layers.24.mlp                                6.0729 bpw - exp. error: 0.00354173
 --   model.layers.25.self_attn                          5.4640 bpw - exp. error: 0.00263537
 --   model.layers.25.mlp                                6.3082 bpw - exp. error: 0.00349990
 --   model.layers.26.self_attn                          6.6359 bpw - exp. error: 0.00133379
 --   model.layers.26.mlp                                8.0374 bpw - exp. error: 0.00102325
 --   model.layers.27.self_attn                          5.4640 bpw - exp. error: 0.00248246
 --   model.layers.27.mlp                                6.3082 bpw - exp. error: 0.00371280
 --   model.layers.28.self_attn                          6.0418 bpw - exp. error: 0.00244441
 --   model.layers.28.mlp                                8.0374 bpw - exp. error: 0.00109955
 --   model.layers.29.self_attn                          5.4640 bpw - exp. error: 0.00300564
 --   model.layers.29.mlp                                8.0374 bpw - exp. error: 0.00177070
 --   model.layers.30.self_attn                          6.6359 bpw - exp. error: 0.00173835
 --   model.layers.30.mlp                                8.0374 bpw - exp. error: 0.00135131
 --   model.layers.31.self_attn                          8.0418 bpw - exp. error: 0.00071250
 --   model.layers.31.mlp                                8.0374 bpw - exp. error: 0.00148813
 -- sum(log(err)): -402.140137
 -- max(err): 0.005406
 -- Tokenizing samples...
 -- Token embeddings again...
 -- Quantizing...
 -- Layer: model.layers.0 (Attention)
 -- Linear: model.layers.0.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.0.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.0.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.0.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.002210
 -- Layer: model.layers.0 (MLP)
 -- Linear: model.layers.0.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.0.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.0.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001247
 -- Layer: model.layers.1 (Attention)
 -- Linear: model.layers.1.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001910
 -- Layer: model.layers.1 (MLP)
 -- Linear: model.layers.1.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.002304
 -- Layer: model.layers.2 (Attention)
 -- Linear: model.layers.2.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.2.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.2.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.2.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001826
 -- Layer: model.layers.2 (MLP)
 -- Linear: model.layers.2.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.2.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.2.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.003184
 -- Layer: model.layers.3 (Attention)
 -- Linear: model.layers.3.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.3.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.3.self_attn.v_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.24 bpw
 -- Linear: model.layers.3.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Module quantized, rfn_error: 0.004051
 -- Layer: model.layers.3 (MLP)
 -- Linear: model.layers.3.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.3.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.3.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.002333
 -- Layer: model.layers.4 (Attention)
 -- Linear: model.layers.4.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.4.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.4.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.4.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001081
 -- Layer: model.layers.4 (MLP)
 -- Linear: model.layers.4.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.4.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.4.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.001737
 -- Layer: model.layers.5 (Attention)
 -- Linear: model.layers.5.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.5.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.5.self_attn.v_proj -> 1:5b_64g s4, 5.07 bpw
 -- Linear: model.layers.5.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Module quantized, rfn_error: 0.002412
 -- Layer: model.layers.5 (MLP)
 -- Linear: model.layers.5.mlp.gate_proj -> 0.1:6b_128g/0.9:5b_128g s4, 5.16 bpw
 -- Linear: model.layers.5.mlp.up_proj -> 0.25:6b_128g/0.75:5b_128g s4, 5.29 bpw
 -- Linear: model.layers.5.mlp.down_proj -> 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4, 5.31 bpw
 -- Module quantized, rfn_error: 0.002792
 -- Layer: model.layers.6 (Attention)
 -- Linear: model.layers.6.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.6.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.6.self_attn.v_proj -> 1:5b_64g s4, 5.07 bpw
 -- Linear: model.layers.6.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Module quantized, rfn_error: 0.003026
 -- Layer: model.layers.6 (MLP)
 -- Linear: model.layers.6.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.6.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.6.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001388
 -- Layer: model.layers.7 (Attention)
 -- Linear: model.layers.7.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.7.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.7.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.7.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.000886
 -- Layer: model.layers.7 (MLP)
 -- Linear: model.layers.7.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.7.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.7.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001762
 -- Layer: model.layers.8 (Attention)
 -- Linear: model.layers.8.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.8.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.8.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.8.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001070
 -- Layer: model.layers.8 (MLP)
 -- Linear: model.layers.8.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.8.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.8.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.002282
 -- Layer: model.layers.9 (Attention)
 -- Linear: model.layers.9.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.9.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.9.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.9.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001224
 -- Layer: model.layers.9 (MLP)
 -- Linear: model.layers.9.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.9.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.9.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.003722
 -- Layer: model.layers.10 (Attention)
 -- Linear: model.layers.10.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.10.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.10.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.10.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001441
 -- Layer: model.layers.10 (MLP)
 -- Linear: model.layers.10.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.10.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.10.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.002382
 -- Layer: model.layers.11 (Attention)
 -- Linear: model.layers.11.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.11.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.11.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.11.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000652
 -- Layer: model.layers.11 (MLP)
 -- Linear: model.layers.11.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.11.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.11.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.002618
 -- Layer: model.layers.12 (Attention)
 -- Linear: model.layers.12.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.12.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.12.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.12.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001683
 -- Layer: model.layers.12 (MLP)
 -- Linear: model.layers.12.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.12.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.12.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.002034
 -- Layer: model.layers.13 (Attention)
 -- Linear: model.layers.13.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.13.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.13.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.13.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.003492
 -- Layer: model.layers.13 (MLP)
 -- Linear: model.layers.13.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.13.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.13.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001966
 -- Layer: model.layers.14 (Attention)
 -- Linear: model.layers.14.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001318
 -- Layer: model.layers.14 (MLP)
 -- Linear: model.layers.14.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.001441
 -- Layer: model.layers.15 (Attention)
 -- Linear: model.layers.15.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.15.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.15.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.15.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000365
 -- Layer: model.layers.15 (MLP)
 -- Linear: model.layers.15.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.15.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.15.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001103
 -- Layer: model.layers.16 (Attention)
 -- Linear: model.layers.16.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.16.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.16.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.16.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.000938
 -- Layer: model.layers.16 (MLP)
 -- Linear: model.layers.16.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.16.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.16.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.001508
 -- Layer: model.layers.17 (Attention)
 -- Linear: model.layers.17.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.17.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.17.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.17.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000418
 -- Saving checkpoint...
 -- Layer: model.layers.17 (MLP)
 -- Linear: model.layers.17.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.17.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.17.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.001869
 -- Layer: model.layers.18 (Attention)
 -- Linear: model.layers.18.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.18.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.18.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.18.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001830
 -- Layer: model.layers.18 (MLP)
 -- Linear: model.layers.18.mlp.gate_proj -> 0.1:6b_128g/0.9:5b_128g s4, 5.16 bpw
 -- Linear: model.layers.18.mlp.up_proj -> 0.25:6b_128g/0.75:5b_128g s4, 5.29 bpw
 -- Linear: model.layers.18.mlp.down_proj -> 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4, 5.31 bpw
 -- Module quantized, rfn_error: 0.003908
 -- Layer: model.layers.19 (Attention)
 -- Linear: model.layers.19.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.19.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.19.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.19.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001757
 -- Layer: model.layers.19 (MLP)
 -- Linear: model.layers.19.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.19.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.19.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.003729
 -- Layer: model.layers.20 (Attention)
 -- Linear: model.layers.20.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.20.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.20.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.20.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001186
 -- Layer: model.layers.20 (MLP)
 -- Linear: model.layers.20.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.20.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.20.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.004217
 -- Layer: model.layers.21 (Attention)
 -- Linear: model.layers.21.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.21.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.21.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.21.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001915
 -- Layer: model.layers.21 (MLP)
 -- Linear: model.layers.21.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.21.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.21.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.004769
 -- Layer: model.layers.22 (Attention)
 -- Linear: model.layers.22.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.22.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.22.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.22.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.002010
 -- Layer: model.layers.22 (MLP)
 -- Linear: model.layers.22.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.22.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.22.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.003114
 -- Layer: model.layers.23 (Attention)
 -- Linear: model.layers.23.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.23.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.23.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.23.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000544
 -- Layer: model.layers.23 (MLP)
 -- Linear: model.layers.23.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.23.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.23.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.005750
 -- Layer: model.layers.24 (Attention)
 -- Linear: model.layers.24.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.24.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.24.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.24.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001395
 -- Layer: model.layers.24 (MLP)
 -- Linear: model.layers.24.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.24.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.24.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.003878
 -- Layer: model.layers.25 (Attention)
 -- Linear: model.layers.25.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.25.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.25.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.25.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.002646
 -- Layer: model.layers.25 (MLP)
 -- Linear: model.layers.25.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.25.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.25.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.003885
 -- Layer: model.layers.26 (Attention)
 -- Linear: model.layers.26.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.26.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.26.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.26.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001354
 -- Layer: model.layers.26 (MLP)
 -- Linear: model.layers.26.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.26.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.26.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001154
 -- Layer: model.layers.27 (Attention)
 -- Linear: model.layers.27.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.27.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.27.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.27.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.002578
 -- Layer: model.layers.27 (MLP)
 -- Linear: model.layers.27.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.27.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.27.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.004201
 -- Layer: model.layers.28 (Attention)
 -- Linear: model.layers.28.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.28.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.28.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.28.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.002510
 -- Layer: model.layers.28 (MLP)
 -- Linear: model.layers.28.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.28.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.28.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001251
 -- Layer: model.layers.29 (Attention)
 -- Linear: model.layers.29.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.29.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.29.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.29.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.003163
 -- Layer: model.layers.29 (MLP)
 -- Linear: model.layers.29.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.29.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.29.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.002406
 -- Layer: model.layers.30 (Attention)
 -- Linear: model.layers.30.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.30.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.30.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.30.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001843
 -- Layer: model.layers.30 (MLP)
 -- Linear: model.layers.30.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.30.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.30.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001549
 -- Layer: model.layers.31 (Attention)
 -- Linear: model.layers.31.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000743
 -- Layer: model.layers.31 (MLP)
 -- Linear: model.layers.31.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001628
 -- Layer: model.norm (RMSNorm)
 -- Module quantized, rfn_error: 0.000000
 -- Layer: lm_head (Linear)
 -- Linear: lm_head -> 0.15:8b_128g/0.85:6b_128g s4, 6.37 bpw
 -- Module quantized, calibration perplexity (quant): 9.5581
 -- Saving checkpoint...
 -- Compiling output file...
 -- Writing shard 1...
 -- Creating directory models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/
 --   models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/output.safetensors (3,068 MB)
 -- Copying non-tensor files to output directory models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/
 --   .gitattributes
 --   added_tokens.json
 --   CODE_OF_CONDUCT.md
 --   config.json
 --   configuration_phi3.py
 --   generation_config.json
 --   LICENSE
 --   model.safetensors.index.json
 --   modeling_phi3.py
 --   NOTICE.md
 --   README.md
 --   sample_finetune.py
 --   SECURITY.md
 --   special_tokens_map.json
 --   tokenizer.json
 --   tokenizer.model
 --   tokenizer_config.json
 -- Finished

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions