GPT2: llama_model_load: error loading model: missing tensor 'output.weight'

### Name and Version

build/bin/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
version: 4945 (9b169a4d)
built with cc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-23) for x86_64-redhat-linux

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

RTX 4090

### Models

GPT2LMHeadModel

### Problem description & steps to reproduce

when I convert the basic GPT2LMHeadModel using the convert_hf_to_gguf.py script. it work perfectly but then when I load it using:
```
from llama_cpp import Llama
llama = Llama("model.path.gguf")
```
I got this error:

llama_model_load: error loading model: missing tensor 'output.weight'

obviously the output layer has not been convert but why? Does anyone have an idea

### First Bad Commit

_No response_

### Relevant log output

```shell
nt_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = -1
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 1024
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 0.1B
print_info: model params     = 124.44 M
print_info: general.name     = Gpt2
print_info: vocab type       = BPE
print_info: n_vocab          = 50257
print_info: n_merges         = 50000
print_info: BOS token        = 50256 '<|endoftext|>'
print_info: EOS token        = 50256 '<|endoftext|>'
print_info: EOT token        = 50256 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 50256 '<|endoftext|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device CUDA0
load_tensors: layer   1 assigned to device CUDA0
load_tensors: layer   2 assigned to device CUDA0
load_tensors: layer   3 assigned to device CUDA0
load_tensors: layer   4 assigned to device CUDA0
load_tensors: layer   5 assigned to device CUDA0
load_tensors: layer   6 assigned to device CUDA0
load_tensors: layer   7 assigned to device CUDA0
load_tensors: layer   8 assigned to device CUDA0
load_tensors: layer   9 assigned to device CUDA0
load_tensors: layer  10 assigned to device CUDA0
load_tensors: layer  11 assigned to device CUDA0
load_tensors: layer  12 assigned to device CUDA0
llama_model_load: error loading model: missing tensor 'output.weight'
llama_model_load_from_file_impl: failed to load model
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPT2: llama_model_load: error loading model: missing tensor 'output.weight' #12567

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPT2: llama_model_load: error loading model: missing tensor 'output.weight' #12567

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions