Skip to content

GPT2: llama_model_load: error loading model: missing tensor 'output.weight' #12567

@UncleBen420

Description

@UncleBen420

Name and Version

build/bin/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
version: 4945 (9b169a4)
built with cc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-23) for x86_64-redhat-linux

Operating systems

Linux

GGML backends

CUDA

Hardware

RTX 4090

Models

GPT2LMHeadModel

Problem description & steps to reproduce

when I convert the basic GPT2LMHeadModel using the convert_hf_to_gguf.py script. it work perfectly but then when I load it using:

from llama_cpp import Llama
llama = Llama("model.path.gguf")

I got this error:

llama_model_load: error loading model: missing tensor 'output.weight'

obviously the output layer has not been convert but why? Does anyone have an idea

First Bad Commit

No response

Relevant log output

nt_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = -1
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 1024
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 0.1B
print_info: model params     = 124.44 M
print_info: general.name     = Gpt2
print_info: vocab type       = BPE
print_info: n_vocab          = 50257
print_info: n_merges         = 50000
print_info: BOS token        = 50256 '<|endoftext|>'
print_info: EOS token        = 50256 '<|endoftext|>'
print_info: EOT token        = 50256 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 50256 '<|endoftext|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device CUDA0
load_tensors: layer   1 assigned to device CUDA0
load_tensors: layer   2 assigned to device CUDA0
load_tensors: layer   3 assigned to device CUDA0
load_tensors: layer   4 assigned to device CUDA0
load_tensors: layer   5 assigned to device CUDA0
load_tensors: layer   6 assigned to device CUDA0
load_tensors: layer   7 assigned to device CUDA0
load_tensors: layer   8 assigned to device CUDA0
load_tensors: layer   9 assigned to device CUDA0
load_tensors: layer  10 assigned to device CUDA0
load_tensors: layer  11 assigned to device CUDA0
load_tensors: layer  12 assigned to device CUDA0
llama_model_load: error loading model: missing tensor 'output.weight'
llama_model_load_from_file_impl: failed to load model

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions