Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use this model? #1717

Open
dzy1128 opened this issue Aug 30, 2024 · 2 comments
Open

How to use this model? #1717

dzy1128 opened this issue Aug 30, 2024 · 2 comments

Comments

@dzy1128
Copy link

dzy1128 commented Aug 30, 2024

llama_model_loader: loaded meta data with 32 key-value pairs and 219 tensors from /data/huggingface/hub/models--city96--t5-v1_1-xxl-encoder-gguf/snapshots/005a6ea51a7d0b84d677b3e633bb52a8c85a83d9/./t5-v1_1-xxl-encoder-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = t5encoder
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = T5 V1_1 Xxl
llama_model_loader: - kv 3: general.organization str = Google
llama_model_loader: - kv 4: general.finetune str = encoder-hf
llama_model_loader: - kv 5: general.basename str = t5-v1_1
llama_model_loader: - kv 6: general.size_label str = xxl
llama_model_loader: - kv 7: t5encoder.context_length u32 = 512
llama_model_loader: - kv 8: t5encoder.embedding_length u32 = 4096
llama_model_loader: - kv 9: t5encoder.feed_forward_length u32 = 10240
llama_model_loader: - kv 10: t5encoder.block_count u32 = 24
llama_model_loader: - kv 11: t5encoder.attention.head_count u32 = 64
llama_model_loader: - kv 12: t5encoder.attention.key_length u32 = 64
llama_model_loader: - kv 13: t5encoder.attention.value_length u32 = 64
llama_model_loader: - kv 14: t5encoder.attention.layer_norm_epsilon f32 = 0.000001
llama_model_loader: - kv 15: t5encoder.attention.relative_buckets_count u32 = 32
llama_model_loader: - kv 16: t5encoder.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 17: general.file_type u32 = 7
llama_model_loader: - kv 18: tokenizer.ggml.model str = t5
llama_model_loader: - kv 19: tokenizer.ggml.pre str = default
llama_model_loader: - kv 20: tokenizer.ggml.tokens arr[str,32128] = ["", "", "", "▁", "X"...
llama_model_loader: - kv 21: tokenizer.ggml.scores arr[f32,32128] = [0.000000, 0.000000, 0.000000, -2.012...
llama_model_loader: - kv 22: tokenizer.ggml.token_type arr[i32,32128] = [3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 23: tokenizer.ggml.add_space_prefix bool = true
llama_model_loader: - kv 24: tokenizer.ggml.remove_extra_whitespaces bool = true
llama_model_loader: - kv 25: tokenizer.ggml.precompiled_charsmap arr[u8,237539] = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,...
llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 1
llama_model_loader: - kv 27: tokenizer.ggml.unknown_token_id u32 = 2
llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 30: tokenizer.ggml.add_eos_token bool = true
llama_model_loader: - kv 31: general.quantization_version u32 = 2
llama_model_loader: - type f32: 50 tensors
llama_model_loader: - type q8_0: 169 tensors
llm_load_vocab: special tokens cache size = 3
llm_load_vocab: token to piece cache size = 0.2111 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = t5encoder
llm_load_print_meta: vocab type = UGM
llm_load_print_meta: n_vocab = 32128
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 512
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_layer = 24
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 64
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 4096
llm_load_print_meta: n_embd_v_gqa = 4096
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 10240
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = -1
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 512
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 4.76 B
llm_load_print_meta: model size = 4.71 GiB (8.50 BPW)
llm_load_print_meta: general.name = T5 V1_1 Xxl
llm_load_print_meta: EOS token = 1 ''
llm_load_print_meta: UNK token = 2 ''
llm_load_print_meta: PAD token = 0 ''
llm_load_print_meta: LF token = 3 '▁'
llm_load_print_meta: max token length = 20
llm_load_tensors: ggml ctx size = 0.10 MiB
llm_load_tensors: CPU buffer size = 4826.12 MiB
.................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 192.00 MiB
llama_new_context_with_model: KV self size = 192.00 MiB, K (f16): 96.00 MiB, V (f16): 96.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.12 MiB
llama_new_context_with_model: CPU compute buffer size = 234.00 MiB
llama_new_context_with_model: graph nodes = 845
llama_new_context_with_model: graph splits = 1
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Model metadata: {'tokenizer.ggml.eos_token_id': '1', 'general.quantization_version': '2', 'tokenizer.ggml.model': 't5', 'tokenizer.ggml.add_bos_token': 'false', 'tokenizer.ggml.remove_extra_whitespaces': 'true', 't5encoder.attention.layer_norm_rms_epsilon': '0.000001', 't5encoder.attention.relative_buckets_count': '32', 't5encoder.attention.layer_norm_epsilon': '0.000001', 'tokenizer.ggml.unknown_token_id': '2', 't5encoder.attention.value_length': '64', 'general.architecture': 't5encoder', 'general.file_type': '7', 't5encoder.context_length': '512', 't5encoder.feed_forward_length': '10240', 'tokenizer.ggml.padding_token_id': '0', 'general.basename': 't5-v1_1', 'tokenizer.ggml.pre': 'default', 'general.name': 'T5 V1_1 Xxl', 'general.finetune': 'encoder-hf', 't5encoder.attention.key_length': '64', 'general.type': 'model', 't5encoder.attention.head_count': '64', 'general.size_label': 'xxl', 'general.organization': 'Google', 't5encoder.embedding_length': '4096', 'tokenizer.ggml.add_eos_token': 'true', 'tokenizer.ggml.add_space_prefix': 'true', 't5encoder.block_count': '24'}
Using fallback chat format: llama-2
/tmp/pip-install-rx431hta/llama-cpp-python_78f9dcf2ce95424dbc2c1f7ebd107737/vendor/llama.cpp/src/llama.cpp:13908: GGML_ASSERT(lctx.is_encoding) failed
/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libggml.so(+0x1015b)[0x7f42c791a15b]
/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libggml.so(ggml_abort+0x15e)[0x7f42c791bd2e]
/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libllama.so(_ZN17llm_build_context16build_t5_encoderEv+0x11a0)[0x7f42c7b5ef90]
/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libllama.so(+0x80d92)[0x7f42c7ad7d92]
/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libllama.so(llama_decode+0x582)[0x7f42c7b35972]
/lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e)[0x7f42c7e45e2e]
/lib/x86_64-linux-gnu/libffi.so.8(+0x4493)[0x7f42c7e42493]
/usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0xa3e9)[0x7f42c7e603e9]
/usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x13302)[0x7f42c7e69302]
python3(_PyObject_MakeTpCall+0x25b)[0x55a5b37e152b]
python3(_PyEval_EvalFrameDefault+0x6f0b)[0x55a5b37da16b]
python3(_PyFunction_Vectorcall+0x7c)[0x55a5b37eb6ac]
python3(_PyEval_EvalFrameDefault+0x8cb)[0x55a5b37d3b2b]
python3(_PyFunction_Vectorcall+0x7c)[0x55a5b37eb6ac]
python3(_PyEval_EvalFrameDefault+0x8cb)[0x55a5b37d3b2b]
python3(+0x1785a2)[0x55a5b38085a2]
python3(_PyEval_EvalFrameDefault+0xac4)[0x55a5b37d3d24]
python3(+0x201a15)[0x55a5b3891a15]
python3(+0x15b909)[0x55a5b37eb909]
python3(_PyEval_EvalFrameDefault+0x6d5)[0x55a5b37d3935]
python3(+0x169251)[0x55a5b37f9251]
python3(PyObject_Call+0x122)[0x55a5b37f9f02]
python3(_PyEval_EvalFrameDefault+0x2a49)[0x55a5b37d5ca9]
python3(_PyFunction_Vectorcall+0x7c)[0x55a5b37eb6ac]
python3(PyObject_Call+0x122)[0x55a5b37f9f02]
python3(_PyEval_EvalFrameDefault+0x2a49)[0x55a5b37d5ca9]
python3(+0x169251)[0x55a5b37f9251]
python3(_PyEval_EvalFrameDefault+0x19b6)[0x55a5b37d4c16]
python3(+0x140096)[0x55a5b37d0096]
python3(PyEval_EvalCode+0x86)[0x55a5b38c5f66]
python3(+0x260e98)[0x55a5b38f0e98]
python3(+0x25a79b)[0x55a5b38ea79b]
python3(+0x260be5)[0x55a5b38f0be5]
python3(_PyRun_SimpleFileObject+0x1a8)[0x55a5b38f00c8]
python3(_PyRun_AnyFileObject+0x43)[0x55a5b38efd13]
python3(Py_RunMain+0x2be)[0x55a5b38e270e]
python3(Py_BytesMain+0x2d)[0x55a5b38b8dfd]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f42c8512d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f42c8512e40]
python3(_start+0x25)[0x55a5b38b8cf5]
Aborted (core dumped)

Has anyone else had this problem?

@dzy1128
Copy link
Author

dzy1128 commented Aug 30, 2024

from llama_cpp import Llama
llm = Llama(
model_path="/data/app/comfyui/ComfyUI/models/clip/t5/t5-v1_1-xxl-encoder-Q8_0.gguf",
chat_format="llama-2"
)
llm.create_chat_completion(
messages = [
{"role": "system", "content": "You are an assistant who perfectly describes images."},
{
"role": "user",
"content": "Describe this image in detail please."
}
]
)

@ayttop
Copy link

ayttop commented Sep 3, 2024

flux1-dev????????????

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants