examples : remove `finetune` and `train-text-from-scratch` #8669

ngxson · 2024-07-24T14:15:43Z

Ref:

These examples are no longer working and require too much efforts to maintain. Therefore, they need to be removed.

It's always sad to say goodbye, but we need to move on... (let's hope that we can bring it back one day)

NOTE: This PR also contains a small correction for export-lora/README

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

JohannesGaessler · 2024-07-24T14:32:21Z

My immediate next priority will be to start working on training in ggml/llama.cpp. Though whether or not these examples are removed on master doesn't matter since they will still be in the git history. I definitely have a strong preference for not having broken code on master since that just needlessly clogs up the issues.

ngxson · 2024-07-24T14:38:33Z

Good to hear that @JohannesGaessler ! Yeah I also bookmarked some PR when things get removed.

I think that if you're rewriting the training or training / finetune examples, probably it's easier to start from a blank file instead of modifying the existing code (correct me if I'm wrong). I did that way when I rewrote export-lora. Too many things changed, so the old code base had very little interest to me.

jboero · 2024-07-25T11:59:29Z

Sad to see these were just removed. I used train-text-from-scratch in a few talks to show basics of how it works. Thanks @JohannesGaessler let me know if I can help in any way. I don't think train and finetune examples ever properly supported GPU usage anyway.

JohannesGaessler · 2024-07-25T12:06:26Z

What are your qualifications relevant to training?

jboero · 2024-07-25T12:12:16Z

What are your qualifications relevant to training?

I'm fluent in CUDA/OpenCL and know the codebase pretty well including the build automation. I may be most helpful as QA/tester if you need one. I also have a pretty good lab of various hardware devices for testing.

JohannesGaessler · 2024-07-25T13:31:23Z

One way that you could definitely help would be with code/PR review. As of right now it is kind of a bottleneck. I myself am part of the problem since I am mostly just working on llama.cpp/ggml as a hobby in my free time when I feel like it and my motivation to do things myself is simply much higher than my motivation to review the things that other people did.

jboero · 2024-07-25T13:53:04Z

That I'm happy to do! 👍

Anish-Mahambare · 2024-07-25T14:34:02Z

We can still use llama-finetune right?

ngxson · 2024-07-25T14:44:32Z

We can still use llama-finetune right?

llama-finetune produces gguf with lora_a not transposed, so output won't be accepted by llama-cli after #8332

In short: for now, use PEFT python if you want to finetune a model.

Anish-Mahambare · 2024-07-25T14:45:52Z

can PEFT python train a gguf model?

ngxson · 2024-07-25T14:49:50Z

can PEFT python train a gguf model?

I don't know, but it's always easier with base model from HF (transformers library)

Some people want to use llama-finetune because they think that it accepts quantized gguf. But in reality, you need f16 or f32 model, and at that point it's easier to take the original model from HF.

Also, llama-finetune does not support QLoRA. It exists for demo'ing ability to use ggml for finetuning LLM, so don't expect it to be performance or efficient (at least for now)

teleprint-me · 2024-07-26T03:47:19Z

I don't even have the words for all feels I'm feeling right now over this. I've been taking a ton of time attempting to learn Vulkan on the side so I could integrate it into this and it's just removed because no one updated it. That's just sad. IMHO, this has so much more value than converting models and has so much more potential in the long term. This is a short-sighted removal.

teleprint-me · 2024-07-27T02:01:57Z

I've been playing around with the code all evening and I have been unable to reproduce the reported reason for the removal.

The only thing I did notice was that the updates broke compatibility with the older pre-trained models. Other than that, it's working fine with the latest master branch.

I'm currently pre-training a model with one of my custom datasets and it's operating as expected.

Screencast.from.2024-07-26.21-55-50.webm

lin72h · 2024-07-27T11:23:44Z

I can confirm at least for llama2 model, the finetune function works pretty well

…#8669) * examples : remove finetune and train-text-from-scratch * fix build * update help message * fix small typo for export-lora

mounta11n · 2024-07-27T21:23:58Z

That's not cool! I used the “finetune” quite often. I can't believe it's being removed now. I also used the “train-from-scratch” a lot for educational purposes (for myself). Thank you @JohannesGaessler for all the work you have done and for the additional knowledge I have been able to gather thanks to your work (finetune and train from scratch). i have spent many valuable hours with these two programs.

JohannesGaessler · 2024-07-27T21:25:54Z

It's not me that made finetune/train-from-scratch. So far I have never touched the training code.

mounta11n · 2024-07-27T21:30:08Z

Oh, where did I get that assumption from? Well okay, there's no harm in thanking someone “for no reason” :D
Otherwise, thanks to the person or persons who put all the efforts into these two programs. : )

mounta11n · 2024-07-27T21:34:12Z

just to clarify. with "no reason" i meant in this narrow context. at least i know you are one of the main devs in this project.

teleprint-me · 2024-07-28T01:30:32Z

I'm attempting to work on it. I ran into a weird issue with it with one of the latest commits. Still haven't narrowed it down, though.

short output

ggml_vk_create_queue()
ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(243539968)
ggml_vulkan memory: ggml_vk_host_malloc(243540000)
ggml_vk_create_buffer(AMD Radeon RX 7600 XT (RADV NAVI33), 243540000, { HostVisible | HostCoherent | HostCached }, { HostVisible | HostCoherent })
print_params: n_vocab: 32768
print_params: n_ctx:   256
print_params: n_embd:  256
print_params: n_head:  8
print_params: n_ff:    768
print_params: n_layer: 16
print_params: n_rot:   32
main: total train_iterations 0
main: seen train_samples     0
main: seen train_tokens      0
main: completed train_epochs 0
main: model_size = 243648192 bytes (232.4 MB)
main: opt_size  = 365006976 bytes (348.1 MB)
main: opt iter 0
ggml_vk_get_device(0)
ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(536887296)
ggml_vulkan memory: ggml_vk_host_malloc(536887328)
ggml_vk_create_buffer(AMD Radeon RX 7600 XT (RADV NAVI33), 536887328, { HostVisible | HostCoherent | HostCached }, { HostVisible | HostCoherent })
main: input_size = 536887328 bytes (512.0 MB)
ggml_vk_get_device(0)
/mnt/valerie/forked/ggerganov/llama.cpp/ggml/src/ggml.c:17147: GGML_ASSERT(replacements->set.keys[k] == NULL) failed
ptrace: Operation not permitted.
No stack.
The program is not being run.
[1]    152976 IOT instruction (core dumped)  ./build/bin/llama-train-text-from-scratch --vocab-model  --ctx 256 --embd 256

I find it ironic that I finally figured this out and this happens right around the timing of this PR. lol. I have to laugh. Otherwise, I'll cry. All those hours spent towards getting here.

full output

21:20:25 | /mnt/valerie/forked/ggerganov/llama.cpp
(.venv) git:(mistral.cpp | Δ) λ ./build/bin/llama-train-text-from-scratch \
    --vocab-model models/ggml-vocab-mistral.gguf \
    --ctx 256 --embd 256 --head 8 --layer 16 \
    --checkpoint-in  /mnt/valerie/models/teleprint-me/valerie/v0.3/chk-valerie-v0.3-256x16-LATEST.gguf \
    --checkpoint-out /mnt/valerie/models/teleprint-me/valerie/v0.3/chk-valerie-v0.3-256x16-ITERATION.gguf \
    --model-out /mnt/valerie/models/teleprint-me/valerie/v0.3/ggml-valerie-v0.3-256x16-f16-ITERATION.gguf \
    --train-data "/mnt/valerie/datasets/valerie/cyberpunk/wiki/wiki-combined.md" \
    -t 16 -b 16 --seed 1 --adam-iter 2000 \
    --save-every 250 --n-gpu-layers 17
main: seed: 1
llama_model_loader: loaded meta data with 32 key-value pairs and 0 tensors from models/ggml-vocab-mistral.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Mistral 7B Instruct v0.3
llama_model_loader: - kv   3:                            general.version str              = v0.3
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Mistral
llama_model_loader: - kv   6:                         general.size_label str              = 7B
llama_model_loader: - kv   7:                            general.license str              = apache-2.0
llama_model_loader: - kv   8:                          llama.block_count u32              = 32
llama_model_loader: - kv   9:                       llama.context_length u32              = 32768
llama_model_loader: - kv  10:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                          general.file_type u32              = 1
llama_model_loader: - kv  17:                           llama.vocab_size u32              = 32768
llama_model_loader: - kv  18:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  19:            tokenizer.ggml.add_space_prefix bool             = true
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,32768]   = ["<unk>", "<s>", "</s>", "[INST]", "[...
llama_model_loader: - kv  23:                      tokenizer.ggml.scores arr[f32,32768]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,32768]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  25:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  26:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  27:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  28:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  29:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  30:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv  31:               general.quantization_version u32              = 2
llm_load_vocab: special tokens cache size = 771
llm_load_vocab: token to piece cache size = 0.1731 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32768
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 1
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 0.00 K
llm_load_print_meta: model size       = 0.00 MiB (-nan BPW) 
llm_load_print_meta: general.name     = Mistral 7B Instruct v0.3
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 781 '<0x0A>'
llm_load_print_meta: max token length = 48
llama_model_load: vocab only - skipping tensors
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 0.0
llama_new_context_with_model: freq_scale = 1
main: init model
gguf_init_from_file: failed to open '/mnt/valerie/models/teleprint-me/valerie/v0.3/chk-valerie-v0.3-256x16-LATEST.gguf': 'No such file or directory'
ggml_vk_instance_init()
ggml_vulkan: Found 1 Vulkan devices:
ggml_vk_print_gpu_info(0)
Vulkan0: AMD Radeon RX 7600 XT (RADV NAVI33) (radv) | uma: 0 | fp16: 1 | warp size: 64
ggml_vk_get_device(0)
Initializing new vk_device
ggml_vk_find_queue_family_index()
ggml_vk_find_queue_family_index()
ggml_vk_create_queue()
ggml_vk_load_shaders(AMD Radeon RX 7600 XT (RADV NAVI33))
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_l, main, 3, 56, (128,128,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_m, main, 3, 56, (64,64,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_s, main, 3, 56, (32,32,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_f16_l, main, 3, 56, (128,128,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_f16_m, main, 3, 56, (64,64,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_f16_s, main, 3, 56, (32,32,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_f16_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_f16_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f32_f16_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_l, main, 3, 56, (128,128,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_m, main, 3, 56, (64,64,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_s, main, 3, 56, (32,32,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_f32_l, main, 3, 56, (128,128,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_f32_m, main, 3, 56, (64,64,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_f32_s, main, 3, 56, (32,32,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_f16_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_0_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_0_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_0_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_0_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_0_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_0_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_1_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_1_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_1_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_1_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_1_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_1_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_0_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_0_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_0_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_0_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_0_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_0_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_1_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_1_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_1_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_1_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_1_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_1_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q8_0_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q8_0_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q8_0_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q8_0_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q8_0_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q8_0_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q2_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q2_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q2_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q2_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q2_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q2_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q3_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q3_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q3_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q3_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q3_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q3_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q4_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q5_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q6_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q6_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q6_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q6_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q6_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_q6_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_iq4_nl_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_iq4_nl_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_iq4_nl_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_iq4_nl_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_iq4_nl_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_iq4_nl_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f32_l, main, 4, 52, (128,128,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f32_m, main, 4, 52, (64,64,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f32_s, main, 4, 52, (32,32,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_l, main, 4, 52, (128,128,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_m, main, 4, 52, (64,64,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_s, main, 4, 52, (32,32,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_f32_l, main, 4, 52, (128,128,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_f32_m, main, 4, 52, (64,64,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_f32_s, main, 4, 52, (32,32,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_f16_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_0_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_0_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_0_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_0_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_0_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_0_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_1_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_1_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_1_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_1_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_1_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_1_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_0_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_0_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_0_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_0_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_0_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_0_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_1_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_1_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_1_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_1_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_1_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_1_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q8_0_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q8_0_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q8_0_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q8_0_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q8_0_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q8_0_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q2_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q2_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q2_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q2_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q2_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q2_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q3_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q3_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q3_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q3_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q3_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q3_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q4_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q5_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q6_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q6_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q6_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q6_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q6_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_q6_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_iq4_nl_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_iq4_nl_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_iq4_nl_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_iq4_nl_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_iq4_nl_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), matmul_id_iq4_nl_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_f32_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_f16_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q4_0_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q4_1_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q5_0_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q5_1_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q8_0_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q2_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q3_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q4_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q5_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q6_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_iq4_nl_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_f32_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_f16_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q4_0_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q4_1_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q5_0_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q5_1_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q8_0_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q2_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q3_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q4_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q5_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_q6_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_iq4_nl_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_f32_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_f16_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q4_0_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q4_1_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q5_0_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q5_1_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q8_0_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q2_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q3_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q4_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q5_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_q6_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_id_iq4_nl_f32, main, 4, 36, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), f32_to_f16, main, 2, 20, (4096,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q4_0, main, 2, 20, (4096,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q4_1, main, 2, 20, (4096,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q5_0, main, 2, 20, (4096,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q5_1, main, 2, 20, (4096,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q8_0, main, 2, 20, (4096,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q2_k, main, 2, 20, (16384,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q3_k, main, 2, 20, (16384,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q4_k, main, 2, 20, (8192,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q5_k, main, 2, 20, (16384,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_q6_k, main, 2, 20, (16384,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), dequant_iq4_nl, main, 2, 20, (4096,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_f32, main, 3, 112, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_f16, main, 3, 112, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q4_0, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q4_1, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q5_0, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q5_1, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q8_0, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_iq4_nl, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_f32_f32, main, 3, 112, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_f16_f32, main, 3, 112, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q4_0_f32, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q4_1_f32, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q5_0_f32, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q5_1_f32, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_q8_0_f32, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), get_rows_iq4_nl_f32, main, 3, 112, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), split_k_reduce, main, 2, 8, (256,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_p021_f16_f32, main, 3, 24, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_mat_vec_nc_f16_f32, main, 3, 28, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), norm_f32, main, 2, 16, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), rms_norm_f32, main, 2, 16, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), cpy_f32_f32, main, 2, 80, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), cpy_f32_f16, main, 2, 80, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), cpy_f16_f16, main, 2, 80, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), add_f32, main, 3, 112, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), mul_f32, main, 3, 112, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), div_f32, main, 3, 112, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), scale_f32, main, 2, 80, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), sqr_f32, main, 2, 80, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), clamp_f32, main, 2, 80, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), gelu_f32, main, 2, 16, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), silu_f32, main, 2, 16, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), relu_f32, main, 2, 16, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), diag_mask_inf_f32, main, 2, 12, (512,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), soft_max_f32, main, 3, 28, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), soft_max_f32_f16, main, 3, 28, (1,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), rope_norm_f32, main, 4, 44, (1,512,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), rope_norm_f16, main, 4, 44, (1,512,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), rope_neox_f32, main, 4, 44, (1,512,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), rope_neox_f16, main, 4, 44, (1,512,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), argsort_f32, main, 2, 12, (1024,1,1), specialization_constants, 1)
ggml_vk_create_pipeline(AMD Radeon RX 7600 XT (RADV NAVI33), sum_rows_f32, main, 2, 16, (1,1,1), specialization_constants, 1)
ggml_vk_create_queue()
ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(243539968)
ggml_vulkan memory: ggml_vk_host_malloc(243540000)
ggml_vk_create_buffer(AMD Radeon RX 7600 XT (RADV NAVI33), 243540000, { HostVisible | HostCoherent | HostCached }, { HostVisible | HostCoherent })
print_params: n_vocab: 32768
print_params: n_ctx:   256
print_params: n_embd:  256
print_params: n_head:  8
print_params: n_ff:    768
print_params: n_layer: 16
print_params: n_rot:   32
main: total train_iterations 0
main: seen train_samples     0
main: seen train_tokens      0
main: completed train_epochs 0
main: model_size = 243648192 bytes (232.4 MB)
main: opt_size  = 365006976 bytes (348.1 MB)
main: opt iter 0
ggml_vk_get_device(0)
ggml_vulkan memory: ggml_backend_vk_host_buffer_type_alloc_buffer(536887296)
ggml_vulkan memory: ggml_vk_host_malloc(536887328)
ggml_vk_create_buffer(AMD Radeon RX 7600 XT (RADV NAVI33), 536887328, { HostVisible | HostCoherent | HostCached }, { HostVisible | HostCoherent })
main: input_size = 536887328 bytes (512.0 MB)
ggml_vk_get_device(0)
/mnt/valerie/forked/ggerganov/llama.cpp/ggml/src/ggml.c:17147: GGML_ASSERT(replacements->set.keys[k] == NULL) failed
ptrace: Operation not permitted.
No stack.
The program is not being run.
[1]    152976 IOT instruction (core dumped)  ./build/bin/llama-train-text-from-scratch --vocab-model  --ctx 256 --embd 256

If I can get this working and there's concensus, I can open a PR with the updates and upgrades to the train text from scratch. I haven't used the finetune yet, but I could take a peek at it too.

I've been saying this for awhile, I am interested in training and fine-tuning with this framework. Lot of low hanging fruit here with this. I have so many ideas, but I'm one person with limited time and resources. Can only do so much at once.

@mounta11n I believe @xaedes is the original author of the training and tuning code. It's in the history for the commits.

If anyone is wondering, this is me offering to maintain it when I can.

jboero · 2024-08-05T18:10:53Z

ptrace: Operation not permitted.

It would probably help if you can enable ptrace on your system. Or if you can step through it in a debugger. Any update to your drivers or Vulkan stack in the same timeframe?

teleprint-me · 2024-08-07T18:58:10Z

ptrace is not safe, so I intentionally leave it disabled.

You can enable debugging with the build and step through with gdb which is what I've been doing.

cmake -B build -DCMAKE_BUILD_TYPE=Debug -DGGML_VULKAN=1 -DGGML_VULKAN_DEBUG=0 -DLLAMA_CURL=0 -DGGML_CCACHE=0
cmake --build build --config Debug -j 16

I haven't worked on it since this post, but the backtrace is simple enough.

gdb -ex=run --args build/bin/llama-train-text-from-scratch \
    --vocab-model models/ggml-vocab-mistral.gguf \
    --ctx 256 --embd 256 --head 8 --layer 16 \
    --checkpoint-in  /mnt/valerie/models/teleprint-me/valerie/v0.3/chk-valerie-v0.3-256x16-LATEST.gguf \
    --checkpoint-out /mnt/valerie/models/teleprint-me/valerie/v0.3/chk-valerie-v0.3-256x16-ITERATION.gguf \
    --model-out /mnt/valerie/models/teleprint-me/valerie/v0.3/ggml-valerie-v0.3-256x16-f16-ITERATION.gguf \
    --train-data "/mnt/valerie/datasets/valerie/cyberpunk/wiki/wiki-combined.md" \
    -t 16 -b 8 --seed 1 --adam-iter 50 \
    --save-every 10 --n-gpu-layers 17

Note that this issue has nothing to do with vulkan. It will trigger with a CPU only build.

Running the backtrace with gdb is super simple.

0x00007ffff6ea53f4 in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff6ea53f4 in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff6e4c120 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff6e334c3 in abort () from /usr/lib/libc.so.6
#3  0x00007ffff74abfec in ggml_abort (file=0x7ffff763e1a8 "/mnt/valerie/forked/ggerganov/llama.cpp/ggml/src/ggml.c", line=17147, 
    fmt=0x7ffff763e436 "GGML_ASSERT(%s) failed") at /mnt/valerie/forked/ggerganov/llama.cpp/ggml/src/ggml.c:207
#4  0x00007ffff74e598a in ggml_build_backward_gradient_checkpointing (ctx=0x7ffff78f46c8 <g_state+296>, gf=0x7fffe8f99030, gb=0x7fffe903a0c0, 
    gb_tmp=0x7fffe90db150, checkpoints=0x555555876e10, n_checkpoints=26) at /mnt/valerie/forked/ggerganov/llama.cpp/ggml/src/ggml.c:17147
#5  0x00005555555bb16a in llama_build_train_graphs (model=0x7fffffffd690, alloc=0x5555562a8870, ctx=0x7ffff78f46c8 <g_state+296>, gf=0x7fffe8f99030, 
    gb=0x7fffe903a0c0, gb_tmp=0x7fffe90db150, logits=0x7fffffffd420, tokens_input=0x55555584b4c0, targets=0x55555584b630, n_tokens=256, n_batch=8, 
    enable_flash_attn=false, enable_checkpointing=true, measure_only=true)
    at /mnt/valerie/forked/ggerganov/llama.cpp/examples/train-text-from-scratch/train-text-from-scratch.cpp:402
#6  0x00005555555c07b9 in main (argc=31, argv=0x7fffffffda98)
    at /mnt/valerie/forked/ggerganov/llama.cpp/examples/train-text-from-scratch/train-text-from-scratch.cpp:1117
(gdb)

What I was able to discover was that commit 2b1f616 is where the breaking change occurs and that's because @slaren made some changes to the backend which seems to have affected the computational graph (the "dag"). I've done some cursory investigation into it, but put it on pause for a bit.

I've been taking some time off due to stress and personal/financial issues (seems counter-intuitive, but I won't be much use to anyone, including myself, if I burn out).

I'm also busy with gig work which is why I usually go MIA from time-to-time. I always check-in when I can, though.

WilliamTambellini · 2024-09-11T22:17:49Z

@ggerganov would you consider a PR if we fix/revamp finetune at least for llama2 and/or 3 ?

JohannesGaessler · 2024-09-12T06:50:12Z

Be aware that I'm currently working on training in general and that the API may change: ggerganov/ggml#949

ggerganov · 2024-09-12T07:00:18Z

I think given the ongoing work by @JohannesGaessler for general-purpose training capabilities it's better to not resurrect these examples yet and instead to wait for the new API and functionality to settle. Otherwise, we'll be dealing with too many conflicts and bug reports.

dgm3333 · 2024-11-03T06:57:36Z

Can I put in a large vote for the replacement finetuning example - I use this a lot and have just come back to do a new build to discover it's disappeared.
I would rather have a slow CPU only process which didn't work for Lllama3 (as implied in the PR which killed it) than nothing while waiting for an update!
Especially if as implied above @teleprint-me potentially resolved it.
BTW teleprint-me - do you have a branch with finetune working currently as I would like to build that for the moment - otherwise I will just stick with my old .exes
There are now a huge range of random llama examples, but this core functionality has been lost :-(

Incidentally shouldn't a PR removing a major feature have a bit more documented justification than this - it just points to this thread which doesn't appear have any conclusive support for removing it as not everyone could replicate the stated issues...
be6d7c0 it doesn't even point to what seems to be the original bug report #7643

I particularly love that the recommended "alternative" is to "use PEFT python if you want to finetune a model." This seems to be a python library which doesn't even function without someone learning about how the library functions and scratch writing a python application around it.

JohannesGaessler · 2024-11-03T09:08:52Z

The new GGML training code is mostly functional, see ggerganov/ggml#988 . The ETA for re-adding llama.cpp training support is 1-2 months I think.

teleprint-me · 2024-11-04T01:51:49Z

To be honest, I lost complete interest once this was removed and saw that there was little interest in keeping it around. I've reviewed the code and have read the docs, specifications, and codebase as a whole and one thing that @ngxson got right is that it is extremely complicated. It's made even more complicated now due to the fact that the original training code was removed.

I really appreciate the work that @JohannesGaessler is doing because it takes a certain level of dedication and persistence to keep pushing this forward. I understand why MNIST is being used, it's simpler. Much simpler. That's why my initial goal was to do an XOR model which is probably the simplest model you can create.

Due to the level of complexity involved which I recognize is due to the vast amount of support for multiple models, CPU, and GPU architectures, I decided to do something else instead. I rewound to the earliest commit possible for llama.cpp and began working on the original code that @ggerganov implemented which is way easier to parse and understand because it's only for LLaMa 1.

I have been working on this code for the past week and ended up creating my own model format in the process. I didn't originally intend to do this, but it made parsing the model file much simpler, coherent, and consistent without concerning myself with all of the complexity involved in the current code base.

I realized that there are issues with the LLaMa 1 tokenizer.model - it's possible I did something wrong, but the issues were consistent enough to prompt me to digress and focus completely on Mistral.

I chose Mistral for a variety of reasons as it uses a lot of modern features, the tokenizer.model is great and covers most use cases, and has probably the most permissive license as far as open models go. So, even though it is more complex as an architecture, I think I'm going to focus on the v0.1 model for now.

Once I get inference working on CPU, I'm going to implement a completely custom Vulkan only backend due to the fact that I need one in C for other purposes that I'm not ready to share. Even though @0cc4m has done amazing work with Vulkan backend, it's using C++ and they're already comfortable and familiar with the Vulkan API. After working on the fundementals in C, I was able to better understand the API as a result, so I think this path is useful for me as a learning experience.

So, my overarching goal of training and fine-tuning a LLM from scratch remains the same. My rationale for this is so that I can learn the fundamentals as I progress. I've noticed every time I do this and return to the ggml and llama.cpp code, I better comprehend what's happening and why certain design choices were made. @slaren I am truly impressed with your work and how you handled the complexity so well.

My overall hope is that I can provide more meaningful contributions in the future. As a result, once I get the basics up and running, I'm hoping to be able to export the custom model formats back to the modern GGML format so that they're compatible with inference.

As of now, it's too early to share anything. I will respect the MIT license and will open source what I've done once it's ready. This will allow me to learn at my own pace. If anyone is genuinely interested in my approach (which is nothing new), I'll be willing to open source it earlier than I originally planned on. The reason I haven't shared is because I don't want to release something that causes issues for the developers here, e.g. causing unnecessary confusion.

Sorry if this was long, I've had a rough go at it the past few years and this year has been especially difficult for me, and I tend to become a recluse because it's less stressful for me. I tend to be happier when I'm coding on my own and ignore the world around me. Gives me a chance to recharge if that makes any sense.

As a teaser - mostly on a more positive note and for fun, here's what I got so far.

22:32:32 | /mnt/valerie/forked/ggerganov/llama.cpp
(.venv) git:(alt.cpp | Δ) λ python -m gguf.mistral-to-gguf -m models/mistral-1/7B -c "config.json" -t 1
Loading model part: pytorch_model-00001-of-00002.bin
Loading model part: pytorch_model-00002-of-00002.bin
FILE start marker starts at 0
FILE start marker ends at 4

Writing hyperparameters section...
Aligned offset with 28 bytes of padding.
FILE 0xdeadbeef starts at 32
FILE 0xdeadbeef ends at 40
FILE size of 68 starts at 40
FILE size of 68 ends at 48
Config start at 48
Config end at 112

Writing tokenizer section...
Aligned offset with 16 bytes of padding.
FILE 0xbaddcafe starts at 128
FILE 0xbaddcafe ends at 136
FILE size of 332690 starts at 136
FILE size of 332690 ends at 144
Tokenizer starts at 144
Tokenizer ends at 332834

Writing tensor section...
Aligned offset with 30 bytes of padding.
FILE 0xfacefeed starts at 332864
FILE 0xfacefeed ends at 332872
FILE size of 14483480854 starts at 332872
FILE size of 14483480854 ends at 332880
Tensor data starts at offset: 332880
Writing 291 tensors with a total of 517 shape elements
Processing model part 1
Processing model part 2
Tensor data ends at offset: 14483813718
Aligned offset with 10 bytes of padding.
FILE end marker starts at 14483813728
FILE end marker ends at 14483813732

Model successfully written to models/mistral-1/7B/ggml-model-f16.gguf

And I'm able to read the model in Python and in C.

20:35:42 | /mnt/valerie/forked/ggerganov/llama.cpp
(.venv) git:(alt.cpp | Δ) λ python -m gguf.validator models/mistral-1/7B/ggml-model-f16.gguf
Opened file models/mistral-1/7B/ggml-model-f16.gguf
Reading start marker: 0
Start marker read successfully: 0x67676d6c
Reading section marker: 4
Aligned offset with 28 bytes of padding.
Section marker: 0xdeadbeef, Size: 68
Reading model parameters...
context_length: 8192, hidden_size: 4096, num_hidden_layers: 32, intermediate_size: 14336, num_attention_heads: 32, num_key_value_heads: 8, sliding_window: 4096, rope_theta: 10000.0, rms_norm_eps: 9.999999747378752e-06, head_size: 128, dtype: 1
Reading section marker: 112
Aligned offset with 16 bytes of padding.
Section marker: 0xbaddcafe, Size: 332690
Reading tokenizer...
Vocab size: 32000
Special tokens - BOS ID: 1, EOS ID: 2, PAD ID: -1, UNK ID: 0
Token ID 0: <unk> (length 5)
Token ID 1: <s> (length 3)
Token ID 2: </s> (length 4)
Token ID 3: <0x00> (length 6)
Token ID 4: <0x01> (length 6)
Token ID 31996: 執 (length 3)
Token ID 31997: 벨 (length 3)
Token ID 31998: ゼ (length 3)
Token ID 31999: 梦 (length 3)
...truncated for brevity
Reading section marker: 332834
Aligned offset with 30 bytes of padding.
Section marker: 0xfacefeed, Size: 14483480854
Reading tensor data...
Tensor count: 291, Shape count: 517
Tensor 0: model.embed_tokens.weight, Shape: [32000, 4096], Dtype: 1
Tensor 1: model.layers.0.self_attn.q_proj.weight, Shape: [4096, 4096], Dtype: 1
Tensor 2: model.layers.0.self_attn.k_proj.weight, Shape: [1024, 4096], Dtype: 1
Tensor 3: model.layers.0.self_attn.v_proj.weight, Shape: [1024, 4096], Dtype: 1
Tensor 4: model.layers.0.self_attn.o_proj.weight, Shape: [4096, 4096], Dtype: 1
Tensor 5: model.layers.0.mlp.gate_proj.weight, Shape: [14336, 4096], Dtype: 1
Tensor 6: model.layers.0.mlp.up_proj.weight, Shape: [14336, 4096], Dtype: 1
Tensor 7: model.layers.0.mlp.down_proj.weight, Shape: [4096, 14336], Dtype: 1
Tensor 8: model.layers.0.input_layernorm.weight, Shape: [4096], Dtype: 1
Tensor 9: model.layers.0.post_attention_layernorm.weight, Shape: [4096], Dtype: 1
Tensor 282: model.layers.31.self_attn.v_proj.weight, Shape: [1024, 4096], Dtype: 1
Tensor 283: model.layers.31.self_attn.o_proj.weight, Shape: [4096, 4096], Dtype: 1
Tensor 284: model.layers.31.mlp.gate_proj.weight, Shape: [14336, 4096], Dtype: 1
Tensor 285: model.layers.31.mlp.up_proj.weight, Shape: [14336, 4096], Dtype: 1
Tensor 286: model.layers.31.mlp.down_proj.weight, Shape: [4096, 14336], Dtype: 1
Tensor 287: model.layers.31.input_layernorm.weight, Shape: [4096], Dtype: 1
Tensor 288: model.layers.31.post_attention_layernorm.weight, Shape: [4096], Dtype: 1
Tensor 289: model.norm.weight, Shape: [4096], Dtype: 1
Tensor 290: lm_head.weight, Shape: [32000, 4096], Dtype: 1
...truncated for brevity
Reading end marker: 14483813718
Aligned offset with 10 bytes of padding.
End marker read successfully: 0xfffffff
File closed.

Obviously, the 7B is massive, but I think it could theoretically be fine-tuned in 8-bit format, but I expect some issues to pop up once I get there. It's nice because the context window is smaller, 8192 according to the paper. I know my 16GB GPU should be able to handle this, in theory. If I succeed, this should have a positive impact for those of us with tighter budgets and less compute availability.

examples : remove finetune and train-text-from-scratch

16c889b

ngxson requested a review from ggerganov July 24, 2024 14:15

github-actions bot added nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment examples python python script changes devops improvements to build systems and github actions labels Jul 24, 2024

ngxson added 2 commits July 24, 2024 16:17

fix build

cedacf0

update help message

3c965b4

fix small typo for export-lora

38c7bf2

slaren approved these changes Jul 24, 2024

View reviewed changes

ggerganov approved these changes Jul 25, 2024

View reviewed changes

ngxson merged commit be6d7c0 into ggerganov:master Jul 25, 2024
55 checks passed

janchk mentioned this pull request Oct 21, 2024

Find common ground between what Jan is doing and what's happening in ggml with Training aifoundry-org/wiki#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples : remove `finetune` and `train-text-from-scratch` #8669

examples : remove `finetune` and `train-text-from-scratch` #8669

ngxson commented Jul 24, 2024 •

edited

Loading

JohannesGaessler commented Jul 24, 2024

ngxson commented Jul 24, 2024 •

edited

Loading

jboero commented Jul 25, 2024

JohannesGaessler commented Jul 25, 2024

jboero commented Jul 25, 2024 •

edited

Loading

JohannesGaessler commented Jul 25, 2024

jboero commented Jul 25, 2024

Anish-Mahambare commented Jul 25, 2024

ngxson commented Jul 25, 2024

Anish-Mahambare commented Jul 25, 2024

ngxson commented Jul 25, 2024

teleprint-me commented Jul 26, 2024 •

edited

Loading

teleprint-me commented Jul 27, 2024

lin72h commented Jul 27, 2024

mounta11n commented Jul 27, 2024

JohannesGaessler commented Jul 27, 2024

mounta11n commented Jul 27, 2024

mounta11n commented Jul 27, 2024

teleprint-me commented Jul 28, 2024 •

edited

Loading

jboero commented Aug 5, 2024 •

edited

Loading

teleprint-me commented Aug 7, 2024 •

edited

Loading

WilliamTambellini commented Sep 11, 2024

JohannesGaessler commented Sep 12, 2024

ggerganov commented Sep 12, 2024

dgm3333 commented Nov 3, 2024 •

edited

Loading

JohannesGaessler commented Nov 3, 2024

teleprint-me commented Nov 4, 2024

examples : remove finetune and train-text-from-scratch #8669

examples : remove finetune and train-text-from-scratch #8669

Conversation

ngxson commented Jul 24, 2024 • edited Loading

JohannesGaessler commented Jul 24, 2024

ngxson commented Jul 24, 2024 • edited Loading

jboero commented Jul 25, 2024

JohannesGaessler commented Jul 25, 2024

jboero commented Jul 25, 2024 • edited Loading

JohannesGaessler commented Jul 25, 2024

jboero commented Jul 25, 2024

Anish-Mahambare commented Jul 25, 2024

ngxson commented Jul 25, 2024

Anish-Mahambare commented Jul 25, 2024

ngxson commented Jul 25, 2024

teleprint-me commented Jul 26, 2024 • edited Loading

teleprint-me commented Jul 27, 2024

lin72h commented Jul 27, 2024

mounta11n commented Jul 27, 2024

JohannesGaessler commented Jul 27, 2024

mounta11n commented Jul 27, 2024

mounta11n commented Jul 27, 2024

teleprint-me commented Jul 28, 2024 • edited Loading

jboero commented Aug 5, 2024 • edited Loading

teleprint-me commented Aug 7, 2024 • edited Loading

WilliamTambellini commented Sep 11, 2024

JohannesGaessler commented Sep 12, 2024

ggerganov commented Sep 12, 2024

dgm3333 commented Nov 3, 2024 • edited Loading

JohannesGaessler commented Nov 3, 2024

teleprint-me commented Nov 4, 2024

examples : remove `finetune` and `train-text-from-scratch` #8669

examples : remove `finetune` and `train-text-from-scratch` #8669

ngxson commented Jul 24, 2024 •

edited

Loading

ngxson commented Jul 24, 2024 •

edited

Loading

jboero commented Jul 25, 2024 •

edited

Loading

teleprint-me commented Jul 26, 2024 •

edited

Loading

teleprint-me commented Jul 28, 2024 •

edited

Loading

jboero commented Aug 5, 2024 •

edited

Loading

teleprint-me commented Aug 7, 2024 •

edited

Loading

dgm3333 commented Nov 3, 2024 •

edited

Loading