Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging tensors of larger models #1

Closed
kir-gadjello opened this issue Mar 10, 2023 · 4 comments
Closed

Merging tensors of larger models #1

kir-gadjello opened this issue Mar 10, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@kir-gadjello
Copy link
Contributor

Currently, only LLaMA-7B is supported since I haven't figured out how to merge the tensors of the bigger models. However, in theory, you should be able to run 65B on a 64GB MacBook

It shouldn't be hard to merge tensors with my https://github.com/kir-gadjello/zipslicer library, but it's pure Python! If you want to keep the project pure C++ you might want to write a standalone gist script that uses zipslicer to unpack weight shards into binary files.

@ggerganov
Copy link
Owner

Thanks! The bigger problem now is that I am out of disk space, haha!
Anyway, will try to figure out something later

@theontho
Copy link

Leave a tip jar to get a @ggerganov bigger SSD and / or macbook :D

@eous
Copy link

eous commented Mar 11, 2023

Its kinda pointless now but I was able to merge the 30B and 65B with this core bit of hackery added to the convert script.

+    fname_model = sys.argv[1] + "/consolidated." + str(i).zfill(2) + ".pth"
+    model_i = torch.load(fname_model, map_location="cpu")
+    
+    # Since the models are split, we need to append the tensors changing the shape/size
+    for k, v in model_i.items():
+        if k in model:
+            if model[k].dtype != v.dtype:
+                print("ERROR: Tensor types do not match: ", model[k].dtype, " vs ", v.dtype)
+                sys.exit(1)
+            elif len(model[k].shape) == 1:
+                print("Skipping tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                continue
+            elif k == "output.weight":
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                model[k] = torch.cat((model[k], v), dim=0)
+                print("New shape: ", model[k].shape)                
+                continue
+            elif "tok_embeddings" in k:
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                model[k] = torch.cat((model[k], v), dim=1)
+                print("New shape: ", model[k].shape)
+                continue
+            elif "attention.wo" in k:
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                model[k] = torch.cat((model[k], v), dim=1)
+                print("New shape: ", model[k].shape)
+                continue
+            elif "feed_forward.w2" in k:
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                model[k] = torch.cat((model[k], v), dim=1)
+                print("New shape: ", model[k].shape)
+            else:
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype, " with shape: ", model[k].shape)
+                model[k] = torch.cat((model[k], v), dim=0)
+                print("New shape: ", model[k].shape)
+        else:
+            print("Adding tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+            model[k] = v
+    del model_i```

@ggerganov
Copy link
Owner

Fixed with 007a8f6

On startup, we go through all the parts and merge them dynamically in the ggml buffers.

@gjmulder gjmulder added the enhancement New feature or request label Mar 15, 2023
nemtos pushed a commit to nemtos/llama.cpp that referenced this issue Apr 9, 2023
…l-instead-of-wget-1

Update command for downloading the weights to use `curl`

`curl` is preinstalled on macOS and the new command is equivalent to the `wget` version but avoids having to install `wget`.
This should save people some time.
mqy added a commit to mqy/llama.cpp that referenced this issue May 26, 2023
mqy added a commit to mqy/llama.cpp that referenced this issue May 26, 2023
mqy added a commit to mqy/llama.cpp that referenced this issue May 29, 2023
mqy added a commit to mqy/llama.cpp that referenced this issue May 31, 2023
broken change: delete original profile ggerganov#1 from q_f32 profiles
syoyo pushed a commit to syoyo/llama.cpp that referenced this issue May 31, 2023
mqy added a commit to mqy/llama.cpp that referenced this issue Jun 4, 2023
broken change: delete original profile ggerganov#1 from q_f32 profiles
rooprob pushed a commit to rooprob/llama.cpp that referenced this issue Aug 2, 2023
funnbot pushed a commit to funnbot/llama.cpp that referenced this issue Aug 8, 2023
* kquants_iter for hipblas and add gfx803
* Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16
* remove dmmv_f16 for now
dranger003 pushed a commit to dranger003/llama.cpp that referenced this issue Apr 8, 2024
HanClinto pushed a commit to HanClinto/llama.cpp that referenced this issue Jun 10, 2024
Oliver-Y added a commit to Oliver-Y/llama.cpp that referenced this issue Jul 23, 2024
* a chinese word formed of 3 chinese charcters but the first 2 is not word

* tokenizer-fix

* E5 Pretokenizer bugfix

* whitespace fix

* remove extra wpm

---------

Co-authored-by: Mike Fan <[email protected]>
Co-authored-by: Oliver Ye <[email protected]>
cunnie added a commit to cunnie/llama.cpp that referenced this issue Aug 3, 2024
When `llama-batched-bench` is invoked _without_ setting `-npl`, "number
of parallel prompts", it segfaults.

The segfault is caused by invoking `max_element()` on a zero-length
vector, `n_pl`

This commit addresses that by first checking to see if the number of
parallel prompts is zero, and if so sets the maximum sequence size to 1;
otherwise, sets it to the original, the result of `max_element()`.

Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf`

```
* thread ggerganov#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28
   69  	    llama_context_params ctx_params = llama_context_params_from_gpt_params(params);
   70
   71  	    // ensure enough sequences are available
-> 72  	    ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end());
```
ggerganov added a commit that referenced this issue Aug 4, 2024
* [example] batched-bench "segmentation fault"

When `llama-batched-bench` is invoked _without_ setting `-npl`, "number
of parallel prompts", it segfaults.

The segfault is caused by invoking `max_element()` on a zero-length
vector, `n_pl`

This commit addresses that by first checking to see if the number of
parallel prompts is zero, and if so sets the maximum sequence size to 1;
otherwise, sets it to the original, the result of `max_element()`.

Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf`

```
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28
   69  	    llama_context_params ctx_params = llama_context_params_from_gpt_params(params);
   70
   71  	    // ensure enough sequences are available
-> 72  	    ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end());
```

* Update examples/batched-bench/batched-bench.cpp

Co-authored-by: compilade <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: compilade <[email protected]>
ggerganov pushed a commit that referenced this issue Aug 6, 2024
@slaren slaren mentioned this issue Aug 15, 2024
4 tasks
jeroen-mostert pushed a commit to jeroen-mostert/llama.cpp that referenced this issue Aug 30, 2024
jeroen-mostert pushed a commit to jeroen-mostert/llama.cpp that referenced this issue Aug 30, 2024
ykhrustalev referenced this issue in ykhrustalev/llama.cpp Sep 26, 2024
#1)

* Fixed a bug where debug code was included in the release, resulting in an undefined function error.

* Change the path of the QNN library when building in termux environment

* Revert "Change the path of the QNN library when building in termux environment"

This reverts commit c6e26a3.

* Changed so that GGML_QNN_DEFAULT_LIB_SEARCH_PATH can be set from command line arguments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants