From 9b4802f73268a12c0f057f3598c65bf3f5704bba Mon Sep 17 00:00:00 2001
From: Daniel Han <danielhanchen@gmail.com>
Date: Sun, 16 Jun 2024 04:32:21 +1000
Subject: [PATCH 1/5] Nightly (#649)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
---
 unsloth/chat_templates.py |  1 +
 unsloth/models/llama.py   | 33 +++++++++++++++++++++++++++++++--
 unsloth/models/loader.py  | 36 ++++++++++++++++++++++++++----------
 3 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/unsloth/chat_templates.py b/unsloth/chat_templates.py
index 2e3761f567..a2a02d7e6e 100644
--- a/unsloth/chat_templates.py
+++ b/unsloth/chat_templates.py
@@ -528,6 +528,7 @@ def get_chat_template(
         chat_template, stop_word = chat_template
         assert(type(chat_template) is str)
         assert(type(stop_word) is str)
+        ollama_modelfile = None
 
     elif type(chat_template) is str:
 
diff --git a/unsloth/models/llama.py b/unsloth/models/llama.py
index f2f79de8c9..3d969d7d31 100644
--- a/unsloth/models/llama.py
+++ b/unsloth/models/llama.py
@@ -1423,9 +1423,38 @@ def get_peft_model(
         transformers_set_seed(random_state)
 
         if isinstance(model, PeftModelForCausalLM):
-            raise TypeError(
-                "Unsloth: Your model already has LoRA adapters. No need to run this again!"
+            # Check if exactly the same and then pass through!
+            assert(hasattr(model, "peft_config"))
+
+            peft_config = model.peft_config["default"].to_dict()
+            check_parameters = [
+                "r", "lora_alpha", "lora_dropout",
+                "bias", "layers_to_transform", "layers_pattern",
+                "use_rslora", "modules_to_save", "init_lora_weights",
+            ]
+            check_all = True
+            for param in check_parameters:
+                check_all = check_all and (peft_config[param] == eval(param))
+            pass
+            check_all = check_all and (
+                len(set(peft_config["target_modules"]) ^ set(target_modules)) == 0
             )
+            check_all = check_all and (
+                (loftq_config == {} or loftq_config is None) and \
+                (peft_config["loftq_config"] == {} or peft_config["loftq_config"] is None)
+            )
+
+            if check_all:
+                # Simply pass through!
+                logger.warning(
+                    "Unsloth: Already have LoRA adapters! We shall skip this step."
+                )
+                return model
+            else:
+                raise TypeError(
+                    "Unsloth: Your model already has LoRA adapters. Your new parameters are different."
+                )
+            pass
         pass
 
         if loftq_config is None: loftq_config = {}
diff --git a/unsloth/models/loader.py b/unsloth/models/loader.py
index de1e2e57bf..d7c0f0760f 100644
--- a/unsloth/models/loader.py
+++ b/unsloth/models/loader.py
@@ -91,21 +91,37 @@ def from_pretrained(
         model_name = _get_model_name(model_name, load_in_4bit)
 
         # First check if it's a normal model via AutoConfig
-        is_peft = False
         try:
             model_config = AutoConfig.from_pretrained(model_name, token = token, revision = revision)
-            is_peft = False
+            is_model = True
+        except:
+            is_model = False
+        try:
+            peft_config = PeftConfig .from_pretrained(model_name, token = token, revision = revision)
+            is_peft = True
         except:
-            try:
-                # Most likely a PEFT model
-                peft_config = PeftConfig.from_pretrained(model_name, token = token, revision = revision)
-            except:
-                raise RuntimeError(f"Unsloth: `{model_name}` is not a full model or a PEFT model.")
-            
+            is_peft = False
+
+        # Cannot be both!
+        if is_model and is_peft:
+            raise RuntimeError(
+                "Unsloth: You repo has a LoRA adapter and a base model.\n"\
+                "You have 2 files `config.json` and `adapter_config.json`.\n"\
+                "We must only allow one config file.\n"\
+                "Please separate the LoRA and base models to 2 repos."
+            )
+        elif not is_model and not is_peft:
+            raise RuntimeError(
+                f"Unsloth: `{model_name}` is not a base model or a PEFT model.\n"\
+                "We could not locate a `config.json` or `adapter_config.json` file"
+            )
+        pass
+
+        # Get base model for PEFT:
+        if is_peft:
             # Check base model again for PEFT
             model_name = _get_model_name(peft_config.base_model_name_or_path, load_in_4bit)
-            model_config = AutoConfig.from_pretrained(model_name, token = token)
-            is_peft = True
+            model_config = AutoConfig.from_pretrained(model_name, token = token, revision = revision)
         pass
 
         model_type = model_config.model_type

From 8bc65baa343b11acb30a0eda47df6c1cc3d1f8e6 Mon Sep 17 00:00:00 2001
From: Jason Crawford <jason@arcadalabs.com>
Date: Sat, 15 Jun 2024 15:38:27 -0600
Subject: [PATCH 2/5] Fix bug in save.py with interpreting quantization_method
 as a string that prevents GGUF from saving

---
 unsloth/save.py | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/unsloth/save.py b/unsloth/save.py
index 940feb40f9..820f8280e9 100644
--- a/unsloth/save.py
+++ b/unsloth/save.py
@@ -853,8 +853,12 @@ def save_to_gguf(
     model_dtype = "f16" if model_dtype == "float16" else "bf16"
 
     # Convert quantization_method to list
-    quantization_method = \
-        quantization_method if type(quantization_method) is list else list(quantization_method)
+    if isinstance(quantization_method, list):
+        quantization_method_list = quantization_method
+    elif isinstance(quantization_method, str):
+        quantization_method_list = [quantization_method]
+    else:
+        quantization_method_list = list(quantization_method)
 
     # Check if bfloat16 is supported
     if model_dtype == "bf16" and not torch.cuda.is_bf16_supported():

From 9ef774861c52fb526f471e17227653ee1ede3aac Mon Sep 17 00:00:00 2001
From: Jason Crawford <jason@arcadalabs.com>
Date: Sat, 15 Jun 2024 16:39:38 -0600
Subject: [PATCH 3/5] Implemented better list management and then forgot to
 actually call the new list variable, fixed

---
 unsloth/save.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/unsloth/save.py b/unsloth/save.py
index 820f8280e9..21cfc3ef0b 100644
--- a/unsloth/save.py
+++ b/unsloth/save.py
@@ -875,7 +875,7 @@ def save_to_gguf(
     pass
 
     # Check I quants
-    for quant_method in quantization_method: 
+    for quant_method in quantization_method_list:
         if quant_method.startswith("iq2"):
             raise RuntimeError("Unsloth: Currently iq2 type quantizations aren't supported yet - sorry!")
     pass
@@ -890,7 +890,7 @@ def save_to_gguf(
 
     # Map quant methods
     new_quantization_method = []
-    for quant_method in quantization_method:
+    for quant_method in quantization_method_list:
         if   quant_method == "not_quantized":  quantization_method = model_dtype
         elif quant_method == "fast_quantized": quantization_method = "q8_0"
         elif quant_method == "quantized":      quantization_method = "q4_k_m"

From c9eea99f8ddfd6e6c872bddecb038ae6e63759ca Mon Sep 17 00:00:00 2001
From: Jason Crawford <jason@arcadalabs.com>
Date: Sat, 15 Jun 2024 17:10:29 -0600
Subject: [PATCH 4/5] Check type of given quantization method and return type
 error if not list or string

---
 unsloth/save.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/unsloth/save.py b/unsloth/save.py
index 21cfc3ef0b..787b22450a 100644
--- a/unsloth/save.py
+++ b/unsloth/save.py
@@ -858,7 +858,7 @@ def save_to_gguf(
     elif isinstance(quantization_method, str):
         quantization_method_list = [quantization_method]
     else:
-        quantization_method_list = list(quantization_method)
+        raise TypeError("quantization_method should be either a string or a list")
 
     # Check if bfloat16 is supported
     if model_dtype == "bf16" and not torch.cuda.is_bf16_supported():

From f52045a016e3fbc0f1ba6c6112a911b73b46fec0 Mon Sep 17 00:00:00 2001
From: Daniel Han <danielhanchen@gmail.com>
Date: Sun, 16 Jun 2024 14:41:22 +1000
Subject: [PATCH 5/5] Update save.py

---
 unsloth/save.py | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/unsloth/save.py b/unsloth/save.py
index 787b22450a..f8f884a9d3 100644
--- a/unsloth/save.py
+++ b/unsloth/save.py
@@ -853,13 +853,13 @@ def save_to_gguf(
     model_dtype = "f16" if model_dtype == "float16" else "bf16"
 
     # Convert quantization_method to list
-    if isinstance(quantization_method, list):
-        quantization_method_list = quantization_method
-    elif isinstance(quantization_method, str):
-        quantization_method_list = [quantization_method]
+    if   isinstance(quantization_method, list):  pass
+    elif isinstance(quantization_method, str):   quantization_method = [ quantization_method, ]
+    elif isinstance(quantization_method, tuple): quantization_method = list(quantization_method)
     else:
-        raise TypeError("quantization_method should be either a string or a list")
-
+        raise TypeError("Unsloth: quantization_method can only be a string or a list of strings")
+    pass
+    
     # Check if bfloat16 is supported
     if model_dtype == "bf16" and not torch.cuda.is_bf16_supported():
         logger.warning(
@@ -875,7 +875,7 @@ def save_to_gguf(
     pass
 
     # Check I quants
-    for quant_method in quantization_method_list:
+    for quant_method in quantization_method: 
         if quant_method.startswith("iq2"):
             raise RuntimeError("Unsloth: Currently iq2 type quantizations aren't supported yet - sorry!")
     pass
@@ -890,7 +890,7 @@ def save_to_gguf(
 
     # Map quant methods
     new_quantization_method = []
-    for quant_method in quantization_method_list:
+    for quant_method in quantization_method:
         if   quant_method == "not_quantized":  quantization_method = model_dtype
         elif quant_method == "fast_quantized": quantization_method = "q8_0"
         elif quant_method == "quantized":      quantization_method = "q4_k_m"