From 9b4802f73268a12c0f057f3598c65bf3f5704bba Mon Sep 17 00:00:00 2001 From: Daniel Han Date: Sun, 16 Jun 2024 04:32:21 +1000 Subject: [PATCH 1/5] Nightly (#649) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén Co-authored-by: XiaoYang Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef Co-authored-by: Alberto Ferrer Co-authored-by: Thomas Viehmann Co-authored-by: Walter Korman --- unsloth/chat_templates.py | 1 + unsloth/models/llama.py | 33 +++++++++++++++++++++++++++++++-- unsloth/models/loader.py | 36 ++++++++++++++++++++++++++---------- 3 files changed, 58 insertions(+), 12 deletions(-) diff --git a/unsloth/chat_templates.py b/unsloth/chat_templates.py index 2e3761f567..a2a02d7e6e 100644 --- a/unsloth/chat_templates.py +++ b/unsloth/chat_templates.py @@ -528,6 +528,7 @@ def get_chat_template( chat_template, stop_word = chat_template assert(type(chat_template) is str) assert(type(stop_word) is str) + ollama_modelfile = None elif type(chat_template) is str: diff --git a/unsloth/models/llama.py b/unsloth/models/llama.py index f2f79de8c9..3d969d7d31 100644 --- a/unsloth/models/llama.py +++ b/unsloth/models/llama.py @@ -1423,9 +1423,38 @@ def get_peft_model( transformers_set_seed(random_state) if isinstance(model, PeftModelForCausalLM): - raise TypeError( - "Unsloth: Your model already has LoRA adapters. No need to run this again!" + # Check if exactly the same and then pass through! + assert(hasattr(model, "peft_config")) + + peft_config = model.peft_config["default"].to_dict() + check_parameters = [ + "r", "lora_alpha", "lora_dropout", + "bias", "layers_to_transform", "layers_pattern", + "use_rslora", "modules_to_save", "init_lora_weights", + ] + check_all = True + for param in check_parameters: + check_all = check_all and (peft_config[param] == eval(param)) + pass + check_all = check_all and ( + len(set(peft_config["target_modules"]) ^ set(target_modules)) == 0 ) + check_all = check_all and ( + (loftq_config == {} or loftq_config is None) and \ + (peft_config["loftq_config"] == {} or peft_config["loftq_config"] is None) + ) + + if check_all: + # Simply pass through! + logger.warning( + "Unsloth: Already have LoRA adapters! We shall skip this step." + ) + return model + else: + raise TypeError( + "Unsloth: Your model already has LoRA adapters. Your new parameters are different." + ) + pass pass if loftq_config is None: loftq_config = {} diff --git a/unsloth/models/loader.py b/unsloth/models/loader.py index de1e2e57bf..d7c0f0760f 100644 --- a/unsloth/models/loader.py +++ b/unsloth/models/loader.py @@ -91,21 +91,37 @@ def from_pretrained( model_name = _get_model_name(model_name, load_in_4bit) # First check if it's a normal model via AutoConfig - is_peft = False try: model_config = AutoConfig.from_pretrained(model_name, token = token, revision = revision) - is_peft = False + is_model = True + except: + is_model = False + try: + peft_config = PeftConfig .from_pretrained(model_name, token = token, revision = revision) + is_peft = True except: - try: - # Most likely a PEFT model - peft_config = PeftConfig.from_pretrained(model_name, token = token, revision = revision) - except: - raise RuntimeError(f"Unsloth: `{model_name}` is not a full model or a PEFT model.") - + is_peft = False + + # Cannot be both! + if is_model and is_peft: + raise RuntimeError( + "Unsloth: You repo has a LoRA adapter and a base model.\n"\ + "You have 2 files `config.json` and `adapter_config.json`.\n"\ + "We must only allow one config file.\n"\ + "Please separate the LoRA and base models to 2 repos." + ) + elif not is_model and not is_peft: + raise RuntimeError( + f"Unsloth: `{model_name}` is not a base model or a PEFT model.\n"\ + "We could not locate a `config.json` or `adapter_config.json` file" + ) + pass + + # Get base model for PEFT: + if is_peft: # Check base model again for PEFT model_name = _get_model_name(peft_config.base_model_name_or_path, load_in_4bit) - model_config = AutoConfig.from_pretrained(model_name, token = token) - is_peft = True + model_config = AutoConfig.from_pretrained(model_name, token = token, revision = revision) pass model_type = model_config.model_type From 8bc65baa343b11acb30a0eda47df6c1cc3d1f8e6 Mon Sep 17 00:00:00 2001 From: Jason Crawford Date: Sat, 15 Jun 2024 15:38:27 -0600 Subject: [PATCH 2/5] Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving --- unsloth/save.py | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/unsloth/save.py b/unsloth/save.py index 940feb40f9..820f8280e9 100644 --- a/unsloth/save.py +++ b/unsloth/save.py @@ -853,8 +853,12 @@ def save_to_gguf( model_dtype = "f16" if model_dtype == "float16" else "bf16" # Convert quantization_method to list - quantization_method = \ - quantization_method if type(quantization_method) is list else list(quantization_method) + if isinstance(quantization_method, list): + quantization_method_list = quantization_method + elif isinstance(quantization_method, str): + quantization_method_list = [quantization_method] + else: + quantization_method_list = list(quantization_method) # Check if bfloat16 is supported if model_dtype == "bf16" and not torch.cuda.is_bf16_supported(): From 9ef774861c52fb526f471e17227653ee1ede3aac Mon Sep 17 00:00:00 2001 From: Jason Crawford Date: Sat, 15 Jun 2024 16:39:38 -0600 Subject: [PATCH 3/5] Implemented better list management and then forgot to actually call the new list variable, fixed --- unsloth/save.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/unsloth/save.py b/unsloth/save.py index 820f8280e9..21cfc3ef0b 100644 --- a/unsloth/save.py +++ b/unsloth/save.py @@ -875,7 +875,7 @@ def save_to_gguf( pass # Check I quants - for quant_method in quantization_method: + for quant_method in quantization_method_list: if quant_method.startswith("iq2"): raise RuntimeError("Unsloth: Currently iq2 type quantizations aren't supported yet - sorry!") pass @@ -890,7 +890,7 @@ def save_to_gguf( # Map quant methods new_quantization_method = [] - for quant_method in quantization_method: + for quant_method in quantization_method_list: if quant_method == "not_quantized": quantization_method = model_dtype elif quant_method == "fast_quantized": quantization_method = "q8_0" elif quant_method == "quantized": quantization_method = "q4_k_m" From c9eea99f8ddfd6e6c872bddecb038ae6e63759ca Mon Sep 17 00:00:00 2001 From: Jason Crawford Date: Sat, 15 Jun 2024 17:10:29 -0600 Subject: [PATCH 4/5] Check type of given quantization method and return type error if not list or string --- unsloth/save.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/unsloth/save.py b/unsloth/save.py index 21cfc3ef0b..787b22450a 100644 --- a/unsloth/save.py +++ b/unsloth/save.py @@ -858,7 +858,7 @@ def save_to_gguf( elif isinstance(quantization_method, str): quantization_method_list = [quantization_method] else: - quantization_method_list = list(quantization_method) + raise TypeError("quantization_method should be either a string or a list") # Check if bfloat16 is supported if model_dtype == "bf16" and not torch.cuda.is_bf16_supported(): From f52045a016e3fbc0f1ba6c6112a911b73b46fec0 Mon Sep 17 00:00:00 2001 From: Daniel Han Date: Sun, 16 Jun 2024 14:41:22 +1000 Subject: [PATCH 5/5] Update save.py --- unsloth/save.py | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/unsloth/save.py b/unsloth/save.py index 787b22450a..f8f884a9d3 100644 --- a/unsloth/save.py +++ b/unsloth/save.py @@ -853,13 +853,13 @@ def save_to_gguf( model_dtype = "f16" if model_dtype == "float16" else "bf16" # Convert quantization_method to list - if isinstance(quantization_method, list): - quantization_method_list = quantization_method - elif isinstance(quantization_method, str): - quantization_method_list = [quantization_method] + if isinstance(quantization_method, list): pass + elif isinstance(quantization_method, str): quantization_method = [ quantization_method, ] + elif isinstance(quantization_method, tuple): quantization_method = list(quantization_method) else: - raise TypeError("quantization_method should be either a string or a list") - + raise TypeError("Unsloth: quantization_method can only be a string or a list of strings") + pass + # Check if bfloat16 is supported if model_dtype == "bf16" and not torch.cuda.is_bf16_supported(): logger.warning( @@ -875,7 +875,7 @@ def save_to_gguf( pass # Check I quants - for quant_method in quantization_method_list: + for quant_method in quantization_method: if quant_method.startswith("iq2"): raise RuntimeError("Unsloth: Currently iq2 type quantizations aren't supported yet - sorry!") pass @@ -890,7 +890,7 @@ def save_to_gguf( # Map quant methods new_quantization_method = [] - for quant_method in quantization_method_list: + for quant_method in quantization_method: if quant_method == "not_quantized": quantization_method = model_dtype elif quant_method == "fast_quantized": quantization_method = "q8_0" elif quant_method == "quantized": quantization_method = "q4_k_m"