Ollama by danielhanchen · Pull Request #665 · unslothai/unsloth

danielhanchen · 2024-06-18T14:25:42Z

No description provided.

…ops pipelines. (#623)

* Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (unslothai#630) * Support revision parameter in FastLanguageModel.from_pretrained (unslothai#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (unslothai#609) * Update __init__.py (unslothai#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (unslothai#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (unslothai#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix unslothai#2 for saving lora * Test fix unslothai#3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (unslothai#619) * llama.cpp failing (unslothai#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (unslothai#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (unslothai#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (unslothai#651) * Nightly (unslothai#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (unslothai#630) * Support revision parameter in FastLanguageModel.from_pretrained (unslothai#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (unslothai#609) * Update __init__.py (unslothai#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (unslothai#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (unslothai#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix unslothai#2 for saving lora * Test fix unslothai#3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (unslothai#619) * llama.cpp failing (unslothai#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (unslothai#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (unslothai#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (unslothai#652) This reverts commit 506cb68. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (unslothai#653) This reverts commit 2f48cc9. * Update llama.py * peft * patch * Update loader.py * retrain * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * offload * Update llama.py * Create a starter script for command-line training to integrate in ML ops pipelines. (unslothai#623) * Update chat_templates.py * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>

* Fix num_logits_to_keep on transformers >= 4.51 + compile loss_function Two follow-ups to the fused-forward work landed in unsloth-zoo PR #665. 1. unsloth_fast_generate (models/llama.py): transformers 4.51 renamed num_logits_to_keep to logits_to_keep. Previously we unconditionally set kwargs['num_logits_to_keep'] = 1, which transformers 4.57's _validate_model_kwargs rejects with: ValueError: The following `model_kwargs` are not used by the model: ['num_logits_to_keep'] blocking model.generate() on Llama / Mistral. Now we inspect the runtime forward signature and use whichever spelling it accepts; if a caller still passes the legacy name we promote it to the new spelling instead of stripping it. 2. patch_loss_functions (models/loader.py): the single internal call site passed torch_compile=False. UnslothForCausalLMLoss is small (label shift + Triton CE), so torch.compile folds the elementwise prep into one launch and removes per-step Python overhead. The < 2.4 fallback inside patch_loss_functions still routes through torch._disable_dynamo so older torches are unaffected. Verified: - Llama 3.2 1B + model.generate() no longer raises; emits a sensible 16-token continuation. - Gemma3 1B GRPO smoke (max_steps=3) returns bit-identical losses 0.256 / 0.4393 / 0.2031 vs pre-fix; train_runtime 409s (vs 415s pre-fix, within noise). - unsloth-zoo test_compiler_rewriter_exhaustive + test_fused_forward_install pass (96 passed) on this combination. Related: unslothai/unsloth-zoo PR for the compiler.py single-matmul backport. * Revert loader.py loss-compile flip; correct rename-version comment Drop the patch_loss_functions(torch_compile=True) flip. Tracing the loss call chain: UnslothForCausalLMLoss -> unsloth_fixed_cross_entropy -> _fast_cross_entropy_loss -> Fast_CrossEntropyLoss.apply (torch.autograd.Function wrapping Triton) torch.compile treats custom autograd.Function.apply as an opaque op and breaks the graph at the boundary. The only Python it can actually compile in the loss function is the label-shift + ignore-fill prep (three elementwise ops), and the per-call dynamo guard overhead is in the same order as that prep. Empirical Gemma3 1B GRPO smoke (max_steps=3) showed no meaningful runtime delta (415s vs 409s, within noise) and risked dragging the outer compiled training step into recompiles when the inner guards drift. Keep torch_compile=False; the Triton kernel is the work, and it is unchanged either way. Also: the inline comment in unsloth_fast_generate said the kwarg rename landed in transformers 4.51. The actual decorator (@deprecate_kwarg) was tagged version="4.50" and present through 4.51.x, then removed in 4.52+. Correct the comment. No behaviour change.

danielhanchen added 30 commits May 19, 2024 16:22

Update llama.py

7df08c4

offload

ba5b6ce

Update llama.py

a07057e

Update llama.py

4be9063

Update llama.py

3dc3d3f

Update llama.py

f1cc1e8

Update llama.py

5cb531a

Update llama.py

6bd8e60

Update llama.py

d1d57ff

continued pretraining trainer

7470f67

Update trainer.py

da9c1a6

Update trainer.py

2c68f56

Update trainer.py

217bf9d

Update trainer.py

6e85384

is_bfloat16_supported

77f9c51

Update __init__.py

c0e1d27

Update README.md

2b23b93

Update llama.py

902e23a

Merge branch 'main' into nightly

98f41ce

is_bfloat16_supported

3193cac

Update __init__.py

dfeaf4b

Mistral v3

1e84090

Merge branch 'main' into nightly

f63f32b

Phi 3 medium

57ad8e7

Update chat_templates.py

2b994b2

Update chat_templates.py

ff8171f

Phi-3

5ca8b58

Merge branch 'main' into nightly

98c2e81

Merge branch 'main' into nightly

3817660

Merge branch 'main' into nightly

f858145

danielhanchen and others added 27 commits June 16, 2024 22:54

retrain

17a9fb3

Update llama.py

25cb17e

Update llama.py

3fba0f5

Update llama.py

798aa1e

Update llama.py

c6142d0

Update llama.py

7236dfc

Update llama.py

37f9abd

Update llama.py

7618197

Update llama.py

b4907f3

Update llama.py

2714a8b

Update llama.py

771a0d0

Merge branch 'main' into nightly

af88eda

offload

92c7d58

Update llama.py

2eacd2d

Create a starter script for command-line training to integrate in ML …

b957061

…ops pipelines. (#623)

Update chat_templates.py

4dba6c5

Ollama

f2e4b83

Update chat_templates.py

a5367a1

Update chat_templates.py

85062ff

Update chat_templates.py

12bd3cf

Update chat_templates.py

4417417

Update chat_templates.py

be02d97

Update chat_templates.py

0a1ee7a

Update chat_templates.py

7195152

Update chat_templates.py

676b20b

Update chat_templates.py

89b7807

Update chat_templates.py

563afa9

danielhanchen merged commit 8770308 into main Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ollama#665

Ollama#665
danielhanchen merged 213 commits into
mainfrom
nightly

danielhanchen commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Uh oh!

Conversation

danielhanchen commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants