Conversation
danielhanchen
left a comment
There was a problem hiding this comment.
Great work again! Just some comments :)
unsloth/models/granite.py
Outdated
| Q = Q.transpose(1, 2) | ||
| K = K.transpose(1, 2) | ||
| V = V.transpose(1, 2) | ||
| sw = getattr(self.config, "sliding_window", None) |
There was a problem hiding this comment.
@Datta0 Is sliding window attention necessary for Granite?
There was a problem hiding this comment.
Um, not necessary. Removing it.
| if past_key_value is not None: | ||
| kv_seq_len += past_key_value[0].shape[-2] | ||
|
|
||
| assert position_embeddings is not None |
There was a problem hiding this comment.
I remember you said we must pass in the position embeddings - did we calculate the cos and sine matrices in RoPE incorrectly?
There was a problem hiding this comment.
Oh this is just a validation. We are calculating the sin, cos and passing from here.
| pass | ||
|
|
||
|
|
||
| def GraniteDecoderLayer_fast_forward( |
There was a problem hiding this comment.
Can we inherit from LlamaDecoderLayer_fast_forward? [Actually scratch that - I forgot Granite has a residual multiplier]
I'm assuming it's because of position_embeddings
unsloth/models/granite.py
Outdated
| use_cache=use_cache, | ||
| padding_mask=padding_mask, | ||
| position_embeddings = position_embeddings, | ||
| _flag_for_generation=True, |
There was a problem hiding this comment.
I don't think flagging it for generation is a good idea - we dynamically have to set this
There was a problem hiding this comment.
Oh this is inspired from gemma2. Should we set it to what we see in the config?
unsloth/models/granite.py
Outdated
| Vn = self.paged_attention_V[:kv_seq_len].permute(1, 2, 0, 3) | ||
|
|
||
| # Handle sliding windows | ||
| sliding_window = self.config.sliding_window if hasattr(self.config, "sliding_window") else self.config.max_position_embeddings |
There was a problem hiding this comment.
Is SWA necessary in Granite?
unsloth/models/granite.py
Outdated
| do_prefill = not hasattr(decoder_layer.self_attn, "paged_attention"), | ||
| position_embeddings = position_embeddings, | ||
| ) | ||
| hidden_states = residual + hidden_states * self.config.residual_multiplier |
There was a problem hiding this comment.
Technically we could use addmm to fuse this entirely into 1 op
There was a problem hiding this comment.
I resorted to using torch.add cuz we don't have any matmul here. Thanks for the suggestion :)
|
|
||
|
|
||
| @staticmethod | ||
| def post_patch(model): |
There was a problem hiding this comment.
We can ignore this I think (if it's a copy from Llama) - it should auto inherit it (I think)
There was a problem hiding this comment.
Correct me if I'm wrong but Wouldn't tie word embeddings mandate handling this separately?
| IS_GEMMA = self.config.model_type.startswith("gemma") | ||
| IS_GEMMA2 = self.config.model_type.startswith("gemma2") | ||
| IS_COHERE = self.config.model_type.startswith("cohere") | ||
| IS_GRANITE = self.config.model_type.startswith("granite") |
There was a problem hiding this comment.
Fix up spacing to make all the equal signs spaced evenly :)
| pass | ||
|
|
||
|
|
||
| if IS_GRANITE: |
There was a problem hiding this comment.
So this is a must must?
There was a problem hiding this comment.
Yeah iirc, Granit's forward calculates it here and passes on and without this it throws error (I don't exactly remember the error unfortunately)
| logit_scaling = getattr(self.config, "logit_scale", 0) | ||
| if self.config.model_type == "granite": | ||
| # granite uses logit_scaling as key and they divide by the scale unlike cohere | ||
| logit_scaling = 1 / getattr(self.config, "logits_scaling", 1) |
There was a problem hiding this comment.
Oh interesting - can you confirm it's not Cohere type logit scaling thanks :)
|
Thanks again! I'll do some testing on my side and might change some parts! Will add this into nightly! |
* Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules * Fix vision model tokenizer padding side. (#1384) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Add citation section to README.md (#1377) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Granite support (#1218) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier * Llama 3.3 --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Zewen Shen <zewen.public@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
* Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules * Fix vision model tokenizer padding side. (#1384) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Add citation section to README.md (#1377) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Granite support (#1218) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier * Llama 3.3 * Update llama.py * Update llama.py * fullgraph * Fix loader.py to work on Windows (#1453) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Fix loader.py to work on Windows --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update save.py warning message (#1425) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Update save.py warning message --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Change _fix_chat_template in case a template has both endif and endfor (#1388) * Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442) * Update save.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Temp fix * Update _utils.py --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Zewen Shen <zewen.public@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Scott Phillips <polygonguru@gmail.com> Co-authored-by: qingy1337 <qxli2@students.everettcc.edu> Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com>
* Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules * Fix vision model tokenizer padding side. (#1384) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Add citation section to README.md (#1377) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Granite support (#1218) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier * Llama 3.3 * Update llama.py * Update llama.py * fullgraph * Fix loader.py to work on Windows (#1453) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Fix loader.py to work on Windows --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update save.py warning message (#1425) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Update save.py warning message --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Change _fix_chat_template in case a template has both endif and endfor (#1388) * Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442) * Update save.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Temp fix * Update _utils.py * Update _utils.py * Update pyproject.toml * Name Error Bug Fix - import from packaging.version import Version (#1468) * Version * Update pyproject.toml * Update pyproject.toml * Version * Update pyproject.toml * Update pyproject.toml * dependencies * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update mistral.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update granite.py * Update cohere.py * Triton windows * Update gemma2.py * Update pyproject.toml * Update _utils.py * Update pyproject.toml --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Zewen Shen <zewen.public@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Scott Phillips <polygonguru@gmail.com> Co-authored-by: qingy1337 <qxli2@students.everettcc.edu> Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com> Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com>
* Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules * Fix vision model tokenizer padding side. (#1384) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Add citation section to README.md (#1377) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Granite support (#1218) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier * Llama 3.3 * Update llama.py * Update llama.py * fullgraph * Fix loader.py to work on Windows (#1453) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Fix loader.py to work on Windows --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update save.py warning message (#1425) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Update save.py warning message --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Change _fix_chat_template in case a template has both endif and endfor (#1388) * Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442) * Update save.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Temp fix * Update _utils.py * Update _utils.py * Update pyproject.toml * Name Error Bug Fix - import from packaging.version import Version (#1468) * Version * Update pyproject.toml * Update pyproject.toml * Version * Update pyproject.toml * Update pyproject.toml * dependencies * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update mistral.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update granite.py * Update cohere.py * Triton windows * Update gemma2.py * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Residual & LoRA * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Zewen Shen <zewen.public@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Scott Phillips <polygonguru@gmail.com> Co-authored-by: qingy1337 <qxli2@students.everettcc.edu> Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com> Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com>
No description provided.