Skip to content

2024 Release#96

Merged
danielhanchen merged 82 commits into
mainfrom
nightly
Jan 18, 2024
Merged

2024 Release#96
danielhanchen merged 82 commits into
mainfrom
nightly

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

No description provided.

danielhanchen added a commit to shimmyshimmer/unsloth-staging-4 that referenced this pull request Feb 6, 2024
commit 35f2ab4a8b4deecbbbe9fbd95f4efde8694233db
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Feb 4 17:35:56 2024 +1100

    2x faster inference (#151)

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

    * Update llama.py

    * Update llama.py

    * Fix SDPA

    * Update llama.py

    * padding

    * Inference

    * Update llama.py

    * Revert

    * Update mistral.py

    * faster inference

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * inference

    * Update llama.py

    * Update utils.py

    * faster inference

    * Update llama.py

    * revert

    * lm_head

    * Update llama.py

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * faster inference

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * torch compile

    * past_key_values

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update llama.py

    * fast inference + saving config.json

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * fast inference again

    * more temp matrices

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update mistral.py

    * Update llama.py

    * SDPA

    * attention_mask

    * New version

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

commit 051a73b0e63d3ae3acd7c4d962349280f69bbdb0
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 31 04:03:37 2024 +1100

    Hotfix - fix inference (#146)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

commit 05624642802c7f90dcc7aeea0e1c8d447cde006e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 17:49:54 2024 +1100

    Fix inference attention mask (#142)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

commit 206a9b65f090bd71ccaad7dd88b67ba2bfde0b58
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:45:07 2024 +1100

    Nightly (#140)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

commit 8faf469f028a05852b2dc29ec8df1f36998fab33
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 02:52:39 2024 +1100

    Fix saving issues (#139)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

commit 1ecc0185a5759c7a0c95dfc96aceea5023cebdfc
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:30:29 2024 +1100

    1 more bug (#138)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

commit cd32ba76b71adf3317ede9de7d1cf6f30ad3bf0d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:20:06 2024 +1100

    Fix bugs + more accurate Swiglu (#137)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

commit 89daa0efcc38c7690abbb8170b5d9f3d364796ce
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:50:22 2024 +1100

    Inference bug fix (#134)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

commit 87a7ef1049f6fca409a0673f51f4758e0aff248d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:47:54 2024 +1100

    More bug fixes (#133)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 3d67790901696e953171f64b4bf9d980780051a0
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 26 04:19:17 2024 +1100

    Fix bugs (#129)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

commit a833f403462e9cfc1f96b3b84d9da15d7d8db5ee
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Tue Jan 23 03:55:24 2024 +1100

    2-4x faster native HF inference (#119)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

commit b370c9c8aacc31a7845404566dd95dfa8c0e3bac
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 22:20:22 2024 +1100

    Hotfix (#118)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 57a5b5a49da588b1db8e9a988cc985dc20393d34
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 05:00:37 2024 +1100

    Update save.py

commit 5145a61e69ab9b3035465f649e1c1e5aae749f8f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:21:54 2024 +1100

    Update save.py

commit a7bd8d119c16433de4f8b6a36903ef7131f225e5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:13:03 2024 +1100

    Update save.py

commit be4b97e7d89074b6dd1d2e984fa429051d328192
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 03:43:49 2024 +1100

    Fixed saving! (#113)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

    * upload_to_huggingface

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

commit abb462be71e8cf01ad989dca0efaa17441113651
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 23:23:00 2024 +1100

    Hotfix for Jan 2024 Release (#110)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

commit 31e2d71720e64b854145d7779833b7d2d3d4177e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 04:25:06 2024 +1100

    Quick fixes (#106)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

commit 8846337e5c8c2f206a4ac8fe6d239f3d1221f7ac
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 20 02:30:31 2024 +1100

    Update _utils.py

commit d378df87e5f3945474915a098c9aa58313465064
Merge: c1e7480 920e3c2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:38 2024 +1100

    Merge branch 'main' of https://github.com/unslothai/unsloth

commit c1e7480ac2ad0e5efa05e84fe0997619ccdd86a4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:20 2024 +1100

    Revert quantization methods

commit 920e3c2ea07a044addeb7c3fa8be6f0189cb7f84
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:57:22 2024 +1100

    getattr issues (#103)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

commit fc25ab0df032f8ee5ea750f27c68d63f49d2d9a9
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:52:30 2024 +1100

    Quick fixes (#101)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

commit b8b1eafda35d124046e11766aeeb6343957e0daf
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 04:51:19 2024 +1100

    2024 Release (#96)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

commit 4112eb4a3df4c0911e36211b47381086c963b4e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:41:00 2024 +1100

    Update pyproject.toml

commit 59d74753362ff59e664cb6d650b564511e6e20f3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:35:17 2024 +1100

    Update pyproject.toml

commit c1ac4d2707574868767345e76ebe49c8353f9057
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Thu Jan 11 04:08:03 2024 +1100

    Fix some bugs (#83)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

commit d3887c7fd93d9b910bf6ee3ab3c7fd485fc55e46
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 23:10:48 2024 +1100

    Update README.md (#81)

commit b5d94d9a0ad9532494e1b3c7badbb94fa92c50eb
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:10:23 2024 +1100

    Discord button redo (#80)

commit 01d7f58e11373ab07b9282a42bc14f542dbdabf0
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:02:20 2024 +1100

    Update logos (#79)

    * HF Perf Button

    * Update README.md

    Adding new buttons cleanup

    * Update README.md

    * Delete images/Discord.png

    * Delete images/try live demo green.png

    * new transparent logos

    * Revamping page

    * Revamp mainpage

    * Update README.md

    * Update README.md

commit 9faaf5b388e025f8ffc302450a12ffb84e7e1750
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 20:03:01 2024 +1100

    Create FUNDING.yml (#78)

commit 82e6fece0b78011707090639823d2d7acf5a3864
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 10 01:02:44 2024 +1100

    fix_tokenizer

commit b52278199b7ae2764f242622275bb8a85ba7b721
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 9 23:40:43 2024 +1100

    check_tokenizer
danielhanchen added a commit that referenced this pull request Feb 6, 2024
* HF Perf Button

* Update README.md

Adding new buttons cleanup

* Update README.md

* Delete images/Discord.png

* Delete images/try live demo green.png

* new transparent logos

* Revamping page

* Revamp mainpage

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* finetune button

* Delete start free finetune button.png

* free finetune button

* Add files via upload

* Update README.md

* Update README.md

* Add files via upload

* Add files via upload

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Squashed commit of the following:

commit 35f2ab4a8b4deecbbbe9fbd95f4efde8694233db
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Feb 4 17:35:56 2024 +1100

    2x faster inference (#151)

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

    * Update llama.py

    * Update llama.py

    * Fix SDPA

    * Update llama.py

    * padding

    * Inference

    * Update llama.py

    * Revert

    * Update mistral.py

    * faster inference

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * inference

    * Update llama.py

    * Update utils.py

    * faster inference

    * Update llama.py

    * revert

    * lm_head

    * Update llama.py

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * faster inference

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * torch compile

    * past_key_values

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update llama.py

    * fast inference + saving config.json

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * fast inference again

    * more temp matrices

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update mistral.py

    * Update llama.py

    * SDPA

    * attention_mask

    * New version

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

commit 051a73b0e63d3ae3acd7c4d962349280f69bbdb0
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 31 04:03:37 2024 +1100

    Hotfix - fix inference (#146)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

commit 05624642802c7f90dcc7aeea0e1c8d447cde006e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 17:49:54 2024 +1100

    Fix inference attention mask (#142)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

commit 206a9b65f090bd71ccaad7dd88b67ba2bfde0b58
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:45:07 2024 +1100

    Nightly (#140)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

commit 8faf469f028a05852b2dc29ec8df1f36998fab33
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 02:52:39 2024 +1100

    Fix saving issues (#139)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

commit 1ecc0185a5759c7a0c95dfc96aceea5023cebdfc
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:30:29 2024 +1100

    1 more bug (#138)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

commit cd32ba76b71adf3317ede9de7d1cf6f30ad3bf0d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:20:06 2024 +1100

    Fix bugs + more accurate Swiglu (#137)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

commit 89daa0efcc38c7690abbb8170b5d9f3d364796ce
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:50:22 2024 +1100

    Inference bug fix (#134)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

commit 87a7ef1049f6fca409a0673f51f4758e0aff248d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:47:54 2024 +1100

    More bug fixes (#133)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 3d67790901696e953171f64b4bf9d980780051a0
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 26 04:19:17 2024 +1100

    Fix bugs (#129)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

commit a833f403462e9cfc1f96b3b84d9da15d7d8db5ee
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Tue Jan 23 03:55:24 2024 +1100

    2-4x faster native HF inference (#119)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

commit b370c9c8aacc31a7845404566dd95dfa8c0e3bac
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 22:20:22 2024 +1100

    Hotfix (#118)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 57a5b5a49da588b1db8e9a988cc985dc20393d34
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 05:00:37 2024 +1100

    Update save.py

commit 5145a61e69ab9b3035465f649e1c1e5aae749f8f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:21:54 2024 +1100

    Update save.py

commit a7bd8d119c16433de4f8b6a36903ef7131f225e5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:13:03 2024 +1100

    Update save.py

commit be4b97e7d89074b6dd1d2e984fa429051d328192
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 03:43:49 2024 +1100

    Fixed saving! (#113)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

    * upload_to_huggingface

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

commit abb462be71e8cf01ad989dca0efaa17441113651
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 23:23:00 2024 +1100

    Hotfix for Jan 2024 Release (#110)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

commit 31e2d71720e64b854145d7779833b7d2d3d4177e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 04:25:06 2024 +1100

    Quick fixes (#106)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

commit 8846337e5c8c2f206a4ac8fe6d239f3d1221f7ac
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 20 02:30:31 2024 +1100

    Update _utils.py

commit d378df87e5f3945474915a098c9aa58313465064
Merge: c1e7480 920e3c2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:38 2024 +1100

    Merge branch 'main' of https://github.com/unslothai/unsloth

commit c1e7480ac2ad0e5efa05e84fe0997619ccdd86a4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:20 2024 +1100

    Revert quantization methods

commit 920e3c2ea07a044addeb7c3fa8be6f0189cb7f84
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:57:22 2024 +1100

    getattr issues (#103)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

commit fc25ab0df032f8ee5ea750f27c68d63f49d2d9a9
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:52:30 2024 +1100

    Quick fixes (#101)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

commit b8b1eafda35d124046e11766aeeb6343957e0daf
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 04:51:19 2024 +1100

    2024 Release (#96)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

commit 4112eb4a3df4c0911e36211b47381086c963b4e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:41:00 2024 +1100

    Update pyproject.toml

commit 59d74753362ff59e664cb6d650b564511e6e20f3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:35:17 2024 +1100

    Update pyproject.toml

commit c1ac4d2707574868767345e76ebe49c8353f9057
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Thu Jan 11 04:08:03 2024 +1100

    Fix some bugs (#83)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

commit d3887c7fd93d9b910bf6ee3ab3c7fd485fc55e46
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 23:10:48 2024 +1100

    Update README.md (#81)

commit b5d94d9a0ad9532494e1b3c7badbb94fa92c50eb
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:10:23 2024 +1100

    Discord button redo (#80)

commit 01d7f58e11373ab07b9282a42bc14f542dbdabf0
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:02:20 2024 +1100

    Update logos (#79)

    * HF Perf Button

    * Update README.md

    Adding new buttons cleanup

    * Update README.md

    * Delete images/Discord.png

    * Delete images/try live demo green.png

    * new transparent logos

    * Revamping page

    * Revamp mainpage

    * Update README.md

    * Update README.md

commit 9faaf5b388e025f8ffc302450a12ffb84e7e1750
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 20:03:01 2024 +1100

    Create FUNDING.yml (#78)

commit 82e6fece0b78011707090639823d2d7acf5a3864
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 10 01:02:44 2024 +1100

    fix_tokenizer

commit b52278199b7ae2764f242622275bb8a85ba7b721
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 9 23:40:43 2024 +1100

    check_tokenizer

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
cm2435 pushed a commit to cm2435/unsloth that referenced this pull request Feb 26, 2024
* HF Perf Button

* Update README.md

Adding new buttons cleanup

* Update README.md

* Delete images/Discord.png

* Delete images/try live demo green.png

* new transparent logos

* Revamping page

* Revamp mainpage

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* finetune button

* Delete start free finetune button.png

* free finetune button

* Add files via upload

* Update README.md

* Update README.md

* Add files via upload

* Add files via upload

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Squashed commit of the following:

commit 35f2ab4a8b4deecbbbe9fbd95f4efde8694233db
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Feb 4 17:35:56 2024 +1100

    2x faster inference (#151)

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

    * Update llama.py

    * Update llama.py

    * Fix SDPA

    * Update llama.py

    * padding

    * Inference

    * Update llama.py

    * Revert

    * Update mistral.py

    * faster inference

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * inference

    * Update llama.py

    * Update utils.py

    * faster inference

    * Update llama.py

    * revert

    * lm_head

    * Update llama.py

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * faster inference

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * torch compile

    * past_key_values

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update llama.py

    * fast inference + saving config.json

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * fast inference again

    * more temp matrices

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update mistral.py

    * Update llama.py

    * SDPA

    * attention_mask

    * New version

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

commit 051a73b0e63d3ae3acd7c4d962349280f69bbdb0
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 31 04:03:37 2024 +1100

    Hotfix - fix inference (#146)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

commit 05624642802c7f90dcc7aeea0e1c8d447cde006e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 17:49:54 2024 +1100

    Fix inference attention mask (#142)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

commit 206a9b65f090bd71ccaad7dd88b67ba2bfde0b58
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:45:07 2024 +1100

    Nightly (#140)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

commit 8faf469f028a05852b2dc29ec8df1f36998fab33
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 02:52:39 2024 +1100

    Fix saving issues (#139)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

commit 1ecc0185a5759c7a0c95dfc96aceea5023cebdfc
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:30:29 2024 +1100

    1 more bug (#138)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

commit cd32ba76b71adf3317ede9de7d1cf6f30ad3bf0d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:20:06 2024 +1100

    Fix bugs + more accurate Swiglu (#137)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

commit 89daa0efcc38c7690abbb8170b5d9f3d364796ce
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:50:22 2024 +1100

    Inference bug fix (#134)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

commit 87a7ef1049f6fca409a0673f51f4758e0aff248d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:47:54 2024 +1100

    More bug fixes (#133)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 3d67790901696e953171f64b4bf9d980780051a0
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 26 04:19:17 2024 +1100

    Fix bugs (#129)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

commit a833f403462e9cfc1f96b3b84d9da15d7d8db5ee
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Tue Jan 23 03:55:24 2024 +1100

    2-4x faster native HF inference (#119)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

commit b370c9c8aacc31a7845404566dd95dfa8c0e3bac
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 22:20:22 2024 +1100

    Hotfix (#118)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 57a5b5a49da588b1db8e9a988cc985dc20393d34
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 05:00:37 2024 +1100

    Update save.py

commit 5145a61e69ab9b3035465f649e1c1e5aae749f8f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:21:54 2024 +1100

    Update save.py

commit a7bd8d119c16433de4f8b6a36903ef7131f225e5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:13:03 2024 +1100

    Update save.py

commit be4b97e7d89074b6dd1d2e984fa429051d328192
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 03:43:49 2024 +1100

    Fixed saving! (#113)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

    * upload_to_huggingface

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

commit abb462be71e8cf01ad989dca0efaa17441113651
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 23:23:00 2024 +1100

    Hotfix for Jan 2024 Release (#110)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

commit 31e2d71720e64b854145d7779833b7d2d3d4177e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 04:25:06 2024 +1100

    Quick fixes (#106)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

commit 8846337e5c8c2f206a4ac8fe6d239f3d1221f7ac
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 20 02:30:31 2024 +1100

    Update _utils.py

commit d378df87e5f3945474915a098c9aa58313465064
Merge: c1e7480 920e3c2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:38 2024 +1100

    Merge branch 'main' of https://github.com/unslothai/unsloth

commit c1e7480ac2ad0e5efa05e84fe0997619ccdd86a4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:20 2024 +1100

    Revert quantization methods

commit 920e3c2ea07a044addeb7c3fa8be6f0189cb7f84
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:57:22 2024 +1100

    getattr issues (#103)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

commit fc25ab0df032f8ee5ea750f27c68d63f49d2d9a9
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:52:30 2024 +1100

    Quick fixes (#101)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

commit b8b1eafda35d124046e11766aeeb6343957e0daf
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 04:51:19 2024 +1100

    2024 Release (#96)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

commit 4112eb4a3df4c0911e36211b47381086c963b4e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:41:00 2024 +1100

    Update pyproject.toml

commit 59d74753362ff59e664cb6d650b564511e6e20f3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:35:17 2024 +1100

    Update pyproject.toml

commit c1ac4d2707574868767345e76ebe49c8353f9057
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Thu Jan 11 04:08:03 2024 +1100

    Fix some bugs (#83)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

commit d3887c7fd93d9b910bf6ee3ab3c7fd485fc55e46
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 23:10:48 2024 +1100

    Update README.md (#81)

commit b5d94d9a0ad9532494e1b3c7badbb94fa92c50eb
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:10:23 2024 +1100

    Discord button redo (#80)

commit 01d7f58e11373ab07b9282a42bc14f542dbdabf0
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:02:20 2024 +1100

    Update logos (#79)

    * HF Perf Button

    * Update README.md

    Adding new buttons cleanup

    * Update README.md

    * Delete images/Discord.png

    * Delete images/try live demo green.png

    * new transparent logos

    * Revamping page

    * Revamp mainpage

    * Update README.md

    * Update README.md

commit 9faaf5b388e025f8ffc302450a12ffb84e7e1750
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 20:03:01 2024 +1100

    Create FUNDING.yml (#78)

commit 82e6fece0b78011707090639823d2d7acf5a3864
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 10 01:02:44 2024 +1100

    fix_tokenizer

commit b52278199b7ae2764f242622275bb8a85ba7b721
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 9 23:40:43 2024 +1100

    check_tokenizer

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 8, 2025
* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 8, 2025
* Update dataset_utils.py

* Update dataset_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* gpu_memory_utilization

* Update temporary_patches.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* train on completions VLMs

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* VLM train only on completions

* Update loss_utils.py

* Update dataset_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update saving_utils.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update saving_utils.py

* Update saving_utils.py

* Update __init__.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* Update llama_cpp.py

* Update loss_utils.py

* Update compiler.py

* Update llama_cpp.py

* Update compiler.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update training_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Revert "Update dataset_utils.py"

This reverts commit 3b690ad.

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Remove prints

* Update compiler.py

* Update saving_utils.py

* Update temporary_patches.py

* Update __init__.py

* Update pyproject.toml

* Update vllm_utils.py

* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 8, 2025
* Update vision_utils.py

* Update vision_utils.py

* train on completions VLMs

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* VLM train only on completions

* Update loss_utils.py

* Update dataset_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update saving_utils.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update saving_utils.py

* Update saving_utils.py

* Update __init__.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* Update llama_cpp.py

* Update loss_utils.py

* Update compiler.py

* Update llama_cpp.py

* Update compiler.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update training_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Revert "Update dataset_utils.py"

This reverts commit 3b690ad.

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Remove prints

* Update compiler.py

* Update saving_utils.py

* Update temporary_patches.py

* Update __init__.py

* Update pyproject.toml

* Update vllm_utils.py

* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 8, 2025
* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update saving_utils.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update saving_utils.py

* Update saving_utils.py

* Update __init__.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* Update llama_cpp.py

* Update loss_utils.py

* Update compiler.py

* Update llama_cpp.py

* Update compiler.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update training_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Revert "Update dataset_utils.py"

This reverts commit 3b690ad.

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Remove prints

* Update compiler.py

* Update saving_utils.py

* Update temporary_patches.py

* Update __init__.py

* Update pyproject.toml

* Update vllm_utils.py

* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 8, 2025
* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Swap space reduce

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 8, 2025
* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Swap space reduce

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update rl_replacements.py

* Revert "Update rl_replacements.py"

This reverts commit c0a4022.

* Update __init__.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 8, 2025
* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Swap space reduce

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update rl_replacements.py

* Revert "Update rl_replacements.py"

This reverts commit c0a4022.

* Update __init__.py

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Fixes

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* revert

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update __init__.py

* Update compiler.py

* Update temporary_patches.py

* Update compiler.py

* Update temporary_patches.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 25, 2025
* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 25, 2025
* Update dataset_utils.py

* Update dataset_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* gpu_memory_utilization

* Update temporary_patches.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* train on completions VLMs

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* VLM train only on completions

* Update loss_utils.py

* Update dataset_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update saving_utils.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update saving_utils.py

* Update saving_utils.py

* Update __init__.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* Update llama_cpp.py

* Update loss_utils.py

* Update compiler.py

* Update llama_cpp.py

* Update compiler.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update training_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Revert "Update dataset_utils.py"

This reverts commit 3b690ad.

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Remove prints

* Update compiler.py

* Update saving_utils.py

* Update temporary_patches.py

* Update __init__.py

* Update pyproject.toml

* Update vllm_utils.py

* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 25, 2025
* Update vision_utils.py

* Update vision_utils.py

* train on completions VLMs

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* VLM train only on completions

* Update loss_utils.py

* Update dataset_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update saving_utils.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update saving_utils.py

* Update saving_utils.py

* Update __init__.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* Update llama_cpp.py

* Update loss_utils.py

* Update compiler.py

* Update llama_cpp.py

* Update compiler.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update training_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Revert "Update dataset_utils.py"

This reverts commit 3b690ad.

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Remove prints

* Update compiler.py

* Update saving_utils.py

* Update temporary_patches.py

* Update __init__.py

* Update pyproject.toml

* Update vllm_utils.py

* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 25, 2025
* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update saving_utils.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update saving_utils.py

* Update saving_utils.py

* Update __init__.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* Update llama_cpp.py

* Update loss_utils.py

* Update compiler.py

* Update llama_cpp.py

* Update compiler.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update training_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Revert "Update dataset_utils.py"

This reverts commit 3b690ad.

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Remove prints

* Update compiler.py

* Update saving_utils.py

* Update temporary_patches.py

* Update __init__.py

* Update pyproject.toml

* Update vllm_utils.py

* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 25, 2025
* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Swap space reduce

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 25, 2025
* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Swap space reduce

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update rl_replacements.py

* Revert "Update rl_replacements.py"

This reverts commit c0a4022.

* Update __init__.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jun 25, 2025
* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Swap space reduce

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update rl_replacements.py

* Revert "Update rl_replacements.py"

This reverts commit c0a4022.

* Update __init__.py

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Fixes

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* revert

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update __init__.py

* Update compiler.py

* Update temporary_patches.py

* Update compiler.py

* Update temporary_patches.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jul 7, 2025
* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jul 7, 2025
* Update dataset_utils.py

* Update dataset_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* gpu_memory_utilization

* Update temporary_patches.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* train on completions VLMs

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* VLM train only on completions

* Update loss_utils.py

* Update dataset_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update saving_utils.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update saving_utils.py

* Update saving_utils.py

* Update __init__.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* Update llama_cpp.py

* Update loss_utils.py

* Update compiler.py

* Update llama_cpp.py

* Update compiler.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update training_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Revert "Update dataset_utils.py"

This reverts commit 7da25fe.

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Remove prints

* Update compiler.py

* Update saving_utils.py

* Update temporary_patches.py

* Update __init__.py

* Update pyproject.toml

* Update vllm_utils.py

* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jul 7, 2025
* Update vision_utils.py

* Update vision_utils.py

* train on completions VLMs

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* VLM train only on completions

* Update loss_utils.py

* Update dataset_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update saving_utils.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update saving_utils.py

* Update saving_utils.py

* Update __init__.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* Update llama_cpp.py

* Update loss_utils.py

* Update compiler.py

* Update llama_cpp.py

* Update compiler.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update training_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Revert "Update dataset_utils.py"

This reverts commit 7da25fe.

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Remove prints

* Update compiler.py

* Update saving_utils.py

* Update temporary_patches.py

* Update __init__.py

* Update pyproject.toml

* Update vllm_utils.py

* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jul 7, 2025
* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update saving_utils.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update saving_utils.py

* Update saving_utils.py

* Update __init__.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update loss_utils.py

* Update loss_utils.py

* Update llama_cpp.py

* Update loss_utils.py

* Update compiler.py

* Update llama_cpp.py

* Update compiler.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update training_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Revert "Update dataset_utils.py"

This reverts commit 7da25fe.

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Remove prints

* Update compiler.py

* Update saving_utils.py

* Update temporary_patches.py

* Update __init__.py

* Update pyproject.toml

* Update vllm_utils.py

* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jul 7, 2025
* bug fix unslothai#2008 unsloth issue - load_in_4bit = True + fast_inference = True (unslothai#79)

* bug fix unslothai#2008 unsloth

* non-quant dtype fix

* Update vllm_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update dataset_utils.py

* Update compiler.py

* Update temporary_patches.py

* Gemma 3 fixes

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Gemma 3 fixes

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Swap space reduce

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jul 7, 2025
* Update compiler.py

* Update patching_utils.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* compiler

* Update gradient_checkpointing.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Swap space reduce

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update rl_replacements.py

* Revert "Update rl_replacements.py"

This reverts commit c0a4022.

* Update __init__.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
mmathew23 added a commit to mmathew23/unsloth that referenced this pull request Jul 7, 2025
* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* causal mask dtype

* Fix checkpoint and save from local file (unslothai#74)

* Enhance gradient checkpointing and add original model ID retrieval in saving utilities

* In case adapter_config.json as well

* Update patching_utils.py

* Update patching_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update compiler.py

* Update peft_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update vllm_lora_worker_manager.py

* Update utils.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update dataset_utils.py

* bidirectional attention

* Update vllm_utils.py

* Update __init__.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update __init__.py

* fix: AsyncLLMEngine bugs (unslothai#82)

* fixed a typo in L119, removing unnecessary len() (unslothai#84)

Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>

* Fix gradient checkpointing warning filter implementation

* Input grads fix for gemma3 (unslothai#96)

* gemma require gradients fix

* Update peft_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update vision_utils.py

* Vision requires grad

* Check SDPA for Mistral / Pixtral

* Update compiler.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update __init__.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vision_utils.py

* Update vllm_utils.py (unslothai#99)

Fix bugs in generate_batches.py.Original output = [] will result in duplication of results.

* Update vision_utils.py

* Fixes to support IterableDataset (unslothai#98)

* Support Iterable Datasets

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Update dataset_utils.py

* Preserve batch size from iterable dataset

* Preserve batch size from iterable dataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Support train_on_response_only with IterableDataset

* Update vllm_utils.py

* Create vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* Update vllm_rlhf_utils.py

* vLLM for Qwen 3

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update compiler.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Swap space reduce

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update vllm_utils.py

* Update rl_replacements.py

* Revert "Update rl_replacements.py"

This reverts commit c0a4022.

* Update __init__.py

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Fixes

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update compiler.py

* revert

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update temporary_patches.py

* Update __init__.py

* Update compiler.py

* Update temporary_patches.py

* Update compiler.py

* Update temporary_patches.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com>
Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com>
Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
Co-authored-by: Roland Tannous <rolandtannous@gonovel.co>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com>
Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
rolandtannous added a commit to rolandtannous/unsloth that referenced this pull request Mar 11, 2026
…-ordered

Added the inference defaults for models
rolandtannous added a commit that referenced this pull request Mar 12, 2026
Stanley00 pushed a commit to stanley-fork/unsloth that referenced this pull request Mar 12, 2026
…-ordered

Added the inference defaults for models
abiswas-realadvice pushed a commit to abiswas-realadvice/unsloth that referenced this pull request May 14, 2026
* Fix tokenizer, dropout, bias for LoRA

* Update loader.py

* Fix LoRA downcasting

* Update _utils.py

* Saving to GGUF

* fix

* colab_quantize_to_gguf

* move save modules

* save module

* Update __init__.py

* Update save.py

* Temp downgrade due to TRL issue

* Fix up bugs

* Faster saving + other changes

* Update llama.py

* Saving modules

* spelling

* Update llama.py

* Update save.py

* Update save.py

* Update loader.py

* Update llama.py

* patch saving

* Update save.py

* Update save.py

* Update save.py

* patch saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* original_model

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* saving to RAM leakage?

* Update save.py

* new_save_directory

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml
abiswas-realadvice pushed a commit to abiswas-realadvice/unsloth that referenced this pull request May 14, 2026
* HF Perf Button

* Update README.md

Adding new buttons cleanup

* Update README.md

* Delete images/Discord.png

* Delete images/try live demo green.png

* new transparent logos

* Revamping page

* Revamp mainpage

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* finetune button

* Delete start free finetune button.png

* free finetune button

* Add files via upload

* Update README.md

* Update README.md

* Add files via upload

* Add files via upload

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Squashed commit of the following:

commit efa0d2332ebc6d8f215aec07d5cc9907f4e84f34
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Feb 4 17:35:56 2024 +1100

    2x faster inference (#151)

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

    * Update llama.py

    * Update llama.py

    * Fix SDPA

    * Update llama.py

    * padding

    * Inference

    * Update llama.py

    * Revert

    * Update mistral.py

    * faster inference

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * inference

    * Update llama.py

    * Update utils.py

    * faster inference

    * Update llama.py

    * revert

    * lm_head

    * Update llama.py

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * faster inference

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * torch compile

    * past_key_values

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update llama.py

    * fast inference + saving config.json

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * fast inference again

    * more temp matrices

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update mistral.py

    * Update llama.py

    * SDPA

    * attention_mask

    * New version

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

commit 2f55935f941eb61816b145575389f91dde4e00f7
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 31 04:03:37 2024 +1100

    Hotfix - fix inference (#146)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

commit a3a2ad93821cede32723843dfb3dfbfe0387d25e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 17:49:54 2024 +1100

    Fix inference attention mask (#142)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

commit 90309ca8dcb06f0611c1bde4a61eb08fb7317993
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:45:07 2024 +1100

    Nightly (#140)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

commit a16bc73e8077fd3c6a034741ae782bcfeb9fa278
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 02:52:39 2024 +1100

    Fix saving issues (#139)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

commit af332245543b1f9ac129b67e5c350047c967846d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:30:29 2024 +1100

    1 more bug (#138)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

commit e2bbd3819e0899e09787a985cd11c08961f09c09
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:20:06 2024 +1100

    Fix bugs + more accurate Swiglu (#137)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

commit a81aff286f1e67c82b2a5105679c85866f624629
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:50:22 2024 +1100

    Inference bug fix (#134)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

commit 7da0c50f757b6b2d9cbe660ee68d23700f2e2b0d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:47:54 2024 +1100

    More bug fixes (#133)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 62fae3aa740869db2fe1522ea38b334ef090d5e7
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 26 04:19:17 2024 +1100

    Fix bugs (#129)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

commit 04f8771821a57fda5109d60b0fe49bb31d0df15b
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Tue Jan 23 03:55:24 2024 +1100

    2-4x faster native HF inference (#119)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

commit 3a9b2dee98fd0547789da9b68e765f054484abc4
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 22:20:22 2024 +1100

    Hotfix (#118)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit a6f4fb007510aeb2a86500d874f2117e81853d7e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 05:00:37 2024 +1100

    Update save.py

commit 705cac03576fe2fff3923841c102a8bd6b72a65b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:21:54 2024 +1100

    Update save.py

commit 16edcb3be2c328f3377aff6555e6435b28980a52
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:13:03 2024 +1100

    Update save.py

commit 3d05a74b12edd39638aacf3b44eca65818c6708a
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 03:43:49 2024 +1100

    Fixed saving! (#113)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

    * upload_to_huggingface

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

commit bb05d6b6e2af2c8807ae4842dcbc2805c9356599
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 23:23:00 2024 +1100

    Hotfix for Jan 2024 Release (#110)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

commit 12e75c93d040f99d5a0cc4c4ee162d804c9fbbf4
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 04:25:06 2024 +1100

    Quick fixes (#106)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

commit 52b5ef31e0cdd96d5b980a1581d3c26c5b89c86c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 20 02:30:31 2024 +1100

    Update _utils.py

commit 1a19c38675a35e6121fa4a95438525f306bca26b
Merge: 0a52390 0d6e52b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:38 2024 +1100

    Merge branch 'main' of https://github.com/unslothai/unsloth

commit 0a52390ac29a78399b033349070fe1d1280bd296
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:20 2024 +1100

    Revert quantization methods

commit 0d6e52b5c7723ed5c78b54c9a6eb67a1997f6038
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:57:22 2024 +1100

    getattr issues (#103)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

commit b3fcea642127ee381a3cf19d33fb8910d066643c
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:52:30 2024 +1100

    Quick fixes (#101)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

commit d691516ab9d64ea61b0af277f3955336a434694d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 04:51:19 2024 +1100

    2024 Release (#96)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

commit 9e2dec16fb29ee97572b4431e892e3f7ca867422
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:41:00 2024 +1100

    Update pyproject.toml

commit 396c7245dda2c913e6b97729fd34e7551dc8e9fa
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:35:17 2024 +1100

    Update pyproject.toml

commit 738e91591f3fb39ce03238134fd0d82a84f4b2e3
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Thu Jan 11 04:08:03 2024 +1100

    Fix some bugs (#83)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

commit a1da50b5ce53f8e57a1b01db607b32f4d0d862e5
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 23:10:48 2024 +1100

    Update README.md (#81)

commit 606e8a928440f396601c1d57a003c0401ba26ec0
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:10:23 2024 +1100

    Discord button redo (#80)

commit 0169294ffb19fdb877170529381f25bd0f83fc3c
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:02:20 2024 +1100

    Update logos (#79)

    * HF Perf Button

    * Update README.md

    Adding new buttons cleanup

    * Update README.md

    * Delete images/Discord.png

    * Delete images/try live demo green.png

    * new transparent logos

    * Revamping page

    * Revamp mainpage

    * Update README.md

    * Update README.md

commit b2a8c33430e4a31cf7baafe184d448bb50595bb1
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 20:03:01 2024 +1100

    Create FUNDING.yml (#78)

commit c9c1abf29045b3831f62099ff03c5b54b99522a6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 10 01:02:44 2024 +1100

    fix_tokenizer

commit 6efffb46e42543986c637690a045092226af5d61
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 9 23:40:43 2024 +1100

    check_tokenizer

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
danielhanchen added a commit that referenced this pull request May 19, 2026
Three feedback items rolled in:

1. gemini-code-assist (medium): the __IMAGES__ sentinel stripper in
   safetensors_agentic.py used `rsplit("\n__IMAGES__:", 1)`, which
   leaves the marker visible to the model when the sentinel appears at
   the very start of the result (no leading newline) and when multiple
   sentinels follow each other back-to-back. Switched to a
   `split("__IMAGES__:", 1)[0].rstrip()` cut so the first occurrence
   truncates the entire image block. Two new tests pin both edge
   cases: leading sentinel and consecutive sentinels.

2. gemini-code-assist (medium): `_detect_safetensors_features` had a
   bare `except Exception: pass` around the gpt-oss override probe.
   Replaced with `logger.debug(..., exc_info=True)` so unexpected
   classifier failures are at least visible in the structured log.

3. CodeQL py/stack-trace-exposure (alerts #95 and #96, CWE-209): the
   safetensors tool stream and non-streaming tool completion paths
   passed `_friendly_error(e)` into the SSE/JSON error response. The
   helper itself never leaks a raw traceback, but with `tb` and `e` in
   the same scope CodeQL flags the taint sink. Tightened both
   handlers to log the exception server-side (logger.exception) and
   emit a constant "An internal error occurred." string over the
   wire. The GGUF tool stream handler is left as-is because it talks
   to a managed llama-server with a known error surface that
   `_friendly_error` already classifies safely.

Tests: 43 tool-loop + 11 capability-advertise + 190 adjacent
regression tests all green locally.

Also merged origin/main (47 commits) so the branch ships against the
current main rather than its base SHA from a week ago.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant