bug fix #2008 unsloth issue - load_in_4bit = True + fast_inference = True#79
Conversation
| quantization_config = getattr(config, "quantization_config", {}) | ||
| kwargs = dict() | ||
| if quantization_config != {}: | ||
| if quantization_config != {} or bnb_config: |
There was a problem hiding this comment.
what if there's no quant involved and we're doing BF16 LoRA or something? compute_dtype will still be not declared...
might want to check that?
There was a problem hiding this comment.
@Datta0 This doesn't change the flow for non quantized cases. This comes into effect only when loading custom models which are BF16 on hugging face, but the user mentions load in 4bit and gast inference to be True.
There was a problem hiding this comment.
This doesn't change the flow for non quant cases, but that is where the problem lies.
compute_dtype would be undeclared variable only to be referenced a few lines later
I'm saying maybe we should set compute_dtype outside the if quantization .... so that it is always available.
There was a problem hiding this comment.
I see what you mean, yes. Regardless of the type of quantization config, the compute_dtype is always set to the same dtype value passed to this method. I'll update this accordingly. Thanks for the catch!
There was a problem hiding this comment.
LGTM! Thanks for the changes :)
|
Thank you for the PR we will review it |
| quantization_config = getattr(config, "quantization_config", {}) | ||
| kwargs = dict() | ||
| if quantization_config != {}: | ||
| if quantization_config != {} or bnb_config: |
There was a problem hiding this comment.
LGTM! Thanks for the changes :)
|
confirmed working for me |
|
maybe replace with |
|
Thanks and appreciate it! I'll add this to nightly and push it a mini release later today! |
* Update compiler.py * debugging * remove debugging * num items in batch * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * logs * Update patching_utils.py * VLM attention mask * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Recheck * Update compiler.py * Update patching_utils.py * Update patching_utils.py * Update patching_utils.py * Update patching_utils.py * Update compiler.py * Update patching_utils.py * suppress errors * Update compiler.py * Update patching_utils.py * Update compiler.py * Update patching_utils.py * Update patching_utils.py * Update patching_utils.py * Update peft_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * bug fixes * Update compiler.py * Update compiler.py * Update vision_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Bug fixes * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * gpu_memory_utilization * Update temporary_patches.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * train on completions VLMs * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * VLM train only on completions * Update loss_utils.py * Update dataset_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update saving_utils.py * Update llama_cpp.py * Update llama_cpp.py * Update saving_utils.py * Update saving_utils.py * Update __init__.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update llama_cpp.py * Update loss_utils.py * Update compiler.py * Update llama_cpp.py * Update compiler.py * Update vllm_utils.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update training_utils.py * Update dataset_utils.py * Update dataset_utils.py * Revert "Update dataset_utils.py" This reverts commit 3b690ad. * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Remove prints * Update compiler.py * Update saving_utils.py * Update temporary_patches.py * Update __init__.py * Update pyproject.toml * Update vllm_utils.py * bug fix #2008 unsloth issue - load_in_4bit = True + fast_inference = True (#79) * bug fix #2008 unsloth * non-quant dtype fix * Update vllm_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update dataset_utils.py * Update compiler.py * Update temporary_patches.py * Gemma 3 fixes * Update temporary_patches.py * Update compiler.py * Update compiler.py * Gemma 3 fixes * Update patching_utils.py * Update compiler.py * Update compiler.py * Update patching_utils.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * compiler * Update gradient_checkpointing.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * causal mask dtype * Fix checkpoint and save from local file (#74) * Enhance gradient checkpointing and add original model ID retrieval in saving utilities * In case adapter_config.json as well * Update patching_utils.py * Update patching_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update vllm_utils.py * Update compiler.py * Update peft_utils.py * Update rl_replacements.py * Update vllm_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py --------- Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
* Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Recheck * Update compiler.py * Update patching_utils.py * Update patching_utils.py * Update patching_utils.py * Update patching_utils.py * Update compiler.py * Update patching_utils.py * suppress errors * Update compiler.py * Update patching_utils.py * Update compiler.py * Update patching_utils.py * Update patching_utils.py * Update patching_utils.py * Update peft_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * bug fixes * Update compiler.py * Update compiler.py * Update vision_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Bug fixes * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * gpu_memory_utilization * Update temporary_patches.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * train on completions VLMs * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * VLM train only on completions * Update loss_utils.py * Update dataset_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update saving_utils.py * Update llama_cpp.py * Update llama_cpp.py * Update saving_utils.py * Update saving_utils.py * Update __init__.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update llama_cpp.py * Update loss_utils.py * Update compiler.py * Update llama_cpp.py * Update compiler.py * Update vllm_utils.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update training_utils.py * Update dataset_utils.py * Update dataset_utils.py * Revert "Update dataset_utils.py" This reverts commit 3b690ad. * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Remove prints * Update compiler.py * Update saving_utils.py * Update temporary_patches.py * Update __init__.py * Update pyproject.toml * Update vllm_utils.py * bug fix #2008 unsloth issue - load_in_4bit = True + fast_inference = True (#79) * bug fix #2008 unsloth * non-quant dtype fix * Update vllm_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update dataset_utils.py * Update compiler.py * Update temporary_patches.py * Gemma 3 fixes * Update temporary_patches.py * Update compiler.py * Update compiler.py * Gemma 3 fixes * Update patching_utils.py * Update compiler.py * Update compiler.py * Update patching_utils.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * compiler * Update gradient_checkpointing.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * causal mask dtype * Fix checkpoint and save from local file (#74) * Enhance gradient checkpointing and add original model ID retrieval in saving utilities * In case adapter_config.json as well * Update patching_utils.py * Update patching_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update vllm_utils.py * Update compiler.py * Update peft_utils.py * Update rl_replacements.py * Update vllm_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update vllm_lora_worker_manager.py * Update utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update dataset_utils.py * bidirectional attention * Update vllm_utils.py * Update __init__.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py --------- Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
* Update compiler.py * Update patching_utils.py * Update patching_utils.py * Update patching_utils.py * Update peft_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * bug fixes * Update compiler.py * Update compiler.py * Update vision_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Bug fixes * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * gpu_memory_utilization * Update temporary_patches.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * train on completions VLMs * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * VLM train only on completions * Update loss_utils.py * Update dataset_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update saving_utils.py * Update llama_cpp.py * Update llama_cpp.py * Update saving_utils.py * Update saving_utils.py * Update __init__.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update llama_cpp.py * Update loss_utils.py * Update compiler.py * Update llama_cpp.py * Update compiler.py * Update vllm_utils.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update training_utils.py * Update dataset_utils.py * Update dataset_utils.py * Revert "Update dataset_utils.py" This reverts commit 3b690ad. * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Remove prints * Update compiler.py * Update saving_utils.py * Update temporary_patches.py * Update __init__.py * Update pyproject.toml * Update vllm_utils.py * bug fix #2008 unsloth issue - load_in_4bit = True + fast_inference = True (#79) * bug fix #2008 unsloth * non-quant dtype fix * Update vllm_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update dataset_utils.py * Update compiler.py * Update temporary_patches.py * Gemma 3 fixes * Update temporary_patches.py * Update compiler.py * Update compiler.py * Gemma 3 fixes * Update patching_utils.py * Update compiler.py * Update compiler.py * Update patching_utils.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * compiler * Update gradient_checkpointing.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * causal mask dtype * Fix checkpoint and save from local file (#74) * Enhance gradient checkpointing and add original model ID retrieval in saving utilities * In case adapter_config.json as well * Update patching_utils.py * Update patching_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update vllm_utils.py * Update compiler.py * Update peft_utils.py * Update rl_replacements.py * Update vllm_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update vllm_lora_worker_manager.py * Update utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update dataset_utils.py * bidirectional attention * Update vllm_utils.py * Update __init__.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update __init__.py * fix: AsyncLLMEngine bugs (#82) * fixed a typo in L119, removing unnecessary len() (#84) Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk> --------- Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com> Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com> Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk>
* Update dataset_utils.py * Update dataset_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * gpu_memory_utilization * Update temporary_patches.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * train on completions VLMs * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * VLM train only on completions * Update loss_utils.py * Update dataset_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update saving_utils.py * Update llama_cpp.py * Update llama_cpp.py * Update saving_utils.py * Update saving_utils.py * Update __init__.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update llama_cpp.py * Update loss_utils.py * Update compiler.py * Update llama_cpp.py * Update compiler.py * Update vllm_utils.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update training_utils.py * Update dataset_utils.py * Update dataset_utils.py * Revert "Update dataset_utils.py" This reverts commit 3b690ad. * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Remove prints * Update compiler.py * Update saving_utils.py * Update temporary_patches.py * Update __init__.py * Update pyproject.toml * Update vllm_utils.py * bug fix #2008 unsloth issue - load_in_4bit = True + fast_inference = True (#79) * bug fix #2008 unsloth * non-quant dtype fix * Update vllm_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update dataset_utils.py * Update compiler.py * Update temporary_patches.py * Gemma 3 fixes * Update temporary_patches.py * Update compiler.py * Update compiler.py * Gemma 3 fixes * Update patching_utils.py * Update compiler.py * Update compiler.py * Update patching_utils.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * compiler * Update gradient_checkpointing.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * causal mask dtype * Fix checkpoint and save from local file (#74) * Enhance gradient checkpointing and add original model ID retrieval in saving utilities * In case adapter_config.json as well * Update patching_utils.py * Update patching_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update vllm_utils.py * Update compiler.py * Update peft_utils.py * Update rl_replacements.py * Update vllm_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update vllm_lora_worker_manager.py * Update utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update dataset_utils.py * bidirectional attention * Update vllm_utils.py * Update __init__.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update __init__.py * fix: AsyncLLMEngine bugs (#82) * fixed a typo in L119, removing unnecessary len() (#84) Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk> * Fix gradient checkpointing warning filter implementation * Input grads fix for gemma3 (#96) * gemma require gradients fix * Update peft_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update vision_utils.py * Vision requires grad * Check SDPA for Mistral / Pixtral * Update compiler.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update __init__.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vllm_utils.py (#99) Fix bugs in generate_batches.py.Original output = [] will result in duplication of results. * Update vision_utils.py * Fixes to support IterableDataset (#98) * Support Iterable Datasets * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Preserve batch size from iterable dataset * Preserve batch size from iterable dataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset --------- Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com> Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com> Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk> Co-authored-by: Roland Tannous <rolandtannous@gonovel.co> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com> Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
* Update vision_utils.py * Update vision_utils.py * train on completions VLMs * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * VLM train only on completions * Update loss_utils.py * Update dataset_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update saving_utils.py * Update llama_cpp.py * Update llama_cpp.py * Update saving_utils.py * Update saving_utils.py * Update __init__.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update llama_cpp.py * Update loss_utils.py * Update compiler.py * Update llama_cpp.py * Update compiler.py * Update vllm_utils.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update training_utils.py * Update dataset_utils.py * Update dataset_utils.py * Revert "Update dataset_utils.py" This reverts commit 3b690ad. * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Remove prints * Update compiler.py * Update saving_utils.py * Update temporary_patches.py * Update __init__.py * Update pyproject.toml * Update vllm_utils.py * bug fix #2008 unsloth issue - load_in_4bit = True + fast_inference = True (#79) * bug fix #2008 unsloth * non-quant dtype fix * Update vllm_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update dataset_utils.py * Update compiler.py * Update temporary_patches.py * Gemma 3 fixes * Update temporary_patches.py * Update compiler.py * Update compiler.py * Gemma 3 fixes * Update patching_utils.py * Update compiler.py * Update compiler.py * Update patching_utils.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * compiler * Update gradient_checkpointing.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * causal mask dtype * Fix checkpoint and save from local file (#74) * Enhance gradient checkpointing and add original model ID retrieval in saving utilities * In case adapter_config.json as well * Update patching_utils.py * Update patching_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update vllm_utils.py * Update compiler.py * Update peft_utils.py * Update rl_replacements.py * Update vllm_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update vllm_lora_worker_manager.py * Update utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update dataset_utils.py * bidirectional attention * Update vllm_utils.py * Update __init__.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update __init__.py * fix: AsyncLLMEngine bugs (#82) * fixed a typo in L119, removing unnecessary len() (#84) Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk> * Fix gradient checkpointing warning filter implementation * Input grads fix for gemma3 (#96) * gemma require gradients fix * Update peft_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update vision_utils.py * Vision requires grad * Check SDPA for Mistral / Pixtral * Update compiler.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update __init__.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vllm_utils.py (#99) Fix bugs in generate_batches.py.Original output = [] will result in duplication of results. * Update vision_utils.py * Fixes to support IterableDataset (#98) * Support Iterable Datasets * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Preserve batch size from iterable dataset * Preserve batch size from iterable dataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Update vllm_utils.py * Create vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * vLLM for Qwen 3 * Update vllm_utils.py * Update vllm_utils.py --------- Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com> Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com> Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk> Co-authored-by: Roland Tannous <rolandtannous@gonovel.co> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com> Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
* Update compiler.py * Update compiler.py * Update compiler.py * Update saving_utils.py * Update llama_cpp.py * Update llama_cpp.py * Update saving_utils.py * Update saving_utils.py * Update __init__.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update loss_utils.py * Update loss_utils.py * Update llama_cpp.py * Update loss_utils.py * Update compiler.py * Update llama_cpp.py * Update compiler.py * Update vllm_utils.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update training_utils.py * Update dataset_utils.py * Update dataset_utils.py * Revert "Update dataset_utils.py" This reverts commit 3b690ad. * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Remove prints * Update compiler.py * Update saving_utils.py * Update temporary_patches.py * Update __init__.py * Update pyproject.toml * Update vllm_utils.py * bug fix #2008 unsloth issue - load_in_4bit = True + fast_inference = True (#79) * bug fix #2008 unsloth * non-quant dtype fix * Update vllm_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update dataset_utils.py * Update compiler.py * Update temporary_patches.py * Gemma 3 fixes * Update temporary_patches.py * Update compiler.py * Update compiler.py * Gemma 3 fixes * Update patching_utils.py * Update compiler.py * Update compiler.py * Update patching_utils.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * compiler * Update gradient_checkpointing.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * causal mask dtype * Fix checkpoint and save from local file (#74) * Enhance gradient checkpointing and add original model ID retrieval in saving utilities * In case adapter_config.json as well * Update patching_utils.py * Update patching_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update vllm_utils.py * Update compiler.py * Update peft_utils.py * Update rl_replacements.py * Update vllm_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update vllm_lora_worker_manager.py * Update utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update dataset_utils.py * bidirectional attention * Update vllm_utils.py * Update __init__.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update __init__.py * fix: AsyncLLMEngine bugs (#82) * fixed a typo in L119, removing unnecessary len() (#84) Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk> * Fix gradient checkpointing warning filter implementation * Input grads fix for gemma3 (#96) * gemma require gradients fix * Update peft_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update vision_utils.py * Vision requires grad * Check SDPA for Mistral / Pixtral * Update compiler.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update __init__.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vllm_utils.py (#99) Fix bugs in generate_batches.py.Original output = [] will result in duplication of results. * Update vision_utils.py * Fixes to support IterableDataset (#98) * Support Iterable Datasets * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Preserve batch size from iterable dataset * Preserve batch size from iterable dataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Update vllm_utils.py * Create vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * vLLM for Qwen 3 * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py --------- Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com> Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com> Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk> Co-authored-by: Roland Tannous <rolandtannous@gonovel.co> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com> Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
* bug fix #2008 unsloth issue - load_in_4bit = True + fast_inference = True (#79) * bug fix #2008 unsloth * non-quant dtype fix * Update vllm_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update dataset_utils.py * Update compiler.py * Update temporary_patches.py * Gemma 3 fixes * Update temporary_patches.py * Update compiler.py * Update compiler.py * Gemma 3 fixes * Update patching_utils.py * Update compiler.py * Update compiler.py * Update patching_utils.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * compiler * Update gradient_checkpointing.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * causal mask dtype * Fix checkpoint and save from local file (#74) * Enhance gradient checkpointing and add original model ID retrieval in saving utilities * In case adapter_config.json as well * Update patching_utils.py * Update patching_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update vllm_utils.py * Update compiler.py * Update peft_utils.py * Update rl_replacements.py * Update vllm_utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update compiler.py * Update vllm_lora_worker_manager.py * Update utils.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update dataset_utils.py * bidirectional attention * Update vllm_utils.py * Update __init__.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update temporary_patches.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update loss_utils.py * Update __init__.py * fix: AsyncLLMEngine bugs (#82) * fixed a typo in L119, removing unnecessary len() (#84) Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk> * Fix gradient checkpointing warning filter implementation * Input grads fix for gemma3 (#96) * gemma require gradients fix * Update peft_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update vision_utils.py * Vision requires grad * Check SDPA for Mistral / Pixtral * Update compiler.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update __init__.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vision_utils.py * Update vllm_utils.py (#99) Fix bugs in generate_batches.py.Original output = [] will result in duplication of results. * Update vision_utils.py * Fixes to support IterableDataset (#98) * Support Iterable Datasets * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Update dataset_utils.py * Preserve batch size from iterable dataset * Preserve batch size from iterable dataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Support train_on_response_only with IterableDataset * Update vllm_utils.py * Create vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * Update vllm_rlhf_utils.py * vLLM for Qwen 3 * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update compiler.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Swap space reduce * Update vllm_utils.py * Update vllm_utils.py * Update rl_replacements.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update __init__.py --------- Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Brad Hilton <brad.hilton.nw@gmail.com> Co-authored-by: SpaceHunter <30568250+SpaceHunterInf@users.noreply.github.com> Co-authored-by: Xiaochen Zhu <xz479@cl.cam.ac.uk> Co-authored-by: Roland Tannous <rolandtannous@gonovel.co> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Qian Wu <121997440+5k5000@users.noreply.github.com> Co-authored-by: marcandrelarochelle <marcandrelarochelle1820@gmail.com>
This PR addresses the unsloth issue : https://github.com/unslothai/unsloth/issues/2008
While the previous release fixes the vLLM component of the issue unslothai/unsloth#2008 , the process still errors out for custom models due to the on the fly bnb_config not being passed to the convert_vllm_to_huggingface method in unsloth_zoo's vllm_utils.py
This PR modifies the vllm_utils.py to take in the on-the-fly generated bnb_config and pass it on to the convert_vllm_to_huggingface method to be parsed for quantization configs.
I have chosen not to bundle it with the model config, since the custom models might also have their own bnb-configs, if they're 4bit quantized already. Hence, the if and elif for parsing the quantization_config and the generated bnb_config.
NOTE: This PR needs to be merged along with unslothai/unsloth#2039 in unsloth, where the llama.py is edited to handle this additional configuration.
Code Snippet:
Output before fix:
Output After Fix: