Skip to content

[DOCS] Lora without regret#4181

Merged
kashif merged 24 commits into
huggingface:mainfrom
burtenshaw:lora-without-regret
Oct 3, 2025
Merged

[DOCS] Lora without regret#4181
kashif merged 24 commits into
huggingface:mainfrom
burtenshaw:lora-without-regret

Conversation

@burtenshaw

@burtenshaw burtenshaw commented Sep 30, 2025

Copy link
Copy Markdown
Collaborator

This is draft PR for a docs page to implement the blog post 'lora without regret' in TRL.

@edbeeching is going to review and share a script.
@sergiopaniego

example with sft

hf jobs uv run \
    --flavor a100-large \
    --timeout 8h \
    --secrets HF_TOKEN \
    "https://gist.githubusercontent.com/burtenshaw/fce24305833f2ecacfe8da181901d345/raw/sft_lora.py" \
    --model_name_or_path Qwen/Qwen2.5-3B-Instruct \
    --dataset_name open-thoughts/OpenThoughts-114k \
    --learning_rate 2.0e-5 \
    --num_train_epochs 1 \
    --packing \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --gradient_checkpointing \
    --eval_strategy no \
    --use_peft \
    --lora_r 256 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --output_dir Qwen2.5-3B-OpenThoughts-LoRA \
    --report_to trackio \
    --push_to_hub

example with grpo

hf jobs uv run \
    --flavor a100-large \
    --timeout 6h \
    --secrets HF_TOKEN \
    "https://gist.githubusercontent.com/burtenshaw/f3fd519cb7efd647254c60b6b904cbcb/raw/c688abe1a9487090bb931b51ecec12c6737cdc52/grpo_lora.py" \
    --model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \
    --output_dir grpo-Qwen2.5-VL-3B-Instruct-LoRA \
    --learning_rate 1e-5 \
    --gradient_checkpointing \
    --torch_dtype bfloat16 \
    --max_prompt_length 2048 \
    --max_completion_length 1024 \
    --use_vllm \
    --vllm_mode colocate \
    --use_peft \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --log_completions \
    --report_to trackio \
    --push_to_hub

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sergiopaniego sergiopaniego left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, will go back after finishing the referenced blog

Comment thread docs/source/_toctree.yml Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
# TODO: local command
```

To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.
To run the script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need uv to run local script

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to use a custom uv based script. I'll use the standard trl scripts instead.

Comment thread docs/source/lora_without_regret.md Outdated

@sergiopaniego sergiopaniego left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the development @burtenshaw !! 🙌
Adding some more comments. Maybe we could add pointers to the blogs for each key finding.

Comment thread docs/source/lora_without_regret.md
Comment thread docs/source/lora_without_regret.md
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
burtenshaw and others added 6 commits October 2, 2025 12:59
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Comment thread docs/source/lora_without_regret.md

@sergiopaniego sergiopaniego left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!!! just a few ideas and we're good to go :)

Comment thread docs/source/_toctree.yml Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md
```bash

uv run "https://huggingface.co/datasets/burtenshaw/lora-without-regrets/resolve/main/grpo.py" \
--model_name_or_path Qwen/Qwen3-0.6B \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default this model operates in "think" mode and thus produces many more tokens than the 4096 you've allocated. The best thing to do would be to copy the dataset (or make a subset) with a chat_template_kwargs column that has {"enable_thinking": false} if you want to only optimise the non-reasoning mode.

Alternatively you could pick a model like Gemma3 which doesn't reason.

@burtenshaw burtenshaw Oct 2, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. The script and model choice don't align. In the SmolmLM3 and reasoning script I do:

def make_conversation(example):
    prompt = [{"role": "user", "content": example["problem"]}]
    example["chat_template_kwargs"] = {"enable_thinking": False}
    return {"prompt": prompt}

I'll update the script now on the hub: https://huggingface.co/datasets/burtenshaw/lora-without-regrets/blob/main/grpo.py

Comment thread docs/source/lora_without_regret.md Outdated

@sergiopaniego sergiopaniego left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw there is https://huggingface.co/datasets/trl-lib/documentation-images in case you want to use it.

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Comment thread docs/source/lora_without_regret.md Outdated
burtenshaw and others added 5 commits October 3, 2025 05:57

@qgallouedec qgallouedec left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm with a few minor suggestions

Feel free to merge even you Donny apply all the suggestions, we can still refine later :)

Comment thread docs/source/_toctree.yml Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
Comment thread docs/source/lora_without_regret.md Outdated
burtenshaw and others added 3 commits October 3, 2025 19:52
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
@burtenshaw

Copy link
Copy Markdown
Collaborator Author

@qgallouedec Thanks for the review. I've responded but you'll need to merge.

@kashif kashif merged commit 1eff7da into huggingface:main Oct 3, 2025
1 check passed
qgallouedec added a commit that referenced this pull request Oct 6, 2025
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants