[DOCS] Lora without regret by burtenshaw · Pull Request #4181 · huggingface/trl

burtenshaw · 2025-09-30T14:27:31Z

This is draft PR for a docs page to implement the blog post 'lora without regret' in TRL.

@edbeeching is going to review and share a script.
@sergiopaniego

example with sft

hf jobs uv run \
    --flavor a100-large \
    --timeout 8h \
    --secrets HF_TOKEN \
    "https://gist.githubusercontent.com/burtenshaw/fce24305833f2ecacfe8da181901d345/raw/sft_lora.py" \
    --model_name_or_path Qwen/Qwen2.5-3B-Instruct \
    --dataset_name open-thoughts/OpenThoughts-114k \
    --learning_rate 2.0e-5 \
    --num_train_epochs 1 \
    --packing \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --gradient_checkpointing \
    --eval_strategy no \
    --use_peft \
    --lora_r 256 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --output_dir Qwen2.5-3B-OpenThoughts-LoRA \
    --report_to trackio \
    --push_to_hub

example with grpo

hf jobs uv run \
    --flavor a100-large \
    --timeout 6h \
    --secrets HF_TOKEN \
    "https://gist.githubusercontent.com/burtenshaw/f3fd519cb7efd647254c60b6b904cbcb/raw/c688abe1a9487090bb931b51ecec12c6737cdc52/grpo_lora.py" \
    --model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \
    --output_dir grpo-Qwen2.5-VL-3B-Instruct-LoRA \
    --learning_rate 1e-5 \
    --gradient_checkpointing \
    --torch_dtype bfloat16 \
    --max_prompt_length 2048 \
    --max_completion_length 1024 \
    --use_vllm \
    --vllm_mode colocate \
    --use_peft \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --log_completions \
    --report_to trackio \
    --push_to_hub

HuggingFaceDocBuilderDev · 2025-09-30T14:31:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sergiopaniego

First pass, will go back after finishing the referenced blog

sergiopaniego · 2025-09-30T14:49:57Z

+# TODO: local command
+```
+
+To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.


Suggested change

To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.

To run the script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.

I don't think we need uv to run local script

I was going to use a custom uv based script. I'll use the standard trl scripts instead.

sergiopaniego

Thanks for the development @burtenshaw !! 🙌
Adding some more comments. Maybe we could add pointers to the blogs for each key finding.

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>

sergiopaniego

awesome!!! just a few ideas and we're good to go :)

lewtun · 2025-10-02T16:30:38Z

+```bash
+
+uv run "https://huggingface.co/datasets/burtenshaw/lora-without-regrets/resolve/main/grpo.py" \
+    --model_name_or_path Qwen/Qwen3-0.6B \


By default this model operates in "think" mode and thus produces many more tokens than the 4096 you've allocated. The best thing to do would be to copy the dataset (or make a subset) with a chat_template_kwargs column that has {"enable_thinking": false} if you want to only optimise the non-reasoning mode.

Alternatively you could pick a model like Gemma3 which doesn't reason.

My mistake. The script and model choice don't align. In the SmolmLM3 and reasoning script I do:

def make_conversation(example): prompt = [{"role": "user", "content": example["problem"]}] example["chat_template_kwargs"] = {"enable_thinking": False} return {"prompt": prompt}

I'll update the script now on the hub: https://huggingface.co/datasets/burtenshaw/lora-without-regrets/blob/main/grpo.py

sergiopaniego

btw there is https://huggingface.co/datasets/trl-lib/documentation-images in case you want to use it.

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

qgallouedec

Lgtm with a few minor suggestions

Feel free to merge even you Donny apply all the suggestions, we can still refine later :)

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

burtenshaw · 2025-10-03T18:34:04Z

@qgallouedec Thanks for the review. I've responded but you'll need to merge.

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

burtenshaw added 3 commits September 30, 2025 16:22

add first draft of blog post

4e33b88

add to toc

a9fcac0

Merge branch 'main' into lora-without-regret

55a35ad

sergiopaniego reviewed Sep 30, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md

Comment thread docs/source/lora_without_regret.md

Comment thread docs/source/lora_without_regret.md Outdated

Comment thread docs/source/lora_without_regret.md Outdated

Comment thread docs/source/lora_without_regret.md Outdated

qgallouedec reviewed Oct 1, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md Outdated

qgallouedec reviewed Oct 1, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md

qgallouedec reviewed Oct 1, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md Outdated

qgallouedec reviewed Oct 1, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md Outdated

qgallouedec reviewed Oct 1, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md Outdated

qgallouedec reviewed Oct 1, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md Outdated

qgallouedec reviewed Oct 1, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md Outdated

burtenshaw and others added 6 commits October 2, 2025 12:59

respond to feedback

ed094bf

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>

restructure and add commands in uv and jobs

d1675fc

use latex math

21c78c3

add the figure to the docs page

64dd3f5

fix steps

0584915

remove jobs from prose

0726576

lewtun reviewed Oct 2, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md

add the parameters to the docs page

63a5d21

sergiopaniego approved these changes Oct 2, 2025

View reviewed changes

Comment thread docs/source/_toctree.yml Outdated

Comment thread docs/source/lora_without_regret.md Outdated

Comment thread docs/source/lora_without_regret.md

burtenshaw added 4 commits October 2, 2025 17:52

add actual lora commands

46e3255

respond to review

fd5eb14

add take aways

fc85021

add memory figure

3e1942d

lewtun reviewed Oct 2, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md Outdated

sergiopaniego reviewed Oct 2, 2025

View reviewed changes

Update docs/source/lora_without_regret.md

23238d7

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

kashif reviewed Oct 2, 2025

View reviewed changes

Update docs/source/lora_without_regret.md

75ecba0

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

burtenshaw commented Oct 3, 2025

View reviewed changes

Comment thread docs/source/lora_without_regret.md Outdated

burtenshaw and others added 5 commits October 3, 2025 05:57

Update docs/source/lora_without_regret.md

a56672d

Apply suggestions from code review

081636e

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

Update docs/source/lora_without_regret.md

087f100

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

add python examples to tabs

c547533

typos

c10527a

qgallouedec approved these changes Oct 3, 2025

View reviewed changes

burtenshaw and others added 3 commits October 3, 2025 19:52

move in menu

d81f44a

Apply suggestions from code review

5a59a8a

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

fix tabs

27f373a

kashif merged commit 1eff7da into huggingface:main Oct 3, 2025
1 check passed

	To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.
	To run the script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.

Conversation

burtenshaw commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

example with sft

example with grpo

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2025

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sergiopaniego Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

burtenshaw Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewtun Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

burtenshaw Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

burtenshaw commented Sep 30, 2025 •

edited

Loading

burtenshaw Oct 2, 2025 •

edited

Loading