[DOCS] Lora without regret#4181
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sergiopaniego
left a comment
There was a problem hiding this comment.
First pass, will go back after finishing the referenced blog
| # TODO: local command | ||
| ``` | ||
|
|
||
| To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details. |
There was a problem hiding this comment.
| To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details. | |
| To run the script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details. |
There was a problem hiding this comment.
I don't think we need uv to run local script
There was a problem hiding this comment.
I was going to use a custom uv based script. I'll use the standard trl scripts instead.
sergiopaniego
left a comment
There was a problem hiding this comment.
Thanks for the development @burtenshaw !! 🙌
Adding some more comments. Maybe we could add pointers to the blogs for each key finding.
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
sergiopaniego
left a comment
There was a problem hiding this comment.
awesome!!! just a few ideas and we're good to go :)
| ```bash | ||
|
|
||
| uv run "https://huggingface.co/datasets/burtenshaw/lora-without-regrets/resolve/main/grpo.py" \ | ||
| --model_name_or_path Qwen/Qwen3-0.6B \ |
There was a problem hiding this comment.
By default this model operates in "think" mode and thus produces many more tokens than the 4096 you've allocated. The best thing to do would be to copy the dataset (or make a subset) with a chat_template_kwargs column that has {"enable_thinking": false} if you want to only optimise the non-reasoning mode.
Alternatively you could pick a model like Gemma3 which doesn't reason.
There was a problem hiding this comment.
My mistake. The script and model choice don't align. In the SmolmLM3 and reasoning script I do:
def make_conversation(example):
prompt = [{"role": "user", "content": example["problem"]}]
example["chat_template_kwargs"] = {"enable_thinking": False}
return {"prompt": prompt}I'll update the script now on the hub: https://huggingface.co/datasets/burtenshaw/lora-without-regrets/blob/main/grpo.py
sergiopaniego
left a comment
There was a problem hiding this comment.
btw there is https://huggingface.co/datasets/trl-lib/documentation-images in case you want to use it.
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
qgallouedec
left a comment
There was a problem hiding this comment.
Lgtm with a few minor suggestions
Feel free to merge even you Donny apply all the suggestions, we can still refine later :)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
@qgallouedec Thanks for the review. I've responded but you'll need to merge. |
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
This is draft PR for a docs page to implement the blog post 'lora without regret' in TRL.
@edbeeching is going to review and share a script.
@sergiopaniego
example with sft
example with grpo