Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
28122f0
Support multimodal tool responses in environment_factory for VLM trai…
sergiopaniego Mar 20, 2026
ba1a30e
Merge branch 'main' into multimodal-tool-responses
sergiopaniego Mar 20, 2026
786b8f9
Merge branch 'main' into multimodal-tool-responses
sergiopaniego Mar 20, 2026
09ef5d4
Fix VLM processor support for tool calling and add CARLA VLM example
sergiopaniego Mar 20, 2026
6ded5b8
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Mar 20, 2026
b03cc2b
Expand image tokens in tool suffix IDs and collect tool images for fo…
sergiopaniego Mar 20, 2026
6ea2430
Expand image tokens in tool suffix IDs and collect tool images for fo…
sergiopaniego Mar 20, 2026
429b548
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Mar 20, 2026
e4a00b7
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Mar 20, 2026
0755a85
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Mar 20, 2026
cdf153a
Update docs
sergiopaniego Mar 23, 2026
a96e3e3
Merge branch 'main' of github.com:huggingface/trl into multimodal-too…
sergiopaniego Mar 23, 2026
f49a63a
Add debug
sergiopaniego Mar 23, 2026
79fd46f
Merge remote-tracking branch 'origin/main' into multimodal-tool-respo…
sergiopaniego Mar 23, 2026
dd87fd1
Fix VLM forward pass: return images from _generate, build mm_token_ty…
sergiopaniego Mar 23, 2026
df71f29
Fix image boundary truncation, tool_mask sync, and image logging with…
sergiopaniego Mar 23, 2026
ba9252d
Clean up PR: extract helpers, remove debug prints, use dynamic token …
sergiopaniego Mar 23, 2026
b464134
Use _is_vlm from SFTTrainer convention, simplify vision token detecti…
sergiopaniego Mar 24, 2026
9714659
Merge branch 'main' into multimodal-tool-responses
sergiopaniego Mar 24, 2026
6408ca4
Merge branch 'main' into multimodal-tool-responses
sergiopaniego Mar 24, 2026
164f035
precommit
sergiopaniego Mar 24, 2026
e74a7f6
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Mar 24, 2026
dc147a8
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Mar 24, 2026
ed83a97
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Mar 24, 2026
2f49156
Fix undefined images variable in rollout_func code path
sergiopaniego Mar 24, 2026
bf3c35e
Pass tool response images to generation in tool loop for VLM visual f…
sergiopaniego Mar 24, 2026
99748e3
Use consistent tokenization path for prefix and full IDs in VLM _get_…
sergiopaniego Mar 24, 2026
4029708
precommit
sergiopaniego Mar 24, 2026
738c17e
Fix VLM image path to only use image_processor for tool images, prese…
sergiopaniego Mar 24, 2026
61ee09f
Replace getattr chain with direct _is_vlm conditional for max_positio…
sergiopaniego Mar 24, 2026
af3a536
Replace getattr chain with direct _is_vlm conditional for max_positio…
sergiopaniego Mar 24, 2026
3e5260c
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Mar 24, 2026
419f6ce
Increase default max-steps to 100 for carla_vlm.py
sergiopaniego Mar 24, 2026
f51a32b
Align RLOO image extraction check with GRPO for consistency
sergiopaniego Mar 24, 2026
9f320d5
Handle VLM max_position_embeddings for vLLM server mode
sergiopaniego Mar 24, 2026
92e1440
Clamp max_length to 0 in _truncate_at_image_boundary to prevent negat…
sergiopaniego Mar 24, 2026
312885b
Normalize tool message content before image/text branch split in _get…
sergiopaniego Mar 24, 2026
ac5291b
Propagate num_images None-safety fix to RLOO for consistency
sergiopaniego Mar 24, 2026
24db4f9
Default trackio
sergiopaniego Mar 24, 2026
20dbe94
Update position
sergiopaniego Mar 24, 2026
1be3989
Update
sergiopaniego Mar 24, 2026
da155cf
Update based on cursor
sergiopaniego Mar 24, 2026
19d5147
Update
sergiopaniego Mar 24, 2026
468bdc8
Update carla_vlm.py defaults: model to 0.8B, image-size to 256, max-c…
sergiopaniego Mar 25, 2026
75c7cf5
Merge remote-tracking branch 'origin/main' into multimodal-tool-respo…
sergiopaniego Mar 25, 2026
4f4b554
Merge remote-tracking branch 'origin/main' into multimodal-tool-respo…
sergiopaniego Mar 26, 2026
f9a7a22
Merge remote-tracking branch 'origin/main' into HEAD
sergiopaniego Mar 27, 2026
aa7bc9d
Fix replay buffer unpack, move VLM parse_response branching, add trun…
sergiopaniego Mar 27, 2026
43cc407
Align GFPO and replay buffer trainers with _generate changes
sergiopaniego Mar 27, 2026
638447f
Merge branch 'main' of github.com:huggingface/trl into multimodal-too…
sergiopaniego Mar 27, 2026
24c14bb
Merge branch 'main' into multimodal-tool-responses
sergiopaniego Mar 30, 2026
a9281d8
Merge remote-tracking branch 'origin/main' into multimodal-tool-respo…
sergiopaniego Apr 1, 2026
35881e3
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Apr 1, 2026
9195853
Support multimodal observations from environment reset
sergiopaniego Apr 1, 2026
0bd2457
nits
sergiopaniego Apr 1, 2026
a32d200
Update trl/trainer/grpo_trainer.py
sergiopaniego Apr 1, 2026
51fa724
Update trl/trainer/grpo_trainer.py
sergiopaniego Apr 1, 2026
11c2ba6
Update trl/trainer/grpo_trainer.py
sergiopaniego Apr 1, 2026
c5acc81
build_mm_token_type_ids simplified
sergiopaniego Apr 1, 2026
eaf92a5
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Apr 1, 2026
52249fd
nit
sergiopaniego Apr 1, 2026
010dc72
Merge branch 'main' into multimodal-tool-responses
sergiopaniego Apr 1, 2026
1ea1b5f
Fix tests
sergiopaniego Apr 1, 2026
b440783
extended
sergiopaniego Apr 1, 2026
cf38723
Merge branch 'main' into multimodal-tool-responses
sergiopaniego Apr 1, 2026
30b2be3
Avoid mutating original prompts during VLM content normalization
sergiopaniego Apr 2, 2026
9d8b4ab
Merge branch 'main' into multimodal-tool-responses
sergiopaniego Apr 2, 2026
826e4c1
Update trl/trainer/grpo_trainer.py
sergiopaniego Apr 2, 2026
36a631c
Update trl/trainer/grpo_trainer.py
sergiopaniego Apr 2, 2026
e1711f9
removed defensive code
sergiopaniego Apr 2, 2026
4775495
Merge branch 'multimodal-tool-responses' of github.com:huggingface/tr…
sergiopaniego Apr 2, 2026
9a5d3c4
Update trl/trainer/grpo_trainer.py
sergiopaniego Apr 2, 2026
cad17f3
update based on cursor
sergiopaniego Apr 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/example_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ These scripts demonstrate how to train models with [OpenEnv](openenv) environmen
| [`examples/scripts/openenv/browsergym.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/browsergym.py) | GRPO training with the BrowserGym environment for VLMs. |
| [`examples/scripts/openenv/browsergym_llm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/browsergym_llm.py) | GRPO training with the BrowserGym environment for LLMs. |
| [`examples/scripts/openenv/carla.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py) | GRPO training with the CARLA environment for autonomous driving. |
| [`examples/scripts/openenv/carla_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla_vlm.py) | GRPO training with CARLA for VLMs with multimodal tool responses (camera images). |

## Distributed Training (for scripts)

Expand Down
Loading
Loading