Update wordle.py example with masking of env tokens by sergiopaniego · Pull Request #4895 · huggingface/trl

sergiopaniego · 2026-01-26T15:01:11Z

What does this PR do?

Some results:

The position_reward could be renamed to something like partial_reward.

One idea:
The model is Qwen3-1.7B and it achieves a correct reward of around 0.3. This does not mean that it answers correctly 30% of the time, but rather that, for a given answer, about 30% of the letters are correct—rather than winning 30% of the games.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@kashif @qgallouedec

…ed script

HuggingFaceDocBuilderDev · 2026-01-26T15:04:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2026-01-26T15:06:08Z

trl/trainer/grpo_trainer.py

        else:
            tool_mask = None

+        env_mask = extra_fields.pop("env_mask", None)


would something like this work?:

tool_mask = extra_fields.pop("env_mask", None)

these tokens are basically processed similarly, maybe it can simplify things

…ted-wordle

sergiopaniego · 2026-01-28T15:22:27Z

I've reviewed the rest of the scripts+notebooks using OpenEnv and they seem to be ok since they add the env feedback to the prompt.

I still need to upload the updated Wordle notebook (wip)

qgallouedec · 2026-01-28T18:07:35Z

@albertvillanova moved everything related to vLLM in a separate class in #4700, you'll have a to deal with some conflicts 😬

albertvillanova · 2026-01-29T08:11:22Z

@albertvillanova moved everything related to vLLM in a separate class in #4700, you'll have a to deal with some conflicts 😬

Thanks for the ping, @qgallouedec!

@sergiopaniego, since I am familiar with the changes introduced in #4700, I can take care of resolving the conflicts in this PR if that helps. 🤗

sergiopaniego · 2026-01-29T08:23:37Z

I'd really appreciate some help with that @albertvillanova!! 😄

albertvillanova · 2026-01-29T08:46:31Z

Done!

albertvillanova

Thanks! Just some comments below.

albertvillanova · 2026-01-29T09:20:28Z

examples/scripts/openenv/wordle.py

-        report_to="trackio",
+        log_completions=True,
+        report_to="wandb",


Why the change from trackio to wandb?

trl/experimental/openenv/utils.py

albertvillanova · 2026-01-29T09:38:24Z

trl/experimental/openenv/utils.py

    return sampling_params


+def _build_server_generation_kwargs(


This new _build_server_generation_kwargs function duplicates logic from _build_colocate_sampling_params. Maybe we could refactor these to reduce duplication.

albertvillanova

As privately discussed, please note that

Trainer no longer has structured_output_regex attribute
Instead, it is an attribute of VLLMGeneration

So, in these code lines:

trl/trl/experimental/openenv/utils.py

Lines 35 to 36 in 25c5d51

    
           if trainer.structured_outputs_regex: 
        
               structured_outputs = StructuredOutputsParams(regex=trainer.structured_outputs_regex)

maybe better using:

    if trainer.vllm_generation.structured_outputs_regex:
        structured_outputs = StructuredOutputsParams(regex=trainer.vllm_generation.structured_outputs_regex)

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

qgallouedec

Everything related to GRPO and vLLM lgtm! nice work @sergiopaniego!

sergiopaniego · 2026-01-30T15:29:39Z

Results using trackio in with the latest changes: https://huggingface.co/spaces/sergiopaniego/Wordle-GRPO

TODO: Upload the updated wordle notebook

sergiopaniego added 3 commits January 21, 2026 16:30

Wordle updated with feedback

ab26853

Merge branch 'main' of github.com:huggingface/trl into updated-wordle

1b191ab

Added env_mask for env tokens management inside the trainer and updat…

d146445

…ed script

qgallouedec reviewed Jan 26, 2026

View reviewed changes

sergiopaniego added 5 commits January 26, 2026 16:12

Code quality

8050107

Merge branch 'main' into updated-wordle

1b7e297

env_mask simplified in grpo_trainer

3dac8bd

Merge branch 'updated-wordle' of github.com:huggingface/trl into upda…

8f91893

…ted-wordle

Support for server mode in rollouts

cedcfd2

Merge branch 'main' into updated-wordle

83767fb

albertvillanova reviewed Jan 29, 2026

View reviewed changes

Remove gradient_checkpointing

25c5d51

albertvillanova reviewed Jan 29, 2026

View reviewed changes

qgallouedec and others added 2 commits January 29, 2026 10:59

Apply suggestion from @albertvillanova

a7b71d1

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

minimize diff

538dc51

qgallouedec approved these changes Jan 29, 2026

View reviewed changes

sergiopaniego added 3 commits January 30, 2026 11:00

Merge branch 'main' into updated-wordle

394b0ad

Refactor

435ef12

Update wordle script

7a3c203

sergiopaniego mentioned this pull request Jan 30, 2026

Add Wordle example with Qwen3 thinking activated #4936

Draft

5 tasks

sergiopaniego added 3 commits February 2, 2026 12:34

Merge branch 'main' into updated-wordle

ae1ba09

Update notebook

8fbace0

Merge branch 'main' into updated-wordle

669a322

sergiopaniego merged commit a03c2fc into main Feb 2, 2026
11 of 13 checks passed

sergiopaniego deleted the updated-wordle branch February 2, 2026 15:27

cmunley1 mentioned this pull request Feb 6, 2026

Update NeMo-Gym to use env_mask #4986

Merged

5 tasks

	if trainer.structured_outputs_regex:
	structured_outputs = StructuredOutputsParams(regex=trainer.structured_outputs_regex)

Conversation

sergiopaniego commented Jan 26, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 26, 2026

Uh oh!

qgallouedec Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

sergiopaniego commented Jan 28, 2026

Uh oh!

qgallouedec commented Jan 28, 2026

Uh oh!

albertvillanova commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergiopaniego commented Jan 29, 2026

Uh oh!

albertvillanova commented Jan 29, 2026

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

albertvillanova Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

albertvillanova Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qgallouedec left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergiopaniego commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

albertvillanova commented Jan 29, 2026 •

edited

Loading

albertvillanova left a comment •

edited

Loading

qgallouedec left a comment •

edited

Loading