Fix microbatch loss scale when loss_agg_mode is "token-mean" #336

yanxi-chen · 2025-10-22T07:40:48Z

Description

Motivation:

In Trinity-RFT (and also in verl), loss_agg_mode = "token-mean" is only applicable for loss calculation within a microbatch, whereas loss scaling / weighting for multiple microbatches of the same minibatch is still determined by the number of sequences, as shown in dp_actor.py. This deviates from the desired behavior for token-mean loss aggregation.

Solution:

This PR adds an experimental config parameter fix_actor_microbatch_loss_scale (bool). When it is set to True && loss_agg_mode is "token-mean", the loss scale of each microbatch is set proportional to its total number of response tokens, matching the desired behavior of token-mean loss aggregation for a mini-batch mathematically.

The default value of fix_actor_microbatch_loss_scale is currently False; we keep default behavior of Trinity unchanged for now, until this fix has been thoroughly validated and shown benefits.

Experimental results:

Marginal difference from the original implementation in a single-turn math reasoning task.

[TODO] Try out multi-step agentic tasks (where sequence lengths could be more diverse).

Limitations:

Only applicable to DP actor, not critic or Megatron actor.

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist · 2025-10-22T07:41:03Z

Summary of Changes

Hello @yanxi-chen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to correct an inconsistency in how loss scaling is applied across microbatches within the Trinity-RFT framework when using "token-mean" loss aggregation. It introduces an experimental configuration flag that, when enabled, will adjust microbatch loss scaling to be proportional to the number of response tokens, aligning it with the intended behavior of token-mean aggregation. This change is currently in a Work-In-Progress state, pending validation experiments.

Highlights

Loss Scaling Correction: This pull request addresses an issue where the loss scaling for microbatches was incorrectly determined by the number of sequences, even when the loss_agg_mode was set to "token-mean".
New Configuration Parameter: A new experimental configuration parameter, fix_actor_microbatch_loss_scale, has been introduced. When set to True and loss_agg_mode is "token-mean", the loss scale for each microbatch will be proportional to its total number of response tokens.
Default Behavior Maintained: The fix_actor_microbatch_loss_scale parameter defaults to False, ensuring that the existing behavior of Trinity remains unchanged until the fix is thoroughly validated.
Scope of the Fix: The proposed fix is currently applicable only to the DP actor and does not extend to the critic or Megatron actor.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an experimental feature to fix the microbatch loss scaling for "token-mean" loss aggregation. The changes, controlled by a new fix_actor_microbatch_loss_scale flag, correctly adjust the loss scaling logic in dp_actor.py to be proportional to the number of tokens in a microbatch. The implementation is clear and well-contained. My review includes one suggestion to improve robustness by handling a potential division-by-zero edge case, which would prevent crashes during training.

trinity/trainer/verl/dp_actor.py

pan-x-c · 2025-10-31T07:42:48Z

/unittest-module-trainer

pan-x-c · 2025-10-31T07:57:06Z

/unittest-module-trainer

pan-x-c · 2025-10-31T09:18:35Z

/unittest-module-common

pan-x-c · 2025-11-03T10:20:00Z

/unittest-module-common

github-actions · 2025-11-03T10:26:21Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
33	31	2	0	0	0	4m 58s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/common/vllm_test.py::TestModelLen_0::test_model_len	The test failed in the call phase due to an assertion error
❌ tests/common/vllm_test.py::TestModelLen_1::test_model_len	The test failed in the call phase due to an assertion error

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	40.5s
tests/common/config_test.py::TestConfig::test_config_flatten	✅	39ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	187ms
tests/common/config_test.py::TestConfig::test_default_workflow	✅	88ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	3.9s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	✅	89ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	2.0s
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	18ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	2ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	53.9s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	31.4s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	44.2s
tests/common/vllm_test.py::TestModelLen_0::test_model_len	❌	17.5s
tests/common/vllm_test.py::TestModelLen_1::test_model_len	❌	17.1s
tests/common/vllm_test.py::TestAPIServer::test_api	✅	22.7s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	22.7s
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	249ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	247ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	19.5s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	17.8s

Github Test Reporter by CTRF 💚

yanxi-chen added 2 commits October 22, 2025 14:33

Extract loss scale factor

aa79c0e

Implement fix for microbatch loss scale

25754ca

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

trinity/trainer/verl/dp_actor.py Outdated Show resolved Hide resolved

yanxi-chen added 5 commits October 30, 2025 15:02

Merge branch 'main' into dev/fix_microbatch_loss_scale

2f1e020

Clean up comment

64bc002

Merge branch 'main' into dev/fix_microbatch_loss_scale

0afd252

Avoid divide-by-zero error

42b9999

Merge branch 'main' into dev/fix_microbatch_loss_scale

a914078

yanxi-chen changed the title ~~[WIP] Fix microbatch loss scale when loss_agg_mode is "token-mean"~~ Fix microbatch loss scale when loss_agg_mode is "token-mean" Oct 31, 2025

Merge branch 'main' into dev/fix_microbatch_loss_scale

f245b40

pan-x-c approved these changes Nov 3, 2025

View reviewed changes

pan-x-c merged commit d9799d9 into modelscope:main Nov 3, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix microbatch loss scale when loss_agg_mode is "token-mean" #336

Fix microbatch loss scale when loss_agg_mode is "token-mean" #336

Uh oh!

yanxi-chen commented Oct 22, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

pan-x-c commented Oct 31, 2025

Uh oh!

pan-x-c commented Oct 31, 2025

Uh oh!

pan-x-c commented Oct 31, 2025

Uh oh!

pan-x-c commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix microbatch loss scale when loss_agg_mode is "token-mean" #336

Fix microbatch loss scale when loss_agg_mode is "token-mean" #336

Uh oh!

Conversation

yanxi-chen commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot commented Oct 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

pan-x-c commented Oct 31, 2025

Uh oh!

pan-x-c commented Oct 31, 2025

Uh oh!

pan-x-c commented Oct 31, 2025

Uh oh!

pan-x-c commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Summary

Failed Tests

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yanxi-chen commented Oct 22, 2025 •

edited

Loading