Skip to content

Conversation

@yanxi-chen
Copy link
Collaborator

@yanxi-chen yanxi-chen commented Oct 22, 2025

Description

Motivation:

In Trinity-RFT (and also in verl), loss_agg_mode = "token-mean" is only applicable for loss calculation within a microbatch, whereas loss scaling / weighting for multiple microbatches of the same minibatch is still determined by the number of sequences, as shown in dp_actor.py. This deviates from the desired behavior for token-mean loss aggregation.

Solution:

This PR adds an experimental config parameter fix_actor_microbatch_loss_scale (bool). When it is set to True && loss_agg_mode is "token-mean", the loss scale of each microbatch is set proportional to its total number of response tokens, matching the desired behavior of token-mean loss aggregation for a mini-batch mathematically.

The default value of fix_actor_microbatch_loss_scale is currently False; we keep default behavior of Trinity unchanged for now, until this fix has been thoroughly validated and shown benefits.

Experimental results:

Marginal difference from the original implementation in a single-turn math reasoning task.

2025-10-31 14 11 00 2025-10-31 14 10 15 2025-10-31 14 11 18

[TODO] Try out multi-step agentic tasks (where sequence lengths could be more diverse).

Limitations:

Only applicable to DP actor, not critic or Megatron actor.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yanxi-chen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to correct an inconsistency in how loss scaling is applied across microbatches within the Trinity-RFT framework when using "token-mean" loss aggregation. It introduces an experimental configuration flag that, when enabled, will adjust microbatch loss scaling to be proportional to the number of response tokens, aligning it with the intended behavior of token-mean aggregation. This change is currently in a Work-In-Progress state, pending validation experiments.

Highlights

  • Loss Scaling Correction: This pull request addresses an issue where the loss scaling for microbatches was incorrectly determined by the number of sequences, even when the loss_agg_mode was set to "token-mean".
  • New Configuration Parameter: A new experimental configuration parameter, fix_actor_microbatch_loss_scale, has been introduced. When set to True and loss_agg_mode is "token-mean", the loss scale for each microbatch will be proportional to its total number of response tokens.
  • Default Behavior Maintained: The fix_actor_microbatch_loss_scale parameter defaults to False, ensuring that the existing behavior of Trinity remains unchanged until the fix is thoroughly validated.
  • Scope of the Fix: The proposed fix is currently applicable only to the DP actor and does not extend to the critic or Megatron actor.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an experimental feature to fix the microbatch loss scaling for "token-mean" loss aggregation. The changes, controlled by a new fix_actor_microbatch_loss_scale flag, correctly adjust the loss scaling logic in dp_actor.py to be proportional to the number of tokens in a microbatch. The implementation is clear and well-contained. My review includes one suggestion to improve robustness by handling a potential division-by-zero edge case, which would prevent crashes during training.

@yanxi-chen yanxi-chen changed the title [WIP] Fix microbatch loss scale when loss_agg_mode is "token-mean" Fix microbatch loss scale when loss_agg_mode is "token-mean" Oct 31, 2025
@pan-x-c
Copy link
Collaborator

pan-x-c commented Oct 31, 2025

/unittest-module-trainer

1 similar comment
@pan-x-c
Copy link
Collaborator

pan-x-c commented Oct 31, 2025

/unittest-module-trainer

@pan-x-c
Copy link
Collaborator

pan-x-c commented Oct 31, 2025

/unittest-module-common

@pan-x-c
Copy link
Collaborator

pan-x-c commented Nov 3, 2025

/unittest-module-common

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
33 31 2 0 0 0 4m 58s

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/common/vllm_test.py::TestModelLen_0::test_model_len The test failed in the call phase due to an assertion error
❌ tests/common/vllm_test.py::TestModelLen_1::test_model_len The test failed in the call phase due to an assertion error

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 40.5s
tests/common/config_test.py::TestConfig::test_config_flatten 39ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 187ms
tests/common/config_test.py::TestConfig::test_default_workflow 88ms
tests/common/config_test.py::TestConfig::test_load_default_config 3.9s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 89ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 2.0s
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_gather 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 18ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 2ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 53.9s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 31.4s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 44.2s
tests/common/vllm_test.py::TestModelLen_0::test_model_len 17.5s
tests/common/vllm_test.py::TestModelLen_1::test_model_len 17.1s
tests/common/vllm_test.py::TestAPIServer::test_api 22.7s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 22.7s
tests/common/vllm_test.py::TestTokenizer::test_action_mask 249ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 247ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 19.5s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 17.8s

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c merged commit d9799d9 into modelscope:main Nov 3, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants