[trainer] feat: ReMax support using reward model for baseline#3780
Merged
wuxibin89 merged 1 commit intoverl-project:mainfrom Oct 17, 2025
HollowMan6:remax-reward
Merged
[trainer] feat: ReMax support using reward model for baseline#3780wuxibin89 merged 1 commit intoverl-project:mainfrom HollowMan6:remax-reward
wuxibin89 merged 1 commit intoverl-project:mainfrom
HollowMan6:remax-reward