-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support reward modeling and ppo #2093
Conversation
@@ -98,7 +99,7 @@ def __init__(self, | |||
optimizers=optimizers, | |||
preprocess_logits_for_metrics=preprocess_logits_for_metrics, | |||
**kwargs) | |||
if not self.label_names: | |||
if not hasattr(self, 'label_names') or not self.label_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PPOv2Trainer does not have the label_names
attribute.
@@ -98,7 +98,8 @@ def prepare_model(model, args: SftArguments): | |||
if args.resume_from_checkpoint is None: | |||
handle_target_modules(model, args) | |||
handle_modules_to_save(model, args) | |||
if args.init_lora_weights and args.init_lora_weights.lower() in ('true', 'false'): | |||
if args.init_lora_weights and isinstance(args.init_lora_weights, | |||
str) and args.init_lora_weights.lower() in ('true', 'false'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid errors in reward_model_args.
@@ -1196,6 +1196,10 @@ def _init_training_args(self) -> None: | |||
if 'accelerator_config' in parameters: | |||
kwargs['accelerator_config'] = {'dispatch_batches': False} | |||
|
|||
metric_for_best_model = 'rouge-l' if self.predict_with_generate else 'loss' | |||
if hasattr(self, 'rlhf_type') and self.rlhf_type == 'ppo': | |||
metric_for_best_model = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PPO training metrics, there are no metrics that start with "eval", set None here
PR type
PR information
support
reward modeling
for LLM and MLLMsupport
PPO
for LLM