Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Add warning guidelines and update codebase to follow best practices #2350

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Nov 11, 2024

What does this PR do?

Some warnings, previously triggered during normal operation, have been removed. As a consequence it reduces noise in our CI pipeline. This change brings us closer to our goal of zero warnings.

CI Warnings Count (Dev Dependencies):

  • Before PR: 558
  • After PR: 466

A follow-up PR is planned to address the remaining legitimate warnings.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Comment on lines -397 to -400
warnings.warn(
"You passed a model_id to the BCOTrainer. This will automatically create an "
"`AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you."
)
Copy link
Member Author

@qgallouedec qgallouedec Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -404 to -407
warnings.warn(
"You passed a ref model_id to the BCOTrainer. This will automatically create an "
"`AutoModelForCausalLM`"
)
Copy link
Member Author

@qgallouedec qgallouedec Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -99,7 +99,8 @@
if model_config.use_peft and model_config.lora_task_type != "SEQ_CLS":
warnings.warn(
"You are using a `task_type` that is different than `SEQ_CLS` for PEFT. This will lead to silent bugs"
" Make sure to pass --lora_task_type SEQ_CLS when using this script with PEFT."
" Make sure to pass --lora_task_type SEQ_CLS when using this script with PEFT.",
UserWarning,
Copy link
Member Author

@qgallouedec qgallouedec Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the appropriate warning type.

@@ -296,7 +296,8 @@ def randn_tensor(
warnings.warn(
f"The passed generator was created on 'cpu' even though a tensor on {device} was expected."
f" Tensors will be created on 'cpu' and then moved to {device}. Note that one can probably"
f" slighly speed up this function by passing a generator that was created on the {device} device."
f" slighly speed up this function by passing a generator that was created on the {device} device.",
UserWarning,
Copy link
Member Author

@qgallouedec qgallouedec Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the appropriate warning type.

Comment on lines -762 to +767
if np.array(predictions[:, 0] == predictions[:, 1], dtype=float).sum() > 0:
equal_predictions_count = np.array(predictions[:, 0] == predictions[:, 1], dtype=float).sum()
if equal_predictions_count > 0:
warnings.warn(
f"There are {np.array(predictions[:, 0] == predictions[:, 1]).sum()} out of {len(predictions[:, 0])} instances where the predictions for both options are equal. As a consequence the accuracy can be misleading."
f"There are {equal_predictions_count} out of {len(predictions[:, 0])} instances where the predictions for "
"both options are equal. As a consequence the accuracy can be misleading.",
UserWarning,
Copy link
Member Author

@qgallouedec qgallouedec Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings must be actionable.
Warnings should not indicate normal behavior.
Use the appropriate warning type.

This warning remains not actionable. Any idea to solve this?

Comment on lines -148 to +150
warnings.warn("install rich to display text")
return
raise ImportError(
"The `rich` library is required to display text with formatting. "
"Install it using `pip install rich`."
)
Copy link
Member Author

@qgallouedec qgallouedec Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -170 to +174
warnings.warn("install rich to display tokens")
return
raise ImportError(
"The `rich` library is required to display tokens with formatting. "
"Install it using `pip install rich`."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -195 to +201
warnings.warn("install rich to display colour legend")
return
raise ImportError(
"The `rich` library is required to display colour legends with formatting. "
"Install it using `pip install rich`."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -811 to +813
"If you are aware that the pretrained model has no lora weights to it, ignore this message. "
"Otherwise please check the if `pytorch_lora_weights.safetensors` exists in the model folder."
"Trying to load LoRA weights but no LoRA weights found. Set `use_lora=False` or check that "
"`pytorch_lora_weights.safetensors` exists in the model folder.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings must be actionable.
Warnings should not indicate normal behavior.

Comment on lines -142 to -145
if self.log_with not in ["wandb", "tensorboard"]:
warnings.warn(
"Accelerator tracking only supports image logging if `log_with` is set to 'wandb' or 'tensorboard'."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -147 to -148
if self.log_with == "wandb" and not is_torchvision_available():
warnings.warn("Wandb image logging requires torchvision to be installed")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -576 to +572
"You set `output_router_logits` to True in the model config, but `router_aux_loss_coef` is set to 0.0,"
" meaning the auxiliary loss will not be used."
"You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to "
"`0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value "
"greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary "
"loss.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings must be actionable.
Use the appropriate warning type.

@@ -705,7 +700,6 @@ def make_inputs_require_grad(module, input, output):
self.running = RunningMoments(accelerator=self.accelerator)

if self.embedding_func is None:
warnings.warn("You did not pass `embedding_func` underlying distribution matching feature is deactivated.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -878 to +872
if not os.path.isfile(running_file):
warnings.warn(f"Missing file {running_file}. Will use a new running delta value for BCO loss calculation")
else:
if os.path.isfile(running_file):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -885 to +877
if not os.path.isfile(running_file):
warnings.warn(f"Missing file {clf_file}. Will use a new UDM classifier for BCO loss calculation")
else:
if os.path.isfile(running_file):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -1353 to -1366
if not self.use_dpo_data_collator:
warnings.warn(
"prediction_step is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than "
"DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -881 to -896
if not self.use_dpo_data_collator:
warnings.warn(
"prediction_step is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than "
"DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -835 to -851
if not self.use_dpo_data_collator:
warnings.warn(
"compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than "
"DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator"
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -308 to -316
if self.cpo_alpha > 0:
warnings.warn(
"You are using CPO-SimPO method because you set a non-zero cpo_alpha. "
"This will result in the CPO-SimPO method "
"(https://github.com/fe1ixxu/CPO_SIMPO/tree/main). "
"If you want to use a pure SimPO method, please set cpo_alpha to 0."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -302 to +304
"You set `output_router_logits` to True in the model config, but `router_aux_loss_coef` is set to 0.0,"
" meaning the auxiliary loss will not be used."
"You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to "
"`0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value "
"greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary "
"loss.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings must be actionable.
Use the appropriate warning type.

Comment on lines -1355 to -1370
if not self.use_dpo_data_collator:
warnings.warn(
"prediction_step is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than "
"DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -161 to +162
"Ignoring `judge` and using `reward_model`."
"Ignoring `judge` and using `reward_model`.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the appropriate warning type.

Comment on lines -154 to -159
warnings.warn(
"You passed a model_id to the ORPOTrainer. This will automatically create an "
"`AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

"You set `output_router_logits` to True in the model config, but `router_aux_loss_coef` is set to 0.0,"
" meaning the auxiliary loss will not be used."
"You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to "
"`0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value "
"greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary "
"loss.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings must be actionable.
Use the appropriate warning type.

Comment on lines -850 to -865
if not self.use_dpo_data_collator:
warnings.warn(
"compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than "
"DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -138 to -141
if type(args) is TrainingArguments:
warnings.warn(
"Using `transformers.TrainingArguments` for `args` is deprecated and will be removed in a future version. Please use `RewardConfig` instead.",
FutureWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been deprecated for more than a year (#748)

@@ -126,9 +126,7 @@ def __init__(
formatting_func: Optional[Callable] = None,
):
if args is None:
output_dir = "tmp_trainer"
warnings.warn(f"No `SFTConfig` passed, using `output_dir={output_dir}`.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -273 to -287
if not self.use_reward_data_collator:
warnings.warn(
"The current compute_loss is implemented for RewardDataCollatorWithPadding,"
" if you are using a custom data collator make sure you know what you are doing or"
" implement your own compute_loss method."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

@@ -189,7 +173,7 @@ def __init__(
"A processing_class must be specified when using the default RewardDataCollatorWithPadding"
)
if max_length is None:
max_length = 512 if type(args) is TrainingArguments or args.max_length is None else args.max_length
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has been deprecated for more than a year (#748)

Comment on lines -174 to +158
"please update to the latest version of peft to use `gradient_checkpointing_kwargs`."
"please update to the latest version of peft to use `gradient_checkpointing_kwargs`.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the appropriate warning type.

Comment on lines -156 to -159
warnings.warn(
"You passed a model_id to the SFTTrainer. This will automatically create an "
"`AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -246 to -249
warnings.warn(
f"You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to {args.max_seq_length}"
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -307 to +300
"You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to "
"overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code."
"You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might "
"lead to some unexpected behaviour due to overflow issues when training a model in half-precision. "
"You might consider adding `processing_class.padding_side = 'right'` to your code.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings must be actionable.
Use the appropriate warning type.

This warning is still not actionable. Any idea?

Comment on lines -331 to -333
warnings.warn(
"You passed `packing=True` to the SFTTrainer/SFTConfig, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines -367 to +359
"You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored."
"You passed a dataset that is already processed (contains an `input_ids` field) together with a "
"valid formatting function. Therefore `formatting_func` will be ignored. Either remove the "
"`formatting_func` or pass a dataset that is not already processed.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings must be actionable.
Use the appropriate warning type.

Comment on lines -441 to +437
"You passed `remove_unused_columns=False` on a non-packed dataset. This might create some issues with the default collator and yield to errors. If you want to "
f"inspect dataset other columns (in this case {extra_columns}), you can subclass `DataCollatorForLanguageModeling` in case you used the default collator and create your own data collator in order to inspect the unused dataset columns."
"You passed `remove_unused_columns=False` on a non-packed dataset. This might create some issues with "
"the default collator and yield to errors. If you want to inspect dataset other columns (in this "
f"case {extra_columns}), you can subclass `DataCollatorForLanguageModeling` in case you used the "
"default collator and create your own data collator in order to inspect the unused dataset columns.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the appropriate warning type.

Comment on lines -138 to +139
"To avoid this, set the pad_token_id to a different value."
"To avoid this, set the pad_token_id to a different value.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.
Use the appropriate warning type.

This warning can still occur in normal behavior. Any idea?

Comment on lines 161 to 165
f"Could not find response key `{self.response_template}` in the "
f'following instance: {self.tokenizer.decode(batch["input_ids"][i])} '
f"This instance will be ignored in loss calculation. "
f"Note, if this happens often, consider increasing the `max_seq_length`."
f"Could not find response key `{self.response_template}` in the following instance: "
f"{self.tokenizer.decode(batch["input_ids"][i])}. This instance will be ignored in loss "
"calculation. Note, if this happens often, consider increasing the `max_seq_length`.",
UserWarning,
Copy link
Member Author

@qgallouedec qgallouedec Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.
Use the appropriate warning type.

This warning can still occur in normal behavior. Any idea?

Comment on lines 188 to 192
f"Could not find response key `{self.response_template}` in the "
f'following instance: {self.tokenizer.decode(batch["input_ids"][i])} '
f"This instance will be ignored in loss calculation. "
f"Note, if this happens often, consider increasing the `max_seq_length`."
f"Could not find response key `{self.response_template}` in the following instance: "
f"{self.tokenizer.decode(batch["input_ids"][i])}. This instance will be ignored in loss "
"calculation. Note, if this happens often, consider increasing the `max_seq_length`.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.
Use the appropriate warning type.

This warning can still occur in normal behavior. Any idea?

Comment on lines -594 to -600

if tokenizer.eos_token_id is None:
warnings.warn(
"The passed tokenizer does not have an EOS token. We will use the passed eos_token_id instead which corresponds"
f" to {eos_token_id}. If this is not the correct EOS token, make sure to pass the correct eos_token_id."
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.

Comment on lines 203 to 207
f"Could not find instruction key `{self.instruction_template}` in the "
f'following instance: {self.tokenizer.decode(batch["input_ids"][i])} '
f"This instance will be ignored in loss calculation. "
f"Note, if this happens often, consider increasing the `max_seq_length`."
f"Could not find instruction key `{self.instruction_template}` in the following instance: "
f"{self.tokenizer.decode(batch["input_ids"][i])}. This instance will be ignored in loss "
"calculation. Note, if this happens often, consider increasing the `max_seq_length`.",
UserWarning,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings should not indicate normal behavior.
Use the appropriate warning type.

This warning can still occur in normal behavior. Any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants