Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model by albertvillanova · Pull Request #5295 · huggingface/trl

albertvillanova · 2026-03-17T10:29:26Z

Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model.

This PR refines how warnings are suppressed when loading a Causal Language Model (CausalLM) checkpoint into an AutoModelForSequenceClassification. The changes ensure that both expected "missing" and "unexpected" key warnings are suppressed, improving clarity for users and maintaining compatibility across different transformers library versions.

Problem

Loading a CausalLM checkpoint into AutoModelForSequenceClassification (as RewardTrainer does when model is a string) produces two harmless LOAD REPORT warnings since transformers v4.57.2:

MISSING score.weight — new seq-clf head, not in the checkpoint, randomly initialized.
UNEXPECTED lm_head.weight — causal LM head, in the checkpoint but absent from seq-clf.

suppress_seqcls_warning already suppressed the MISSING side. See:

However, the UNEXPECTED warning is not suppressed yet: https://github.com/huggingface/trl/actions/runs/23131343032/job/67185204370

Qwen2ForSequenceClassification LOAD REPORT from: trl-internal-testing/tiny-Qwen2ForCausalLM-2.5
Key            | Status     |  | 
---------------+------------+--+-
lm_head.weight | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.

Solution

This PR extends it to also suppress the UNEXPECTED side, and renames the internal helper to reflect that it now covers both directions.

Changes

Warning suppression improvements:

Updated the warning suppression logic to handle both "missing" (score.weight) and "unexpected" (lm_head.weight) keys when loading a CausalLM checkpoint into a sequence classification model, providing clearer comments and more robust regex patterns.
Enhanced the context manager for newer transformers versions (>= 4.57.0) to ignore both missing and unexpected keys by updating _keys_to_ignore_on_load_missing and introducing _keys_to_ignore_on_load_unexpected on GenericForSequenceClassification.
Updated the version-aware suppression wrapper to use the new context manager for recent transformers versions and the improved logging filter for older versions, ensuring consistent behavior.

Note

Low Risk
Low risk: only adjusts warning-suppression logic during reward-model loading to also ignore expected lm_head.* unexpected keys on newer transformers versions.

Overview
When loading a CausalLM checkpoint via AutoModelForSequenceClassification in RewardTrainer, the PR expands warning suppression to cover both expected cross-architecture load reports: missing score.weight and (in transformers>=4.57.0) unexpected lm_head.* keys.

It renames/clarifies the internal context managers and, for newer transformers, temporarily sets both GenericForSequenceClassification._keys_to_ignore_on_load_missing and _keys_to_ignore_on_load_unexpected, while keeping the logging-filter fallback for older versions.

^{Written by Cursor Bugbot for commit a91d7b1. This will update automatically on new commits. Configure here.}

…h_keys

HuggingFaceDocBuilderDev · 2026-03-17T10:32:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec

Perfect

commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 3972d66 Author: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Date: Wed Mar 18 22:26:44 2026 +0100 Suggest the `Json()` type for tool calling dataset format (#5307) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 5c6e915 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 14:55:19 2026 -0600 Update `RewardFunc` type annotation to allow `None`values in reward list (#5297) commit ee96845 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Mar 18 17:03:54 2026 +0100 Fix DPOTrainer collators to truncate sequences before padding (#5305) commit 435c2ae Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 08:09:42 2026 -0600 Add guidance to avoid `hasattr` and `getattr` with defaults in `AGENTS.md` (#5294) commit 26ce6a3 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 00:44:12 2026 -0600 Apply docstyle (#5296) commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

…ward model (huggingface#5295)

albertvillanova added 5 commits March 17, 2026 11:17

Update comment

6350418

Update suppress_from_pretrained_warning

704bd64

Update ignore_seqcls_score_missing_key

81e26d6

Rename ignore_seqcls_score_missing_key to _ignore_seqcls_cross_arch_keys

0686133

Rename suppress_from_pretrained_warning to _suppress_seqcls_cross_arc…

e11d4e5

…h_keys

Revert because UNEXPECTED was not emitted for transformers < 4.57.0

a91d7b1

qgallouedec approved these changes Mar 17, 2026

View reviewed changes

albertvillanova merged commit 52cd0cc into huggingface:main Mar 17, 2026
11 of 12 checks passed

songhappy pushed a commit to songhappy/trl that referenced this pull request Apr 20, 2026

Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a re…

d5dad5e

…ward model (huggingface#5295)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model#5295

Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model#5295
albertvillanova merged 6 commits into
huggingface:mainfrom
albertvillanova:fix-load-report-unexpected

albertvillanova commented Mar 17, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2026

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Mar 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2026

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Mar 17, 2026 •

edited by cursor Bot

Loading