Skip to content

Add sequence classification capability to Granite models#44215

Open
jmriosal wants to merge 8 commits intohuggingface:mainfrom
jmriosal:add_seq_class_head_granite
Open

Add sequence classification capability to Granite models#44215
jmriosal wants to merge 8 commits intohuggingface:mainfrom
jmriosal:add_seq_class_head_granite

Conversation

@jmriosal
Copy link

@jmriosal jmriosal commented Feb 22, 2026

What does this PR do?

Add sequence classification capabilities to the family of Granite models (Granite, GraniteMoe, GraniteMoeHybrid, and GraniteMoeShared).

Fixes #44214, #35720

Why

The Granite models currently only have the base model and causal model heads, so this addition brings them more in line with other models in the library.

Proposed solution and description of changes

The following ForSequenceClassification classes were added:

  • GraniteForSequenceClassification
  • GraniteMoeForSequenceClassification
  • GraniteMoeHybridForSequenceClassification
  • GraniteMoeSharedForSequenceClassification

using the existing GenericForSequenceClassification, following the established pattern seen in many other models in the library. Code changes were minimal and done in a way to keep consistent logic across similar models. Changes were implemented in modular_*.py and then modeling_*.py files were automatically generated using utils/modular_model_converter.py

Updated __all__ exports with new classes in each model module.

The Auto Model Registry (MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES) has been updated to allow for auto-loading them via AutoModelForSequenceClassification

New features usage

After with PR, users should be able to load any Granite model variant for sequence classification as follows:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(granite-model-id)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

@ArthurZucker @Cyrilvallez

  Created ForSequenceClassification classes for Granite, GraniteMoe, GraniteMoeHybrid, GraniteMoeShared using the existing GenericForSequenceClassification mixin pattern. Implementation in modular_*.py

  Updated __all__ exports in each model module

  Registered all new classes in auto/modeling_auto.py
@jmriosal
Copy link
Author

working on adding new tests for the new ForSequenceClassification classes

…ng_*.py files, following the same pattern as other models in the library
@jmriosal
Copy link
Author

jmriosal commented Feb 23, 2026

The CI check_repository_consistency is failing due to a pre-existing mismatch in PR #44176 between modular and modeling files for GraniteMoeHybrid, in which modeling_granitemoehybrid.py was modified directly but modular_granitemoehybrid.py was not. This mismatch exists in the main branch and is not caused by these changes.

As a consequence, the following check FAILS:

python utils/check_modular_conversion.py --files src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py

Any advise?

Added prepare_config_and_inputs_for_sequence_classification() method to provide the correct input format for create_and_check_for_sequence_classification
…teMoeHybridModelTester available from BambaModelTester.
…ModelTester (inherited by GraniteMoeHybridModelTester)
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, granite, granitemoe, granitemoehybrid, granitemoeshared

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but you can also use GenericForSequenceClassification directly!
Kind of up to you but its generic so compatible!

@ArthurZucker
Copy link
Collaborator

don't worry we'll fix this one on main!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add sequence classification capabilities to the Granite models

2 participants