Remove reference to subclasses in modernbert #40896

lematt1991 · 2025-09-15T17:31:30Z

What does this PR do?

ModernBertPretrainedModel currently references it's sub-classes when initializing weights. This breaks things when you try to create a new model that inherits from this class and use utils/modular_model_converter.py, since it will pull in the sub-classes as dependencies, for example:

# This causes an error because `ModernBertPreTrainedModel` hasn't been defined yet!
class ModernBertForSequenceClassification(ModernBertPreTrainedModel)
    ....

class ModernBertPreTrainedModel(PreTrainedModel):
    ...

This removes any references to sub-classes inside of ModernBertPreTrainedModel breaking the circular reference.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

CC @ArthurZucker and @tomaarsen who reviewed #35158

Rocketknight1 · 2025-09-16T12:24:12Z

cc @Cyrilvallez in case the modular converter can do anything in these cases

Cyrilvallez · 2025-09-16T15:11:25Z

Humm, those patterns are indeed discouraged, but it probably means that you need to overwrite the method explicitly in your modular @lematt1991, as otherwise you will still end-up with unnecessary code (those same branches won't import dependencies, but will still exist) 🤔 Could you expand a bit more on your issue before we take a decision on this PR?

@Rocketknight1 modular cannot know whether the user intents to add the dependencies or not unfortunately - it simply check if it finds an object that it does not know about and is not redefined in modular, and if it's the case, it imports it as a dependency. For _init_weights of PreTrainedModel, this pattern of checking downstream classes unfortunately exists in some models, but I am against making an exception to the modular rules for this method specifically, as it's quite hard to maintain, and very unexpected in some cases. And we found a set of rules that make modular sound without magic, so let's avoid it! The method should instead be redefined if needed! (or the original model logic can be changed if more practical, such as this PR, but let's wait to see if it's worth it in this case!)

lematt1991 · 2025-09-16T15:23:35Z

Could you expand a bit more on your issue

@Cyrilvallez thanks for your response! I'm trying to create a new model that has ModernBertModel as a component. To do this, I create a subclass of it and use utils/modular_model_converter.py to pull it in:

class MyModelModernBertModel(ModernBertModel):
    ....

class MyModel(PretrainedModel):
    def __init__(self, ...):
        self.text_encoder = MyModelModernBertModel(...)

After running modular_model_converter without this PR, it pulls in all of the other sub-classes (ModernBertForMaskedLM, ModernBertForSequenceClassification, etc) because they are referenced in the _init_weights method. After this PR, they are excluded.

as otherwise you will still end-up with unnecessary code (those same branches won't import dependencies, but will still exist)

Sorry, I'm not sure I follow, with this fix we no longer end up with the (ModernBertForMaskedLM, ModernBertForSequenceClassification, etc) classes. You can test by creating a new file (src/transformers/models/test/modular_test.py) with:

from ..modeling_modernbert import ModernBertModel
class MyModernBertModel(ModernBertModel): ...

And then running python utils/modular_model_converter.py test. You can check that there aren't any import errors with ruff check src/transformers/models/test/ (without this PR ruff check should fail)

Cyrilvallez · 2025-09-16T16:22:13Z

with this fix we no longer end up with the (ModernBertForMaskedLM, ModernBertForSequenceClassification, etc) classes

Yes for sure, but then the _init_weights which is created still has the branches lif isinstance(module, PreTrainedModel) and hasattr(module, "decoder"): etc which are useless and should not be there.
So the proper way to do it would be to overwrite that method to only init what's needed without having unreachable branches

And doing so would not require any changes to the current ModernBert

lematt1991 · 2025-09-16T16:32:09Z

Ah I see what you're saying.

but then the _init_weights which is created still has the branches lif isinstance(module, PreTrainedModel) and hasattr(module, "decoder"): etc which are useless and should not be there.

They are useless, but they aren't harming anything. I think the problem with the proposed solution is that I need to copy ModernBertModel in its entirety, and then subclass my copy of ModernBertPreTrainedModel which overrides _init_weights to disregard the modules that I don't care about. Basically this - unless I'm still misunderstanding something.

As opposed to the one liner I mentioned here

lematt1991 · 2025-09-16T19:20:35Z

@Cyrilvallez, one more argument I'd like to make in favor of this change is that it preserves the abstraction of utils/modular_model_converter.py.

utils/modular_model_converter.py allows you to import components from other models, while maintaining the "single file policy".

Fundamentally, what I want to do is import ModernBertModel to be used in my new model that I'm implementing. If it weren't for this "single file policy" in transformers I would import only ModernBertModel, but because of this issue I need to "look inside the implementation" of ModernBert, copy and paste all of ModernBertModel and then override the _init_weights method from it's parent class.

Cyrilvallez · 2025-09-17T09:48:00Z

We never want to add useless code in Transformers, even if harmless hahaha. It makes the code hard to read, and increases a lot the maintenance burden! We put a lot of efforts into trying to have as minimal code as possible. In your case, you can simply do in modular:

class NewPreTrainedModel(ModernBertPreTrainedModel)
     def _init_weights(self, module):
          ....

class NewModel(ModernBertModel):
    pass

and it will work out.

So you don't have to redefine anything more than what you would have with the change in this PR, only the _init_weights method. And as I said we want to redefine it anyway, to avoid useless branches!

lematt1991 · 2025-09-17T11:07:00Z

Ah you're right that is much simpler, thank you!

I do still think this is worth fixing since anyone trying to import ModernBertModel in the future will run into the same issue. What do you think about refactoring so that the classifer/decoder layers are initialized only in the sub-classes? This way there won't be any useless code after modular_model_converter.py

Also, fix lint

github-actions · 2025-09-17T12:16:25Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: modernbert, modernbert_decoder

Cyrilvallez · 2025-09-17T15:22:34Z

What I don't like so much about your original change is that we loose the explicitness, i.e. we don't know which classes are impacted by those branches anymore. To circumvent the modular issue, and keep the explicitness, we can use name checks tho, then I'd be in favor of this PR 🤗
So things such as elif module.__class__.__name__ == "ModernBertForMaskedLM" instead

lematt1991 · 2025-09-17T15:25:16Z

@Cyrilvallez, what do you think about the latest changes? This pushes specific weight initialization into the sub-classes

Cyrilvallez · 2025-09-17T15:35:00Z

It completely changes the inheritance graph in a way that we don't do 🙂
Sorry, but as the original issue is moot (it was simply a slight misunderstanding about how modular should be used), I don't think we want that... Now that I think about it more, switching to string matching is not really something we want either, as class instance checks are much more pythonic 🙂
So I actually don't thin we want any change to ModernBert... Sorry for the back and forth, and thanks a lot for opening the PR anyway!

Remove reference to subclasses in modernbert

401ce8f

lematt1991 added 2 commits September 17, 2025 04:36

Move specific weight initialization into sub-classes

8775d47

Use correct output dimension for ModernBertForMultipleChoice

702b312

Also, fix lint

lematt1991 closed this Sep 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove reference to subclasses in modernbert #40896

Remove reference to subclasses in modernbert #40896

Uh oh!

lematt1991 commented Sep 15, 2025 •

edited

Loading

Uh oh!

Rocketknight1 commented Sep 16, 2025

Uh oh!

Cyrilvallez commented Sep 16, 2025 •

edited

Loading

Uh oh!

lematt1991 commented Sep 16, 2025

Uh oh!

Cyrilvallez commented Sep 16, 2025 •

edited

Loading

Uh oh!

lematt1991 commented Sep 16, 2025

Uh oh!

lematt1991 commented Sep 16, 2025

Uh oh!

Cyrilvallez commented Sep 17, 2025 •

edited

Loading

Uh oh!

lematt1991 commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

Cyrilvallez commented Sep 17, 2025 •

edited

Loading

Uh oh!

lematt1991 commented Sep 17, 2025

Uh oh!

Cyrilvallez commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove reference to subclasses in modernbert #40896

Remove reference to subclasses in modernbert #40896

Uh oh!

Conversation

lematt1991 commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

Rocketknight1 commented Sep 16, 2025

Uh oh!

Cyrilvallez commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lematt1991 commented Sep 16, 2025

Uh oh!

Cyrilvallez commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lematt1991 commented Sep 16, 2025

Uh oh!

lematt1991 commented Sep 16, 2025

Uh oh!

Cyrilvallez commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lematt1991 commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

Cyrilvallez commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lematt1991 commented Sep 17, 2025

Uh oh!

Cyrilvallez commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lematt1991 commented Sep 15, 2025 •

edited

Loading

Cyrilvallez commented Sep 16, 2025 •

edited

Loading

Cyrilvallez commented Sep 16, 2025 •

edited

Loading

Cyrilvallez commented Sep 17, 2025 •

edited

Loading

Cyrilvallez commented Sep 17, 2025 •

edited

Loading