Skip to content

Conversation

@lematt1991
Copy link
Contributor

@lematt1991 lematt1991 commented Sep 15, 2025

What does this PR do?

ModernBertPretrainedModel currently references it's sub-classes when initializing weights. This breaks things when you try to create a new model that inherits from this class and use utils/modular_model_converter.py, since it will pull in the sub-classes as dependencies, for example:

# This causes an error because `ModernBertPreTrainedModel` hasn't been defined yet!
class ModernBertForSequenceClassification(ModernBertPreTrainedModel)
    ....

class ModernBertPreTrainedModel(PreTrainedModel):
    ...

This removes any references to sub-classes inside of ModernBertPreTrainedModel breaking the circular reference.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

CC @ArthurZucker and @tomaarsen who reviewed #35158

@Rocketknight1
Copy link
Member

cc @Cyrilvallez in case the modular converter can do anything in these cases

@Cyrilvallez
Copy link
Member

Cyrilvallez commented Sep 16, 2025

Humm, those patterns are indeed discouraged, but it probably means that you need to overwrite the method explicitly in your modular @lematt1991, as otherwise you will still end-up with unnecessary code (those same branches won't import dependencies, but will still exist) 🤔 Could you expand a bit more on your issue before we take a decision on this PR?

@Rocketknight1 modular cannot know whether the user intents to add the dependencies or not unfortunately - it simply check if it finds an object that it does not know about and is not redefined in modular, and if it's the case, it imports it as a dependency. For _init_weights of PreTrainedModel, this pattern of checking downstream classes unfortunately exists in some models, but I am against making an exception to the modular rules for this method specifically, as it's quite hard to maintain, and very unexpected in some cases. And we found a set of rules that make modular sound without magic, so let's avoid it! The method should instead be redefined if needed! (or the original model logic can be changed if more practical, such as this PR, but let's wait to see if it's worth it in this case!)

@lematt1991
Copy link
Contributor Author

Could you expand a bit more on your issue

@Cyrilvallez thanks for your response! I'm trying to create a new model that has ModernBertModel as a component. To do this, I create a subclass of it and use utils/modular_model_converter.py to pull it in:

class MyModelModernBertModel(ModernBertModel):
    ....

class MyModel(PretrainedModel):
    def __init__(self, ...):
        self.text_encoder = MyModelModernBertModel(...)

After running modular_model_converter without this PR, it pulls in all of the other sub-classes (ModernBertForMaskedLM, ModernBertForSequenceClassification, etc) because they are referenced in the _init_weights method. After this PR, they are excluded.

as otherwise you will still end-up with unnecessary code (those same branches won't import dependencies, but will still exist)

Sorry, I'm not sure I follow, with this fix we no longer end up with the (ModernBertForMaskedLM, ModernBertForSequenceClassification, etc) classes. You can test by creating a new file (src/transformers/models/test/modular_test.py) with:

from ..modeling_modernbert import ModernBertModel
class MyModernBertModel(ModernBertModel): ...

And then running python utils/modular_model_converter.py test. You can check that there aren't any import errors with ruff check src/transformers/models/test/ (without this PR ruff check should fail)

@Cyrilvallez
Copy link
Member

Cyrilvallez commented Sep 16, 2025

with this fix we no longer end up with the (ModernBertForMaskedLM, ModernBertForSequenceClassification, etc) classes

Yes for sure, but then the _init_weights which is created still has the branches lif isinstance(module, PreTrainedModel) and hasattr(module, "decoder"): etc which are useless and should not be there.
So the proper way to do it would be to overwrite that method to only init what's needed without having unreachable branches

And doing so would not require any changes to the current ModernBert

@lematt1991
Copy link
Contributor Author

Ah I see what you're saying.

but then the _init_weights which is created still has the branches lif isinstance(module, PreTrainedModel) and hasattr(module, "decoder"): etc which are useless and should not be there.

They are useless, but they aren't harming anything. I think the problem with the proposed solution is that I need to copy ModernBertModel in its entirety, and then subclass my copy of ModernBertPreTrainedModel which overrides _init_weights to disregard the modules that I don't care about. Basically this - unless I'm still misunderstanding something.

As opposed to the one liner I mentioned here

@lematt1991
Copy link
Contributor Author

@Cyrilvallez, one more argument I'd like to make in favor of this change is that it preserves the abstraction of utils/modular_model_converter.py.

utils/modular_model_converter.py allows you to import components from other models, while maintaining the "single file policy".

Fundamentally, what I want to do is import ModernBertModel to be used in my new model that I'm implementing. If it weren't for this "single file policy" in transformers I would import only ModernBertModel, but because of this issue I need to "look inside the implementation" of ModernBert, copy and paste all of ModernBertModel and then override the _init_weights method from it's parent class.

@Cyrilvallez
Copy link
Member

Cyrilvallez commented Sep 17, 2025

We never want to add useless code in Transformers, even if harmless hahaha. It makes the code hard to read, and increases a lot the maintenance burden! We put a lot of efforts into trying to have as minimal code as possible. In your case, you can simply do in modular:

class NewPreTrainedModel(ModernBertPreTrainedModel)
     def _init_weights(self, module):
          ....

class NewModel(ModernBertModel):
    pass

and it will work out.

So you don't have to redefine anything more than what you would have with the change in this PR, only the _init_weights method. And as I said we want to redefine it anyway, to avoid useless branches!

@lematt1991
Copy link
Contributor Author

Ah you're right that is much simpler, thank you!

I do still think this is worth fixing since anyone trying to import ModernBertModel in the future will run into the same issue. What do you think about refactoring so that the classifer/decoder layers are initialized only in the sub-classes? This way there won't be any useless code after modular_model_converter.py

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: modernbert, modernbert_decoder

@Cyrilvallez
Copy link
Member

Cyrilvallez commented Sep 17, 2025

What I don't like so much about your original change is that we loose the explicitness, i.e. we don't know which classes are impacted by those branches anymore. To circumvent the modular issue, and keep the explicitness, we can use name checks tho, then I'd be in favor of this PR 🤗
So things such as elif module.__class__.__name__ == "ModernBertForMaskedLM" instead

@lematt1991
Copy link
Contributor Author

@Cyrilvallez, what do you think about the latest changes? This pushes specific weight initialization into the sub-classes

@Cyrilvallez
Copy link
Member

It completely changes the inheritance graph in a way that we don't do 🙂
Sorry, but as the original issue is moot (it was simply a slight misunderstanding about how modular should be used), I don't think we want that... Now that I think about it more, switching to string matching is not really something we want either, as class instance checks are much more pythonic 🙂
So I actually don't thin we want any change to ModernBert... Sorry for the back and forth, and thanks a lot for opening the PR anyway!

@lematt1991 lematt1991 closed this Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants