Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Conversation

@bogdankostic
Copy link
Contributor

@bogdankostic bogdankostic commented Oct 21, 2020

This PR should fix #519. @brandenchan @stefan-it @PhilipMay

Summary of the bug

When training an ELECTRA model, saving it and loading it later on, the loaded model produced different (much worse) predictions than the trained model.

Root of the bug

In this transformers PR, summary_use_proj was set by default to True, which means that a linear layer is stacked on top of pooling. Therefore, each time an ELECTRA model is loaded in FARM, the weights of this linear layer are randomly initialized.

Fix

Set summary_use_proj to False in order to not add that linear layer.

@Timoeller Timoeller self-requested a review October 21, 2020 15:12
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding! This was a nasty one :)

Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks for the detailed description @bogdankostic

What a big headache this behaviour created and what small changes needed to fix that headache.

@Timoeller Timoeller merged commit dac388a into master Oct 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Getting different predictions on different runs with same ELECTRA model.

4 participants