-
Notifications
You must be signed in to change notification settings - Fork 31.9k
ElectraForMultipleChoice #4954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ElectraForMultipleChoice #4954
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4954 +/- ##
==========================================
+ Coverage 77.24% 77.33% +0.08%
==========================================
Files 133 133
Lines 22134 22166 +32
==========================================
+ Hits 17097 17141 +44
+ Misses 5037 5025 -12
Continue to review full report at Codecov.
|
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR! I think there is a way to make it more consistent with the other models for multiple choice and avoid introducing a new class, would you mind looking into it?
src/transformers/modeling_electra.py
Outdated
| return outputs # (loss), start_logits, end_logits, (hidden_states), (attentions) | ||
|
|
||
|
|
||
| class ElectraPooler(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing a new class, I'd rather use the existing SequenceSummary which can do the pool + dropout (see the XLNetForMultipleChoice for an example. It will require to add a few things to the config to make it work (see XLNetConfig for an example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a way to make it more consistent with the other models for multiple choice and avoid introducing a new class, would you mind looking into it?
Sure, I wasn't aware of SequenceSummary. I'll try to use it instead of Pooler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But couldn't you just use the ElectraClassificationHead for that pooling 🤔:
transformers/src/transformers/modeling_electra.py
Lines 346 to 362 in 86578bb
| class ElectraClassificationHead(nn.Module): | |
| """Head for sentence-level classification tasks.""" | |
| def __init__(self, config): | |
| super().__init__() | |
| self.dense = nn.Linear(config.hidden_size, config.hidden_size) | |
| self.dropout = nn.Dropout(config.hidden_dropout_prob) | |
| self.out_proj = nn.Linear(config.hidden_size, config.num_labels) | |
| def forward(self, features, **kwargs): | |
| x = features[:, 0, :] # take <s> token (equiv. to [CLS]) | |
| x = self.dropout(x) | |
| x = self.dense(x) | |
| x = get_activation("gelu")(x) # although BERT uses tanh here, it seems Electra authors used gelu here | |
| x = self.dropout(x) | |
| x = self.out_proj(x) | |
| return x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly no, because it adds a linear layer to num_labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @LysandreJik it's good for your review.
LysandreJik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @patil-suraj
|
hi @LysandreJik could you tell me why these tests are failing now ? Thanks. |
|
Looks like @LysandreJik deleted a function by mistake during the merge (the |
|
My bad, sorry about that. Thanks for the fix! |
This PR add
ElectraForMultipleChoice. One of the missing models in this project.Since, for multiple choice pooled outputs are needed, added
ElectraPoolerclass.@sgugger , @LysandreJik