plan to release SWAG code? #38

eveliao · 2018-11-03T03:45:14Z

Hi, I just want to know if you plan to release fine-tuning and evaluation code for SWAG dataset.
If not, I wonder if the training procedure is same as MRPC. (more specificly, label 0 for distractors and 1 for gold-ending)

jacobdevlin-google · 2018-11-05T18:18:07Z

For maintainability reasons we don't plan on releasing more code than what we've released (except for the gradient accumulation code that we've promised). You could train it as a binary classification, but we actually did something different where you softmax over the logits from different examples. This only requires a few lines of code but does require changing the input processing.

Let's assume your batch size is 8 and your sequence length is 128. Each SWAG example has 4 entries, the correct one and 3 incorrect ones.

Instead of your input_fn returning an input_ids of size [128], it should return one of size [4, 128]. Same for mask and sequence ids. So for each example, you will generate the sequences predicate ending0, predicate ending1, predicate ending2, predicate ending3. Also return a label scalar which is in an integer in the range [0, 3] to indicate what the gold ending is.
After batching, your model_fn will get an input of shape [8, 4, 128]. Reshape these to [32, 128] before passing them into BertModel. I.e., BERT will consider all of these independently.
Compute the logits as in run_classifier.py, but your "classifier layer" will just be a vector of size [768] (or whatever your hidden size is).
Now you have a set of logits of size [32]. Re-shape these back into [8, 4] and then compute tf.nn.log_softmax() over the 4 endings for each example. Now you have log probabilities of shape [8, 4] over the 4 endings and a label tensor of shape [8], so compute the loss exactly as you would for a classification problem.

jacobdevlin-google closed this as completed Nov 6, 2018

xwzhong mentioned this issue Nov 8, 2018

How to run bert on SWAG dataset #86

Closed

rodgzilla mentioned this issue Dec 6, 2018

BertForMultipleChoice and Swag dataset example. huggingface/transformers#96

Merged

klxiao mentioned this issue Mar 5, 2019

Tensor wrong shape (swag) #479

Closed

eric-haibin-lin mentioned this issue Mar 20, 2019

[BERT] Reproduce BERT on SWAG dmlc/gluon-nlp#599

Open

NielsRogge mentioned this issue Oct 12, 2020

Strange error while using the LongformerForMultipleChoice huggingface/transformers#7701

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plan to release SWAG code? #38

plan to release SWAG code? #38

eveliao commented Nov 3, 2018

jacobdevlin-google commented Nov 5, 2018

plan to release SWAG code? #38

plan to release SWAG code? #38

Comments

eveliao commented Nov 3, 2018

jacobdevlin-google commented Nov 5, 2018