Shift labels during finetuning and provide correct padding mask #101

LianaMikael · 2024-02-16T16:32:42Z

This PR makes sure that the labels are shifted to the right when computing the loss and provides a correct mask to mask out the padding tokens while keeping the other special tokens.

nailimixaM

This change makes me slightly nervous as there will be consequences on the results we report in the paper. We should therefore spend some time to evaluate the key metrics before and after this change. To do this, we should do the following:

Slice phi-2 25% on alpaca, 2048 calibration samples [no change on ppl expected]
RFT with hyperparams: --ppl-eval-dataset alpaca --finetune-dataset alpaca --finetune-train-nsamples 8000 --finetune-train-seqlen 1024 --finetune-train-batch-size 3 --lora-alpha 10 --lora-dropout 0.05 --lora-r 32 --eval-steps 128 --lora-target-modules attn_head_and_mlp (NB not the lm head as well, as this wasn't done in the paper) [change expected]
Lm eval on --tasks piqa should suffice [change expected]

The Lm eval on piqa before the change should match exactly what we report in the Table 10 in the appendix of the paper.

nailimixaM · 2024-02-16T17:04:43Z

experiments/run_finetuning.py

@@ -62,6 +62,21 @@ def get_train_dataloader(self) -> DataLoader:

    def get_eval_dataloader(self, _) -> DataLoader:
        return self.test_loader
+
+    def compute_loss(self, model, inputs, return_outputs=False):


As this is most of the calculation for ppl can we make this one and our gpu_utils implementation consistent, and add a test? Better still would be to call a common subroutine from here and from the the ppl calc implementation. NB Pashmin'as ppl refactoring PR should go in first.

Yes, the perplexity calculation should have the same changes, good idea to combine them

nailimixaM · 2024-02-16T17:05:25Z

experiments/run_finetuning.py

+        labels = inputs.pop('labels')
+        attention_mask = inputs["attention_mask"]
+        outputs = model(**inputs)
+        labels = labels[..., 1:].contiguous()


Nit: we should double check whether contiguous is required here, as we don't use it in our ppl implementation.

nailimixaM · 2024-03-01T17:13:39Z

So after some digging I found that our CustomTrainer's loss_fn never gets called! I confirmed this by commenting out the loss_fn -> the code still runs fine.

Instead the Trainer's default compute_metrics is called, which calculates the correct loss here. The LabelSmoother with smoothing 0 (by default) calculates the crossentropy loss, using a padding mask based off of the ignore token. @LianaMikael could you check if that implementation makes sense, or if we need to override it? Thanks

LianaMikael · 2024-03-01T17:42:46Z

So after some digging I found that our CustomTrainer's loss_fn never gets called! I confirmed this by commenting out the loss_fn -> the code still runs fine.

Instead the Trainer's default compute_metrics is called, which calculates the correct loss here. The LabelSmoother with smoothing 0 (by default) calculates the crossentropy loss, using a padding mask based off of the ignore token. @LianaMikael could you check if that implementation makes sense, or if we need to override it? Thanks

In the change that is added here the loss is being called under the compute_loss function. I think we need to use the padding mask that we created rather than just ignoring all padding tokens since padding tokens are used for other special characters too, hence this change should take care of it.

Fix fine-tuning loss computation

262fed1

nailimixaM requested changes Feb 16, 2024

View reviewed changes

LianaMikael mentioned this pull request Feb 27, 2024

Rename scripts, fix logging, update GPU tests #107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shift labels during finetuning and provide correct padding mask #101

Shift labels during finetuning and provide correct padding mask #101

LianaMikael commented Feb 16, 2024

nailimixaM left a comment

nailimixaM Feb 16, 2024

LianaMikael Feb 19, 2024

nailimixaM Feb 16, 2024

nailimixaM commented Mar 1, 2024

LianaMikael commented Mar 1, 2024

Shift labels during finetuning and provide correct padding mask #101

Are you sure you want to change the base?

Shift labels during finetuning and provide correct padding mask #101

Conversation

LianaMikael commented Feb 16, 2024

nailimixaM left a comment

Choose a reason for hiding this comment

nailimixaM Feb 16, 2024

Choose a reason for hiding this comment

LianaMikael Feb 19, 2024

Choose a reason for hiding this comment

nailimixaM Feb 16, 2024

Choose a reason for hiding this comment

nailimixaM commented Mar 1, 2024

LianaMikael commented Mar 1, 2024