You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
eval_loss of the same set of data from the same model (gpt-neo, flan-t5, llama...) differs when using different batch size.
fromtransformersimportAutoModel, AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLMtokenizer=AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125m")
model=AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125m", padding_side="left")
# bug also happens on flan-t5# tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")# model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large")# set pad tokeniftokenizer.pad_tokenisNone:
tokenizer.pad_token=tokenizer.eos_token# For the following inputs, eval_loss is different when using different batch sizesamples= [
"Sheldon: So if a photon is directed through a plane with two slits in it and either slit is observed it will not go through both slits. If it's unobserved it will, however, if it's observed after it's left the plane but before it hits its target, it will not have gone through both slits. Leonard: Agreed, what's your point? Sheldon: There's no point, I just think it's a good idea for a tee-shirt. Leonard: Excuse me? Receptionist: Hang on. Leonard: One across is Aegean, eight down is Nabakov, twenty-six across is MCM, fourteen down is… move your finger… phylum, which makes fourteen across Port-au-Prince. See, Papa Doc's capital idea, that's Port-au-Prince. Haiti. Receptionist: Can I help you? Leonard: Yes. Um, is this the High IQ sperm bank? Receptionist: If you have to ask, maybe you shouldn't be here. Sheldon: I think this is the place. Receptionist: Fill these out. Leonard: Thank-you. We'll be right back. Receptionist: Oh, take your time. I'll just finish my crossword puzzle. Oh wait. (They sit and begin to fill in forms). Sheldon: Leonard, I don't think I can do this. Leonard: What, are you kidding? You're a semi-pro. Sheldon: No. We are committing genetic fraud. There's no guarantee that our sperm is going to generate high IQ offspring, think about that. Sheldon: So if a photon is directed through a plane with two slits in it and either slit is observed it will not go through both slits. If it's unobserved it will, however, if it's observed after it's left the plane but before it hits its target, it will not have gone through both slits. Leonard: Agreed, what's your point? Sheldon: There's no point, I just think it's a good idea for a tee-shirt. Leonard: Excuse me?",
"Sheldon: Are you still mad about the sperm bank? Leonard: No. Sheldon: You want to hear an interesting thing about stairs? Leonard: Not really. Sheldon: If the height of a single step is off by as little as two millimetres, most people will trip. Leonard: I don't care. Two millimetres? That doesn't seem right. Sheldon: No, it's true, I did a series of experiments when I was twelve, my father broke his clavicle. Leonard: Is that why they sent you to boarding school? Sheldon: No, that was the result of my work with lasers. Leonard: New neighbour? Sheldon: Evidently. Leonard: Significant improvement over the old neighbour. Sheldon: Two hundred pound transvestite with a skin condition, yes she is. Penny: Oh, hi! Leonard: Hi. Sheldon: Hi. Leonard: Hi. Sheldon: Hi. Penny: Hi? Leonard: We don't mean to interrupt, we live across the hall. Penny: Oh, that's nice. Leonard: Oh… uh… no… we don't live together… um… we live together but in separate, heterosexual bedrooms. Penny: Oh, okay, well, guess I'm your new neighbour, Penny. Leonard: Leonard, Sheldon. Penny: Hi. Leonard: Hi. Sheldon: Hi. Penny: Hi. Leonard: Hi. Well, uh, oh, welcome to the building. Penny: Thankyou, maybe we can have coffee sometime. Leonard: Oh, great. Penny: Great. Sheldon: Great. Leonard: Great. Well, bye. Penny: Bye. Sheldon: Bye. Leonard: Bye. Leonard: Should we have invited her for lunch? Sheldon: No. We're going to start Season Two of Battlestar Galactica. Leonard: We already watched the Season Two DVDs. Sheldon: Not with commentary. Leonard: I think we should be good neighbours, invite her over, make her feel welcome. Sheldon: We never invited Louis-slash-Louise over. Leonard: Well, then that was wrong of us. Sheldon: Are you still mad about the sperm bank? Leonard: No. Sheldon: You want to hear an interesting thing about stairs? Leonard: Not really.",
"Leonard: Okay, well, make yourself at home. Penny: Okay, thankyou. Leonard: You're very welcome. Penny: This looks like some serious stuff, Leonard, did you do this? Sheldon: Actually that's my work. Penny: Wow. Sheldon: Yeah, well, it's just some quantum mechanics, with a little string theory doodling around the edges. That part there, that's just a joke, it's a spoof of the Bourne-Oppenheimer approximation. Penny: So you're like, one of those, beautiful mind genius guys. Sheldon: Yeah. Penny: This is really impressive. Leonard: I have a board. If you like boards, this is my board. Penny: Holy smokes. Sheldon: If by holy smokes you mean a derivative restatement of the kind of stuff you can find scribbled on the wall of any men's room at MIT, sure. Leonard: What? Sheldon: Oh, come on. Who hasn't seen this differential below “here I sit broken hearted?” Leonard: At least I didn't have to invent twenty-six dimensions just to make the math come out. Sheldon: I didn't invent them, they're there. Leonard: In what universe? Sheldon: In all of them, that is the point. Penny: Uh, do you guys mind if I start? Sheldon: Um, Penny, that's where I sit. Penny: So, sit next to me. Sheldon: No, I sit there. Penny: What's the difference? Sheldon: What's the difference? Leonard: Here we go. Sheldon: In the winter that seat is close enough to the radiator to remain warm, and yet not so close as to cause perspiration. In the summer it's directly in the path of a cross breeze created by open windows there, and there. Leonard: Okay, well, make yourself at home. Penny: Okay, thankyou. Leonard: You're very welcome. Penny: This looks like some serious stuff, Leonard, did you do this?",
"Leonard: Uh, there it goes, it sticks, I'm sorry. Penny: Okay. Thanks. Leonard: You're welcome, oh, you're going to step right, okay, I'll…. Penny: Hey, Leonard? Leonard: The hair products are Sheldon's. Penny: Um, okay. Can I ask you a favour. Leonard: A favour? Sure, you can ask me a favour, I would do you a favour for you. Penny: It's okay if you say no. Leonard: Oh, I'll probably say yes. Penny: It's just not the kind of thing you ask a guy you've just met. Leonard: Wow. Leonard: Uh, there it goes, it sticks, I'm sorry. Penny: Okay. Thanks. Leonard: You're welcome, oh, you're going to step right, okay, I'll…. Penny: Hey, Leonard?"
]
model.eval()
withtorch.no_grad():
# feed all data in one batchall_batch_samples=tokenizer(samples, return_tensors="pt", padding="max_length", max_length=480, truncation=True)
labels=all_batch_samples["input_ids"].clone()
labels[labels==tokenizer.pad_token_id] =-100all_batch_samples["labels"] =labelsoutputs=model(**all_batch_samples)
all_loss=outputs.loss# feed one data sample per batch (batch size is 1)losses= []
foriinrange(len(all_batch_samples["input_ids"])):
batch_samples=tokenizer(samples[i], return_tensors="pt", padding="max_length", max_length=480, truncation=True)
labels=batch_samples["input_ids"].clone()
labels[labels==tokenizer.pad_token_id] =-100batch_samples["labels"] =labelsfork, vinbatch_samples.items():
# always trueassert (all_batch_samples[k][i] ==batch_samples[k]).all()
losses.append(model(**batch_samples).loss)
losses=torch.stack(losses)
print(f"BS=1: {losses.mean()}", "*"*5, f"BS=all: {all_loss}", "*"*5, f"Losses: {losses}")
# BS=1: 3.6513803005218506 ***** BS=all: 3.6280925273895264 ***** Losses: tensor([3.5703, 3.4178, 3.8621, 3.7554])
Expected behavior
I think the loss should be exactly the same with different batch sizes. I wonder why the deviation happens.
The text was updated successfully, but these errors were encountered:
This is because this is a causal LM models, where the loss is computed across the non-padding tokens.
The loss (returned from the model's forward) is the total loss divided by the number of non-padding tokens sent to the model.
In your case (4 examples), they have 438, 461, 423 and 183 non-padding tokens, a total of 1505.
For each single example, the (averaged) loss is 2.5674, 2.7242, 2.9536 and 2.3945. Multiplying by the corresponding number of non-padding tokens, we get 1124.5172, 1255.8704, 1249.3870 and 438.1949. Summing them gives the total loss of 4067.9697.
Divided by 1505 (the total number of non-padding tokens in the batch), we get 4067.9697 / 1505 = 2.7031, which is the loss we get when sending the batch to the model. (There is a slight precision issue above, but it's fine)
This is known and not an real issue. However, if you want to have full control, you can call model's forward without labels and compute it in your own code.
System Info
transformers
version: 4.30.0Who can help?
@ArthurZucker @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
eval_loss
of the same set of data from the same model (gpt-neo
,flan-t5
,llama
...) differs when using different batch size.Expected behavior
I think the loss should be exactly the same with different batch sizes. I wonder why the deviation happens.
The text was updated successfully, but these errors were encountered: