About padding of sentences less than the maximum length #23

yseyableach · 2021-07-15T09:06:55Z

Hello, @ruifanxu.
I would like to ask you about the filling of sentences less than the maximum length.
'''
def mask(self, seqs):
device = next(self.parameters()).device
batch_size, seq_len = seqs.shape
mask = torch.triu(torch.ones((seq_len,seq_len), dtype=torch.long), diagonal=1).to(device) # [seq_len ,seq_len]
pad = torch.eq(seqs,self.padding_idx) # [n, seq_len]
mask = torch.where(pad[:,None,None,:],1,mask[None,None,:,:]).to(device) # [n, 1, seq_len, seq_len]
return mask>0 # [n, 1, seq_len, seq_len]
'''
Suppose the longest sentence is 200.
My idea is that sentences with less than 200 words should be given zero padding so that it will not affect the attention calculation and the generation of the target sequence.
Therefore, if there are only three words in a sentence, the following 197 (step) should be 0 (zero padding), otherwise the following parameters (context= [n, step, model_dim]) will have model_dim without step .
However, the above program does not seem to be filled with zeros? Only the future mask. I want to ask if I have an understanding error? Thank you.

Looking forward to your reply!Thank you

ruifan831 · 2021-08-01T21:53:30Z

The data pass into the function has padded already, so those sentences less than max length will be padded with 0.
Before constructing the mask, pad = torch.eq(seqs, self.padding_idx) will return a boolean matrix where each place indicates if it is a pad.
Then use this variable to construct the final mask,
mask = torch.where(pad[:,None,None,:],1,mask[None,None,:,:]).to(device) # [n, 1, seq_len, seq_len]
The above code will check if the each element in the pad variable and return 1 if true, otherwise it will fill the final mask with the value in the same position of mask variable.
Hence, it is not just the future mask but also masks the place where it is a pad. Hope this answers your question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About padding of sentences less than the maximum length #23

About padding of sentences less than the maximum length #23

yseyableach commented Jul 15, 2021

ruifan831 commented Aug 1, 2021

About padding of sentences less than the maximum length #23

About padding of sentences less than the maximum length #23

Comments

yseyableach commented Jul 15, 2021

ruifan831 commented Aug 1, 2021