Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About padding of sentences less than the maximum length #23

Open
yseyableach opened this issue Jul 15, 2021 · 1 comment
Open

About padding of sentences less than the maximum length #23

yseyableach opened this issue Jul 15, 2021 · 1 comment

Comments

@yseyableach
Copy link

Hello, @ruifanxu.
I would like to ask you about the filling of sentences less than the maximum length.
'''
def mask(self, seqs):
device = next(self.parameters()).device
batch_size, seq_len = seqs.shape
mask = torch.triu(torch.ones((seq_len,seq_len), dtype=torch.long), diagonal=1).to(device) # [seq_len ,seq_len]
pad = torch.eq(seqs,self.padding_idx) # [n, seq_len]
mask = torch.where(pad[:,None,None,:],1,mask[None,None,:,:]).to(device) # [n, 1, seq_len, seq_len]
return mask>0 # [n, 1, seq_len, seq_len]
'''
Suppose the longest sentence is 200.
My idea is that sentences with less than 200 words should be given zero padding so that it will not affect the attention calculation and the generation of the target sequence.
Therefore, if there are only three words in a sentence, the following 197 (step) should be 0 (zero padding), otherwise the following parameters (context= [n, step, model_dim]) will have model_dim without step .
However, the above program does not seem to be filled with zeros? Only the future mask. I want to ask if I have an understanding error? Thank you.

Looking forward to your reply!Thank you

@ruifan831
Copy link
Contributor

The data pass into the function has padded already, so those sentences less than max length will be padded with 0.
Before constructing the mask, pad = torch.eq(seqs, self.padding_idx) will return a boolean matrix where each place indicates if it is a pad.
Then use this variable to construct the final mask,
mask = torch.where(pad[:,None,None,:],1,mask[None,None,:,:]).to(device) # [n, 1, seq_len, seq_len]
The above code will check if the each element in the pad variable and return 1 if true, otherwise it will fill the final mask with the value in the same position of mask variable.
Hence, it is not just the future mask but also masks the place where it is a pad. Hope this answers your question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants