-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conformer Modules #58
Conversation
See also my other comments (they don't show up in the "Files changed" view because they got outdated due to the changed indent). Mark them as "resolved" when you resolved them. |
Btw, as usual, also see failing tests. |
c0a1f78
to
6e3a651
Compare
6989cea
to
ef51266
Compare
Merged now. Let's further improve this in later PRs or commits. |
A question on the ConformerConvSubsample: I think this is not exactly described in the original paper, right? So where is it exactly described? Where did you base this on? |
Yes, in the original paper it is not described. The one implemented here was based on ESPNet code. See here: |
But in the original paper, there were some references to other paper they referred to on the preprocessing. I think it is explained in those other papers, or not? Also, in your implementation here, you use maxpooling as far as I see. But this is different to ESPnet. In ESPnet, they don't use any pooling but just striding instead. Where do you have the pooling from? |
Also, in ESPnet there are some other variants, including VGG2L (and I think VGG-style was also mentioned in the Conformer paper, or the one it refers to?). VGG2L looks also similar to @christophmluscher 's hybrid baseline? |
Further, in ESPnet ( This is usually Then, for the self-attention, it usually uses |
Also related: espnet/espnet#2816, espnet/espnet#2684 |
Ok then it is not exactly the same. you are right they use striding instead. |
Fix #54.
This is a draft now. It still requires the implementation of MultiHeadSelfAttention from #52.