-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attention layer output #9
Comments
@krayush07 you are right. Have you tested the implementation by changing the attention to the hidden vectors? |
@krayush07 - In the paper this is how the projection is done: |
@heisenbugfix I agree to what you mentioned in the previous comment. However, final attention is applied on hidden states and NOT projected vector. As per my understanding, here are the steps apply attention:
I find a mismatch in 4th step in your code. Please correct me if I am wrong. |
@krayush07 Ah I get it. Thanks for clarifying. |
@krayush07 I think it's more of a personal choice where to apply attention weights. In the paper, the authors project the hidden state to the same dimension and then compute attention and apply it to the hidden state. However, in this implementation he projects the hidden state to a lower dimension to compute attention. |
The method
task_specific_attention
applies attention to the projected vectors instead of the hidden vectors (output from RNN cell).Has it been applied purposefully or has the information on attention according to the paper been missed out where final sentence vector is weighted summation of hidden states and NOT inner projected vector?
The text was updated successfully, but these errors were encountered: