A question in function simple_context #9

wai7niu8 · 2016-07-25T16:27:59Z

Hi:
in train.ipynb, i'm confused about this line:
activation_energies = activation_energies + -1e20*K.expand_dims(1.-K.cast(mask[:, :maxlend],'float32'),1)
i think this line is unnecessary(i maybe wrong), please explain this line for me in detail.
And, when computing the attention weights, i think we should only use the current word's ht(ht is the time step t's hidden state) in decoding, but in the function simple_context, it use all headline words' ht every time step?
What's more,can you show me the paper or other references about how to implement the attention layer, i think i am not particularly familiar with it. Thank you.

udibr · 2016-07-25T18:57:03Z

the first line in the README file gives a link to the paper on which the code is based.
please read it several times from start to finish until you feel you understand it.
Also read the references it gives.

the line you asked about reduce the energy by a huge value in all places in which mask is zero in the part of the input (0:maxlend) which came from the article.
Latter I take a softmax of the energy and as a result locations in which the mask was zero will have almost zero weight.

simple_context works at once on all the time steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question in function simple_context #9

A question in function simple_context #9

wai7niu8 commented Jul 25, 2016

udibr commented Jul 25, 2016

A question in function simple_context #9

A question in function simple_context #9

Comments

wai7niu8 commented Jul 25, 2016

udibr commented Jul 25, 2016