Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions posts/2015-08-Understanding-LSTMs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -169,12 +169,12 @@ <h2 id="the-core-idea-behind-lstms">The Core Idea Behind LSTMs</h2>
<p>The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”</p>
<p>An LSTM has three of these gates, to protect and control the cell state.</p>
<h2 id="step-by-step-lstm-walk-through">Step-by-Step LSTM Walk Through</h2>
<p>The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks at <span class="math">\(h_{t-1}\)</span> and <span class="math">\(x_t\)</span>, and outputs a number between <span class="math">\(0\)</span> and <span class="math">\(1\)</span> for each number in the cell state <span class="math">\(C_{t-1}\)</span>. A <span class="math">\(1\)</span> represents “completely keep this” while a <span class="math">\(0\)</span> represents “completely get rid of this.”</p>
<p>The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “<b>forget gate</b> layer.” It looks at <span class="math">\(h_{t-1}\)</span> and <span class="math">\(x_t\)</span>, and outputs a number between <span class="math">\(0\)</span> and <span class="math">\(1\)</span> for each number in the cell state <span class="math">\(C_{t-1}\)</span>. A <span class="math">\(1\)</span> represents “completely keep this” while a <span class="math">\(0\)</span> represents “completely get rid of this.”</p>
<p>Let’s go back to our example of a language model trying to predict the next word based on all the previous ones. In such a problem, the cell state might include the gender of the present subject, so that the correct pronouns can be used. When we see a new subject, we want to forget the gender of the old subject.</p>
<div style="width:90%; margin-left:auto; margin-right:auto; margin-bottom:8px; margin-top:8px;">
<img src="img/LSTM3-focus-f.png" alt>
</div>
<p>The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values, <span class="math">\(\tilde{C}_t\)</span>, that could be added to the state. In the next step, we’ll combine these two to create an update to the state.</p>
<p>The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid layer called the “<b>input gate</b> layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values, <span class="math">\(\tilde{C}_t\)</span>, that could be added to the state. In the next step, we’ll combine these two to create an update to the state.</p>
<p>In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the old one we’re forgetting.</p>
<div style="width:90%; margin-left:auto; margin-right:auto; margin-bottom:8px; margin-top:8px;">
<img src="img/LSTM3-focus-i.png" alt>
Expand Down