For each epoch, we draw the accuracy versus iteration plot and loss versus iteration plot. And then at last, we concatenate the loss curve and accuracy curve of those five epochs in a single plot for clearness.
In this plot, the model was trained for three times to show variance of the UAS Score on test data. The UAS Score result is shown in the table below.
Experiment 1 | Experiment 2 | Experiment 3 | |
---|---|---|---|
Epoch 1 | 76.89 | 76.26 | 76.67 |
Epoch 2 | 81.16 | 80.48 | 80.92 |
Epoch 3 | 83.16 | 82.34 | 83.13 |
Epoch 4 | 84.07 | 83.88 | 83.89 |
Epoch 5 | 84.80 | 83.91 | 84.79 |
Findings
From the loss and accuracy plot for each epoch, we can see the accuracy curve climbed fast and loss curve also declined fast at epoch 1. For epoch 2 and 3, the speed of accuracy climbing and loss declining slowed down, but the tendency still remained. From epoch 4 the accuracy stopped climbing and the loss stopped decreasing, both the plots fluctuate in a random way. So 3 epochs training should be appropriate for this model.
In my new parser model TwoHiddenParserModel, I tried different network designs and hyperparameters. The model is different from the original model in:
-
There are 2 hidden layers with l1_hidden_size = 500 and l2_hidden_size = 200.
-
The activation function is changed from cube to ReLU.
-
The l1_hidden_size was changed from 200 to 500.
Then we measure the test UAS score of these two models for each training epoch. The result of a model with single hidden layer but using ReLU activation function is also shown for reference.
Single Hidden Layer + Cube (Old) | Single Hidden Layers + ReLU (Reference) | Two Hidden Layers + ReLU (New) | |
---|---|---|---|
1 | 76.89 | 82.33 | 82.48 |
2 | 81.86 | 85.48 | 86.48 |
3 | 83.16 | 86.51 | 87.34 |
4 | 84.07 | 87.57 | 87.82 |
5 | 84.80 | 88.00 | 88.48 |
From the result above we can see, the change of activation function from cube to ReLU improved the UAS Score a lot. The change from one hidden layer to two hidden layers further improved the performance a bit further.
The output of parsing sentence "The big dog ate my homework" was shown below. The parser made some mistakes when parsing the first half of the sentence.
----
buffer: ['the', 'big', 'dog', 'ate', 'my', 'homework']
stack: ['<root>']
action: shift
----
buffer: ['big', 'dog', 'ate', 'my', 'homework']
stack: ['<root>', 'the']
action: shift
----
buffer: ['dog', 'ate', 'my', 'homework']
stack: ['<root>', 'the', 'big']
action: shift
----
buffer: ['ate', 'my', 'homework']
stack: ['<root>', 'the', 'big', 'dog']
action: shift
----
buffer: ['my', 'homework']
stack: ['<root>', 'the', 'big', 'dog', 'ate']
action: left arc, <d>:compound
----
buffer: ['my', 'homework']
stack: ['<root>', 'the', 'big', 'ate']
action: left arc, <d>:amod
----
buffer: ['my', 'homework']
stack: ['<root>', 'the', 'ate']
action: left arc, <d>:det
----
buffer: ['my', 'homework']
stack: ['<root>', 'ate']
action: shift
----
buffer: ['homework']
stack: ['<root>', 'ate', 'my']
action: shift
----
buffer: []
stack: ['<root>', 'ate', 'my', 'homework']
action: left arc, <d>:nmod:poss
----
buffer: []
stack: ['<root>', 'ate', 'homework']
action: right arc, <d>:dep
----
buffer: []
stack: ['<root>', 'ate']
action: right arc, <d>:root
<root>
|
ate
____|___________
| | | homework
| | | |
the big dog my
The correct operation for each step should be:
----
buffer: ['the', 'big', 'dog', 'ate', 'my', 'homework']
stack: ['<root>']
action: shift
----
buffer: ['big', 'dog', 'ate', 'my', 'homework']
stack: ['<root>', 'the']
action: shift
----
buffer: ['dog', 'ate', 'my', 'homework']
stack: ['<root>', 'the', 'big']
action: shift
----
buffer: ['ate', 'my', 'homework']
stack: ['<root>', 'the', 'big', 'dog']
action: left arc
----
buffer: ['ate', 'my', 'homework']
stack: ['<root>', 'the', 'dog']
action: left arc,
----
buffer: ['ate', 'my', 'homework']
stack: ['<root>', 'dog']
action: shift,
----
buffer: ['my', 'homework']
stack: ['<root>', 'dog', 'ate']
action: left arc,
----
buffer: ['my', 'homework']
stack: ['<root>', 'ate']
action: shift
----
buffer: ['homework']
stack: ['<root>', 'ate', 'my']
action: shift
----
buffer: []
stack: ['<root>', 'ate', 'my', 'homework']
action: left arc,
----
buffer: []
stack: ['<root>', 'ate', 'homework']
action: right arc,
----
buffer: []
stack: ['<root>', 'ate']
action: right arc,