-
Notifications
You must be signed in to change notification settings - Fork 4
/
log_transformer
192 lines (192 loc) · 32.6 KB
/
log_transformer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
nohup: ignoring input
| distributed init (rank 0): tcp://localhost:11547
| distributed init (rank 3): tcp://localhost:11547
| distributed init (rank 1): tcp://localhost:11547
| distributed init (rank 2): tcp://localhost:11547
Namespace(adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_softmax_cutoff=None, arch='transformer_vaswani_wmt_en_de_big', attention_dropout=0.0, clip_norm=0.0, criterion='label_smoothed_cross_entropy', data='data-raw', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=False, device_id=0, distributed_backend='nccl', distributed_init_method='tcp://localhost:11547', distributed_port=-1, distributed_rank=0, distributed_world_size=4, dropout=0.3, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=False, fp16=False, keep_interval_updates=-1, label_smoothing=0.1, left_pad_source='True', left_pad_target='False', log_format=None, log_interval=1000, lr=[0.0005], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=3000, max_update=0, min_loss_scale=0.0001, min_lr=1e-09, momentum=0.99, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, no_token_positional_embeddings=False, optimizer='adam', raw_text=True, relu_dropout=0.0, restore_file='checkpoint_last.pt', save_dir='checkpoints/transformer', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang='ch', target_lang='en', task='translation', train_subset='train', update_freq=[32], valid_subset='valid', validate_interval=1, warmup_init_lr=1e-07, warmup_updates=4000, weight_decay=0.0)
| [ch] dictionary: 30000 types
| [en] dictionary: 30000 types
| data-raw train 1252977 examples
| data-raw valid 1664 examples
| model transformer_vaswani_wmt_en_de_big, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 207077376
| training on 4 GPUs
| max tokens per GPU = 3000 and max sentences per GPU = None
| epoch 001: 1000 / 3233 loss=14.514, nll_loss=14.391, ppl=21484.29, wps=5840, ups=0.0, wpb=354342, bsz=12574, num_updates=31, lr=3.97423e-06, gnorm=0.207, clip=100%, oom=0, wall=1881
| epoch 001: 2000 / 3233 loss=13.624, nll_loss=13.392, ppl=10749.05, wps=5880, ups=0.0, wpb=353990, bsz=12482, num_updates=62, lr=7.84845e-06, gnorm=0.144, clip=100%, oom=0, wall=3733
| epoch 001: 3000 / 3233 loss=13.117, nll_loss=12.824, ppl=7249.59, wps=5892, ups=0.0, wpb=354015, bsz=12407, num_updates=93, lr=1.17227e-05, gnorm=0.116, clip=100%, oom=0, wall=5588
| epoch 001 | loss 13.015 | nll_loss 12.709 | ppl 6695.03 | wps 5893 | ups 0.0 | wpb 353968 | bsz 12403 | num_updates 101 | lr 1.27225e-05 | gnorm 0.111 | clip 100% | oom 0 | wall 6067
| epoch 001 | valid on 'valid' subset | valid_loss 10.4658 | valid_nll_loss 9.83665 | valid_ppl 914.38 | num_updates 101
| epoch 002: 1000 / 3233 loss=11.537, nll_loss=11.054, ppl=2125.65, wps=4874, ups=0.0, wpb=354388, bsz=12500, num_updates=132, lr=1.65967e-05, gnorm=0.095, clip=100%, oom=0, wall=8320
| epoch 002: 2000 / 3233 loss=11.294, nll_loss=10.779, ppl=1756.80, wps=5356, ups=0.0, wpb=354240, bsz=12362, num_updates=163, lr=2.04709e-05, gnorm=0.088, clip=100%, oom=0, wall=10167
| epoch 002: 3000 / 3233 loss=11.086, nll_loss=10.537, ppl=1485.71, wps=5532, ups=0.0, wpb=354116, bsz=12437, num_updates=194, lr=2.43452e-05, gnorm=0.087, clip=100%, oom=0, wall=12019
| epoch 002 | loss 11.038 | nll_loss 10.481 | ppl 1428.97 | wps 5560 | ups 0.0 | wpb 354087 | bsz 12406 | num_updates 202 | lr 2.5345e-05 | gnorm 0.086 | clip 100% | oom 0 | wall 12499
| epoch 002 | valid on 'valid' subset | valid_loss 9.43088 | valid_nll_loss 8.60303 | valid_ppl 388.84 | num_updates 202 | best 9.43088
| epoch 003: 1000 / 3233 loss=10.329, nll_loss=9.636, ppl=795.75, wps=4925, ups=0.0, wpb=354192, bsz=12513, num_updates=233, lr=2.92192e-05, gnorm=0.082, clip=100%, oom=0, wall=14728
| epoch 003: 2000 / 3233 loss=10.231, nll_loss=9.513, ppl=730.38, wps=5383, ups=0.0, wpb=353886, bsz=12457, num_updates=264, lr=3.30934e-05, gnorm=0.079, clip=100%, oom=0, wall=16575
| epoch 003: 3000 / 3233 loss=10.152, nll_loss=9.412, ppl=681.27, wps=5551, ups=0.0, wpb=354017, bsz=12386, num_updates=295, lr=3.69676e-05, gnorm=0.076, clip=100%, oom=0, wall=18430
| epoch 003 | loss 10.138 | nll_loss 9.394 | ppl 672.76 | wps 5581 | ups 0.0 | wpb 354079 | bsz 12403 | num_updates 303 | lr 3.79674e-05 | gnorm 0.077 | clip 100% | oom 0 | wall 18907
| epoch 003 | valid on 'valid' subset | valid_loss 9.12231 | valid_nll_loss 8.17805 | valid_ppl 289.63 | num_updates 303 | best 9.12231
| epoch 004: 1000 / 3233 loss=9.886, nll_loss=9.074, ppl=539.04, wps=4970, ups=0.0, wpb=354638, bsz=12618, num_updates=334, lr=4.18417e-05, gnorm=0.075, clip=100%, oom=0, wall=21119
| epoch 004: 2000 / 3233 loss=9.814, nll_loss=8.990, ppl=508.46, wps=5416, ups=0.0, wpb=354145, bsz=12426, num_updates=365, lr=4.57159e-05, gnorm=0.074, clip=100%, oom=0, wall=22961
| epoch 004: 3000 / 3233 loss=9.762, nll_loss=8.928, ppl=486.95, wps=5573, ups=0.0, wpb=354147, bsz=12433, num_updates=396, lr=4.95901e-05, gnorm=0.073, clip=100%, oom=0, wall=24817
| epoch 004 | loss 9.750 | nll_loss 8.913 | ppl 482.20 | wps 5595 | ups 0.0 | wpb 354064 | bsz 12407 | num_updates 404 | lr 5.05899e-05 | gnorm 0.073 | clip 100% | oom 0 | wall 25298
| epoch 004 | valid on 'valid' subset | valid_loss 8.88035 | valid_nll_loss 7.86992 | valid_ppl 233.93 | num_updates 404 | best 8.88035
| epoch 005: 1000 / 3233 loss=9.512, nll_loss=8.641, ppl=399.16, wps=4831, ups=0.0, wpb=354242, bsz=12596, num_updates=435, lr=5.44641e-05, gnorm=0.072, clip=100%, oom=0, wall=27571
| epoch 005: 2000 / 3233 loss=9.461, nll_loss=8.582, ppl=383.16, wps=5330, ups=0.0, wpb=354270, bsz=12423, num_updates=466, lr=5.83384e-05, gnorm=0.071, clip=100%, oom=0, wall=29419
| epoch 005: 3000 / 3233 loss=9.400, nll_loss=8.513, ppl=365.39, wps=5521, ups=0.0, wpb=354177, bsz=12410, num_updates=497, lr=6.22126e-05, gnorm=0.070, clip=100%, oom=0, wall=31264
| epoch 005 | loss 9.383 | nll_loss 8.494 | ppl 360.56 | wps 5550 | ups 0.0 | wpb 354089 | bsz 12406 | num_updates 505 | lr 6.32124e-05 | gnorm 0.070 | clip 100% | oom 0 | wall 31741
| epoch 005 | valid on 'valid' subset | valid_loss 8.63879 | valid_nll_loss 7.61566 | valid_ppl 196.13 | num_updates 505 | best 8.63879
| epoch 006: 1000 / 3233 loss=9.121, nll_loss=8.199, ppl=293.78, wps=4855, ups=0.0, wpb=354163, bsz=12438, num_updates=536, lr=6.70866e-05, gnorm=0.069, clip=100%, oom=0, wall=34003
| epoch 006: 2000 / 3233 loss=9.067, nll_loss=8.136, ppl=281.32, wps=5341, ups=0.0, wpb=353985, bsz=12437, num_updates=567, lr=7.09608e-05, gnorm=0.068, clip=100%, oom=0, wall=35850
| epoch 006: 3000 / 3233 loss=9.032, nll_loss=8.095, ppl=273.50, wps=5524, ups=0.0, wpb=353977, bsz=12394, num_updates=598, lr=7.48351e-05, gnorm=0.068, clip=100%, oom=0, wall=37700
| epoch 006 | loss 9.020 | nll_loss 8.080 | ppl 270.68 | wps 5555 | ups 0.0 | wpb 354075 | bsz 12407 | num_updates 606 | lr 7.58349e-05 | gnorm 0.068 | clip 100% | oom 0 | wall 38180
| epoch 006 | valid on 'valid' subset | valid_loss 8.56563 | valid_nll_loss 7.50863 | valid_ppl 182.11 | num_updates 606 | best 8.56563
| epoch 007: 1000 / 3233 loss=8.816, nll_loss=7.848, ppl=230.43, wps=5807, ups=0.0, wpb=355143, bsz=12610, num_updates=637, lr=7.97091e-05, gnorm=0.068, clip=100%, oom=0, wall=40076
| epoch 007: 2000 / 3233 loss=8.786, nll_loss=7.815, ppl=225.21, wps=5862, ups=0.0, wpb=354386, bsz=12419, num_updates=668, lr=8.35833e-05, gnorm=0.067, clip=100%, oom=0, wall=41928
| epoch 007: 3000 / 3233 loss=8.740, nll_loss=7.761, ppl=216.99, wps=5880, ups=0.0, wpb=354191, bsz=12437, num_updates=699, lr=8.74575e-05, gnorm=0.067, clip=100%, oom=0, wall=43782
| epoch 007 | loss 8.728 | nll_loss 7.748 | ppl 214.98 | wps 5881 | ups 0.0 | wpb 354077 | bsz 12405 | num_updates 707 | lr 8.84573e-05 | gnorm 0.066 | clip 100% | oom 0 | wall 44261
| epoch 007 | valid on 'valid' subset | valid_loss 8.24814 | valid_nll_loss 7.14979 | valid_ppl 142.00 | num_updates 707 | best 8.24814
| epoch 008: 1000 / 3233 loss=8.541, nll_loss=7.535, ppl=185.48, wps=4893, ups=0.0, wpb=353904, bsz=12339, num_updates=738, lr=9.23316e-05, gnorm=0.066, clip=100%, oom=0, wall=46503
| epoch 008: 2000 / 3233 loss=8.521, nll_loss=7.511, ppl=182.38, wps=5362, ups=0.0, wpb=353936, bsz=12326, num_updates=769, lr=9.62058e-05, gnorm=0.065, clip=100%, oom=0, wall=48354
| epoch 008: 3000 / 3233 loss=8.475, nll_loss=7.459, ppl=175.90, wps=5538, ups=0.0, wpb=354061, bsz=12362, num_updates=800, lr=0.00010008, gnorm=0.065, clip=100%, oom=0, wall=50207
| epoch 008 | loss 8.467 | nll_loss 7.450 | ppl 174.83 | wps 5570 | ups 0.0 | wpb 354074 | bsz 12405 | num_updates 808 | lr 0.00010108 | gnorm 0.065 | clip 100% | oom 0 | wall 50682
| epoch 008 | valid on 'valid' subset | valid_loss 8.17919 | valid_nll_loss 7.09035 | valid_ppl 136.27 | num_updates 808 | best 8.17919
| epoch 009: 1000 / 3233 loss=8.409, nll_loss=7.379, ppl=166.47, wps=4972, ups=0.0, wpb=354528, bsz=12536, num_updates=839, lr=0.000104954, gnorm=0.065, clip=100%, oom=0, wall=52892
| epoch 009: 2000 / 3233 loss=8.347, nll_loss=7.310, ppl=158.65, wps=5407, ups=0.0, wpb=354256, bsz=12479, num_updates=870, lr=0.000108828, gnorm=0.064, clip=100%, oom=0, wall=54744
| epoch 009: 3000 / 3233 loss=8.293, nll_loss=7.249, ppl=152.09, wps=5572, ups=0.0, wpb=354212, bsz=12437, num_updates=901, lr=0.000112702, gnorm=0.064, clip=100%, oom=0, wall=56594
| epoch 009 | loss 8.284 | nll_loss 7.238 | ppl 151.00 | wps 5594 | ups 0.0 | wpb 354081 | bsz 12406 | num_updates 909 | lr 0.000113702 | gnorm 0.063 | clip 100% | oom 0 | wall 57074
| epoch 009 | valid on 'valid' subset | valid_loss 8.00769 | valid_nll_loss 6.88914 | valid_ppl 118.53 | num_updates 909 | best 8.00769
| epoch 010: 1000 / 3233 loss=8.141, nll_loss=7.074, ppl=134.78, wps=5101, ups=0.0, wpb=353879, bsz=12442, num_updates=940, lr=0.000117577, gnorm=0.063, clip=100%, oom=0, wall=59225
| epoch 010: 2000 / 3233 loss=8.097, nll_loss=7.024, ppl=130.17, wps=5490, ups=0.0, wpb=354363, bsz=12429, num_updates=971, lr=0.000121451, gnorm=0.062, clip=100%, oom=0, wall=61076
| epoch 010: 3000 / 3233 loss=8.066, nll_loss=6.988, ppl=126.92, wps=5630, ups=0.0, wpb=354125, bsz=12407, num_updates=1002, lr=0.000125325, gnorm=0.062, clip=100%, oom=0, wall=62924
| epoch 010 | loss 8.059 | nll_loss 6.980 | ppl 126.24 | wps 5652 | ups 0.0 | wpb 354082 | bsz 12407 | num_updates 1010 | lr 0.000126325 | gnorm 0.062 | clip 100% | oom 0 | wall 63402
| epoch 010 | valid on 'valid' subset | valid_loss 7.87823 | valid_nll_loss 6.73769 | valid_ppl 106.72 | num_updates 1010 | best 7.87823
| epoch 011: 1000 / 3233 loss=7.933, nll_loss=6.835, ppl=114.20, wps=5220, ups=0.0, wpb=353857, bsz=12340, num_updates=1041, lr=0.000130199, gnorm=0.061, clip=100%, oom=0, wall=65504
| epoch 011: 2000 / 3233 loss=7.980, nll_loss=6.887, ppl=118.32, wps=5542, ups=0.0, wpb=353822, bsz=12282, num_updates=1072, lr=0.000134073, gnorm=0.061, clip=100%, oom=0, wall=67361
| epoch 011: 3000 / 3233 loss=7.944, nll_loss=6.846, ppl=115.02, wps=5667, ups=0.0, wpb=354055, bsz=12383, num_updates=1103, lr=0.000137947, gnorm=0.061, clip=100%, oom=0, wall=69212
| epoch 011 | loss 7.934 | nll_loss 6.834 | ppl 114.05 | wps 5687 | ups 0.0 | wpb 354070 | bsz 12405 | num_updates 1111 | lr 0.000138947 | gnorm 0.061 | clip 100% | oom 0 | wall 69691
| epoch 011 | valid on 'valid' subset | valid_loss 7.83937 | valid_nll_loss 6.68482 | valid_ppl 102.88 | num_updates 1111 | best 7.83937
| epoch 012: 1000 / 3233 loss=7.770, nll_loss=6.647, ppl=100.22, wps=5197, ups=0.0, wpb=353913, bsz=12593, num_updates=1142, lr=0.000142821, gnorm=0.060, clip=100%, oom=0, wall=71801
| epoch 012: 2000 / 3233 loss=7.751, nll_loss=6.625, ppl=98.67, wps=5551, ups=0.0, wpb=354101, bsz=12492, num_updates=1173, lr=0.000146696, gnorm=0.060, clip=100%, oom=0, wall=73645
| epoch 012: 3000 / 3233 loss=7.750, nll_loss=6.623, ppl=98.57, wps=5679, ups=0.0, wpb=354160, bsz=12439, num_updates=1204, lr=0.00015057, gnorm=0.059, clip=100%, oom=0, wall=75490
| epoch 012 | loss 7.747 | nll_loss 6.620 | ppl 98.34 | wps 5697 | ups 0.0 | wpb 354081 | bsz 12399 | num_updates 1212 | lr 0.00015157 | gnorm 0.059 | clip 100% | oom 0 | wall 75968
| epoch 012 | valid on 'valid' subset | valid_loss 7.76365 | valid_nll_loss 6.57877 | valid_ppl 95.59 | num_updates 1212 | best 7.76365
| epoch 013: 1000 / 3233 loss=7.683, nll_loss=6.543, ppl=93.24, wps=5779, ups=0.0, wpb=354607, bsz=12610, num_updates=1243, lr=0.000155444, gnorm=0.059, clip=100%, oom=0, wall=77870
| epoch 013: 2000 / 3233 loss=7.646, nll_loss=6.501, ppl=90.56, wps=5863, ups=0.0, wpb=354126, bsz=12431, num_updates=1274, lr=0.000159318, gnorm=0.058, clip=100%, oom=0, wall=79713
| epoch 013: 3000 / 3233 loss=7.627, nll_loss=6.478, ppl=89.13, wps=5891, ups=0.0, wpb=354048, bsz=12386, num_updates=1305, lr=0.000163192, gnorm=0.058, clip=100%, oom=0, wall=81557
| epoch 013 | loss 7.616 | nll_loss 6.466 | ppl 88.40 | wps 5896 | ups 0.0 | wpb 354078 | bsz 12411 | num_updates 1313 | lr 0.000164192 | gnorm 0.058 | clip 100% | oom 0 | wall 82033
| epoch 013 | valid on 'valid' subset | valid_loss 7.66644 | valid_nll_loss 6.48338 | valid_ppl 89.47 | num_updates 1313 | best 7.66644
| epoch 014: 1000 / 3233 loss=7.523, nll_loss=6.358, ppl=82.03, wps=5776, ups=0.0, wpb=354621, bsz=12476, num_updates=1344, lr=0.000168066, gnorm=0.058, clip=100%, oom=0, wall=83937
| epoch 014: 2000 / 3233 loss=7.496, nll_loss=6.327, ppl=80.30, wps=5857, ups=0.0, wpb=354343, bsz=12425, num_updates=1375, lr=0.000171941, gnorm=0.057, clip=100%, oom=0, wall=85784
| epoch 014: 3000 / 3233 loss=7.480, nll_loss=6.309, ppl=79.29, wps=5879, ups=0.0, wpb=354020, bsz=12398, num_updates=1406, lr=0.000175815, gnorm=0.057, clip=100%, oom=0, wall=87633
| epoch 014 | loss 7.477 | nll_loss 6.305 | ppl 79.07 | wps 5885 | ups 0.0 | wpb 354077 | bsz 12406 | num_updates 1414 | lr 0.000176815 | gnorm 0.057 | clip 100% | oom 0 | wall 88110
| epoch 014 | valid on 'valid' subset | valid_loss 7.61302 | valid_nll_loss 6.42623 | valid_ppl 86.00 | num_updates 1414 | best 7.61302
| epoch 015: 1000 / 3233 loss=7.382, nll_loss=6.195, ppl=73.26, wps=5830, ups=0.0, wpb=354126, bsz=12255, num_updates=1445, lr=0.000180689, gnorm=0.057, clip=100%, oom=0, wall=89993
| epoch 015: 2000 / 3233 loss=7.361, nll_loss=6.172, ppl=72.10, wps=5893, ups=0.0, wpb=354225, bsz=12302, num_updates=1476, lr=0.000184563, gnorm=0.056, clip=100%, oom=0, wall=91836
| epoch 015: 3000 / 3233 loss=7.345, nll_loss=6.152, ppl=71.13, wps=5911, ups=0.0, wpb=354186, bsz=12383, num_updates=1507, lr=0.000188437, gnorm=0.056, clip=100%, oom=0, wall=93682
| epoch 015 | loss 7.339 | nll_loss 6.146 | ppl 70.81 | wps 5909 | ups 0.0 | wpb 354076 | bsz 12402 | num_updates 1515 | lr 0.000189437 | gnorm 0.056 | clip 100% | oom 0 | wall 94162
| epoch 015 | valid on 'valid' subset | valid_loss 7.52435 | valid_nll_loss 6.31967 | valid_ppl 79.87 | num_updates 1515 | best 7.52435
| epoch 016: 1000 / 3233 loss=7.255, nll_loss=6.048, ppl=66.15, wps=5807, ups=0.0, wpb=354652, bsz=12471, num_updates=1546, lr=0.000193311, gnorm=0.055, clip=100%, oom=0, wall=96055
| epoch 016: 2000 / 3233 loss=7.255, nll_loss=6.049, ppl=66.20, wps=5875, ups=0.0, wpb=354583, bsz=12404, num_updates=1577, lr=0.000197186, gnorm=0.055, clip=100%, oom=0, wall=97904
| epoch 016: 3000 / 3233 loss=7.239, nll_loss=6.030, ppl=65.36, wps=5903, ups=0.0, wpb=354089, bsz=12387, num_updates=1608, lr=0.00020106, gnorm=0.055, clip=100%, oom=0, wall=99740
| epoch 016 | loss 7.233 | nll_loss 6.023 | ppl 65.04 | wps 5907 | ups 0.0 | wpb 354084 | bsz 12409 | num_updates 1616 | lr 0.00020206 | gnorm 0.055 | clip 100% | oom 0 | wall 100216
| epoch 016 | valid on 'valid' subset | valid_loss 7.42005 | valid_nll_loss 6.19589 | valid_ppl 73.31 | num_updates 1616 | best 7.42005
| epoch 017: 1000 / 3233 loss=7.121, nll_loss=5.895, ppl=59.51, wps=5823, ups=0.0, wpb=354513, bsz=12419, num_updates=1647, lr=0.000205934, gnorm=0.055, clip=100%, oom=0, wall=102103
| epoch 017: 2000 / 3233 loss=7.109, nll_loss=5.880, ppl=58.91, wps=5885, ups=0.0, wpb=354360, bsz=12361, num_updates=1678, lr=0.000209808, gnorm=0.054, clip=100%, oom=0, wall=103949
| epoch 017: 3000 / 3233 loss=7.097, nll_loss=5.867, ppl=58.38, wps=5901, ups=0.0, wpb=354188, bsz=12419, num_updates=1709, lr=0.000213682, gnorm=0.054, clip=100%, oom=0, wall=105797
| epoch 017 | loss 7.092 | nll_loss 5.862 | ppl 58.15 | wps 5902 | ups 0.0 | wpb 354077 | bsz 12407 | num_updates 1717 | lr 0.000214682 | gnorm 0.054 | clip 100% | oom 0 | wall 106275
| epoch 017 | valid on 'valid' subset | valid_loss 7.40647 | valid_nll_loss 6.18115 | valid_ppl 72.56 | num_updates 1717 | best 7.40647
| epoch 018: 1000 / 3233 loss=6.992, nll_loss=5.747, ppl=53.69, wps=5843, ups=0.0, wpb=354902, bsz=12576, num_updates=1748, lr=0.000218556, gnorm=0.053, clip=100%, oom=0, wall=108158
| epoch 018: 2000 / 3233 loss=6.998, nll_loss=5.753, ppl=53.94, wps=5900, ups=0.0, wpb=354357, bsz=12461, num_updates=1779, lr=0.000222431, gnorm=0.053, clip=100%, oom=0, wall=109999
| epoch 018: 3000 / 3233 loss=7.012, nll_loss=5.768, ppl=54.50, wps=5922, ups=0.0, wpb=354203, bsz=12424, num_updates=1810, lr=0.000226305, gnorm=0.053, clip=100%, oom=0, wall=111838
| epoch 018 | loss 7.008 | nll_loss 5.763 | ppl 54.31 | wps 5925 | ups 0.0 | wpb 354076 | bsz 12403 | num_updates 1818 | lr 0.000227305 | gnorm 0.053 | clip 100% | oom 0 | wall 112311
| epoch 018 | valid on 'valid' subset | valid_loss 7.29706 | valid_nll_loss 6.05298 | valid_ppl 66.39 | num_updates 1818 | best 7.29706
| epoch 019: 1000 / 3233 loss=6.907, nll_loss=5.647, ppl=50.12, wps=5838, ups=0.0, wpb=354287, bsz=12525, num_updates=1849, lr=0.000231179, gnorm=0.053, clip=100%, oom=0, wall=114192
| epoch 019: 2000 / 3233 loss=6.905, nll_loss=5.645, ppl=50.03, wps=5907, ups=0.0, wpb=353994, bsz=12407, num_updates=1880, lr=0.000235053, gnorm=0.052, clip=100%, oom=0, wall=116026
| epoch 019: 3000 / 3233 loss=6.897, nll_loss=5.635, ppl=49.70, wps=5929, ups=0.0, wpb=354114, bsz=12414, num_updates=1911, lr=0.000238927, gnorm=0.052, clip=100%, oom=0, wall=117866
| epoch 019 | loss 6.896 | nll_loss 5.633 | ppl 49.64 | wps 5930 | ups 0.0 | wpb 354080 | bsz 12409 | num_updates 1919 | lr 0.000239927 | gnorm 0.052 | clip 100% | oom 0 | wall 118342
| epoch 019 | valid on 'valid' subset | valid_loss 7.34923 | valid_nll_loss 6.09578 | valid_ppl 68.39 | num_updates 1919 | best 7.29706
| epoch 020: 1000 / 3233 loss=6.862, nll_loss=5.595, ppl=48.35, wps=5944, ups=0.0, wpb=354318, bsz=12570, num_updates=1950, lr=0.000243801, gnorm=0.052, clip=100%, oom=0, wall=120190
| epoch 020: 2000 / 3233 loss=6.847, nll_loss=5.576, ppl=47.71, wps=5947, ups=0.0, wpb=354212, bsz=12422, num_updates=1981, lr=0.000247675, gnorm=0.051, clip=100%, oom=0, wall=122034
| epoch 020: 3000 / 3233 loss=6.830, nll_loss=5.557, ppl=47.07, wps=5953, ups=0.0, wpb=354120, bsz=12380, num_updates=2012, lr=0.00025155, gnorm=0.051, clip=100%, oom=0, wall=123874
| epoch 020 | loss 6.826 | nll_loss 5.553 | ppl 46.95 | wps 5953 | ups 0.0 | wpb 354075 | bsz 12401 | num_updates 2020 | lr 0.00025255 | gnorm 0.051 | clip 100% | oom 0 | wall 124349
| epoch 020 | valid on 'valid' subset | valid_loss 7.1458 | valid_nll_loss 5.88021 | valid_ppl 58.90 | num_updates 2020 | best 7.1458
| epoch 021: 1000 / 3233 loss=6.773, nll_loss=5.491, ppl=44.98, wps=5842, ups=0.0, wpb=354140, bsz=12540, num_updates=2051, lr=0.000256424, gnorm=0.051, clip=100%, oom=0, wall=126229
| epoch 021: 2000 / 3233 loss=6.772, nll_loss=5.490, ppl=44.94, wps=5904, ups=0.0, wpb=354027, bsz=12409, num_updates=2082, lr=0.000260298, gnorm=0.051, clip=100%, oom=0, wall=128067
| epoch 021: 3000 / 3233 loss=6.769, nll_loss=5.486, ppl=44.83, wps=5931, ups=0.0, wpb=354025, bsz=12411, num_updates=2113, lr=0.000264172, gnorm=0.051, clip=100%, oom=0, wall=129901
| epoch 021 | loss 6.770 | nll_loss 5.487 | ppl 44.84 | wps 5931 | ups 0.0 | wpb 354080 | bsz 12410 | num_updates 2121 | lr 0.000265172 | gnorm 0.051 | clip 100% | oom 0 | wall 130380
| epoch 021 | valid on 'valid' subset | valid_loss 7.16764 | valid_nll_loss 5.8903 | valid_ppl 59.31 | num_updates 2121 | best 7.1458
| epoch 022: 1000 / 3233 loss=6.714, nll_loss=5.421, ppl=42.84, wps=5982, ups=0.0, wpb=354722, bsz=12517, num_updates=2152, lr=0.000269046, gnorm=0.050, clip=100%, oom=0, wall=132218
| epoch 022: 2000 / 3233 loss=6.710, nll_loss=5.417, ppl=42.73, wps=5978, ups=0.0, wpb=354171, bsz=12400, num_updates=2183, lr=0.00027292, gnorm=0.050, clip=100%, oom=0, wall=134053
| epoch 022: 3000 / 3233 loss=6.692, nll_loss=5.396, ppl=42.10, wps=5974, ups=0.0, wpb=354217, bsz=12443, num_updates=2214, lr=0.000276795, gnorm=0.050, clip=100%, oom=0, wall=135894
| epoch 022 | loss 6.688 | nll_loss 5.392 | ppl 41.99 | wps 5973 | ups 0.0 | wpb 354084 | bsz 12406 | num_updates 2222 | lr 0.000277794 | gnorm 0.050 | clip 100% | oom 0 | wall 136367
| epoch 022 | valid on 'valid' subset | valid_loss 7.07068 | valid_nll_loss 5.7786 | valid_ppl 54.89 | num_updates 2222 | best 7.07068
| epoch 023: 1000 / 3233 loss=6.675, nll_loss=5.377, ppl=41.57, wps=5730, ups=0.0, wpb=354406, bsz=12433, num_updates=2253, lr=0.000281669, gnorm=0.050, clip=100%, oom=0, wall=138284
| epoch 023: 2000 / 3233 loss=6.670, nll_loss=5.371, ppl=41.39, wps=5847, ups=0.0, wpb=354350, bsz=12440, num_updates=2284, lr=0.000285543, gnorm=0.050, clip=100%, oom=0, wall=140124
| epoch 023: 3000 / 3233 loss=6.652, nll_loss=5.351, ppl=40.80, wps=5882, ups=0.0, wpb=354071, bsz=12428, num_updates=2315, lr=0.000289417, gnorm=0.049, clip=100%, oom=0, wall=141965
| epoch 023 | loss 6.648 | nll_loss 5.346 | ppl 40.67 | wps 5888 | ups 0.0 | wpb 354078 | bsz 12403 | num_updates 2323 | lr 0.000290417 | gnorm 0.049 | clip 100% | oom 0 | wall 142440
| epoch 023 | valid on 'valid' subset | valid_loss 7.16557 | valid_nll_loss 5.88419 | valid_ppl 59.06 | num_updates 2323 | best 7.07068
| epoch 024: 1000 / 3233 loss=6.584, nll_loss=5.273, ppl=38.67, wps=5712, ups=0.0, wpb=355211, bsz=12619, num_updates=2354, lr=0.000294291, gnorm=0.049, clip=100%, oom=0, wall=144368
| epoch 024: 2000 / 3233 loss=6.567, nll_loss=5.253, ppl=38.13, wps=5823, ups=0.0, wpb=354272, bsz=12479, num_updates=2385, lr=0.000298165, gnorm=0.049, clip=100%, oom=0, wall=146212
| epoch 024: 3000 / 3233 loss=6.551, nll_loss=5.235, ppl=37.65, wps=5865, ups=0.0, wpb=354074, bsz=12398, num_updates=2416, lr=0.00030204, gnorm=0.049, clip=100%, oom=0, wall=148055
| epoch 024 | loss 6.547 | nll_loss 5.230 | ppl 37.53 | wps 5875 | ups 0.0 | wpb 354072 | bsz 12407 | num_updates 2424 | lr 0.000303039 | gnorm 0.049 | clip 100% | oom 0 | wall 148527
| epoch 024 | valid on 'valid' subset | valid_loss 6.9938 | valid_nll_loss 5.6895 | valid_ppl 51.61 | num_updates 2424 | best 6.9938
| epoch 025: 1000 / 3233 loss=6.519, nll_loss=5.197, ppl=36.68, wps=5539, ups=0.0, wpb=354591, bsz=12320, num_updates=2455, lr=0.000306914, gnorm=0.048, clip=100%, oom=0, wall=150511
| epoch 025: 2000 / 3233 loss=6.510, nll_loss=5.187, ppl=36.42, wps=5747, ups=0.0, wpb=354556, bsz=12510, num_updates=2486, lr=0.000310788, gnorm=0.048, clip=100%, oom=0, wall=152351
| epoch 025: 3000 / 3233 loss=6.505, nll_loss=5.181, ppl=36.27, wps=5824, ups=0.0, wpb=354154, bsz=12387, num_updates=2517, lr=0.000314662, gnorm=0.048, clip=100%, oom=0, wall=154182
| epoch 025 | loss 6.503 | nll_loss 5.178 | ppl 36.21 | wps 5835 | ups 0.0 | wpb 354081 | bsz 12406 | num_updates 2525 | lr 0.000315662 | gnorm 0.048 | clip 100% | oom 0 | wall 154656
| epoch 025 | valid on 'valid' subset | valid_loss 7.07606 | valid_nll_loss 5.7883 | valid_ppl 55.27 | num_updates 2525 | best 6.9938
| epoch 026: 1000 / 3233 loss=6.498, nll_loss=5.170, ppl=36.00, wps=5686, ups=0.0, wpb=354570, bsz=12456, num_updates=2556, lr=0.000319536, gnorm=0.048, clip=100%, oom=0, wall=156589
| epoch 026: 2000 / 3233 loss=6.480, nll_loss=5.150, ppl=35.51, wps=5824, ups=0.0, wpb=354130, bsz=12457, num_updates=2587, lr=0.00032341, gnorm=0.048, clip=100%, oom=0, wall=158426
| epoch 026: 3000 / 3233 loss=6.464, nll_loss=5.132, ppl=35.06, wps=5871, ups=0.0, wpb=354042, bsz=12388, num_updates=2618, lr=0.000327285, gnorm=0.047, clip=100%, oom=0, wall=160263
| epoch 026 | loss 6.463 | nll_loss 5.131 | ppl 35.05 | wps 5876 | ups 0.0 | wpb 354075 | bsz 12405 | num_updates 2626 | lr 0.000328284 | gnorm 0.047 | clip 100% | oom 0 | wall 160741
| epoch 026 | valid on 'valid' subset | valid_loss 7.00241 | valid_nll_loss 5.69371 | valid_ppl 51.76 | num_updates 2626 | best 6.9938
| epoch 027: 1000 / 3233 loss=6.433, nll_loss=5.096, ppl=34.21, wps=5779, ups=0.0, wpb=354342, bsz=12458, num_updates=2657, lr=0.000332159, gnorm=0.047, clip=100%, oom=0, wall=162642
| epoch 027: 2000 / 3233 loss=6.424, nll_loss=5.087, ppl=33.98, wps=5877, ups=0.0, wpb=354336, bsz=12473, num_updates=2688, lr=0.000336033, gnorm=0.047, clip=100%, oom=0, wall=164480
| epoch 027: 3000 / 3233 loss=6.412, nll_loss=5.073, ppl=33.67, wps=5916, ups=0.0, wpb=354072, bsz=12365, num_updates=2719, lr=0.000339907, gnorm=0.047, clip=100%, oom=0, wall=166307
| epoch 027 | loss 6.408 | nll_loss 5.069 | ppl 33.56 | wps 5920 | ups 0.0 | wpb 354080 | bsz 12407 | num_updates 2727 | lr 0.000340907 | gnorm 0.047 | clip 100% | oom 0 | wall 166783
| epoch 027 | valid on 'valid' subset | valid_loss 7.00674 | valid_nll_loss 5.69114 | valid_ppl 51.67 | num_updates 2727 | best 6.9938
| epoch 028: 1000 / 3233 loss=6.370, nll_loss=5.025, ppl=32.55, wps=5619, ups=0.0, wpb=354531, bsz=12408, num_updates=2758, lr=0.000344781, gnorm=0.047, clip=100%, oom=0, wall=168739
| epoch 028: 2000 / 3233 loss=6.350, nll_loss=5.003, ppl=32.06, wps=5788, ups=0.0, wpb=354237, bsz=12421, num_updates=2789, lr=0.000348655, gnorm=0.047, clip=100%, oom=0, wall=170577
| epoch 028: 3000 / 3233 loss=6.339, nll_loss=4.989, ppl=31.76, wps=5847, ups=0.0, wpb=354021, bsz=12377, num_updates=2820, lr=0.00035253, gnorm=0.046, clip=100%, oom=0, wall=172413
| epoch 028 | loss 6.335 | nll_loss 4.985 | ppl 31.68 | wps 5858 | ups 0.0 | wpb 354081 | bsz 12406 | num_updates 2828 | lr 0.000353529 | gnorm 0.046 | clip 100% | oom 0 | wall 172887
| epoch 028 | valid on 'valid' subset | valid_loss 6.93591 | valid_nll_loss 5.6087 | valid_ppl 48.80 | num_updates 2828 | best 6.93591
| epoch 029: 1000 / 3233 loss=6.299, nll_loss=4.943, ppl=30.75, wps=4618, ups=0.0, wpb=354427, bsz=12289, num_updates=2859, lr=0.000357404, gnorm=0.046, clip=100%, oom=0, wall=175266
| epoch 029: 2000 / 3233 loss=6.302, nll_loss=4.947, ppl=30.86, wps=5194, ups=0.0, wpb=353905, bsz=12391, num_updates=2890, lr=0.000361278, gnorm=0.046, clip=100%, oom=0, wall=177112
| epoch 029: 3000 / 3233 loss=6.298, nll_loss=4.942, ppl=30.74, wps=5427, ups=0.0, wpb=354064, bsz=12367, num_updates=2921, lr=0.000365152, gnorm=0.046, clip=100%, oom=0, wall=178954
| epoch 029 | loss 6.296 | nll_loss 4.941 | ppl 30.71 | wps 5464 | ups 0.0 | wpb 354080 | bsz 12404 | num_updates 2929 | lr 0.000366152 | gnorm 0.046 | clip 100% | oom 0 | wall 179432
| epoch 029 | valid on 'valid' subset | valid_loss 6.89303 | valid_nll_loss 5.56065 | valid_ppl 47.20 | num_updates 2929 | best 6.89303
| epoch 030: 1000 / 3233 loss=6.280, nll_loss=4.923, ppl=30.33, wps=5744, ups=0.0, wpb=354019, bsz=12389, num_updates=2960, lr=0.000370026, gnorm=0.046, clip=100%, oom=0, wall=181342
| epoch 030: 2000 / 3233 loss=6.303, nll_loss=4.947, ppl=30.85, wps=5860, ups=0.0, wpb=354006, bsz=12420, num_updates=2991, lr=0.0003739, gnorm=0.046, clip=100%, oom=0, wall=183177
| epoch 030: 3000 / 3233 loss=6.284, nll_loss=4.926, ppl=30.40, wps=5887, ups=0.0, wpb=353978, bsz=12393, num_updates=3022, lr=0.000377774, gnorm=0.046, clip=100%, oom=0, wall=185024
| epoch 030 | loss 6.279 | nll_loss 4.920 | ppl 30.27 | wps 5891 | ups 0.0 | wpb 354073 | bsz 12407 | num_updates 3030 | lr 0.000378774 | gnorm 0.045 | clip 100% | oom 0 | wall 185502
| epoch 030 | valid on 'valid' subset | valid_loss 6.93997 | valid_nll_loss 5.62386 | valid_ppl 49.31 | num_updates 3030 | best 6.89303
| epoch 031: 1000 / 3233 loss=6.257, nll_loss=4.896, ppl=29.78, wps=5858, ups=0.0, wpb=353883, bsz=12261, num_updates=3061, lr=0.000382648, gnorm=0.045, clip=100%, oom=0, wall=187375
| epoch 031: 2000 / 3233 loss=6.231, nll_loss=4.867, ppl=29.18, wps=5919, ups=0.0, wpb=354165, bsz=12237, num_updates=3092, lr=0.000386523, gnorm=0.045, clip=100%, oom=0, wall=189212
| epoch 031: 3000 / 3233 loss=6.215, nll_loss=4.848, ppl=28.80, wps=5932, ups=0.0, wpb=354103, bsz=12399, num_updates=3123, lr=0.000390397, gnorm=0.045, clip=100%, oom=0, wall=191054
| epoch 031 | loss 6.215 | nll_loss 4.848 | ppl 28.80 | wps 5933 | ups 0.0 | wpb 354085 | bsz 12406 | num_updates 3131 | lr 0.000391397 | gnorm 0.045 | clip 100% | oom 0 | wall 191530
| epoch 031 | valid on 'valid' subset | valid_loss 6.88285 | valid_nll_loss 5.55712 | valid_ppl 47.08 | num_updates 3131 | best 6.88285
| epoch 032: 1000 / 3233 loss=6.207, nll_loss=4.838, ppl=28.59, wps=5753, ups=0.0, wpb=354298, bsz=12678, num_updates=3162, lr=0.000395271, gnorm=0.045, clip=100%, oom=0, wall=193439
| epoch 032: 2000 / 3233 loss=6.200, nll_loss=4.829, ppl=28.43, wps=5852, ups=0.0, wpb=353654, bsz=12510, num_updates=3193, lr=0.000399145, gnorm=0.045, clip=100%, oom=0, wall=195277
| epoch 032: 3000 / 3233 loss=6.187, nll_loss=4.815, ppl=28.14, wps=5902, ups=0.0, wpb=354155, bsz=12366, num_updates=3224, lr=0.000403019, gnorm=0.045, clip=100%, oom=0, wall=197111
| epoch 032 | loss 6.185 | nll_loss 4.813 | ppl 28.10 | wps 5905 | ups 0.0 | wpb 354069 | bsz 12402 | num_updates 3232 | lr 0.000404019 | gnorm 0.045 | clip 100% | oom 0 | wall 197586
| epoch 032 | valid on 'valid' subset | valid_loss 6.86361 | valid_nll_loss 5.53661 | valid_ppl 46.42 | num_updates 3232 | best 6.86361
| epoch 033: 1000 / 3233 loss=6.137, nll_loss=4.759, ppl=27.07, wps=5765, ups=0.0, wpb=354781, bsz=12449, num_updates=3263, lr=0.000407893, gnorm=0.044, clip=100%, oom=0, wall=199494
| epoch 033: 2000 / 3233 loss=6.140, nll_loss=4.762, ppl=27.14, wps=5859, ups=0.0, wpb=354377, bsz=12471, num_updates=3294, lr=0.000411768, gnorm=0.044, clip=100%, oom=0, wall=201336
| epoch 033: 3000 / 3233 loss=6.175, nll_loss=4.802, ppl=27.89, wps=5895, ups=0.0, wpb=354103, bsz=12411, num_updates=3325, lr=0.000415642, gnorm=0.044, clip=100%, oom=0, wall=203172
| epoch 033 | loss 6.173 | nll_loss 4.799 | ppl 27.84 | wps 5904 | ups 0.0 | wpb 354087 | bsz 12410 | num_updates 3333 | lr 0.000416642 | gnorm 0.044 | clip 100% | oom 0 | wall 203644
| epoch 033 | valid on 'valid' subset | valid_loss 7.01394 | valid_nll_loss 5.68997 | valid_ppl 51.62 | num_updates 3333 | best 6.86361
| epoch 034: 1000 / 3233 loss=6.165, nll_loss=4.789, ppl=27.65, wps=5480, ups=0.0, wpb=354177, bsz=12417, num_updates=3364, lr=0.000420516, gnorm=0.044, clip=100%, oom=0, wall=205647
| epoch 034: 2000 / 3233 loss=6.163, nll_loss=4.786, ppl=27.60, wps=5723, ups=0.0, wpb=354002, bsz=12304, num_updates=3395, lr=0.00042439, gnorm=0.044, clip=100%, oom=0, wall=207478
| epoch 034: 3000 / 3233 loss=6.157, nll_loss=4.780, ppl=27.48, wps=5805, ups=0.0, wpb=354093, bsz=12375, num_updates=3426, lr=0.000428264, gnorm=0.044, clip=100%, oom=0, wall=209317
| epoch 034 | loss 6.156 | nll_loss 4.778 | ppl 27.45 | wps 5815 | ups 0.0 | wpb 354079 | bsz 12403 | num_updates 3434 | lr 0.000429264 | gnorm 0.044 | clip 100% | oom 0 | wall 209793
| epoch 034 | valid on 'valid' subset | valid_loss 6.91425 | valid_nll_loss 5.58023 | valid_ppl 47.84 | num_updates 3434 | best 6.86361
| epoch 035: 1000 / 3233 loss=6.107, nll_loss=4.723, ppl=26.42, wps=5674, ups=0.0, wpb=353891, bsz=12404, num_updates=3465, lr=0.000433138, gnorm=0.044, clip=100%, oom=0, wall=211727
| epoch 035: 2000 / 3233 loss=6.099, nll_loss=4.715, ppl=26.26, wps=5817, ups=0.0, wpb=353547, bsz=12387, num_updates=3496, lr=0.000437013, gnorm=0.044, clip=100%, oom=0, wall=213562
| epoch 035: 3000 / 3233 loss=6.102, nll_loss=4.719, ppl=26.34, wps=5876, ups=0.0, wpb=354117, bsz=12396, num_updates=3527, lr=0.000440887, gnorm=0.044, clip=100%, oom=0, wall=215398
| epoch 035 | loss 6.104 | nll_loss 4.721 | ppl 26.37 | wps 5882 | ups 0.0 | wpb 354074 | bsz 12406 | num_updates 3535 | lr 0.000441887 | gnorm 0.044 | clip 100% | oom 0 | wall 215873
| epoch 035 | valid on 'valid' subset | valid_loss 6.84885 | valid_nll_loss 5.49734 | valid_ppl 45.17 | num_updates 3535 | best 6.84885
| epoch 036: 1000 / 3233 loss=6.082, nll_loss=4.694, ppl=25.88, wps=5389, ups=0.0, wpb=353959, bsz=12312, num_updates=3566, lr=0.000445761, gnorm=0.043, clip=100%, oom=0, wall=217909
| epoch 036: 2000 / 3233 loss=6.069, nll_loss=4.680, ppl=25.64, wps=5679, ups=0.0, wpb=354198, bsz=12507, num_updates=3597, lr=0.000449635, gnorm=0.043, clip=100%, oom=0, wall=219740
| epoch 036: 3000 / 3233 loss=6.068, nll_loss=4.679, ppl=25.62, wps=5775, ups=0.0, wpb=354055, bsz=12419, num_updates=3628, lr=0.000453509, gnorm=0.043, clip=100%, oom=0, wall=221575