You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Validation perplexity on a word language model went up from 127 to 186 on 02/21. I tested with the cu90 pypi package from 1.5.0b20190221.
Looking into the commits that went in, this led me to the change that happened on softmax accumulation type in #14098.
Tested by reverting the commit 862cbc6 from 1.5.0b20190221 and built a static library using the static library build script, this as expected came down to 127.
There was another change on 03/12 in #14219 that changed the default data type to fp32 which helped reduce the validation perplexity to 140. This was an unintended consequence and not the right fix.
Environment info (Required)
Deep Learning Ubuntu Base AMI id: ami-0ff00f007c727c376
Instance Type: P2.16X
What to do:
Create a P2.16X EC2 instance or similar with the ami id(latest when i tested) above.
Run the script using `python word_language_model/word_language_model.py --gpus 8 --nhid 650 --emsize 650 --dropout 0.5 --epochs 40 --data word_language_model/data/ptb. --mode imperative --kvstore device
Run the same with sudo pip install mxnet-cu90==1.5.0b20190221.
Output:
Version: 1.5.0b20190220
hostname ; pip show mxnet-cu90 ; tail -n 10 lstm_ptb_imperative.log
ip-172-31-29-103
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Name: mxnet-cu90
Version: 1.5.0b20190220
INFO:root:[Epoch 38] time cost 26.20s, valid loss 4.84, valid ppl 126.83
INFO:root:test loss 4.79, test ppl 120.65
INFO:root:[Epoch 39] time cost 26.03s, valid loss 4.84, valid ppl 126.66
INFO:root:test loss 4.79, test ppl 120.26
INFO:root:Best test loss 4.79, test ppl 120.26
Version: 1.5.0b20190221
hostname ; pip show mxnet-cu90 ; tail -n 10 lstm_ptb_imperative.log
ip-172-31-63-76
Name: mxnet-cu90
Version: 1.5.0b20190221
Summary: MXNet is an ultra-scalable deep learning framework. This version uses CUDA-9.0.
Home-page: https://github.com/apache/incubator-mxnet
Author: UNKNOWN
Author-email: UNKNOWN
License: Apache 2.0
Location: /usr/local/lib/python2.7/dist-packages
Requires: numpy, requests, graphviz
You are using pip version 9.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
INFO:root:test loss 5.23, test ppl 186.97
INFO:root:[Epoch 36] time cost 25.76s, valid loss 5.27, valid ppl 194.12
INFO:root:test loss 5.23, test ppl 186.37
INFO:root:[Epoch 37] time cost 25.64s, valid loss 5.24, valid ppl 189.45
INFO:root:test loss 5.20, test ppl 181.52
INFO:root:[Epoch 38] time cost 26.20s, valid loss 5.24, valid ppl 189.03
INFO:root:test loss 5.20, test ppl 180.95
INFO:root:[Epoch 39] time cost 25.59s, valid loss 5.23, valid ppl 185.90
INFO:root:test loss 5.18, test ppl 177.53
INFO:root:Best test loss 5.18, test ppl 177.53
Version: 1.5.0b20190313 -> This gives a perplexity of 140.
hostname ; pip show mxnet-cu90 ; tail -n 10 lstm_ptb_imperative.log ; echo "commit-hash" ; cat /usr/local/lib/python2.7/dist-packages/mxnet/COMMIT_HASH
ip-172-31-29-103
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Name: mxnet-cu90
Version: 1.5.0b20190313
Summary: MXNet is an ultra-scalable deep learning framework. This version uses CUDA-9.0.
Home-page: https://github.com/apache/incubator-mxnet
Author: UNKNOWN
Author-email: UNKNOWN
License: Apache 2.0
Location: /usr/local/lib/python2.7/dist-packages
Requires: numpy, requests, graphviz
Required-by:
INFO:root:test loss 4.92, test ppl 136.69
INFO:root:[Epoch 36] time cost 25.91s, valid loss 4.96, valid ppl 142.06
INFO:root:test loss 4.91, test ppl 135.81
INFO:root:[Epoch 37] time cost 25.51s, valid loss 4.95, valid ppl 141.45
INFO:root:test loss 4.91, test ppl 135.16
INFO:root:[Epoch 38] time cost 25.57s, valid loss 4.95, valid ppl 141.29
INFO:root:test loss 4.91, test ppl 134.99
INFO:root:[Epoch 39] time cost 26.08s, valid loss 4.95, valid ppl 140.87
INFO:root:test loss 4.90, test ppl 134.48
INFO:root:Best test loss 4.90, test ppl 134.48
commit-hash
4432af1f47a439517eff9a21bef23ef7ae5e4aa4
What have you tried:
Reverted the softmax commit 862cbc6 on top of 0221 and reran the test to see validation perplexity drop to 127.
nswamy
changed the title
Increased Validation Perplexity on word language model
Increased Loss/Validation Perplexity on word language model
Apr 22, 2019
Description
Validation perplexity on a word language model went up from 127 to 186 on 02/21. I tested with the cu90 pypi package from 1.5.0b20190221.
Looking into the commits that went in, this led me to the change that happened on softmax accumulation type in #14098.
Tested by reverting the commit 862cbc6 from 1.5.0b20190221 and built a static library using the static library build script, this as expected came down to 127.
There was another change on 03/12 in #14219 that changed the default data type to fp32 which helped reduce the validation perplexity to 140. This was an unintended consequence and not the right fix.
Environment info (Required)
Deep Learning Ubuntu Base AMI id: ami-0ff00f007c727c376
Instance Type: P2.16X
What to do:
Create a P2.16X EC2 instance or similar with the ami id(latest when i tested) above.
sudo pip install mxnet-cu90==1.5.0b20190220 &
git clone https://github.com/awslabs/deeplearning-benchmark
Use the script https://github.com/awslabs/deeplearning-benchmark/blob/master/word_language_model/word_language_model_train.py
Run the script using `python word_language_model/word_language_model.py --gpus 8 --nhid 650 --emsize 650 --dropout 0.5 --epochs 40 --data word_language_model/data/ptb. --mode imperative --kvstore device
Run the same with sudo pip install mxnet-cu90==1.5.0b20190221.
Output:
What have you tried:
Reverted the softmax commit 862cbc6 on top of 0221 and reran the test to see validation perplexity drop to 127.
Have a PR in progress and tested to see that the validation perplexity drops to 127
https://github.com/apache/incubator-mxnet/compare/master...nswamy:softmax_fix?expand=1
The text was updated successfully, but these errors were encountered: