Validation perplexity is 146.71 at the end of training (24 epochs) #3

ygoncharov · 2016-02-07T17:13:31Z

(it should get ~82 on valid and ~79 on test)

$ python main.py --dataset ptb

.....

epoch: [24] [ 250/ 265] loss: 3.466149
Valid: loss: 5.225354, perplexity: 185.927017
{'perplexity': 83.749542031012467, 'epoch': 24, 'valid_perplexity': 146.71359295576036, 'learning_rate': 0.5}
[] Saving checkpoints...
Test: loss: 4.836956, perplexity: 126.084908
[] Test loss: 4.954320, perplexity: 141.786226

carpedm20 · 2016-02-07T23:07:21Z

I'm working on this issue and I don't think the current implementation is different from the original model. I checked the model validity by comparing the losses of a single batch during the early epochs and there are no differences. Also, I checked the perplexity of training set goes down to 90.

One thing I'm working on is to change the testing algorithm which is different from the original. The original code calculate the whole perplexity of all test data in a single forward pass but this repo calculates the perplexity of test data same as the training data, which is batch averaged perplexity. This will reduce the perplexity in some way.. but not sure this will make the comparable results.

If you find any other differences, feel free to share it to me 😄

yoonkim · 2016-02-08T03:09:01Z

Cool stuff!
I noticed on the README that you are using 100/150 hidden units for small/large models respectively. I actually use 300/650 hidden units, so this might explain the difference in performance. Also, it seems like you are using RMSProp? I've found vanilla SGD with starting learning rate of 1.0 (halved every time the perplexity does not improve on dev set) to work much better than other optimization methods, including RMSProp.

Hope this helps.

carpedm20 · 2016-02-08T03:29:51Z

@yoonkim Hi! Thanks for sharing your great work and I enjoyed the paper very well! Actually, README is an old one which I forgot to update it (now I fixed it) and the code already uses same hidden units, optimizer, and decay as you mentioned..

yoonkim · 2016-02-08T04:43:23Z

Ah ok! Few other things may be:

batch size
parameter initialization

carpedm20 · 2016-02-08T05:04:16Z

Thanks! I'll dig into those things and how was the perplexity on training set after the training?

yoonkim · 2016-02-08T05:06:28Z

I think it should be a lot lower. I don't recall the numbers exactly but since the dataset is small and the model has a lot of capacity (even with dropout) training PPL should be well below 50.

nileshkulkarni · 2016-06-02T01:11:44Z

@carpedm20 Hi,
Did you find any possibles pointers on this issue of high test perplexity? I was trying to debug it and any help would be appreciated.

yss4 · 2016-07-04T06:16:18Z

@carpedm20 Hello, thanks for sharing your code in github. I also noticed that the problem of getting high perplexity on PTB test set is still ongoing. Have you had a chance to deal with this issue or any pointer to fix it? Thanks in advance.

carpedm20 · 2016-07-04T09:50:04Z

@nileshkulkarni @yss4 No, I couldn't find the reason of problem yet and I'm not working on this project now. But if you share me any weird codes that is different from the original paper, please share it and I'll take a look at it.

mkroutikov · 2016-09-16T17:09:35Z

@carpedm20 This implementation is NOT identical to the original.

Interested reader can have a look at my code here:
https://github.com/mkroutikov/tf-lstm-char-cnn
that does reproduce Yoon Kim's redult in TF.

hejunqing · 2016-09-20T03:34:38Z

I ran the code yesterday and received a result of 156.097 averaged validation PPL, 149.565 averaged test PPL. So I am reading your code and the original.The first different thing I found was the criterion, yours is CE while the original is NLL.Does it matter?

guanghuixu · 2016-10-26T04:56:29Z

Thanks for sharing your code. I want to know how can I train a model in word_level? I found you code has the things like ( use_char = Ture, use_word = False). Is it useful to adjust the 'use_word = Ture'? Looking forward to your answer, thank you.

ygoncharov changed the title ~~Validation perplexity is 146.71 at the end of training (34 epochs)~~ Validation perplexity is 146.71 at the end of training (24 epochs) Feb 7, 2016

carpedm20 added the bug label Feb 7, 2016

carpedm20 self-assigned this Feb 7, 2016

carpedm20 mentioned this issue Jul 14, 2016

dataset : nsmc - ResourceExhaustedError ? #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation perplexity is 146.71 at the end of training (24 epochs) #3

Validation perplexity is 146.71 at the end of training (24 epochs) #3

ygoncharov commented Feb 7, 2016

carpedm20 commented Feb 7, 2016

yoonkim commented Feb 8, 2016

carpedm20 commented Feb 8, 2016

yoonkim commented Feb 8, 2016

carpedm20 commented Feb 8, 2016

yoonkim commented Feb 8, 2016

nileshkulkarni commented Jun 2, 2016 •

edited

Loading

yss4 commented Jul 4, 2016

carpedm20 commented Jul 4, 2016

mkroutikov commented Sep 16, 2016 •

edited

Loading

hejunqing commented Sep 20, 2016

guanghuixu commented Oct 26, 2016

Validation perplexity is 146.71 at the end of training (24 epochs) #3

Validation perplexity is 146.71 at the end of training (24 epochs) #3

Comments

ygoncharov commented Feb 7, 2016

carpedm20 commented Feb 7, 2016

yoonkim commented Feb 8, 2016

carpedm20 commented Feb 8, 2016

yoonkim commented Feb 8, 2016

carpedm20 commented Feb 8, 2016

yoonkim commented Feb 8, 2016

nileshkulkarni commented Jun 2, 2016 • edited Loading

yss4 commented Jul 4, 2016

carpedm20 commented Jul 4, 2016

mkroutikov commented Sep 16, 2016 • edited Loading

hejunqing commented Sep 20, 2016

guanghuixu commented Oct 26, 2016

nileshkulkarni commented Jun 2, 2016 •

edited

Loading

mkroutikov commented Sep 16, 2016 •

edited

Loading