Skip to content
This repository has been archived by the owner on Aug 3, 2022. It is now read-only.

Bug in CPU training (related to TensorFlow)? #2

Closed
tim5go opened this issue Jul 24, 2017 · 5 comments
Closed

Bug in CPU training (related to TensorFlow)? #2

tim5go opened this issue Jul 24, 2017 · 5 comments

Comments

@tim5go
Copy link

tim5go commented Jul 24, 2017

As I observed, there will be an out-of-vocabulary error throwing out when using the "embedding_lookup"
Error looks like:

InvalidArgumentError (see above for traceback): indices[0,1,3] = 6501 is not in [0, 6342)
[[Node: model_1/embedding_lookup = Gather[Tindices=DT_INT64, Tparams=DT_FLOAT, class=["loc:@model/embedding"], validate
indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](model/embedding/read, _recv_model_1/inputs_0)]]

See geek-ai/irgan#9

Nothing wrong with your code, it sounds like a known issue in Tensorflow (CPU version)

@indiejoseph
Copy link
Owner

are you obtain the TFrecord training file via data_helper ? Line 149 used dict.get any OOV indexes will replace with 0(UNK index), so there no way has any index out of specified vocab.pkl.

@tim5go
Copy link
Author

tim5go commented Jul 24, 2017

I used: python train.py
So I assumed it was already using data_helper.
I will try to switch it to a GPU environment to see if the problem still persists.

BTW, may I know the corpus size of your char-rnn model, it seems that it is quite RAM-consuming.
For a 2GB Apple Daily raw text, there will be a spike of 60GB consumption. (although the consumption will go down later)

It will be nicer if I know your original corpus size ><"

@tim5go
Copy link
Author

tim5go commented Jul 24, 2017

After changing the line:
https://github.com/indiejoseph/doc-han-att/blob/master/model.py#L30
from cpu to gpu, the error goes away.

^^

@indiejoseph
Copy link
Owner

i used <1GB Apple Daily news text to train char-rnn LM.
it is so weird that i was successfully training this model with CPU, what is your tensorflow version?

@tim5go
Copy link
Author

tim5go commented Jul 25, 2017

Um...maybe one of the reason is that my vocabulary size is smaller than yours i.e. 6790
I only got 6342 from a 300MB corpus
Also, I am using tensorflow 1.1 in Python 3.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants