in training sm_cnn, ValueError: could not convert string to float: '<pad>' #142

liudonglei · 2018-08-18T07:57:56Z

$ python train.py --mode static --gpu 1
Note: You are using GPU for training
Dataset TREC Mode static
VOCAB num 13
LABEL.target_class: 13
LABELS: ['', '2', '0', '7', '3', '1', '8', '4', '5', '9', '6', '\t', '.']
Train instance 53417
Dev instance 1148
Test instance 1517
Shift model to GPU
Time Epoch Iteration Progress (%Epoch) Loss Dev/Loss Accuracy Dev/Accuracy
Traceback (most recent call last):
File "train.py", line 147, in
for batch_idx, batch in enumerate(train_iter):
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/iterator.py", line 151, in iter
self.train)
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/batch.py", line 27, in init
setattr(self, name, field.process(batch, device=device, train=train))
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/field.py", line 188, in process
tensor = self.numericalize(padded, device=device, train=train)
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/field.py", line 308, in numericalize
arr = self.postprocessing(arr, None, train)
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/pipeline.py", line 37, in call
x = pipe.call(x, *args)
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/pipeline.py", line 52, in call
return [self.convert_token(tok, *args) for tok in x]
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/pipeline.py", line 52, in
return [self.convert_token(tok, *args) for tok in x]
File "train.py", line 62, in
postprocessing=data.Pipeline(lambda arr, _, train: [float(y) for y in arr]))
File "train.py", line 62, in
postprocessing=data.Pipeline(lambda arr, _, train: [float(y) for y in arr]))
ValueError: could not convert string to float: ''

liudonglei · 2018-09-27T07:17:32Z

(castor) [ldl@402 sm_cnn 15:15:35] $ python train.py --mode static --no_cuda
Dataset TREC Mode static
VOCAB num 13
LABEL.target_class: 13
LABELS: ['', '2', '0', '7', '3', '1', '8', '4', '5', '9', '6', '\t', '.']
Train instance 53417
Dev instance 1148
Test instance 1517
Time Epoch Iteration Progress (%Epoch) Loss Dev/Loss Accuracy Dev/Accuracy
Traceback (most recent call last):
File "train.py", line 147, in
for batch_idx, batch in enumerate(train_iter):
File "/home/ldl/anaconda2/envs/castor/lib/python3.6/site-packages/torchtext/data/iterator.py", line 151, in iter
self.train)
File "/home/ldl/anaconda2/envs/castor/lib/python3.6/site-packages/torchtext/data/batch.py", line 27, in init
setattr(self, name, field.process(batch, device=device, train=train))
File "/home/ldl/anaconda2/envs/castor/lib/python3.6/site-packages/torchtext/data/field.py", line 188, in process
tensor = self.numericalize(padded, device=device, train=train)
File "/home/ldl/anaconda2/envs/castor/lib/python3.6/site-packages/torchtext/data/field.py", line 308, in numericalize
arr = self.postprocessing(arr, None, train)
File "/home/ldl/anaconda2/envs/castor/lib/python3.6/site-packages/torchtext/data/pipeline.py", line 37, in call
x = pipe.call(x, *args)
File "/home/ldl/anaconda2/envs/castor/lib/python3.6/site-packages/torchtext/data/pipeline.py", line 52, in call
return [self.convert_token(tok, *args) for tok in x]
File "/home/ldl/anaconda2/envs/castor/lib/python3.6/site-packages/torchtext/data/pipeline.py", line 52, in
return [self.convert_token(tok, *args) for tok in x]
File "train.py", line 62, in
postprocessing=data.Pipeline(lambda arr, _, train: [float(y) for y in arr]))
File "train.py", line 62, in
postprocessing=data.Pipeline(lambda arr, _, train: [float(y) for y in arr]))
ValueError: could not convert string to float: ''

Impavidity · 2018-09-27T17:58:48Z

Hey @liudonglei To my understanding, you are using your own dataset, right ?
Can you post your dataset format in this thread? It will be more easier for me to understand this issue.

liudonglei · 2018-09-28T09:01:32Z

@Impavidity Not my own dataset, I just try the sm_cnn model on TrecQA dataset in your Castor-data repo, My all steps follow the steps in Castor/README.md and Castor/sm_cnn/README.md

SawanKumar28 · 2018-11-16T08:04:03Z

Hi @liudonglei, were you able to resolve this issue? I am facing the same issue.

liudonglei · 2018-11-17T14:31:35Z

Hi @liudonglei, were you able to resolve this issue? I am facing the same issue.

Sorry, I can't, I am unfamiliar with the torchtext package this repo used.

liudonglei · 2018-12-26T12:35:41Z

@rosequ
@SawanKumar28
Hi, today i try this repo again and fix this problem,
this problem come from the file trec_dataset.py to use the torchtext.data.TabularDataset. I don't know why, That maybe some bug of Python's class inheritance.
after debugging half day, I locate the file trec_dataset.py and borrow the similar code from BLOG http://mlexplained.com/2018/02/08/a-comprehensive-tutorial-to-torchtext to make the repo works.

you can just replace the trec_dataset.py with the bellow code:

----the right trec_dataset.py file ----
from torchtext import data

class TrecDataset:
dirname = 'data'
@classmethod
def splits(self, question_id, question_field, answer_field, external_field, label_field):

    tv_datafields = [('qid', question_id), ('label', label_field), ('question', question_field),
        ('answer', answer_field), ('ext_feat', external_field)]

    train, dev, test  = data.TabularDataset.splits(
        path="data", # the root directory where the data lies
        #train='train.csv', validation="valid.csv",
        train='trecqa.train.tsv', validation='trecqa.dev.tsv', test='trecqa.test.tsv',
        #train='ttt.csv', validation='ttt.csv', test='ttt.csv',
        format='tsv',
        #skip_header=True, # if your csv header has a header, make sure to pass this to ensure it doesn't get proceesed as data!
        fields=tv_datafields)
    return train, dev, test

liudonglei changed the title ~~in training sm_cnn, what the <PAD> meaning?~~ in training sm_cnn, ValueError: could not convert string to float: '<pad>' Aug 18, 2018

tuzhucheng assigned rosequ Aug 19, 2018

liudonglei mentioned this issue Sep 27, 2018

Dataset path mismatch #143

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in training sm_cnn, ValueError: could not convert string to float: '<pad>' #142

in training sm_cnn, ValueError: could not convert string to float: '<pad>' #142

liudonglei commented Aug 18, 2018

liudonglei commented Sep 27, 2018

Impavidity commented Sep 27, 2018

liudonglei commented Sep 28, 2018 •

edited

Loading

SawanKumar28 commented Nov 16, 2018

liudonglei commented Nov 17, 2018

liudonglei commented Dec 26, 2018

in training sm_cnn, ValueError: could not convert string to float: '<pad>' #142

in training sm_cnn, ValueError: could not convert string to float: '<pad>' #142

Comments

liudonglei commented Aug 18, 2018

liudonglei commented Sep 27, 2018

Impavidity commented Sep 27, 2018

liudonglei commented Sep 28, 2018 • edited Loading

SawanKumar28 commented Nov 16, 2018

liudonglei commented Nov 17, 2018

liudonglei commented Dec 26, 2018

liudonglei commented Sep 28, 2018 •

edited

Loading