Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

terminate called after throwing an instance of 'std::out_of_range’ #8663

Closed
VikingMew opened this issue Nov 15, 2017 · 4 comments · Fixed by #14433
Closed

terminate called after throwing an instance of 'std::out_of_range’ #8663

VikingMew opened this issue Nov 15, 2017 · 4 comments · Fixed by #14433

Comments

@VikingMew
Copy link

Description

(Brief description of the problem in no more than 2 sentences.)

Environment info (Required)

Python 2.7.13 Numpy 1.13.1 MXNet 0.12rc0

Steps to reproduce

import argparse

import mxnet as mx
import numpy as np
from mxnet import metric
from mxnet.initializer import Uniform

import mxdleio

parser = argparse.ArgumentParser(description="Train RNN on NC data",
                                 formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--batch-size', type=int, default=32,
                    help='the batch size.')


def custom_fit(self, train_data, eval_metric='acc', kvstore='local', optimizer='sgd',
               optimizer_params=(('learning_rate', 0.01),), initializer=Uniform(0.01),
               arg_params=None, aux_params=None, allow_missing=False, force_rebind=False,
               force_init=False, begin_epoch=0, num_epoch=None, ):
    self.bind(data_shapes=train_data.provide_data, label_shapes=train_data.provide_label,
              for_training=True, force_rebind=force_rebind)

    self.init_params(initializer=initializer, arg_params=arg_params, aux_params=aux_params,
                     allow_missing=allow_missing, force_init=force_init)
    self.init_optimizer(kvstore=kvstore, optimizer=optimizer,
                        optimizer_params=optimizer_params)

    if not isinstance(eval_metric, metric.EvalMetric):
        eval_metric = metric.create(eval_metric)

    # training loop
    for epoch in range(begin_epoch, num_epoch):
        eval_metric.reset()
        nbatch = 0
        data_iter = iter(train_data)
        end_of_batch = False
        next_data_batch = next(data_iter)
        while not end_of_batch:
            data_batch = next_data_batch

            logging.info(data_batch.data[0].shape)
            logging.info(data_batch.provide_data)
            logging.info(data_batch.label[0].shape)
            logging.info(data_batch.provide_label)

            assert self.binded and self.params_initialized
            self.switch_bucket(data_batch.bucket_key, data_batch.provide_data,
                               data_batch.provide_label)
            raise NotImplementedError
            self._curr_module.forward(data_batch, is_train=True)
            self.backward()

            self.update()
            try:
                # pre fetch next batch
                next_data_batch = next(data_iter)
                self.prepare(next_data_batch)
            except StopIteration:
                end_of_batch = True

            self.update_metric(eval_metric, data_batch.label)

            nbatch += 1
        # sync aux params across devices
        arg_params, aux_params = self.get_params()
        self.set_params(arg_params, aux_params)

        # end of 1 epoch, reset the data-iter for another epoch
        train_data.reset()


if __name__ == '__main__':
    import logging
    import os

    head = '%(asctime)-15s %(message)s'
    logging.basicConfig(level=logging.DEBUG, format=head)
    np.set_printoptions(precision=4, suppress=True, linewidth=150, threshold=np.inf)
    logging.info("pid=%s", os.getpid())

    args = parser.parse_args()
    logging.info("%s", args)

    buckets = [2000, 3500]
    data_train = mxdleio.NCBucketIter(TRAIN_PATH, args.batch_size, neps=0, buckets=buckets, shuffle=True,
                                      )
    stack = mx.rnn.SequentialRNNCell()
    stack.add(mx.rnn.FusedRNNCell(512, num_layers=1, mode='lstm', prefix='lstm_l3', bidirectional=True))


    def sym_gen(seq_len):
        data = mx.sym.Variable('data')
        label = mx.sym.Variable('softmax_label')
        logging.info("seq_len: %s", seq_len)
        stack.reset()
        outputs, states = stack.unroll(seq_len, inputs=data, merge_outputs=True)
        pred = mx.sym.Reshape(outputs, shape=(-1, 1024))
        pred = mx.sym.FullyConnected(data=pred, num_hidden=42)
        l = mx.sym.Reshape(data=label, shape=(-1,))
        l = mx.sym.Cast(data=l, dtype='int32')

        sm = mx.sym.WarpCTC(data=pred, label=l, label_length=seq_len, input_length=seq_len)
        return sm, ('data',), ('softmax_label',)


    contexts = mx.gpu(0)
    model = mx.mod.BucketingModule(
        sym_gen=sym_gen,
        default_bucket_key=data_train.default_bucket_key,
        context=contexts)

    custom_fit(model,
               train_data=data_train,
               num_epoch=1,
               )

result


2017-11-15 14:56:20,641 pid=17540
2017-11-15 14:56:20,643 Namespace(batch_size=10)
2017-11-15 14:56:20,696 Data Length: 2972
2017-11-15 14:56:20,696 Total Samples: 281241
2017-11-15 14:56:20,696 Input Size: 80
2017-11-15 14:56:20,697 seq_len: 3500
lib/python2.7/site-packages/mxnet/rnn/rnn_cell.py:675: UserWarning: NTC layout detected. Consider using TNC for FusedRNNCell for faster speed
  warnings.warn("NTC layout detected. Consider using "
2017-11-15 14:56:20,699 seq_len: 3500
2017-11-15 14:56:25,232 (10L, 2000L, 80L)
2017-11-15 14:56:25,234 [DataDesc[data,(10L, 2000L, 80L),<type 'numpy.float32'>,NTC]]
2017-11-15 14:56:25,234 (10L, 2000L, 1L)
2017-11-15 14:56:25,234 [DataDesc[softmax_label,(10L, 2000L, 1L),<type 'numpy.float32'>,NTC]]
2017-11-15 14:56:25,234 seq_len: 2000
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
script: line 17: 17540 Aborted                 python minimal_sym_task2.py --batch-size 10


@CXMA479
Copy link

CXMA479 commented Dec 28, 2017

similar problem has been spotted, as i tried to execute some code like this:
Version 0.11.1 also 1.0.0

def gen_sym(seq_len):
    data=mx.sym.Variable('data')
    label = mx.sym.Variable('label')  # NT
    output, _ = stack.unroll(seq_len, inputs=data, merge_outputs=False, layout='NT')
    pred = output[seq_len-1]
    pred = mx.sym.reshape(pred, shape=(batch_size,-1))
    pred_e = mx.sym.FullyConnected(data= pred, num_hidden=1)
    sym = mx.sym.broadcast_add(pred_e, label)
    sym = mx.sym.MakeLoss(sym)
    return sym, ('data', ), ('label', )

However, if i replace pred_e with pred in broadcast_add it works fine

Error info

terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

The point where this error occurs seems to be unpredictable. It could be thrown out before executing forward or after several loops of update
I inserted one line above the 1st line of the forward_backward function in base_module.py to print, some results were like this:

################## SAMPLE 1  ######################
$ python main.py 
WARNING: discarded 0 sentences longer than the largest bucket.
WARNING: discarded 0 sentences longer than the largest bucket.
begin one forward_backward, data_batch shape: (10L, 9L)
begin one forward_backward, data_batch shape: (10L, 9L)
begin one forward_backward, data_batch shape: (10L, 9L)
[xxxx-xx-xx 21:19:07,066] {/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/callback.py:166} INFO - Epoch[0] Batch [2]	Speed: 188.98 samples/sec	mse=0.329261
begin one forward_backward, data_batch shape: (10L, 9L)
begin one forward_backward, data_batch shape: (10L, 9L)
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)
################## SAMPLE 2  ######################
$ python main.py 
WARNING: discarded 0 sentences longer than the largest bucket.
WARNING: discarded 0 sentences longer than the largest bucket.
begin one forward_backward, data_batch shape: (10L, 6L)
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

@opringle
Copy link
Contributor

opringle commented Jan 10, 2018

@CXMA479 I found the solution is to make sure you define the name for each symbol. Try changing to pred_e = mx.sym.FullyConnected(data= pred, num_hidden=1, name='prediction_layer').

I've documented this problem in detail in #9405.

@anirudh2290
Copy link
Member

@eric-haibin-lin Please tag - "Exception Handling" "Symbol" "Question"

@anirudh2290
Copy link
Member

@eric-haibin-lin also, please tag "Bug" for the crash.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants