BucketingModule causes uncaught exception if symbol name not specified #9405

opringle · 2018-01-12T20:22:57Z

Description

When using the BucketingModule together with mx.rnn.BuckSentenceIter, I hit an uncaught exception if I do not specify the symbol name in either mx.sym.Embedding or mx.sym.FullyConnected symbols, despite each symbol in the network having a unique name.

Environment info (Required)

----------Python Info----------
Version      : 3.6.3
Compiler     : GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)
Build        : ('default', 'Oct  4 2017 06:09:15')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 9.0.1
Directory    : /Users/opringle/Envs/finn-dl/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.0.0
Directory    : /Users/opringle/Envs/finn-dl/lib/python3.6/site-packages/mxnet
Commit Hash   : 2b67436802b750e15b9fbfdf275958c1000be6a8
----------System Info----------
Platform     : Darwin-17.3.0-x86_64-i386-64bit
system       : Darwin
node         : MacBook-Pro-2.local
release      : 17.3.0
version      : Darwin Kernel Version 17.3.0: Thu Nov  9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0077 sec, LOAD: 0.4726 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0263 sec, LOAD: 0.0633 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0897 sec, LOAD: 0.0599 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0259 sec, LOAD: 0.1184 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0139 sec, LOAD: 0.2371 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0305 sec, LOAD: 0.2018 sec.

I'm using python

Error Message:

libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: unordered_map::at: key not found
Abort trap: 6

Minimum reproducible example

import mxnet as mx
import random

#synthetically create 1000 encoded sentences
encoded_sentences = []
for i in list(range(1000)):
    sentence_length = random.randint(1,20)
    sentence = random.sample(range(20), sentence_length)
    encoded_sentences.append(sentence)

#define hyperparameters/info
vocab = list(set(index for list in encoded_sentences for index in list))
batch_size = 18
num_embed = 10
num_hidden = 3
buckets = [5,10,15,20]
epochs = 15

#use mxnet bucketing iterator, label is next value in sentence
train_iter = mx.rnn.BucketSentenceIter(sentences = encoded_sentences,
                                       batch_size = batch_size,
                                       buckets = buckets,
                                       data_name = 'data',
                                       label_name = 'softmax_label')

#define recurrent cell outside of network TODO: why outside?
r_cell = mx.rnn.LSTMCell(num_hidden=num_hidden)

#define a network symbol based on sentence length
def sym_gen1(seq_len):

    print("-" * 50)

    #define input shape so we can infer layer shapes as we go
    data_shape = (batch_size, seq_len)
    label_shape = (batch_size, seq_len)

    data = mx.sym.Variable('data')
    print("\ndata shape: ", data.infer_shape(data=data_shape)[1][0], "\n")

    label = mx.sym.Variable('softmax_label')
    print("\nlabel shape: ", label.infer_shape(softmax_label=label_shape)[1][0], "\n")

    embed = mx.sym.Embedding(data=data, input_dim=len(vocab), output_dim=num_embed,name='embed')
    print("\nembed layer shape: ", embed.infer_shape(data=data_shape)[1][0], "\n")

    output, shapes = r_cell.unroll(seq_len, inputs=embed, merge_outputs=True)
    print("\nconcatenated recurrent layer shape: ", output.infer_shape(data=data_shape)[1][0], "after ", seq_len, " unrolls\n")

    reshape = mx.sym.Reshape(output, shape=(-1, num_hidden))
    print("\nafter reshaping: ", reshape.infer_shape(data=data_shape)[1][0], "\n")

    fc = mx.sym.FullyConnected(data=reshape, num_hidden=len(vocab), name='pred')
    print("\nfully connected layer shape: ", fc.infer_shape(data=data_shape)[1][0], "\n")

    label = mx.sym.Reshape(label, shape=(-1,))
    print("\nlabel shape after reshaping: ", label.infer_shape(softmax_label=label_shape)[1][0], "\n")

    pred = mx.sym.SoftmaxOutput(data=fc, label=label, name='softmax')
    print("\nsoftmax shape: ", pred.infer_shape(data=data_shape, softmax_label = label_shape)[1][0], "\n")

    return pred, ('data',), ('softmax_label',)

#define identical symbol but remove name arguement from fully connected layer
def sym_gen2(seq_len):

    print("-" * 50)

    #define input shape so we can infer layer shapes as we go
    data_shape = (batch_size, seq_len)
    label_shape = (batch_size, seq_len)

    data = mx.sym.Variable('data')
    print("\ndata shape: ", data.infer_shape(data=data_shape)[1][0], "\n")

    label = mx.sym.Variable('softmax_label')
    print("\nlabel shape: ", label.infer_shape(softmax_label=label_shape)[1][0], "\n")

    embed = mx.sym.Embedding(data=data, input_dim=len(vocab), output_dim=num_embed,name='embed')
    print("\nembed layer shape: ", embed.infer_shape(data=data_shape)[1][0], "\n")

    output, shapes = r_cell.unroll(seq_len, inputs=embed, merge_outputs=True)
    print("\nconcatenated recurrent layer shape: ", output.infer_shape(data=data_shape)[1][0], "after ", seq_len, " unrolls\n")

    reshape = mx.sym.Reshape(output, shape=(-1, num_hidden))
    print("\nafter reshaping: ", reshape.infer_shape(data=data_shape)[1][0], "\n")

    fc = mx.sym.FullyConnected(data=reshape, num_hidden=len(vocab))
    print("\nfully connected layer shape: ", fc.infer_shape(data=data_shape)[1][0], "\n")

    label = mx.sym.Reshape(label, shape=(-1,))
    print("\nlabel shape after reshaping: ", label.infer_shape(softmax_label=label_shape)[1][0], "\n")

    pred = mx.sym.SoftmaxOutput(data=fc, label=label, name='softmax')
    print("\nsoftmax shape: ", pred.infer_shape(data=data_shape, softmax_label = label_shape)[1][0], "\n")

    return pred, ('data',), ('softmax_label',)


#create a trainable bucketing module on cpu
model = mx.mod.BucketingModule(sym_gen=sym_gen1, default_bucket_key=train_iter.default_bucket_key, context=mx.cpu())
model_not_working = mx.mod.BucketingModule(sym_gen=sym_gen2, default_bucket_key=train_iter.default_bucket_key, context=mx.cpu())

#fit the first module
print("\n\n\nFITTING FIRST MODULE...\n\n\n")
metric = mx.metric.create('loss')
model.bind(data_shapes=train_iter.provide_data,
           label_shapes=train_iter.provide_label)
model.init_params()
model.init_optimizer(optimizer='Adam')
for epoch in range(epochs):
    train_iter.reset()
    metric.reset()
    for batch in train_iter:
        model.forward(batch, is_train=True)             # compute predictions
        model.backward()                                # compute gradients
        model.update()                                  # update parameters
        model.update_metric(metric, batch.label)        # update metric
    print('\n', 'Epoch %d, Training %s' % (epoch, metric.get()))

#fit the second module
print("\n\n\nFITTING SECOND MODULE...\n\n\n")
metric = mx.metric.create('loss')
model_not_working.bind(data_shapes=train_iter.provide_data,
           label_shapes=train_iter.provide_label)
model_not_working.init_params()
model_not_working.init_optimizer(optimizer='Adam')
for epoch in range(epochs):
    train_iter.reset()
    metric.reset()
    for batch in train_iter:
        model_not_working.forward(batch, is_train=True)             # compute predictions
        model_not_working.backward()                                # compute gradients
        model_not_working.update()                                  # update parameters
        model_not_working.update_metric(metric, batch.label)        # update metric
    print('\n', 'Epoch %d, Training %s' % (epoch, metric.get()))

Steps to reproduce

Run the script
You will see the module fit correctly with gen_sym1() but fail with gen_sym2(). The only difference between the two is the name arguement has been removed from fc = mx.sym.FullyConnected(data=reshape, num_hidden=len(vocab), name='pred').

What have you tried to solve it?

I have solved the problem by including an input to the name parameter when creating the symbol.

The text was updated successfully, but these errors were encountered:

tdomhan · 2018-02-14T10:40:19Z

+1 for this. This is actually a very annoying bug.

rajanksin · 2018-02-27T20:28:38Z

@sandeep-krishnamurthy : Please tag: Bug

ShownX · 2018-03-03T21:33:53Z

Facing the same problem. However, in Gluon, it cannot define a name

opringle mentioned this issue Jan 12, 2018

terminate called after throwing an instance of 'std::out_of_range’ #8663

Closed

opringle changed the title ~~BucketingModule causes error if symbol name not specified~~ BucketingModule causes uncaught exception if symbol name not specified Jan 13, 2018

sandeep-krishnamurthy added Bug Call for Contribution Python labels Feb 27, 2018

anirudh2290 mentioned this issue Mar 14, 2018

[MXNET-89] Bug fix for bucketing module #10094

Merged

6 tasks

piiswrong closed this as completed in #10094 Mar 20, 2018

anirudh2290 mentioned this issue Mar 20, 2018

TODO list for Exception Handling Support #9913

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BucketingModule causes uncaught exception if symbol name not specified #9405

BucketingModule causes uncaught exception if symbol name not specified #9405

opringle commented Jan 12, 2018 •

edited

Loading

tdomhan commented Feb 14, 2018

rajanksin commented Feb 27, 2018

ShownX commented Mar 3, 2018

BucketingModule causes uncaught exception if symbol name not specified #9405

BucketingModule causes uncaught exception if symbol name not specified #9405

Comments

opringle commented Jan 12, 2018 • edited Loading

Description

Environment info (Required)

Error Message:

Minimum reproducible example

Steps to reproduce

What have you tried to solve it?

tdomhan commented Feb 14, 2018

rajanksin commented Feb 27, 2018

ShownX commented Mar 3, 2018

opringle commented Jan 12, 2018 •

edited

Loading