Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

BucketingModule causes uncaught exception if symbol name not specified #9405

Closed
opringle opened this issue Jan 12, 2018 · 3 comments · Fixed by #10094
Closed

BucketingModule causes uncaught exception if symbol name not specified #9405

opringle opened this issue Jan 12, 2018 · 3 comments · Fixed by #10094

Comments

@opringle
Copy link
Contributor

opringle commented Jan 12, 2018

Description

When using the BucketingModule together with mx.rnn.BuckSentenceIter, I hit an uncaught exception if I do not specify the symbol name in either mx.sym.Embedding or mx.sym.FullyConnected symbols, despite each symbol in the network having a unique name.

Environment info (Required)

----------Python Info----------
Version      : 3.6.3
Compiler     : GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)
Build        : ('default', 'Oct  4 2017 06:09:15')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 9.0.1
Directory    : /Users/opringle/Envs/finn-dl/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.0.0
Directory    : /Users/opringle/Envs/finn-dl/lib/python3.6/site-packages/mxnet
Commit Hash   : 2b67436802b750e15b9fbfdf275958c1000be6a8
----------System Info----------
Platform     : Darwin-17.3.0-x86_64-i386-64bit
system       : Darwin
node         : MacBook-Pro-2.local
release      : 17.3.0
version      : Darwin Kernel Version 17.3.0: Thu Nov  9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0077 sec, LOAD: 0.4726 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0263 sec, LOAD: 0.0633 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0897 sec, LOAD: 0.0599 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0259 sec, LOAD: 0.1184 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0139 sec, LOAD: 0.2371 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0305 sec, LOAD: 0.2018 sec.

I'm using python

Error Message:

libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: unordered_map::at: key not found
Abort trap: 6

Minimum reproducible example

import mxnet as mx
import random

#synthetically create 1000 encoded sentences
encoded_sentences = []
for i in list(range(1000)):
    sentence_length = random.randint(1,20)
    sentence = random.sample(range(20), sentence_length)
    encoded_sentences.append(sentence)

#define hyperparameters/info
vocab = list(set(index for list in encoded_sentences for index in list))
batch_size = 18
num_embed = 10
num_hidden = 3
buckets = [5,10,15,20]
epochs = 15

#use mxnet bucketing iterator, label is next value in sentence
train_iter = mx.rnn.BucketSentenceIter(sentences = encoded_sentences,
                                       batch_size = batch_size,
                                       buckets = buckets,
                                       data_name = 'data',
                                       label_name = 'softmax_label')

#define recurrent cell outside of network TODO: why outside?
r_cell = mx.rnn.LSTMCell(num_hidden=num_hidden)

#define a network symbol based on sentence length
def sym_gen1(seq_len):

    print("-" * 50)

    #define input shape so we can infer layer shapes as we go
    data_shape = (batch_size, seq_len)
    label_shape = (batch_size, seq_len)

    data = mx.sym.Variable('data')
    print("\ndata shape: ", data.infer_shape(data=data_shape)[1][0], "\n")

    label = mx.sym.Variable('softmax_label')
    print("\nlabel shape: ", label.infer_shape(softmax_label=label_shape)[1][0], "\n")

    embed = mx.sym.Embedding(data=data, input_dim=len(vocab), output_dim=num_embed,name='embed')
    print("\nembed layer shape: ", embed.infer_shape(data=data_shape)[1][0], "\n")

    output, shapes = r_cell.unroll(seq_len, inputs=embed, merge_outputs=True)
    print("\nconcatenated recurrent layer shape: ", output.infer_shape(data=data_shape)[1][0], "after ", seq_len, " unrolls\n")

    reshape = mx.sym.Reshape(output, shape=(-1, num_hidden))
    print("\nafter reshaping: ", reshape.infer_shape(data=data_shape)[1][0], "\n")

    fc = mx.sym.FullyConnected(data=reshape, num_hidden=len(vocab), name='pred')
    print("\nfully connected layer shape: ", fc.infer_shape(data=data_shape)[1][0], "\n")

    label = mx.sym.Reshape(label, shape=(-1,))
    print("\nlabel shape after reshaping: ", label.infer_shape(softmax_label=label_shape)[1][0], "\n")

    pred = mx.sym.SoftmaxOutput(data=fc, label=label, name='softmax')
    print("\nsoftmax shape: ", pred.infer_shape(data=data_shape, softmax_label = label_shape)[1][0], "\n")

    return pred, ('data',), ('softmax_label',)

#define identical symbol but remove name arguement from fully connected layer
def sym_gen2(seq_len):

    print("-" * 50)

    #define input shape so we can infer layer shapes as we go
    data_shape = (batch_size, seq_len)
    label_shape = (batch_size, seq_len)

    data = mx.sym.Variable('data')
    print("\ndata shape: ", data.infer_shape(data=data_shape)[1][0], "\n")

    label = mx.sym.Variable('softmax_label')
    print("\nlabel shape: ", label.infer_shape(softmax_label=label_shape)[1][0], "\n")

    embed = mx.sym.Embedding(data=data, input_dim=len(vocab), output_dim=num_embed,name='embed')
    print("\nembed layer shape: ", embed.infer_shape(data=data_shape)[1][0], "\n")

    output, shapes = r_cell.unroll(seq_len, inputs=embed, merge_outputs=True)
    print("\nconcatenated recurrent layer shape: ", output.infer_shape(data=data_shape)[1][0], "after ", seq_len, " unrolls\n")

    reshape = mx.sym.Reshape(output, shape=(-1, num_hidden))
    print("\nafter reshaping: ", reshape.infer_shape(data=data_shape)[1][0], "\n")

    fc = mx.sym.FullyConnected(data=reshape, num_hidden=len(vocab))
    print("\nfully connected layer shape: ", fc.infer_shape(data=data_shape)[1][0], "\n")

    label = mx.sym.Reshape(label, shape=(-1,))
    print("\nlabel shape after reshaping: ", label.infer_shape(softmax_label=label_shape)[1][0], "\n")

    pred = mx.sym.SoftmaxOutput(data=fc, label=label, name='softmax')
    print("\nsoftmax shape: ", pred.infer_shape(data=data_shape, softmax_label = label_shape)[1][0], "\n")

    return pred, ('data',), ('softmax_label',)


#create a trainable bucketing module on cpu
model = mx.mod.BucketingModule(sym_gen=sym_gen1, default_bucket_key=train_iter.default_bucket_key, context=mx.cpu())
model_not_working = mx.mod.BucketingModule(sym_gen=sym_gen2, default_bucket_key=train_iter.default_bucket_key, context=mx.cpu())

#fit the first module
print("\n\n\nFITTING FIRST MODULE...\n\n\n")
metric = mx.metric.create('loss')
model.bind(data_shapes=train_iter.provide_data,
           label_shapes=train_iter.provide_label)
model.init_params()
model.init_optimizer(optimizer='Adam')
for epoch in range(epochs):
    train_iter.reset()
    metric.reset()
    for batch in train_iter:
        model.forward(batch, is_train=True)             # compute predictions
        model.backward()                                # compute gradients
        model.update()                                  # update parameters
        model.update_metric(metric, batch.label)        # update metric
    print('\n', 'Epoch %d, Training %s' % (epoch, metric.get()))

#fit the second module
print("\n\n\nFITTING SECOND MODULE...\n\n\n")
metric = mx.metric.create('loss')
model_not_working.bind(data_shapes=train_iter.provide_data,
           label_shapes=train_iter.provide_label)
model_not_working.init_params()
model_not_working.init_optimizer(optimizer='Adam')
for epoch in range(epochs):
    train_iter.reset()
    metric.reset()
    for batch in train_iter:
        model_not_working.forward(batch, is_train=True)             # compute predictions
        model_not_working.backward()                                # compute gradients
        model_not_working.update()                                  # update parameters
        model_not_working.update_metric(metric, batch.label)        # update metric
    print('\n', 'Epoch %d, Training %s' % (epoch, metric.get()))

Steps to reproduce

  1. Run the script
  2. You will see the module fit correctly with gen_sym1() but fail with gen_sym2(). The only difference between the two is the name arguement has been removed from fc = mx.sym.FullyConnected(data=reshape, num_hidden=len(vocab), name='pred').

What have you tried to solve it?

  1. I have solved the problem by including an input to the name parameter when creating the symbol.
@opringle opringle changed the title BucketingModule causes error if symbol name not specified BucketingModule causes uncaught exception if symbol name not specified Jan 13, 2018
@tdomhan
Copy link
Contributor

tdomhan commented Feb 14, 2018

+1 for this. This is actually a very annoying bug.

@rajanksin
Copy link
Contributor

@sandeep-krishnamurthy : Please tag: Bug

@ShownX
Copy link

ShownX commented Mar 3, 2018

Facing the same problem. However, in Gluon, it cannot define a name

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants