Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

ImageRecordIOParser std::bad_alloc Error when or after decoding #5525

Closed
ysh329 opened this issue Mar 22, 2017 · 6 comments
Closed

ImageRecordIOParser std::bad_alloc Error when or after decoding #5525

ysh329 opened this issue Mar 22, 2017 · 6 comments

Comments

@ysh329
Copy link
Contributor

ysh329 commented Mar 22, 2017

Environment info

Jetson TX1, 4G RAM, Ubuntu16.04 64bit, MXNet 0.94

  1. RAM is enough. I excluded the cause of the memory firstly. I made rec format data less 1MB using im2rec.py. However, it still occurs this problem message.

I use caltech-256 data set from mxnet/example/image-classification/data/caltech256.sh and train a DeepID model based on mxnet/example/image-classification/train_cifar10.py.

Error Message:

(py-mxnet-SSD) yuanshuai@tegra-ubuntu:~/sdcard/code/mxnet_inference/deepid$ python train_cifar10_resize224.py                                                                             INFO:root:start with arguments Namespace(batch_size=128, benchmark=0, data_nthreads=4, data_train='/home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-train.rec', data_val='/home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-val.rec', disp_batches=20, gpus='0', image_shape='3,224,224', kv_store='device', load_epoch=None, lr=0.0005, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix='./deepid-caltech-256', mom=0.9, monitor=0, network=None, num_classes=256, num_epochs=1, num_examples=25574, num_layers=None, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001)
[09:58:25] src/io/iter_image_recordio.cc:209: ImageRecordIOParser: /home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-train.rec, use 1 threads for decoding..
[09:58:30] src/io/iter_image_recordio.cc:209: ImageRecordIOParser: /home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-val.rec, use 1 threads for decoding..
Killed
(py-mxnet-SSD) yuanshuai@tegra-ubuntu:~/sdcard/code/mxnet_inference/deepid$ python train_cifar10_resize224.py 
INFO:root:start with arguments Namespace(batch_size=128, benchmark=0, data_nthreads=4, data_train='/home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-train.rec', data_val='/home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-val.rec', disp_batches=20, gpus='0', image_shape='3,224,224', kv_store='device', load_epoch=None, lr=0.0005, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix='./deepid-caltech-256', mom=0.9, monitor=0, network=None, num_classes=256, num_epochs=1, num_examples=25574, num_layers=None, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001)
[09:58:41] src/io/iter_image_recordio.cc:209: ImageRecordIOParser: /home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-train.rec, use 1 threads for decoding..
[09:58:47] src/io/iter_image_recordio.cc:209: ImageRecordIOParser: /home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-val.rec, use 1 threads for decoding..
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

Key Code

def download_cifar10():
    data_dir="data"
    fnames = (os.path.join(data_dir, "cifar10_train.rec"),
              os.path.join(data_dir, "cifar10_val.rec"))
    #download_file('http://data.mxnet.io/data/cifar10/cifar10_val.rec', fnames[1])
    #download_file('http://data.mxnet.io/data/cifar10/cifar10_train.rec', fnames[0])
    fnames = list(fnames)
    fnames[0] = '/home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-train.rec'
    fnames[1] = '/home/yuanshuai/sdcard/code/mxnet/example/image-classification/data/caltech256-val.rec'
    return fnames
if __name__ == '__main__':
    # download data
    (train_fname, val_fname) = download_cifar10()

    # parse args
    parser = argparse.ArgumentParser(description="train cifar10",
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    fit.add_fit_args(parser)
    data.add_data_args(parser)
    data.add_data_aug_args(parser)
    data.set_data_aug_level(parser, 2)
    parser.set_defaults(
        # network
        #network        = 'resnet',
        #num_layers     = 110,
        # data
        data_train     = train_fname,
        data_val       = val_fname,
        num_classes    = 256,
        num_examples  = 25574,
        image_shape    = '3,224,224',#90
        pad_size       = 4,
        # train
        gpus           = '0',
        batch_size     = 128,
        num_epochs     = 1,#300
        lr             = .0005,#.05
        lr_step_epochs = '200,250',
        model_prefix   = './deepid-caltech-256'
    )
    args = parser.parse_args()

    # load network
    from importlib import import_module
    #net = import_module('symbols.'+args.network)
    #sym = net.get_symbol(**vars(args))
    sym = get_symbol(256)

    # train
    fit.fit(args, sym, data.get_rec_iter)

Complete Code

@piiswrong
Copy link
Contributor

piiswrong commented Mar 22, 2017

@ptrendx

@ysh329 ysh329 changed the title ImageRecordIOParser std::bad_alloc Error when decoding ImageRecordIOParser std::bad_alloc Error when or after decoding Mar 22, 2017
@ysh329
Copy link
Contributor Author

ysh329 commented Mar 22, 2017

similar to these open-status questions below: 😢
#4299
#2099
#2113

@ptrendx
Copy link
Member

ptrendx commented Mar 22, 2017

Could you try setting env variable MXNET_ENGINE_TYPE to NaiveEngine and set DEBUG to 1 in config.mk, rebuild and run through gdb? Then once an error occurs type bt and paste an output here? This will make a better callstack so that we will be able to see where the issue actually occurs.

@jdhao
Copy link

jdhao commented Apr 10, 2017

@ysh329 , I met a similar problem when I tried to use MINISTIter, see #2270 for a reference.

@KeyKy
Copy link
Contributor

KeyKy commented Aug 13, 2017

met the same memory problem when training ssd or imagnet using .rec file.

@szha
Copy link
Member

szha commented Nov 12, 2017

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!
Also, do please check out our forum (and Chinese version) for general "how-to" questions.

@szha szha closed this as completed Nov 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants