Failed to export gluon model zoo vision model to symbol file #9835

kevinthesun · 2018-02-20T06:28:40Z

Description

Gluon block export function returned error while trying to export mobilenet, alexnet, densenet, squeezenet and resnet_v2 models. MXNet is built from source with mkldnn.

MXNet commit hash:
af0c3b4

Build config:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

#-------------------------------------------------------------------------------
#  Template configuration for compiling mxnet
#
#  If you want to change the configuration, please use the following
#  steps. Assume you are on the root directory of mxnet. First copy the this
#  file so that any local changes will be ignored by git
#
#  $ cp make/config.mk .
#
#  Next modify the according entries, and then compile by
#
#  $ make
#
#  or build in parallel with 8 threads
#
#  $ make -j8
#-------------------------------------------------------------------------------

#---------------------
# choice of compiler
#--------------------

export CC = gcc
export CXX = g++
export NVCC = nvcc

# whether compile with options for MXNet developer
DEV = 0

# whether compile with debug
DEBUG = 0

# whether compile with profiler
USE_PROFILER =

# whether to turn on segfault signal handler to log the stack trace
USE_SIGNAL_HANDLER =

# the additional link flags you want to add
ADD_LDFLAGS =

# the additional compile flags you want to add
ADD_CFLAGS =

#---------------------------------------------
# matrix computation libraries for CPU/GPU
#---------------------------------------------

# whether use CUDA during compile
USE_CUDA = 0

# add the path to CUDA library to link and compile flag
# if you have already add them to environment variable, leave it as NONE
# USE_CUDA_PATH = /usr/local/cuda
USE_CUDA_PATH = NONE

# whether to enable CUDA runtime compilation
ENABLE_CUDA_RTC = 1

# whether use CuDNN R3 library
USE_CUDNN = 0

#whether to use NCCL library
USE_NCCL = 0
#add the path to NCCL library
USE_NCCL_PATH = NONE

# whether use opencv during compilation
# you can disable it, however, you will not able to use
# imbin iterator
USE_OPENCV = 1

#whether use libjpeg-turbo for image decode without OpenCV wrapper
USE_LIBJPEG_TURBO = 0
#add the path to libjpeg-turbo library
USE_LIBJPEG_TURBO_PATH = NONE

# use openmp for parallelization
USE_OPENMP = 1

# MKL ML Library for Intel CPU/Xeon Phi
# Please refer to MKL_README.md for details

# MKL ML Library folder, need to be root for /usr/local
# Change to User Home directory for standard user
# For USE_BLAS!=mkl only
MKLML_ROOT=/usr/local

# whether use MKL2017 library
USE_MKL2017 = 1

# whether use MKL2017 experimental feature for high performance
# Prerequisite USE_MKL2017=1
USE_MKL2017_EXPERIMENTAL = 0

# whether use NNPACK library
USE_NNPACK = 0

# choose the version of blas you want to use
# can be: mkl, blas, atlas, openblas
# in default use atlas for linux while apple for osx
#UNAME_S := $(shell uname -s)
#ifeq ($(UNAME_S), Darwin)
#USE_BLAS = apple
#else
#USE_BLAS = atlas
#endif
USE_BLAS = mkl

# whether use lapack during compilation
# only effective when compiled with blas versions openblas/apple/atlas/mkl
USE_LAPACK = 1

# path to lapack library in case of a non-standard installation
USE_LAPACK_PATH =

# add path to intel library, you may need it for MKL, if you did not add the path
# to environment variable
USE_INTEL_PATH = NONE

# If use MKL only for BLAS, choose static link automatically to allow python wrapper
ifeq ($(USE_BLAS), mkl)
USE_STATIC_MKL = 1
else
USE_STATIC_MKL = NONE
endif

#----------------------------
# Settings for power and arm arch
#----------------------------
ARCH := $(shell uname -a)
ifneq (,$(filter $(ARCH), armv6l armv7l powerpc64le ppc64le aarch64))
	USE_SSE=0
else
	USE_SSE=1
endif

#----------------------------
# distributed computing
#----------------------------

# whether or not to enable multi-machine supporting
USE_DIST_KVSTORE = 0

# whether or not allow to read and write HDFS directly. If yes, then hadoop is
# required
USE_HDFS = 0

# path to libjvm.so. required if USE_HDFS=1
LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server

# whether or not allow to read and write AWS S3 directly. If yes, then
# libcurl4-openssl-dev is required, it can be installed on Ubuntu by
# sudo apt-get install -y libcurl4-openssl-dev
USE_S3 = 0

#----------------------------
# performance settings
#----------------------------
# Use operator tuning
USE_OPERATOR_TUNING = 1

# Use gperftools if found
USE_GPERFTOOLS = 1

# Use JEMalloc if found, and not using gperftools
USE_JEMALLOC = 1

#----------------------------
# additional operators
#----------------------------

# path to folders containing projects specific operators that you don't want to put in src/operators
EXTRA_OPERATORS =

#----------------------------
# other features
#----------------------------

# Create C++ interface package
USE_CPP_PACKAGE = 0

#----------------------------
# plugins
#----------------------------

# whether to use caffe integration. This requires installing caffe.
# You also need to add CAFFE_PATH/build/lib to your LD_LIBRARY_PATH
# CAFFE_PATH = $(HOME)/caffe
# MXNET_PLUGINS += plugin/caffe/caffe.mk

# WARPCTC_PATH = $(HOME)/warp-ctc
# MXNET_PLUGINS += plugin/warpctc/warpctc.mk

# whether to use sframe integration. This requires build sframe
# [email protected]:dato-code/SFrame.git
# SFRAME_PATH = $(HOME)/SFrame
# MXNET_PLUGINS += plugin/sframe/plugin.mk

Error Message:

Traceback (most recent call last):
  File "test.py", line 18, in <module>
    block.export(model)
  File "/home/ubuntu/mxnet/python/mxnet/gluon/block.py", line 558, in export
    ndarray.save('%s-%04d.params'%(path, epoch), arg_dict)
  File "/home/ubuntu/mxnet/python/mxnet/ndarray/utils.py", line 236, in save
    keys))
  File "/home/ubuntu/mxnet/python/mxnet/base.py", line 148, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [06:26:59] src/operator/tensor/./././elemwise_unary_op.h:301: Check failed: inputs[0].dptr_ == outputs[0].dptr_ (0x7fbc19d37000 vs. 0x7fbc19d39000) 

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7fbcedf144fb]
[bt] (1) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7fbcedf15518]
[bt] (2) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::op::UnaryOp::IdentityCompute<mshadow::cpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0xa99) [0x7fbcee3b8c19]
[bt] (3) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x1067) [0x7fbcf06f7ec7]
[bt] (4) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::engine::ThreadedEngine::BulkAppend(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x68) [0x7fbcf0b6ba98]
[bt] (5) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::engine::ThreadedEngine::BulkAppend(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x47) [0x7fbcf0b6ba77]
[bt] (6) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::engine::ThreadedEngine::BulkFlush()::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x4b) [0x7fbcf0b5794b]
[bt] (7) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x2be) [0x7fbcf0b5c2ee]
[bt] (8) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>&&)+0x133) [0x7fbcf0b79d73]
[bt] (9) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)> (std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)> >::_M_run()+0x4a) [0x7fbcf0b7590a]

Minimum reproducible example

import numpy as np
import mxnet as mx

from mxnet.gluon.model_zoo.vision import get_model

model = "mobilenet1.0"
batch_size = 1

image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape

data_array = np.random.uniform(0, 255, size=data_shape).astype("float32")
mx_data = mx.nd.array(data_array)

block = get_model(model, pretrained=True)
block.hybridize()
block(mx_data)
block.export(model)

The text was updated successfully, but these errors were encountered:

szha · 2018-02-20T06:48:46Z

@zheng-da would you take a look?

zheng-da · 2018-04-20T07:05:34Z

i just saw this message. @ashokei is working on this.

pengzhao-intel · 2018-05-10T06:22:29Z

@szha @marcoabreu we encountered the same issues and block our fruther works now.
Could you help review and merge @zheng-da's PR?
Thanks a lot.

marcoabreu added the Model Zoo label Feb 20, 2018

zheng-da mentioned this issue Apr 23, 2018

handle fallback correctly for write inplace when the array is MKLDNN. #10651

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to export gluon model zoo vision model to symbol file #9835

Failed to export gluon model zoo vision model to symbol file #9835

kevinthesun commented Feb 20, 2018 •

edited

Loading

szha commented Feb 20, 2018

zheng-da commented Apr 20, 2018

pengzhao-intel commented May 10, 2018

Failed to export gluon model zoo vision model to symbol file #9835

Failed to export gluon model zoo vision model to symbol file #9835

Comments

kevinthesun commented Feb 20, 2018 • edited Loading

Description

Build config:

Error Message:

Minimum reproducible example

szha commented Feb 20, 2018

zheng-da commented Apr 20, 2018

pengzhao-intel commented May 10, 2018

kevinthesun commented Feb 20, 2018 •

edited

Loading