Skip to content

Commit

Permalink
Integrate MKLDNN.
Browse files Browse the repository at this point in the history
Update MXNet for MKLDNN.

Enable MKLDNN Relu.

Fix a compilation error.

Change Makefile for MKLDNN.

Remove infer storage in convolution.

Update MXNet for MKLDNN.

Support MKLDNN storage type in python.

Update activation.

Add MKLDNN base classes.

Implement MKLDNN fully connected.

Add MKLDNN convolution.

Update MKLDNN interface in NDArray.

MKLDNN convolution handle CreateMKLDNNData failure.

Add another GetMKLDNNData in NDArray.

Have mkldnn to define the data format.

Create output MKLDNN memory explicitly for FC.

Fix a bug in NDArray.

Fix a bug in GetWeightDesc.

Convert data layout if necessary in FC.

remove unnecessary print in MKLDNN convolution.

Add MKLDNN deconvolution.

Add MKLDNNStream to manage primitives and memories.

Use MKLDNNStream to register memory in NDArray.

Use MKLDNNStream to manage resources in operators.

Handle kAddTo in MKLDNN operators.

Fix a bug in deconvolution.

Fix bugs in NDArray.

Revert "Fix bugs in NDArray."

This reverts commit f5624a4.

Fix a bug in NDArray.

Fix a bug in NDArray.

Reorder MKLDNN memory to default format in SetTBlob.

Disable MKLDNN correctly.

Fix a bug in activation.

Reshape of NDArray supports MKLDNN.

Fix a memory ref bug in NDArray.

Reshape NDArray in MKLDNN FullyConnected.

Fix data format conversion.

Create MKLDNN NDArray in python.

Support Slice for MKLDNN NDArray.

Reduce the overhead of summing the result to the output array.

Avoid unnecessary memory copy in NDArray.

Fix a bug in data reordering.

Fix a bug in NDArray.

Don't hard code MKLDNN type.

Support dilation in MKLDNN convolution.

Fix a bug in sum results.

Rewrite GetMKLDNNData.

Add prepare_mkldnn.sh

Enable MKLDNN activation.

Fix a bug on FullyConnected.

Handle 3 dims for MKLDNN NDArray.

Fix a bug in MKLDNN FC.

Support MKLDNN storage in KV store.

Fix a bug in executor for non-default NDArray.

Fix a link error in cast_storage.cc.

Remove unnecessary function def

Fall back to def storage if the type isn't supported by MKLDNN.

Use NDArray for MKLDNN in python.

Reshape output of MKLDNN convolution.

Fix a bug in NDArray.

Support more operations in MKLDNN NDArray.

Fix a bug in deconvolution.

Fix bugs in MKLDNN deconvolution.

We still need to compute bias correctly.

Have elemwise binary ops to fall to default for MKLDNN.

Limit the cases that MKLDNN operations are called.

Force the layout of mkldnn::memory from NDArray.

Add MKLDNN softmax.

Fix output storage type of MKLDNN softmax.

Add MKLDNN sum.

Fix a bug in elemwise sum.

Fix a bug in MKLDNN softmax.

Fix a bug in imperative.

Clean up dispatch modes.

Remove redundant code.

MKLDNN Pooling Op integration

MKLDNN Pooling Op integration add missing file

fix mkldnn pooling op workspace issue

handle workspace in MKLDNN pooling correctly.

Use a non-MKLDNN op for testing.

Allow to share arguments and their gradients between executors.

Avoid using MKLDNN pooling when it's not supported.

Support MKLDNN properly.

Choose MKLDNN softmax more carefully.

Fix a bug in MKLDNN pooling.

Fall back if MKLDNN pooling isn't supported.

Fix a bug in Slice of NDArray.

Use int32 for workspace memory.

Exclude MKLDNN act with tanh.

Have two Reshape functions in NDArray.

Copy data for NDArray with diff shapes.

Add MKLDNN copy.

Add MKLDNN version of elemwise_add.

Add MKLDNN version of Flatten.

add mkldnn surport for concat

simplify MKLDNN Flatten.

Enalbe MKLDNN deconvolution with bias.

Fix a bug in CuDNN deconvolution.

avoid using MKLDNNStorage when it's not defined.

Remove ./cudnn_lrn-inl.h

Fix for make lint.

add mkldnn surport for concat

fix the coding style for pr of mkldnn concat

Only add input data for MKLDNN concat backward

Remove unnecessary TODO.

remove unnecessary __repr__ in MKLNDArray.

better condition check for readability.

Use macro when including mkldnn.hpp.

Revert "Use CoreOpRunner for refactored Ops."

This reverts commit a28586f.

Fix a bug in test core.

Limit MKLDNN ops being used.

Fix complains from "make pylint"

Move ContainStorage to common/utils.h

Limit MKLDNN concat being used.

Add license.

Fix amalgamation

Fix compilation error in mkldnn_ops-inl.h

Fix a bug in deconvolution.

Fix a bug in pooling.

MKLDNN ops allocates temp mem.

Fix a bug in pooling.

Allocate align memory from temp space.

Have parameter gradients stored in the default storage.

Handle all cases in CopyFrom.

Ensure NDArray returns memory with right memory descriptors.

use auto to define memory in the operator.

Use raw pointer for mkldnn memory.

Move more code to mkldnn_base.cc

Fix a compilation error.

Address review comments.

fix a bug in activation backward.

Miss a macro in mkldnn_base.cc

Fix a bug in data iterator in examples.

Avoid memory allocation in ReshapeMKLDNN.

Avoid memory allocation in storage cast.

Fix a bug in cast storage.

Handle sliced MKLDNN NDArray.

Use memcpy if NDArray uses default format.

Revert "Limit MKLDNN ops being used."

This reverts commit 75e2ae5.

Enable mkldnn act backward has the same input layout.

Fix a bug in mkldnn activation.

Use MKLDNN sum in more cases.

Improve perf of reorder.

Avoid memory reorder in conv and deconv.

Avoid unnecessary storage cast in fallback path.

Revert "Use MKLDNN sum in more cases."

This reverts commit 7a21ebc.

Handle sliced ndarray in more cases.

Fix a complain from make lint.

Update Jenkins to test MKLDNN.

debug compiling mkldnn.

Use MKLDNN sum in more cases.

Add mkldnn as a submodule.

Compile with mkldnn in 3rdparty.

Fix some coding styles.

write the path to mkldnn lib in libmxnet.so.

use rpath with $ORIGIN.

Pack all lib files in Jenkins.

pack and unpack mxnet with MKLDNN.

Update Jenkinsfile

Update Jenkinsfile

Add mkldnn batch normalization

Fix bugs in BN.

Avoid memory allocation in MKLDNNCopy.

only use MKLDNN BatchNorm for special cases.

MKLDNN BatchNorm doesn't work well on the default layout.

Add MKL-DNN based LRN

Code Style Changes

Fix a bug in BN.

Fix a bug in LRN.

Handle non-default storage in memory plan.

Fix coding style.

Fix a compilation error without mkldnn.

Fix some coding styles for batch norm

Improve forward of convolution.

Add openmp and simd support to BN operator

Retrieve MKLDNN Conv primitive based on signature.

Retrieve Act primitive based on its signature.

Fix a bug in pooling.

Diable some MKLDNN activation and pooling.

Cast MKLDNN storage with diff data type.

Check if it's a view of NDArray.

Reshaped and sliced arrays share the same chunks.

Implement caching MKLDNN Act correctly.

Fix a bug in check_consistency.

Fix a potential bug when destroying NDArray.

Fix bugs when allocating mem in NDArray.

Fix coding style.

Add micro when using mkldnn in ndarray.

Fix a compilation error.

Fix a bug in concat.

Remove MKLDNNStorage.

handle diff layouts in CopyFromToDnsImpl.

Fallback correctly.

Force weight grad to use default layout.

Reorder weight arrays in (de)conv for faster inference.

Avoid caching TBlob from NDArray.

This commit may add some overhead of managing NDArray for each fallback.

Fix a bug in Flatten.

handle ndarray with def layout in mkldnn BN correctly.

Align to page when mkldnn is enabled.

Use default mem alloc for mkldnn.

Reuse NDArrays.

Support WriteInplace for sum.

fix complains from "make lint".

Avoid reallocation in NDArray.

Handle weight arrays with special MKLDNN layouts.

Remove unnecessary GetWeights.

Fix compilation error without MKLDNN.

Fix a bug in (de)conv for weight arrays.

Fix a minor bug in MKLDNN conv.

Fix a bug in MKLDNNOpSignature.

Reimplement fallback for MKLDNN ops.

Fix a bug in FallbackExecutor.

Add params in hashcode.

Invalidate data in outputs to accelerate.

Fix a minor bug.

Update mkldnn_base-inl.h

Add primitive caching for Pooling forward computation

Add hashcode in pooling parameters.

Support NDArray copy with types unsupported by MKLDNN.

Avoid using MKLDNN concat for negative dimension.

Fix make lint complain.

Disable mkldnn avg pooling for now.

Fix a compile warning.

Fix compile error when MKLDNN is disabled.

OP primitive cache: use memory as signature for MKLDNN storage type

Remove MKLDNN array in python.

Disable Clang tests in Jenkins.

Use mklml dockers to test mkldnn.

Update MKLDNN repo to zhengda's mkldnn repo.

Update MKLDNN repo to ashok's.

Fix a bug in fallback.

Change avg pooling algorithm to pooling_avg_include_padding

Fix a code style in mkldnn pooling.

Temp fix a bug in FC.

Revert "Disable Clang tests in Jenkins."

This reverts commit b4efa8f.

Rebase and Refactor deconv  (#20)

* rebase to Da,Zheng refactor branch Jan.14,  add signature for mkldnn Deconv and modify classMKLDNNDeconvForward

* fix make lint complains

A simple way of caching BN inference.

cache BN forward for both training and inference.

Fix some minor problems in BN.

Fix a bug in caching BN.

force to build with avx2 in Jenkins.

Remove the remaining MKLDNNStorageType

Some minor updates in NDArray.

a lot of updates to address comments.

minor changes.
  • Loading branch information
zheng-da committed Jan 17, 2018
1 parent 4f67086 commit 6181a55
Show file tree
Hide file tree
Showing 71 changed files with 5,911 additions and 606 deletions.
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,7 @@
[submodule "3rdparty/googletest"]
path = 3rdparty/googletest
url = https://github.com/google/googletest.git
[submodule "3rdparty/mkldnn"]
path = 3rdparty/mkldnn
url = https://github.com/ashokei/mkl-dnn.git
branch = master
1 change: 1 addition & 0 deletions 3rdparty/mkldnn
Submodule mkldnn added at e9ef04
55 changes: 27 additions & 28 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
mx_lib = 'lib/libmxnet.so, lib/libmxnet.a, dmlc-core/libdmlc.a, nnvm/lib/libnnvm.a'
// mxnet cmake libraries, in cmake builds we do not produce a libnvvm static library by default.
mx_cmake_lib = 'build/libmxnet.so, build/libmxnet.a, build/dmlc-core/libdmlc.a'
mx_mkldnn_lib = 'lib/libmxnet.so, lib/libmxnet.a, lib/libiomp5.so, lib/libmklml_gnu.so, lib/libmkldnn.so, lib/libmkldnn.so.0, lib/libmklml_intel.so, dmlc-core/libdmlc.a, nnvm/lib/libnnvm.a'
// command to start a docker container
docker_run = 'tests/ci_build/ci_build.sh'
// timeout in minutes
Expand Down Expand Up @@ -143,15 +144,15 @@ def python3_gpu_ut(docker_type) {
}

// Python 2
def python2_mklml_ut(docker_type) {
def python2_mkldnn_ut(docker_type) {
timeout(time: max_time, unit: 'MINUTES') {
sh "${docker_run} ${docker_type} find . -name '*.pyc' -type f -delete"
sh "${docker_run} ${docker_type} PYTHONPATH=./python/ nosetests-2.7 --with-timer --verbose tests/python/cpu"
}
}

// Python 3
def python3_mklml_ut(docker_type) {
def python3_mkldnn_ut(docker_type) {
timeout(time: max_time, unit: 'MINUTES') {
sh "${docker_run} ${docker_type} find . -name '*.pyc' -type f -delete"
sh "${docker_run} ${docker_type} PYTHONPATH=./python/ nosetests-3.4 --with-timer --verbose tests/python/cpu"
Expand Down Expand Up @@ -225,21 +226,20 @@ try {
}
}
},
'CPU: MKLML': {
'CPU: MKLDNN': {
node('mxnetlinux-cpu') {
ws('workspace/build-mklml-cpu') {
ws('workspace/build-mkldnn-cpu') {
init_git()
def flag = """ \
DEV=1 \
USE_PROFILER=1 \
USE_CPP_PACKAGE=1 \
USE_BLAS=openblas \
USE_MKL2017=1 \
USE_MKL2017_EXPERIMENTAL=1 \
USE_MKLDNN=1 \
-j\$(nproc)
"""
make("cpu_mklml", flag)
pack_lib('mklml_cpu')
pack_lib('mkldnn_cpu', mx_mkldnn_lib)
}
}
},
Expand All @@ -260,24 +260,23 @@ try {
}
}
},
'GPU: MKLML': {
'GPU: MKLDNN': {
node('mxnetlinux-cpu') {
ws('workspace/build-mklml-gpu') {
ws('workspace/build-mkldnn-gpu') {
init_git()
def flag = """ \
DEV=1 \
USE_PROFILER=1 \
USE_CPP_PACKAGE=1 \
USE_BLAS=openblas \
USE_MKL2017=1 \
USE_MKL2017_EXPERIMENTAL=1 \
USE_MKLDNN=1 \
USE_CUDA=1 \
USE_CUDA_PATH=/usr/local/cuda \
USE_CUDNN=1 \
-j\$(nproc)
"""
make("build_cuda", flag)
pack_lib('mklml_gpu')
pack_lib('mkldnn_gpu', mx_mkldnn_lib)
}
}
},
Expand Down Expand Up @@ -424,43 +423,43 @@ try {
}
}
},
'Python2: MKLML-CPU': {
'Python2: MKLDNN-CPU': {
node('mxnetlinux-cpu') {
ws('workspace/ut-python2-mklml-cpu') {
ws('workspace/ut-python2-mkldnn-cpu') {
init_git()
unpack_lib('mklml_cpu')
unpack_lib('mkldnn_cpu', mx_mkldnn_lib)
python2_ut('cpu_mklml')
python2_mklml_ut('cpu_mklml')
python2_mkldnn_ut('cpu_mklml')
}
}
},
'Python2: MKLML-GPU': {
'Python2: MKLDNN-GPU': {
node('mxnetlinux-gpu') {
ws('workspace/ut-python2-mklml-gpu') {
ws('workspace/ut-python2-mkldnn-gpu') {
init_git()
unpack_lib('mklml_gpu')
unpack_lib('mkldnn_gpu', mx_mkldnn_lib)
python2_gpu_ut('gpu_mklml')
python2_mklml_ut('gpu_mklml')
python2_mkldnn_ut('gpu_mklml')
}
}
},
'Python3: MKLML-CPU': {
'Python3: MKLDNN-CPU': {
node('mxnetlinux-cpu') {
ws('workspace/ut-python3-mklml-cpu') {
ws('workspace/ut-python3-mkldnn-cpu') {
init_git()
unpack_lib('mklml_cpu')
unpack_lib('mkldnn_cpu', mx_mkldnn_lib)
python3_ut('cpu_mklml')
python3_mklml_ut('cpu_mklml')
python3_mkldnn_ut('cpu_mklml')
}
}
},
'Python3: MKLML-GPU': {
'Python3: MKLDNN-GPU': {
node('mxnetlinux-gpu') {
ws('workspace/ut-python3-mklml-gpu') {
ws('workspace/ut-python3-mkldnn-gpu') {
init_git()
unpack_lib('mklml_gpu')
unpack_lib('mkldnn_gpu', mx_mkldnn_lib)
python3_gpu_ut('gpu_mklml')
python3_mklml_ut('gpu_mklml')
python3_mkldnn_ut('gpu_mklml')
}
}
},
Expand Down
44 changes: 21 additions & 23 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ endif
# use customized config file
include $(config)

ifeq ($(USE_MKL2017), 1)
# must run ./prepare_mkl before including mshadow.mk
RETURN_STRING := $(shell ./prepare_mkl.sh $(MKLML_ROOT))
MKLROOT := $(firstword $(RETURN_STRING))
export USE_MKLML = $(lastword $(RETURN_STRING))
ifeq ($(USE_MKLDNN), 1)
RETURN_STRING := $(shell ./prepare_mkldnn.sh $(MKLDNN_ROOT))
MKLDNNROOT := $(firstword $(RETURN_STRING))
MKLROOT := $(lastword $(RETURN_STRING))
export USE_MKLML = 1
endif

include mshadow/make/mshadow.mk
Expand Down Expand Up @@ -114,23 +114,20 @@ ifeq ($(USE_NNPACK), 1)
LDFLAGS += -lnnpack
endif

ifeq ($(USE_MKL2017), 1)
CFLAGS += -DMXNET_USE_MKL2017=1
ifeq ($(USE_MKLDNN), 1)
CFLAGS += -DMXNET_USE_MKLDNN=1
CFLAGS += -DUSE_MKL=1
CFLAGS += -I$(ROOTDIR)/src/operator/mkl/
CFLAGS += -I$(MKLML_ROOT)/include
LDFLAGS += -L$(MKLML_ROOT)/lib
ifeq ($(USE_MKL2017_EXPERIMENTAL), 1)
CFLAGS += -DMKL_EXPERIMENTAL=1
else
CFLAGS += -DMKL_EXPERIMENTAL=0
endif
ifeq ($(UNAME_S), Darwin)
LDFLAGS += -lmklml
else
LDFLAGS += -Wl,--as-needed -lmklml_intel -lmklml_gnu
CFLAGS += -I$(ROOTDIR)/src/operator/nn/mkldnn/
ifneq ($(MKLDNNROOT), $(MKLROOT))
CFLAGS += -I$(MKLROOT)/include
LDFLAGS += -L$(MKLROOT)/lib
endif
LDFLAGS += -liomp5
CFLAGS += -I$(MKLDNNROOT)/include
LDFLAGS += -L$(MKLDNNROOT)/lib -lmkldnn -Wl,-rpath,'$${ORIGIN}'
endif

ifeq ($(BN_DEBUG), 1)
CFLAGS += -DMXNET_BN_DEBUG=1
endif

ifeq ($(USE_OPERATOR_TUNING), 1)
Expand All @@ -144,7 +141,7 @@ endif
# - for Ubuntu, installing atlas will not automatically install the atlas provided lapack library
# silently switching lapack off instead of letting the build fail because of backward compatibility
ifeq ($(USE_LAPACK), 1)
ifeq ($(USE_BLAS),$(filter $(USE_BLAS),blas openblas atlas))
ifeq ($(USE_BLAS),$(filter $(USE_BLAS),blas openblas atlas mkl))
ifeq (,$(wildcard /lib/liblapack.a))
ifeq (,$(wildcard /usr/lib/liblapack.a))
ifeq (,$(wildcard /usr/lib64/liblapack.a))
Expand All @@ -162,7 +159,7 @@ ifeq ($(USE_LAPACK), 1)
ifneq ($(USE_LAPACK_PATH), )
LDFLAGS += -L$(USE_LAPACK_PATH)
endif
ifeq ($(USE_BLAS),$(filter $(USE_BLAS),blas openblas atlas))
ifeq ($(USE_BLAS),$(filter $(USE_BLAS),blas openblas atlas mkl))
LDFLAGS += -llapack
endif
CFLAGS += -DMXNET_USE_LAPACK
Expand Down Expand Up @@ -552,7 +549,8 @@ clean: cyclean $(EXTRA_PACKAGES_CLEAN)
else
clean: cyclean testclean $(EXTRA_PACKAGES_CLEAN)
$(RM) -r build lib bin *~ */*~ */*/*~ */*/*/*~ R-package/NAMESPACE R-package/man R-package/R/mxnet_generated.R \
R-package/inst R-package/src/image_recordio.h R-package/src/*.o R-package/src/*.so mxnet_*.tar.gz
R-package/inst R-package/src/image_recordio.h R-package/src/*.o R-package/src/*.so mxnet_*.tar.gz \
external/mkldnn/install/*
cd $(DMLC_CORE); $(MAKE) clean; cd -
cd $(PS_PATH); $(MAKE) clean; cd -
cd $(NNVM_PATH); $(MAKE) clean; cd -
Expand Down
2 changes: 1 addition & 1 deletion amalgamation/mxnet_predict0.cc
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
#include "src/operator/operator_util.cc"
#include "src/operator/nn/activation.cc"
#include "src/operator/nn/batch_norm.cc"
#include "src/operator/concat.cc"
#include "src/operator/nn/concat.cc"
#include "src/operator/nn/convolution.cc"
#include "src/operator/nn/deconvolution.cc"
#include "src/operator/nn/dropout.cc"
Expand Down
3 changes: 2 additions & 1 deletion example/image-classification/common/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,8 @@ def get_rec_iter(args, kv=None):
image_shape = tuple([int(l) for l in args.image_shape.split(',')])
if 'benchmark' in args and args.benchmark:
data_shape = (args.batch_size,) + image_shape
train = SyntheticDataIter(args.num_classes, data_shape, 500, np.float32)
train = SyntheticDataIter(args.num_classes, data_shape,
args.num_examples / args.batch_size, np.float32)
return (train, None)
if kv:
(rank, nworker) = (kv.rank, kv.num_workers)
Expand Down
Loading

0 comments on commit 6181a55

Please sign in to comment.