Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mkldnn #2

Closed
wants to merge 75 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
a10bc72
Use NNVM for convolution.
zheng-da Sep 27, 2017
cf6293f
Fix bugs in CuDNN convolution.
zheng-da Sep 27, 2017
cba26a4
Use NNVM for activation.
zheng-da Sep 27, 2017
f7a9e72
Rename the input macro for convolution.
zheng-da Sep 27, 2017
dcafbb3
Use NNVM for batch norm.
zheng-da Sep 28, 2017
94143ab
Use NNVM for FullyConnected.
zheng-da Sep 28, 2017
ecb33ea
Use NNVM for SoftmaxActivation.
zheng-da Sep 28, 2017
d45e0d3
Use NNVM for dropout.
zheng-da Sep 29, 2017
b73e762
Use NNVM for Pooling.
zheng-da Sep 30, 2017
df581ef
Use NNVM for Deconvolution and Upsampling.
zheng-da Oct 2, 2017
8491f72
Fix a bug in deconvolution.
zheng-da Oct 2, 2017
ac317cd
Handle aux states in batch norm.
zheng-da Oct 8, 2017
672ce33
Fix GPU versions of the operators.
zheng-da Oct 13, 2017
a5cadd4
Fix coding style.
zheng-da Oct 16, 2017
54f699e
Fix bugs in CuDNN convolution.
zheng-da Oct 17, 2017
9147a51
Fix bugs in other CuDNN operators.
zheng-da Oct 17, 2017
10ae5f0
enable depthwise convolution.
zheng-da Oct 17, 2017
2279f93
Move CuDNN code to src/operator/nn/cudnn
zheng-da Oct 17, 2017
b713524
Fix a bug in convolution.
zheng-da Oct 17, 2017
2a7d71e
Remove MKL code.
zheng-da Oct 20, 2017
cfd862b
Update MXNet for MKLDNN.
zheng-da Oct 21, 2017
063b504
Enable MKLDNN Relu.
zheng-da Oct 24, 2017
f1431e9
Merge remote-tracking branch 'mxnet_master/master' into mkldnn
zheng-da Oct 24, 2017
756ec14
Change Makefile for MKLDNN.
zheng-da Oct 24, 2017
604300c
Temporarily disable part of dropout.
zheng-da Oct 25, 2017
f1f27d1
Remove infer storage in convolution.
zheng-da Oct 25, 2017
df73da1
Update MXNet for MKLDNN.
zheng-da Oct 25, 2017
6da4528
Support MKLDNN storage type in python.
zheng-da Oct 31, 2017
1b3e210
Update activation.
zheng-da Oct 25, 2017
b2a8b60
Add MKLDNN base classes.
zheng-da Oct 26, 2017
5f52ccb
Implement MKLDNN fully connected.
zheng-da Oct 28, 2017
7c7fe62
Add MKLDNN convolution.
zheng-da Oct 31, 2017
7c2fb77
Update MKLDNN interface in NDArray.
zheng-da Nov 2, 2017
2b58bfc
MKLDNN convolution handle CreateMKLDNNData failure.
zheng-da Nov 2, 2017
560eb0d
Add another GetMKLDNNData in NDArray.
zheng-da Nov 3, 2017
92f58c5
Have mkldnn to define the data format.
zheng-da Nov 3, 2017
5128cb4
Create output MKLDNN memory explicitly for FC.
zheng-da Nov 3, 2017
9dd1b02
Fix a bug in NDArray.
zheng-da Nov 3, 2017
bf8b782
Fix a bug in GetWeightDesc.
zheng-da Nov 3, 2017
4b511c8
Convert data layout if necessary in FC.
zheng-da Nov 3, 2017
9e42bd4
remove unnecessary print in MKLDNN convolution.
zheng-da Nov 3, 2017
ed0e5d4
Add MKLDNN deconvolution.
zheng-da Nov 2, 2017
a211fe3
Add MKLDNNStream to manage primitives and memories.
zheng-da Nov 6, 2017
b4dd48b
Use MKLDNNStream to register memory in NDArray.
zheng-da Nov 6, 2017
13fcb9b
Use MKLDNNStream to manage resources in operators.
zheng-da Nov 6, 2017
beb8505
Handle kAddTo in MKLDNN operators.
zheng-da Nov 7, 2017
cd53fb4
Fix a bug in deconvolution.
zheng-da Nov 7, 2017
f5624a4
Fix bugs in NDArray.
zheng-da Nov 7, 2017
40c6e42
Revert "Fix bugs in NDArray."
zheng-da Nov 7, 2017
62655dc
Fix a bug in NDArray.
zheng-da Nov 7, 2017
8335d27
Fix a bug in NDArray.
zheng-da Nov 7, 2017
131a141
Reorder MKLDNN memory to default format in SetTBlob.
zheng-da Nov 8, 2017
61ac839
Disable MKLDNN correctly.
zheng-da Nov 8, 2017
64cf57c
Fix a bug in activation.
zheng-da Nov 8, 2017
4a2a98b
Reshape of NDArray supports MKLDNN.
zheng-da Nov 8, 2017
8d5ad60
Fix a memory ref bug in NDArray.
zheng-da Nov 8, 2017
e83c9c0
Reshape NDArray in MKLDNN FullyConnected.
zheng-da Nov 8, 2017
97a6910
Fix data format conversion.
zheng-da Nov 8, 2017
f87d8b9
Create MKLDNN NDArray in python.
zheng-da Nov 10, 2017
1b97bc7
Support Slice for MKLDNN NDArray.
zheng-da Nov 11, 2017
3cad7c9
Reduce the overhead of summing the result to the output array.
zheng-da Nov 11, 2017
0044a9a
Avoid unnecessary memory copy in NDArray.
zheng-da Nov 11, 2017
1494a44
Fix a bug in data reordering.
zheng-da Nov 13, 2017
8f7da06
Fix a bug in NDArray.
zheng-da Nov 14, 2017
ac06afe
Don't hard code MKLDNN type.
zheng-da Nov 14, 2017
ca6b1f7
Support dilation in MKLDNN convolution.
zheng-da Nov 16, 2017
39cc06a
Merge branch 'mkldnn' of https://github.com/zheng-da/incubator-mxnet …
zheng-da Nov 16, 2017
ee28ebe
Fix a bug in sum results.
zheng-da Nov 16, 2017
70d5b75
Rewrite GetMKLDNNData.
zheng-da Nov 16, 2017
4996db5
Add prepare_mkldnn.sh
zheng-da Nov 16, 2017
40841b6
Enable MKLDNN activation.
zheng-da Nov 16, 2017
10412f5
Fix a bug on FullyConnected.
zheng-da Nov 17, 2017
2819797
Handle 3 dims for MKLDNN NDArray.
zheng-da Nov 17, 2017
2832750
add mkldnn_concat.cc
wentingj Dec 4, 2017
94e4dcc
declare concat func
wentingj Dec 4, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 19 additions & 25 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ endif
# use customized config file
include $(config)

ifeq ($(USE_MKL2017), 1)
# must run ./prepare_mkl before including mshadow.mk
RETURN_STRING := $(shell ./prepare_mkl.sh $(MKLML_ROOT))
MKLROOT := $(firstword $(RETURN_STRING))
export USE_MKLML = $(lastword $(RETURN_STRING))
ifeq ($(USE_MKLDNN), 1)
RETURN_STRING := $(shell ./prepare_mkldnn.sh $(MKLDNN_ROOT))
MKLDNNROOT := $(firstword $(RETURN_STRING))
MKLROOT := $(lastword $(RETURN_STRING))
export USE_MKLML = 1
endif

include mshadow/make/mshadow.mk
Expand Down Expand Up @@ -112,23 +112,16 @@ ifeq ($(USE_NNPACK), 1)
LDFLAGS += -lnnpack
endif

ifeq ($(USE_MKL2017), 1)
CFLAGS += -DMXNET_USE_MKL2017=1
ifeq ($(USE_MKLDNN), 1)
CFLAGS += -DMXNET_USE_MKLDNN=1
CFLAGS += -DUSE_MKL=1
CFLAGS += -I$(ROOTDIR)/src/operator/mkl/
CFLAGS += -I$(MKLML_ROOT)/include
LDFLAGS += -L$(MKLML_ROOT)/lib
ifeq ($(USE_MKL2017_EXPERIMENTAL), 1)
CFLAGS += -DMKL_EXPERIMENTAL=1
else
CFLAGS += -DMKL_EXPERIMENTAL=0
endif
ifeq ($(UNAME_S), Darwin)
LDFLAGS += -lmklml
else
LDFLAGS += -Wl,--as-needed -lmklml_intel -lmklml_gnu
CFLAGS += -I$(ROOTDIR)/src/operator/nn/mkldnn/
ifneq ($(MKLDNNROOT), $(MKLROOT))
CFLAGS += -I$(MKLROOT)/include
LDFLAGS += -L$(MKLROOT)/lib
endif
LDFLAGS += -liomp5
CFLAGS += -I$(MKLDNNROOT)/include
LDFLAGS += -L$(MKLDNNROOT)/lib -lmkldnn
endif

# verify existence of separate lapack library when using blas/openblas/atlas
Expand All @@ -138,7 +131,7 @@ endif
# - for Ubuntu, installing atlas will not automatically install the atlas provided lapack library
# silently switching lapack off instead of letting the build fail because of backward compatibility
ifeq ($(USE_LAPACK), 1)
ifeq ($(USE_BLAS),$(filter $(USE_BLAS),blas openblas atlas))
ifeq ($(USE_BLAS),$(filter $(USE_BLAS),blas openblas atlas mkl))
ifeq (,$(wildcard /lib/liblapack.a))
ifeq (,$(wildcard /usr/lib/liblapack.a))
ifeq (,$(wildcard $(USE_LAPACK_PATH)/liblapack.a))
Expand All @@ -154,7 +147,7 @@ ifeq ($(USE_LAPACK), 1)
ifneq ($(USE_LAPACK_PATH), )
LDFLAGS += -L$(USE_LAPACK_PATH)
endif
ifeq ($(USE_BLAS),$(filter $(USE_BLAS),blas openblas atlas))
ifeq ($(USE_BLAS),$(filter $(USE_BLAS),blas openblas atlas mkl))
LDFLAGS += -llapack
endif
CFLAGS += -DMXNET_USE_LAPACK
Expand Down Expand Up @@ -280,9 +273,9 @@ endif

all: lib/libmxnet.a lib/libmxnet.so $(BIN) extra-packages

SRC = $(wildcard src/*/*/*.cc src/*/*.cc src/*.cc)
SRC = $(wildcard src/*/*/*/*.cc src/*/*/*.cc src/*/*.cc src/*.cc)
OBJ = $(patsubst %.cc, build/%.o, $(SRC))
CUSRC = $(wildcard src/*/*/*.cu src/*/*.cu src/*.cu)
CUSRC = $(wildcard src/*/*/*/*.cu src/*/*/*.cu src/*/*.cu src/*.cu)
CUOBJ = $(patsubst %.cu, build/%_gpu.o, $(CUSRC))

# extra operators
Expand Down Expand Up @@ -521,7 +514,8 @@ clean: cyclean $(EXTRA_PACKAGES_CLEAN)
else
clean: cyclean testclean $(EXTRA_PACKAGES_CLEAN)
$(RM) -r build lib bin *~ */*~ */*/*~ */*/*/*~ R-package/NAMESPACE R-package/man R-package/R/mxnet_generated.R \
R-package/inst R-package/src/image_recordio.h R-package/src/*.o R-package/src/*.so mxnet_*.tar.gz
R-package/inst R-package/src/image_recordio.h R-package/src/*.o R-package/src/*.so mxnet_*.tar.gz \
external/mkldnn/install/*
cd $(DMLC_CORE); $(MAKE) clean; cd -
cd $(PS_PATH); $(MAKE) clean; cd -
cd $(NNVM_PATH); $(MAKE) clean; cd -
Expand Down
151 changes: 49 additions & 102 deletions include/mxnet/ndarray.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@
#include <map>
#include <string>
#include <memory>
#if MXNET_USE_MKLDNN == 1
#include <mkldnn.hpp>
#endif
#include "./base.h"
#include "./storage.h"
#include "./engine.h"
#if MKL_EXPERIMENTAL == 1
#include <mkl_memory.h>
#endif
// check c++11
#if DMLC_USE_CXX11 == 0
#error "cxx11 was required for ndarray module"
Expand All @@ -60,8 +60,12 @@ enum NDArrayStorageType {
kDefaultStorage, // dense
kRowSparseStorage, // row sparse
kCSRStorage, // csr
#if MXNET_USE_MKLDNN == 1
kMKLDNNStorage, // MKLDNN
#endif
};

class MKLDNNMemory;

/*!
* \brief ndarray interface
Expand All @@ -70,9 +74,6 @@ class NDArray {
public:
/*! \brief default constructor */
NDArray() {
#if MKL_EXPERIMENTAL == 1
Mkl_mem_ = MKLMemHolder::create();
#endif
}
/*!
* \brief constructs a new dynamic NDArray
Expand All @@ -86,56 +87,14 @@ class NDArray {
: ptr_(std::make_shared<Chunk>(shape, ctx, delay_alloc, dtype)),
shape_(shape), dtype_(dtype), storage_type_(kDefaultStorage),
entry_({nullptr, 0, 0}) {
#if MKL_EXPERIMENTAL == 1
Mkl_mem_ = std::make_shared<MKLMemHolder>();
#endif
}
/*! \brief constructor for NDArray with storage type
*/
NDArray(const NDArrayStorageType stype, const TShape &shape, Context ctx,
bool delay_alloc = true, int dtype = mshadow::default_type_flag,
std::vector<int> aux_types = {}, std::vector<TShape> aux_shapes = {},
TShape storage_shape = TShape(mshadow::Shape1(0)))
: shape_(shape), dtype_(dtype), storage_type_(stype),
entry_({nullptr, 0, 0}) {
// Assign default aux types if not given
if (aux_types.size() == 0) {
if (stype == kRowSparseStorage) {
aux_types = {mshadow::kInt64};
} else if (stype == kCSRStorage) {
aux_types = {mshadow::kInt64, mshadow::kInt64};
} else {
LOG(FATAL) << "Unknown storage type " << stype;
}
}
// Assign default shapes if not given
// unknown shapes are intialized as {0} such that Size() would return 0
if (aux_shapes.size() == 0) {
if (stype == kRowSparseStorage) {
aux_shapes = {TShape(mshadow::Shape1(0))};
} else if (stype == kCSRStorage) {
// aux shapes for indptr and indices
aux_shapes = {TShape(mshadow::Shape1(0)), TShape(mshadow::Shape1(0))};
} else {
LOG(FATAL) << "Unknown storage type " << stype;
}
}
if (storage_shape.Size() == 0) {
if (stype == kRowSparseStorage) {
storage_shape = shape;
storage_shape[0] = aux_shapes[rowsparse::kIdx][0];
} else if (stype == kCSRStorage) {
storage_shape = aux_shapes[csr::kIdx];
} else {
LOG(FATAL) << "Unknown storage type " << stype;
}
}
ptr_ = std::make_shared<Chunk>(stype, storage_shape, ctx, delay_alloc,
dtype, aux_types, aux_shapes);
#if MKL_EXPERIMENTAL == 1
Mkl_mem_ = std::make_shared<MKLMemHolder>();
#endif
}
TShape storage_shape = TShape(mshadow::Shape1(0)));

/*!
* \brief constructing a static NDArray that shares data with TBlob
* Use with caution: allocate ONLY ONE NDArray for each TBlob,
Expand All @@ -147,9 +106,6 @@ class NDArray {
: ptr_(std::make_shared<Chunk>(data, dev_id)), shape_(data.shape_),
dtype_(data.type_flag_), storage_type_(kDefaultStorage),
entry_({nullptr, 0, 0}) {
#if MKL_EXPERIMENTAL == 1
Mkl_mem_ = std::make_shared<MKLMemHolder>();
#endif
}

/*!
Expand All @@ -166,9 +122,6 @@ class NDArray {
const TBlob &data, const std::vector<TBlob> &aux_data, int dev_id)
: ptr_(std::make_shared<Chunk>(stype, data, aux_data, dev_id)), shape_(shape),
dtype_(data.type_flag_), storage_type_(stype), entry_({nullptr, 0, 0}) {
#if MKL_EXPERIMENTAL == 1
Mkl_mem_ = std::make_shared<MKLMemHolder>();
#endif
}


Expand Down Expand Up @@ -253,9 +206,6 @@ class NDArray {
<< "Unexpected storage type: " << stype;
res = TBlob(dptr, shape, ptr_->aux_handles[i].ctx.dev_mask(), type);
});
#if MKL_EXPERIMENTAL == 1
res.Mkl_mem_ = Mkl_mem_;
#endif
return res;
}
/*!
Expand Down Expand Up @@ -497,12 +447,6 @@ class NDArray {
CHECK_GE(ptr_->shandle.size,
shape.Size() * mshadow::mshadow_sizeof(dtype))
<< "NDArray.AsArray: target memory size is bigger";
#if MKL_EXPERIMENTAL == 1
if (Mkl_mem_ != nullptr) {
// convert prv to cpu
Mkl_mem_->check_and_prv_to_cpu(ptr_->shandle.dptr);
}
#endif
NDArray ret = *this;
ret.shape_ = shape;
ret.dtype_ = dtype;
Expand Down Expand Up @@ -574,6 +518,31 @@ class NDArray {
<< "CheckAndAllocAuxData is not intended for kDefaultStorage";
ptr_->CheckAndAllocAuxData(i, aux_shape);
}

#if MXNET_USE_MKLDNN == 1
/*
* This function returns mkldnn::memory with the default primitive_desc.
*/
std::shared_ptr<const mkldnn::memory> GetMKLDNNData() const;
/*
* This function returns mkldnn::memory with the given primitive_desc
* as long as the array size meets the required size in the given primitive_desc.
*/
std::shared_ptr<const mkldnn::memory> GetMKLDNNData(
const mkldnn::memory::primitive_desc &desc) const;
/*
* This function returns mkldnn::memory with the given primitive_desc.
* The returned mkldnn::memory will have the same physical layout as
* the given primitive_desc.
*/
std::shared_ptr<const mkldnn::memory> GetMKLDNNDataReorder(
const mkldnn::memory::primitive_desc &desc) const;

void CopyFrom(const mkldnn::memory &mem);
std::shared_ptr<mkldnn::memory> CreateMKLDNNData(
const mkldnn::memory::primitive_desc &desc);
#endif

/*!
* \brief Save list of ndarray into the Stream.x
* \param fo The stream of output.
Expand Down Expand Up @@ -608,6 +577,12 @@ class NDArray {
for csr, aux_handles[0] = indptr, aux_handles[1] = indices
*/
std::vector<Storage::Handle> aux_handles;

#if MXNET_USE_MKLDNN == 1
/*! This is created when data is stored in MKLDNN format.
*/
std::shared_ptr<mkldnn::memory> Mkl_mem_;
#endif
/*! \brief variable from engine */
Engine::VarHandle var;
/*!
Expand Down Expand Up @@ -774,20 +749,14 @@ class NDArray {
// storage shape is also updated
// if data is already allocated, try reuse the storage. Otherwise, free the current one
// and allocate new storage
inline void CheckAndAllocData(const TShape &shape, int dtype) {
CHECK_NE(aux_shapes.size(), 0) << "data is expected to be allocated after aux_data";
auto dbytes = shape.Size() * mshadow::mshadow_sizeof(dtype);
if (shandle.size < dbytes) {
// free storage if necessary and alloc again
if (shandle.size > 0) Storage::Get()->Free(shandle);
// init storage
shandle = Storage::Get()->Alloc(dbytes, ctx);
}
// init shape
storage_shape = shape;
// delay_alloc is only set when data storage handle is present
delay_alloc = false;
}
void CheckAndAllocData(const TShape &shape, int dtype);

#if MXNET_USE_MKLDNN == 1
// Have MKL memory reference to the data in the default storage
// or create memory for MKLDNN.
void SetMKLMem(const TShape &shape, int dtype);
#endif

// create storage handle for aux data based on shape
// this function assumes ctx, aux shapes and aux types are set
// aux shape is also updated
Expand Down Expand Up @@ -828,30 +797,8 @@ class NDArray {
}
}; // struct Chunk

void SetTBlob() const {
CHECK(ptr_ != nullptr);
TShape shape = shape_;
char *dptr = static_cast<char*>(ptr_->shandle.dptr);
auto stype = storage_type();
if (stype == kDefaultStorage) {
dptr += byte_offset_;
} else if (stype == kCSRStorage || stype == kRowSparseStorage) {
shape = storage_shape();
} else {
LOG(FATAL) << "unknown storage type " << stype;
}
tblob_.dptr_ = dptr;
tblob_.shape_ = shape;
tblob_.type_flag_ = dtype_;
tblob_.SetDLTensor(ptr_->shandle.ctx.dev_mask(), ptr_->shandle.ctx.dev_id);
#if MKL_EXPERIMENTAL == 1
tblob_.Mkl_mem_ = Mkl_mem_;
#endif
}
void SetTBlob() const;

#if MKL_EXPERIMENTAL == 1
std::shared_ptr<MKLMemHolder> Mkl_mem_;
#endif
/*! \brief internal data of NDArray */
std::shared_ptr<Chunk> ptr_{nullptr};
/*! \brief shape of current NDArray */
Expand Down
Loading