diff --git a/docs/static_site/src/pages/api/faq/caffe.md b/docs/static_site/src/pages/api/faq/caffe.md
index f05b9087aca4..147ffd7bc428 100644
--- a/docs/static_site/src/pages/api/faq/caffe.md
+++ b/docs/static_site/src/pages/api/faq/caffe.md
@@ -26,65 +26,11 @@ permalink: /api/faq/caffe
 
 Key topics covered include the following:
 
-- [Converting Caffe trained models to MXNet](#converting-caffe-trained-models-to-mxnet)
 - [Calling Caffe operators in MXNet](#calling-caffe-operators-in-mxnet)
 
-## Converting Caffe trained models to MXNet
-
-The converting tool is available at
-[tools/caffe_converter](https://github.com/dmlc/mxnet/tree/master/tools/caffe_converter). On
-the remaining of this section, we assume we are on the `tools/caffe_converter`
-directory.
-
-### How to build
-
-If Caffe's python package is installed, namely we can run `import caffe` in
-python, then we are ready to go.
-
-For example, we can used
-[AWS Deep Learning AMI](https://aws.amazon.com/marketplace/pp/B06VSPXKDX) with
-both Caffe and MXNet installed.
-
-Otherwise we can install the
-[Google protobuf](https://developers.google.com/protocol-buffers/?hl=en)
-compiler and its python binding. It is easier to install, but may be slower
-during running.
-
-1. Install the compiler:
-  - Linux: install `protobuf-compiler` e.g. `sudo apt-get install
-    protobuf-compiler` for Ubuntu and `sudo yum install protobuf-compiler` for
-     Redhat/Fedora.
-  - Windows: Download the win32 build of
-    [protobuf](https://github.com/google/protobuf/releases). Make sure to
-    download the version that corresponds to the version of the python binding
-    on the next step. Extract to any location then add that location to your
-    `PATH`
-  - Mac OS X: `brew install protobuf`
-
-2. Install the python binding by either `conda install -c conda-forge protobuf`
-   or `pip install protobuf`.
-
-3. Compile Caffe proto definition. Run `make` in Linux or Mac OS X, or
-   `make_win32.bat` in Windows
-
-### How to use
-
-There are three tools:
-
-- `convert_symbol.py` : convert Caffe model definition in protobuf into MXNet's
-  Symbol in JSON format.
-- `convert_model.py` : convert Caffe model parameters into MXNet's NDArray format
-- `convert_mean.py` : convert Caffe input mean file into MXNet's NDArray format
-
-In addition, there are two tools:
-- `convert_caffe_modelzoo.py` : download and convert models from Caffe model
-  zoo.
-- `test_converter.py` : test the converted models by checking the prediction
-  accuracy.
-
 ## Calling Caffe operators in MXNet
 
-Besides converting Caffe models, MXNet supports calling most Caffe operators,
+MXNet supports calling most Caffe operators,
 including network layer, data layer, and loss function, directly. It is
 particularly useful if there are customized operators implemented in Caffe, then
 we do not need to re-implement them in MXNet.
@@ -201,8 +147,3 @@ train = mx.io.CaffeDataIter(
     num_examples   = 60000,
 )
 ```
-
-### Put it all together
-
-The complete example is available at
-[example/caffe](https://github.com/dmlc/mxnet/blob/master/example/caffe/)
diff --git a/example/ssd/README.md b/example/ssd/README.md
index dcb15f4e47a8..f85b90049213 100644
--- a/example/ssd/README.md
+++ b/example/ssd/README.md
@@ -36,7 +36,6 @@ The arXiv paper is available [here](http://arxiv.org/abs/1512.02325).
 
 This example is intended for reproducing the nice detector while fully utilize the
 remarkable traits of MXNet.
-* Model [converter](#convert-caffemodel) from caffe is available now!
 * The result is almost identical to the original version. However, due to different implementation details, the results might differ slightly.
 
 Due to the permission issue, this example is maintained in this [repository](https://github.com/zhreshold/mxnet-ssd) separately. You can use the link regarding specific per example [issues](https://github.com/zhreshold/mxnet-ssd/issues).
@@ -261,19 +260,6 @@ Useful when loading python symbol is not available.
 python deploy.py --num-class 20
 ```
 
-### Convert caffe model
-Converter from caffe is available at `/path/to/incubator-mxnet/example/ssd/tools/caffe_converter`
-
-This is specifically modified to handle custom layer in caffe-ssd. Usage:
-```
-cd /path/to/incubator-mxnet/example/ssd/tools/caffe_converter
-make
-python convert_model.py deploy.prototxt name_of_pretrained_caffe_model.caffemodel ssd_converted
-# you will use this model in deploy mode without loading from python symbol(layer names inconsistent)
-python demo.py --prefix ssd_converted --epoch 1 --deploy
-```
-There is no guarantee that conversion will always work, but at least it's good for now.
-
 ### Legacy models
 Since the new interface for composing network is introduced, the old models have inconsistent names for weights.
 You can still load the previous model by rename the symbol to `legacy_xxx.py`
diff --git a/example/ssd/tools/caffe_converter/.gitignore b/example/ssd/tools/caffe_converter/.gitignore
deleted file mode 100644
index 7804af1597c2..000000000000
--- a/example/ssd/tools/caffe_converter/.gitignore
+++ /dev/null
@@ -1,5 +0,0 @@
-model/
-*.caffemodel
-*.prototxt
-*.json
-*.params
diff --git a/example/ssd/tools/caffe_converter/Makefile b/example/ssd/tools/caffe_converter/Makefile
deleted file mode 100644
index d39945b7cc33..000000000000
--- a/example/ssd/tools/caffe_converter/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-ifndef PROTOC
-DEPS_PROTOC=../../deps/bin/protoc
-ifneq ("$(wildcard $(DEPS_PROTOC))","")
-PROTOC = $(DEPS_PROTOC)
-else
-PROTOC = protoc
-endif
-endif
-
-all: caffe_pb2.py
-
-clean:
-	rm caffe_pb2.py*
-
-caffe_pb2.py:
-	$(PROTOC) --python_out=./ ./caffe.proto
diff --git a/example/ssd/tools/caffe_converter/README.md b/example/ssd/tools/caffe_converter/README.md
deleted file mode 100644
index 3d4ab13f4694..000000000000
--- a/example/ssd/tools/caffe_converter/README.md
+++ /dev/null
@@ -1,37 +0,0 @@
-<!--- Licensed to the Apache Software Foundation (ASF) under one -->
-<!--- or more contributor license agreements.  See the NOTICE file -->
-<!--- distributed with this work for additional information -->
-<!--- regarding copyright ownership.  The ASF licenses this file -->
-<!--- to you under the Apache License, Version 2.0 (the -->
-<!--- "License"); you may not use this file except in compliance -->
-<!--- with the License.  You may obtain a copy of the License at -->
-
-<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
-
-<!--- Unless required by applicable law or agreed to in writing, -->
-<!--- software distributed under the License is distributed on an -->
-<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
-<!--- KIND, either express or implied.  See the License for the -->
-<!--- specific language governing permissions and limitations -->
-<!--- under the License. -->
-
-# Convert Caffe Model to Mxnet Format
-
-This folder contains the source codes for this tool.
-
-If Caffe with python binding is installed, we can use the following command to
-convert a Resnet-50 pretrained model.
-
-```bash
-python convert_caffe_modelzoo.py resnet-50
-```
-
-Please refer to
-[docs/faq/caffe.md](../../docs/faq/caffe.md) for more details.
-
-### How to use
-To convert ssd caffemodels, Use: `python convert_model.py prototxt caffemodel outputprefix`
-
-### Note
-
-Use this converter for ssd caffemodels only. General converter is available in `mxnet/tools/caffe_converter`.
diff --git a/example/ssd/tools/caffe_converter/caffe.proto b/example/ssd/tools/caffe_converter/caffe.proto
deleted file mode 100644
index 7dfe073fb75c..000000000000
--- a/example/ssd/tools/caffe_converter/caffe.proto
+++ /dev/null
@@ -1,1938 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-syntax = "proto2";
-
-package caffe;
-
-// Specifies the shape (dimensions) of a Blob.
-message BlobShape {
-  repeated int64 dim = 1 [packed = true];
-}
-
-message BlobProto {
-  optional BlobShape shape = 7;
-  repeated float data = 5 [packed = true];
-  repeated float diff = 6 [packed = true];
-  repeated double double_data = 8 [packed = true];
-  repeated double double_diff = 9 [packed = true];
-
-  // 4D dimensions -- deprecated.  Use "shape" instead.
-  optional int32 num = 1 [default = 0];
-  optional int32 channels = 2 [default = 0];
-  optional int32 height = 3 [default = 0];
-  optional int32 width = 4 [default = 0];
-}
-
-// The BlobProtoVector is simply a way to pass multiple blobproto instances
-// around.
-message BlobProtoVector {
-  repeated BlobProto blobs = 1;
-}
-
-message Datum {
-  optional int32 channels = 1;
-  optional int32 height = 2;
-  optional int32 width = 3;
-  // the actual image data, in bytes
-  optional bytes data = 4;
-  optional int32 label = 5;
-  // Optionally, the datum could also hold float data.
-  repeated float float_data = 6;
-  // If true data contains an encoded image that need to be decoded
-  optional bool encoded = 7 [default = false];
-}
-
-// The label (display) name and label id.
-message LabelMapItem {
-  // Both name and label are required.
-  optional string name = 1;
-  optional int32 label = 2;
-  // display_name is optional.
-  optional string display_name = 3;
-}
-
-message LabelMap {
-  repeated LabelMapItem item = 1;
-}
-
-// Sample a bbox in the normalized space [0, 1] with provided constraints.
-message Sampler {
-  // Minimum scale of the sampled bbox.
-  optional float min_scale = 1 [default = 1.];
-  // Maximum scale of the sampled bbox.
-  optional float max_scale = 2 [default = 1.];
-
-  // Minimum aspect ratio of the sampled bbox.
-  optional float min_aspect_ratio = 3 [default = 1.];
-  // Maximum aspect ratio of the sampled bbox.
-  optional float max_aspect_ratio = 4 [default = 1.];
-}
-
-// Constraints for selecting sampled bbox.
-message SampleConstraint {
-  // Minimum Jaccard overlap between sampled bbox and all bboxes in
-  // AnnotationGroup.
-  optional float min_jaccard_overlap = 1;
-  // Maximum Jaccard overlap between sampled bbox and all bboxes in
-  // AnnotationGroup.
-  optional float max_jaccard_overlap = 2;
-
-  // Minimum coverage of sampled bbox by all bboxes in AnnotationGroup.
-  optional float min_sample_coverage = 3;
-  // Maximum coverage of sampled bbox by all bboxes in AnnotationGroup.
-  optional float max_sample_coverage = 4;
-
-  // Minimum coverage of all bboxes in AnnotationGroup by sampled bbox.
-  optional float min_object_coverage = 5;
-  // Maximum coverage of all bboxes in AnnotationGroup by sampled bbox.
-  optional float max_object_coverage = 6;
-}
-
-// Sample a batch of bboxes with provided constraints.
-message BatchSampler {
-  // Use original image as the source for sampling.
-  optional bool use_original_image = 1 [default = true];
-
-  // Constraints for sampling bbox.
-  optional Sampler sampler = 2;
-
-  // Constraints for determining if a sampled bbox is positive or negative.
-  optional SampleConstraint sample_constraint = 3;
-
-  // If provided, break when found certain number of samples satisfing the
-  // sample_constraint.
-  optional uint32 max_sample = 4;
-
-  // Maximum number of trials for sampling to avoid infinite loop.
-  optional uint32 max_trials = 5 [default = 100];
-}
-
-// Condition for emitting annotations.
-message EmitConstraint {
-  enum EmitType {
-    CENTER = 0;
-    MIN_OVERLAP = 1;
-  }
-  optional EmitType emit_type = 1 [default = CENTER];
-  // If emit_type is MIN_OVERLAP, provide the emit_overlap.
-  optional float emit_overlap = 2;
-}
-
-// The normalized bounding box [0, 1] w.r.t. the input image size.
-message NormalizedBBox {
-  optional float xmin = 1;
-  optional float ymin = 2;
-  optional float xmax = 3;
-  optional float ymax = 4;
-  optional int32 label = 5;
-  optional bool difficult = 6;
-  optional float score = 7;
-  optional float size = 8;
-}
-
-// Annotation for each object instance.
-message Annotation {
-  optional int32 instance_id = 1 [default = 0];
-  optional NormalizedBBox bbox = 2;
-}
-
-// Group of annotations for a particular label.
-message AnnotationGroup {
-  optional int32 group_label = 1;
-  repeated Annotation annotation = 2;
-}
-
-// An extension of Datum which contains "rich" annotations.
-message AnnotatedDatum {
-  enum AnnotationType {
-    BBOX = 0;
-  }
-  optional Datum datum = 1;
-  // If there are "rich" annotations, specify the type of annotation.
-  // Currently it only supports bounding box.
-  // If there are no "rich" annotations, use label in datum instead.
-  optional AnnotationType type = 2;
-  // Each group contains annotation for a particular class.
-  repeated AnnotationGroup annotation_group = 3;
-}
-
-message FillerParameter {
-  // The filler type.
-  optional string type = 1 [default = 'constant'];
-  optional float value = 2 [default = 0]; // the value in constant filler
-  optional float min = 3 [default = 0]; // the min value in uniform filler
-  optional float max = 4 [default = 1]; // the max value in uniform filler
-  optional float mean = 5 [default = 0]; // the mean value in Gaussian filler
-  optional float std = 6 [default = 1]; // the std value in Gaussian filler
-  // The expected number of non-zero output weights for a given input in
-  // Gaussian filler -- the default -1 means don't perform sparsification.
-  optional int32 sparse = 7 [default = -1];
-  // Normalize the filler variance by fan_in, fan_out, or their average.
-  // Applies to 'xavier' and 'msra' fillers.
-  enum VarianceNorm {
-    FAN_IN = 0;
-    FAN_OUT = 1;
-    AVERAGE = 2;
-  }
-  optional VarianceNorm variance_norm = 8 [default = FAN_IN];
-}
-
-message NetParameter {
-  optional string name = 1; // consider giving the network a name
-  // DEPRECATED. See InputParameter. The input blobs to the network.
-  repeated string input = 3;
-  // DEPRECATED. See InputParameter. The shape of the input blobs.
-  repeated BlobShape input_shape = 8;
-
-  // 4D input dimensions -- deprecated.  Use "input_shape" instead.
-  // If specified, for each input blob there should be four
-  // values specifying the num, channels, height and width of the input blob.
-  // Thus, there should be a total of (4 * #input) numbers.
-  repeated int32 input_dim = 4;
-
-  // Whether the network will force every layer to carry out backward operation.
-  // If set False, then whether to carry out backward is determined
-  // automatically according to the net structure and learning rates.
-  optional bool force_backward = 5 [default = false];
-  // The current "state" of the network, including the phase, level, and stage.
-  // Some layers may be included/excluded depending on this state and the states
-  // specified in the layers' include and exclude fields.
-  optional NetState state = 6;
-
-  // Print debugging information about results while running Net::Forward,
-  // Net::Backward, and Net::Update.
-  optional bool debug_info = 7 [default = false];
-
-  // The layers that make up the net.  Each of their configurations, including
-  // connectivity and behavior, is specified as a LayerParameter.
-  repeated LayerParameter layer = 100;  // ID 100 so layers are printed last.
-
-  // DEPRECATED: use 'layer' instead.
-  repeated V1LayerParameter layers = 2;
-}
-
-// NOTE
-// Update the next available ID when you add a new SolverParameter field.
-//
-// SolverParameter next available ID: 44 (last added: plateau_winsize)
-message SolverParameter {
-  //////////////////////////////////////////////////////////////////////////////
-  // Specifying the train and test networks
-  //
-  // Exactly one train net must be specified using one of the following fields:
-  //     train_net_param, train_net, net_param, net
-  // One or more test nets may be specified using any of the following fields:
-  //     test_net_param, test_net, net_param, net
-  // If more than one test net field is specified (e.g., both net and
-  // test_net are specified), they will be evaluated in the field order given
-  // above: (1) test_net_param, (2) test_net, (3) net_param/net.
-  // A test_iter must be specified for each test_net.
-  // A test_level and/or a test_stage may also be specified for each test_net.
-  //////////////////////////////////////////////////////////////////////////////
-
-  // Proto filename for the train net, possibly combined with one or more
-  // test nets.
-  optional string net = 24;
-  // Inline train net param, possibly combined with one or more test nets.
-  optional NetParameter net_param = 25;
-
-  optional string train_net = 1; // Proto filename for the train net.
-  repeated string test_net = 2; // Proto filenames for the test nets.
-  optional NetParameter train_net_param = 21; // Inline train net params.
-  repeated NetParameter test_net_param = 22; // Inline test net params.
-
-  // The states for the train/test nets. Must be unspecified or
-  // specified once per net.
-  //
-  // By default, all states will have solver = true;
-  // train_state will have phase = TRAIN,
-  // and all test_state's will have phase = TEST.
-  // Other defaults are set according to the NetState defaults.
-  optional NetState train_state = 26;
-  repeated NetState test_state = 27;
-
-  // Evaluation type.
-  optional string eval_type = 41 [default = "classification"];
-  // ap_version: different ways of computing Average Precision.
-  //    Check https://sanchom.wordpress.com/tag/average-precision/ for details.
-  //    11point: the 11-point interpolated average precision. Used in VOC2007.
-  //    MaxIntegral: maximally interpolated AP. Used in VOC2012/ILSVRC.
-  //    Integral: the natural integral of the precision-recall curve.
-  optional string ap_version = 42 [default = "Integral"];
-  // If true, display per class result.
-  optional bool show_per_class_result = 44 [default = false];
-
-  // The number of iterations for each test net.
-  repeated int32 test_iter = 3;
-
-  // The number of iterations between two testing phases.
-  optional int32 test_interval = 4 [default = 0];
-  optional bool test_compute_loss = 19 [default = false];
-  // If true, run an initial test pass before the first iteration,
-  // ensuring memory availability and printing the starting value of the loss.
-  optional bool test_initialization = 32 [default = true];
-  optional float base_lr = 5; // The base learning rate
-  // the number of iterations between displaying info. If display = 0, no info
-  // will be displayed.
-  optional int32 display = 6;
-  // Display the loss averaged over the last average_loss iterations
-  optional int32 average_loss = 33 [default = 1];
-  optional int32 max_iter = 7; // the maximum number of iterations
-  // accumulate gradients over `iter_size` x `batch_size` instances
-  optional int32 iter_size = 36 [default = 1];
-
-  // The learning rate decay policy. The currently implemented learning rate
-  // policies are as follows:
-  //    - fixed: always return base_lr.
-  //    - step: return base_lr * gamma ^ (floor(iter / step))
-  //    - exp: return base_lr * gamma ^ iter
-  //    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
-  //    - multistep: similar to step but it allows non uniform steps defined by
-  //      stepvalue
-  //    - poly: the effective learning rate follows a polynomial decay, to be
-  //      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
-  //    - sigmoid: the effective learning rate follows a sigmod decay
-  //      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
-  //    - plateau: decreases lr
-  //              if the minimum loss isn't updated for 'plateau_winsize' iters
-  //
-  // where base_lr, max_iter, gamma, step, stepvalue and power are defined
-  // in the solver parameter protocol buffer, and iter is the current iteration.
-  optional string lr_policy = 8;
-  optional float gamma = 9; // The parameter to compute the learning rate.
-  optional float power = 10; // The parameter to compute the learning rate.
-  optional float momentum = 11; // The momentum value.
-  optional float weight_decay = 12; // The weight decay.
-  // regularization types supported: L1 and L2
-  // controlled by weight_decay
-  optional string regularization_type = 29 [default = "L2"];
-  // the stepsize for learning rate policy "step"
-  optional int32 stepsize = 13;
-  // the stepsize for learning rate policy "multistep"
-  repeated int32 stepvalue = 34;
-  // the stepsize for learning rate policy "plateau"
-  repeated int32 plateau_winsize = 43;
-
-  // Set clip_gradients to >= 0 to clip parameter gradients to that L2 norm,
-  // whenever their actual L2 norm is larger.
-  optional float clip_gradients = 35 [default = -1];
-
-  optional int32 snapshot = 14 [default = 0]; // The snapshot interval
-  optional string snapshot_prefix = 15; // The prefix for the snapshot.
-  // whether to snapshot diff in the results or not. Snapshotting diff will help
-  // debugging but the final protocol buffer size will be much larger.
-  optional bool snapshot_diff = 16 [default = false];
-  enum SnapshotFormat {
-    HDF5 = 0;
-    BINARYPROTO = 1;
-  }
-  optional SnapshotFormat snapshot_format = 37 [default = BINARYPROTO];
-  // the mode solver will use: 0 for CPU and 1 for GPU. Use GPU in default.
-  enum SolverMode {
-    CPU = 0;
-    GPU = 1;
-  }
-  optional SolverMode solver_mode = 17 [default = GPU];
-  // the device_id will that be used in GPU mode. Use device_id = 0 in default.
-  optional int32 device_id = 18 [default = 0];
-  // If non-negative, the seed with which the Solver will initialize the Caffe
-  // random number generator -- useful for reproducible results. Otherwise,
-  // (and by default) initialize using a seed derived from the system clock.
-  optional int64 random_seed = 20 [default = -1];
-
-  // type of the solver
-  optional string type = 40 [default = "SGD"];
-
-  // numerical stability for RMSProp, AdaGrad and AdaDelta and Adam
-  optional float delta = 31 [default = 1e-8];
-  // parameters for the Adam solver
-  optional float momentum2 = 39 [default = 0.999];
-
-  // RMSProp decay value
-  // MeanSquare(t) = rms_decay*MeanSquare(t-1) + (1-rms_decay)*SquareGradient(t)
-  optional float rms_decay = 38 [default = 0.99];
-
-  // If true, print information about the state of the net that may help with
-  // debugging learning problems.
-  optional bool debug_info = 23 [default = false];
-
-  // If false, don't save a snapshot after training finishes.
-  optional bool snapshot_after_train = 28 [default = true];
-
-  // DEPRECATED: old solver enum types, use string instead
-  enum SolverType {
-    SGD = 0;
-    NESTEROV = 1;
-    ADAGRAD = 2;
-    RMSPROP = 3;
-    ADADELTA = 4;
-    ADAM = 5;
-  }
-  // DEPRECATED: use type instead of solver_type
-  optional SolverType solver_type = 30 [default = SGD];
-}
-
-// A message that stores the solver snapshots
-message SolverState {
-  optional int32 iter = 1; // The current iteration
-  optional string learned_net = 2; // The file that stores the learned net.
-  repeated BlobProto history = 3; // The history for sgd solvers
-  optional int32 current_step = 4 [default = 0]; // The current step for learning rate
-  optional float minimum_loss = 5 [default = 1E38]; // Historical minimum loss
-  optional int32 iter_last_event = 6 [default = 0]; // The iteration when last lr-update or min_loss-update happend
-}
-
-enum Phase {
-   TRAIN = 0;
-   TEST = 1;
-}
-
-message NetState {
-  optional Phase phase = 1 [default = TEST];
-  optional int32 level = 2 [default = 0];
-  repeated string stage = 3;
-}
-
-message NetStateRule {
-  // Set phase to require the NetState have a particular phase (TRAIN or TEST)
-  // to meet this rule.
-  optional Phase phase = 1;
-
-  // Set the minimum and/or maximum levels in which the layer should be used.
-  // Leave undefined to meet the rule regardless of level.
-  optional int32 min_level = 2;
-  optional int32 max_level = 3;
-
-  // Customizable sets of stages to include or exclude.
-  // The net must have ALL of the specified stages and NONE of the specified
-  // "not_stage"s to meet the rule.
-  // (Use multiple NetStateRules to specify conjunctions of stages.)
-  repeated string stage = 4;
-  repeated string not_stage = 5;
-}
-
-// Specifies training parameters (multipliers on global learning constants,
-// and the name and other settings used for weight sharing).
-message ParamSpec {
-  // The names of the parameter blobs -- useful for sharing parameters among
-  // layers, but never required otherwise.  To share a parameter between two
-  // layers, give it a (non-empty) name.
-  optional string name = 1;
-
-  // Whether to require shared weights to have the same shape, or just the same
-  // count -- defaults to STRICT if unspecified.
-  optional DimCheckMode share_mode = 2;
-  enum DimCheckMode {
-    // STRICT (default) requires that num, channels, height, width each match.
-    STRICT = 0;
-    // PERMISSIVE requires only the count (num*channels*height*width) to match.
-    PERMISSIVE = 1;
-  }
-
-  // The multiplier on the global learning rate for this parameter.
-  optional float lr_mult = 3 [default = 1.0];
-
-  // The multiplier on the global weight decay for this parameter.
-  optional float decay_mult = 4 [default = 1.0];
-}
-
-// NOTE
-// Update the next available ID when you add a new LayerParameter field.
-//
-// LayerParameter next available layer-specific ID: 147 (last added: recurrent_param)
-message LayerParameter {
-  optional string name = 1; // the layer name
-  optional string type = 2; // the layer type
-  repeated string bottom = 3; // the name of each bottom blob
-  repeated string top = 4; // the name of each top blob
-
-  // The train / test phase for computation.
-  optional Phase phase = 10;
-
-  // The amount of weight to assign each top blob in the objective.
-  // Each layer assigns a default value, usually of either 0 or 1,
-  // to each top blob.
-  repeated float loss_weight = 5;
-
-  // Specifies training parameters (multipliers on global learning constants,
-  // and the name and other settings used for weight sharing).
-  repeated ParamSpec param = 6;
-
-  // The blobs containing the numeric parameters of the layer.
-  repeated BlobProto blobs = 7;
-
-  // Specifies whether to backpropagate to each bottom. If unspecified,
-  // Caffe will automatically infer whether each input needs backpropagation
-  // to compute parameter gradients. If set to true for some inputs,
-  // backpropagation to those inputs is forced; if set false for some inputs,
-  // backpropagation to those inputs is skipped.
-  //
-  // The size must be either 0 or equal to the number of bottoms.
-  repeated bool propagate_down = 11;
-
-  // Rules controlling whether and when a layer is included in the network,
-  // based on the current NetState.  You may specify a non-zero number of rules
-  // to include OR exclude, but not both.  If no include or exclude rules are
-  // specified, the layer is always included.  If the current NetState meets
-  // ANY (i.e., one or more) of the specified rules, the layer is
-  // included/excluded.
-  repeated NetStateRule include = 8;
-  repeated NetStateRule exclude = 9;
-
-  // Parameters for data pre-processing.
-  optional TransformationParameter transform_param = 100;
-
-  // Parameters shared by loss layers.
-  optional LossParameter loss_param = 101;
-
-  // Layer type-specific parameters.
-  //
-  // Note: certain layers may have more than one computational engine
-  // for their implementation. These layers include an Engine type and
-  // engine parameter for selecting the implementation.
-  // The default for the engine is set by the ENGINE switch at compile-time.
-  optional AccuracyParameter accuracy_param = 102;
-  optional AnnotatedDataParameter annotated_data_param = 200;
-  optional ArgMaxParameter argmax_param = 103;
-  optional BatchNormParameter batch_norm_param = 139;
-  optional BiasParameter bias_param = 141;
-  optional ConcatParameter concat_param = 104;
-  optional ContrastiveLossParameter contrastive_loss_param = 105;
-  optional ConvolutionParameter convolution_param = 106;
-  optional CropParameter crop_param = 144;
-  optional DataParameter data_param = 107;
-  optional DetectionEvaluateParameter detection_evaluate_param = 205;
-  optional DetectionOutputParameter detection_output_param = 204;
-  optional DropoutParameter dropout_param = 108;
-  optional DummyDataParameter dummy_data_param = 109;
-  optional EltwiseParameter eltwise_param = 110;
-  optional ELUParameter elu_param = 140;
-  optional EmbedParameter embed_param = 137;
-  optional ExpParameter exp_param = 111;
-  optional FlattenParameter flatten_param = 135;
-  optional HDF5DataParameter hdf5_data_param = 112;
-  optional HDF5OutputParameter hdf5_output_param = 113;
-  optional HingeLossParameter hinge_loss_param = 114;
-  optional ImageDataParameter image_data_param = 115;
-  optional InfogainLossParameter infogain_loss_param = 116;
-  optional InnerProductParameter inner_product_param = 117;
-  optional InputParameter input_param = 143;
-  optional LogParameter log_param = 134;
-  optional LRNParameter lrn_param = 118;
-  optional MemoryDataParameter memory_data_param = 119;
-  optional MultiBoxLossParameter multibox_loss_param = 201;
-  optional MVNParameter mvn_param = 120;
-  optional NormalizeParameter norm_param = 206;
-  optional ParameterParameter parameter_param = 145;
-  optional PermuteParameter permute_param = 202;
-  optional PoolingParameter pooling_param = 121;
-  optional PowerParameter power_param = 122;
-  optional PReLUParameter prelu_param = 131;
-  optional PriorBoxParameter prior_box_param = 203;
-  optional PythonParameter python_param = 130;
-  optional RecurrentParameter recurrent_param = 146;
-  optional ReductionParameter reduction_param = 136;
-  optional ReLUParameter relu_param = 123;
-  optional ReshapeParameter reshape_param = 133;
-  optional ScaleParameter scale_param = 142;
-  optional SigmoidParameter sigmoid_param = 124;
-  optional SoftmaxParameter softmax_param = 125;
-  optional SPPParameter spp_param = 132;
-  optional SliceParameter slice_param = 126;
-  optional TanHParameter tanh_param = 127;
-  optional ThresholdParameter threshold_param = 128;
-  optional TileParameter tile_param = 138;
-  optional VideoDataParameter video_data_param = 207;
-  optional WindowDataParameter window_data_param = 129;
-}
-
-// Message that stores parameters used to apply transformation
-// to the data layer's data
-message TransformationParameter {
-  // For data pre-processing, we can do simple scaling and subtracting the
-  // data mean, if provided. Note that the mean subtraction is always carried
-  // out before scaling.
-  optional float scale = 1 [default = 1];
-  // Specify if we want to randomly mirror data.
-  optional bool mirror = 2 [default = false];
-  // Specify if we would like to randomly crop an image.
-  optional uint32 crop_size = 3 [default = 0];
-  optional uint32 crop_h = 11 [default = 0];
-  optional uint32 crop_w = 12 [default = 0];
-
-  // mean_file and mean_value cannot be specified at the same time
-  optional string mean_file = 4;
-  // if specified can be repeated once (would subtract it from all the channels)
-  // or can be repeated the same number of times as channels
-  // (would subtract them from the corresponding channel)
-  repeated float mean_value = 5;
-  // Force the decoded image to have 3 color channels.
-  optional bool force_color = 6 [default = false];
-  // Force the decoded image to have 1 color channels.
-  optional bool force_gray = 7 [default = false];
-  // Resize policy
-  optional ResizeParameter resize_param = 8;
-  // Noise policy
-  optional NoiseParameter noise_param = 9;
-  // Distortion policy
-  optional DistortionParameter distort_param = 13;
-  // Expand policy
-  optional ExpansionParameter expand_param = 14;
-  // Constraint for emitting the annotation after transformation.
-  optional EmitConstraint emit_constraint = 10;
-}
-
-// Message that stores parameters used by data transformer for resize policy
-message ResizeParameter {
-  //Probability of using this resize policy
-  optional float prob = 1 [default = 1];
-
-  enum Resize_mode {
-    WARP = 1;
-    FIT_SMALL_SIZE = 2;
-    FIT_LARGE_SIZE_AND_PAD = 3;
-  }
-  optional Resize_mode resize_mode = 2 [default = WARP];
-  optional uint32 height = 3 [default = 0];
-  optional uint32 width = 4 [default = 0];
-  // A parameter used to update bbox in FIT_SMALL_SIZE mode.
-  optional uint32 height_scale = 8 [default = 0];
-  optional uint32 width_scale = 9 [default = 0];
-
-  enum Pad_mode {
-    CONSTANT = 1;
-    MIRRORED = 2;
-    REPEAT_NEAREST = 3;
-  }
-  // Padding mode for BE_SMALL_SIZE_AND_PAD mode and object centering
-  optional Pad_mode pad_mode = 5 [default = CONSTANT];
-  // if specified can be repeated once (would fill all the channels)
-  // or can be repeated the same number of times as channels
-  // (would use it them to the corresponding channel)
-  repeated float pad_value = 6;
-
-  enum Interp_mode { //Same as in OpenCV
-    LINEAR = 1;
-    AREA = 2;
-    NEAREST = 3;
-    CUBIC = 4;
-    LANCZOS4 = 5;
-  }
-  //interpolation for for resizing
-  repeated Interp_mode interp_mode = 7;
-}
-
-message SaltPepperParameter {
-  //Percentage of pixels
-  optional float fraction = 1 [default = 0];
-  repeated float value = 2;
-}
-
-// Message that stores parameters used by data transformer for transformation
-// policy
-message NoiseParameter {
-  //Probability of using this resize policy
-  optional float prob = 1 [default = 0];
-  // Histogram equalized
-  optional bool hist_eq = 2 [default = false];
-  // Color inversion
-  optional bool inverse = 3 [default = false];
-  // Grayscale
-  optional bool decolorize = 4 [default = false];
-  // Gaussian blur
-  optional bool gauss_blur = 5 [default = false];
-
-  // JPEG compression quality (-1 = no compression)
-  optional float jpeg = 6 [default = -1];
-
-  // Posterization
-  optional bool posterize = 7 [default = false];
-
-  // Erosion
-  optional bool erode = 8 [default = false];
-
-  // Salt-and-pepper noise
-  optional bool saltpepper = 9 [default = false];
-
-  optional SaltPepperParameter saltpepper_param = 10;
-
-  // Local histogram equalization
-  optional bool clahe = 11 [default = false];
-
-  // Color space conversion
-  optional bool convert_to_hsv = 12 [default = false];
-
-  // Color space conversion
-  optional bool convert_to_lab = 13 [default = false];
-}
-
-// Message that stores parameters used by data transformer for distortion policy
-message DistortionParameter {
-  // The probability of adjusting brightness.
-  optional float brightness_prob = 1 [default = 0.0];
-  // Amount to add to the pixel values within [-delta, delta].
-  // The possible value is within [0, 255]. Recommend 32.
-  optional float brightness_delta = 2 [default = 0.0];
-
-  // The probability of adjusting contrast.
-  optional float contrast_prob = 3 [default = 0.0];
-  // Lower bound for random contrast factor. Recommend 0.5.
-  optional float contrast_lower = 4 [default = 0.0];
-  // Upper bound for random contrast factor. Recommend 1.5.
-  optional float contrast_upper = 5 [default = 0.0];
-
-  // The probability of adjusting hue.
-  optional float hue_prob = 6 [default = 0.0];
-  // Amount to add to the hue channel within [-delta, delta].
-  // The possible value is within [0, 180]. Recommend 36.
-  optional float hue_delta = 7 [default = 0.0];
-
-  // The probability of adjusting saturation.
-  optional float saturation_prob = 8 [default = 0.0];
-  // Lower bound for the random saturation factor. Recommend 0.5.
-  optional float saturation_lower = 9 [default = 0.0];
-  // Upper bound for the random saturation factor. Recommend 1.5.
-  optional float saturation_upper = 10 [default = 0.0];
-
-  // The probability of randomly order the image channels.
-  optional float random_order_prob = 11 [default = 0.0];
-}
-
-// Message that stores parameters used by data transformer for expansion policy
-message ExpansionParameter {
-  //Probability of using this expansion policy
-  optional float prob = 1 [default = 1];
-
-  // The ratio to expand the image.
-  optional float max_expand_ratio = 2 [default = 1.];
-}
-
-// Message that stores parameters shared by loss layers
-message LossParameter {
-  // If specified, ignore instances with the given label.
-  optional int32 ignore_label = 1;
-  // How to normalize the loss for loss layers that aggregate across batches,
-  // spatial dimensions, or other dimensions.  Currently only implemented in
-  // SoftmaxWithLoss and SigmoidCrossEntropyLoss layers.
-  enum NormalizationMode {
-    // Divide by the number of examples in the batch times spatial dimensions.
-    // Outputs that receive the ignore label will NOT be ignored in computing
-    // the normalization factor.
-    FULL = 0;
-    // Divide by the total number of output locations that do not take the
-    // ignore_label.  If ignore_label is not set, this behaves like FULL.
-    VALID = 1;
-    // Divide by the batch size.
-    BATCH_SIZE = 2;
-    // Do not normalize the loss.
-    NONE = 3;
-  }
-  // For historical reasons, the default normalization for
-  // SigmoidCrossEntropyLoss is BATCH_SIZE and *not* VALID.
-  optional NormalizationMode normalization = 3 [default = VALID];
-  // Deprecated.  Ignored if normalization is specified.  If normalization
-  // is not specified, then setting this to false will be equivalent to
-  // normalization = BATCH_SIZE to be consistent with previous behavior.
-  optional bool normalize = 2;
-}
-
-// Messages that store parameters used by individual layer types follow, in
-// alphabetical order.
-
-message AccuracyParameter {
-  // When computing accuracy, count as correct by comparing the true label to
-  // the top k scoring classes.  By default, only compare to the top scoring
-  // class (i.e. argmax).
-  optional uint32 top_k = 1 [default = 1];
-
-  // The "label" axis of the prediction blob, whose argmax corresponds to the
-  // predicted label -- may be negative to index from the end (e.g., -1 for the
-  // last axis).  For example, if axis == 1 and the predictions are
-  // (N x C x H x W), the label blob is expected to contain N*H*W ground truth
-  // labels with integer values in {0, 1, ..., C-1}.
-  optional int32 axis = 2 [default = 1];
-
-  // If specified, ignore instances with the given label.
-  optional int32 ignore_label = 3;
-}
-
-message AnnotatedDataParameter {
-  // Define the sampler.
-  repeated BatchSampler batch_sampler = 1;
-  // Store label name and label id in LabelMap format.
-  optional string label_map_file = 2;
-  // If provided, it will replace the AnnotationType stored in each
-  // AnnotatedDatum.
-  optional AnnotatedDatum.AnnotationType anno_type = 3;
-}
-
-message ArgMaxParameter {
-  // If true produce pairs (argmax, maxval)
-  optional bool out_max_val = 1 [default = false];
-  optional uint32 top_k = 2 [default = 1];
-  // The axis along which to maximise -- may be negative to index from the
-  // end (e.g., -1 for the last axis).
-  // By default ArgMaxLayer maximizes over the flattened trailing dimensions
-  // for each index of the first / num dimension.
-  optional int32 axis = 3;
-}
-
-message ConcatParameter {
-  // The axis along which to concatenate -- may be negative to index from the
-  // end (e.g., -1 for the last axis).  Other axes must have the
-  // same dimension for all the bottom blobs.
-  // By default, ConcatLayer concatenates blobs along the "channels" axis (1).
-  optional int32 axis = 2 [default = 1];
-
-  // DEPRECATED: alias for "axis" -- does not support negative indexing.
-  optional uint32 concat_dim = 1 [default = 1];
-}
-
-message BatchNormParameter {
-  // If false, accumulate global mean/variance values via a moving average. If
-  // true, use those accumulated values instead of computing mean/variance
-  // across the batch.
-  optional bool use_global_stats = 1;
-  // How much does the moving average decay each iteration?
-  optional float moving_average_fraction = 2 [default = .999];
-  // Small value to add to the variance estimate so that we don't divide by
-  // zero.
-  optional float eps = 3 [default = 1e-5];
-}
-
-message BiasParameter {
-  // The first axis of bottom[0] (the first input Blob) along which to apply
-  // bottom[1] (the second input Blob).  May be negative to index from the end
-  // (e.g., -1 for the last axis).
-  //
-  // For example, if bottom[0] is 4D with shape 100x3x40x60, the output
-  // top[0] will have the same shape, and bottom[1] may have any of the
-  // following shapes (for the given value of axis):
-  //    (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
-  //    (axis == 1 == -3)          3;     3x40;     3x40x60
-  //    (axis == 2 == -2)                   40;       40x60
-  //    (axis == 3 == -1)                                60
-  // Furthermore, bottom[1] may have the empty shape (regardless of the value of
-  // "axis") -- a scalar bias.
-  optional int32 axis = 1 [default = 1];
-
-  // (num_axes is ignored unless just one bottom is given and the bias is
-  // a learned parameter of the layer.  Otherwise, num_axes is determined by the
-  // number of axes by the second bottom.)
-  // The number of axes of the input (bottom[0]) covered by the bias
-  // parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
-  // Set num_axes := 0, to add a zero-axis Blob: a scalar.
-  optional int32 num_axes = 2 [default = 1];
-
-  // (filler is ignored unless just one bottom is given and the bias is
-  // a learned parameter of the layer.)
-  // The initialization for the learned bias parameter.
-  // Default is the zero (0) initialization, resulting in the BiasLayer
-  // initially performing the identity operation.
-  optional FillerParameter filler = 3;
-}
-
-message ContrastiveLossParameter {
-  // margin for dissimilar pair
-  optional float margin = 1 [default = 1.0];
-  // The first implementation of this cost did not exactly match the cost of
-  // Hadsell et al 2006 -- using (margin - d^2) instead of (margin - d)^2.
-  // legacy_version = false (the default) uses (margin - d)^2 as proposed in the
-  // Hadsell paper. New models should probably use this version.
-  // legacy_version = true uses (margin - d^2). This is kept to support /
-  // reproduce existing models and results
-  optional bool legacy_version = 2 [default = false];
-}
-
-message ConvolutionParameter {
-  optional uint32 num_output = 1; // The number of outputs for the layer
-  optional bool bias_term = 2 [default = true]; // whether to have bias terms
-
-  // Pad, kernel size, and stride are all given as a single value for equal
-  // dimensions in all spatial dimensions, or once per spatial dimension.
-  repeated uint32 pad = 3; // The padding size; defaults to 0
-  repeated uint32 kernel_size = 4; // The kernel size
-  repeated uint32 stride = 6; // The stride; defaults to 1
-  // Factor used to dilate the kernel, (implicitly) zero-filling the resulting
-  // holes. (Kernel dilation is sometimes referred to by its use in the
-  // algorithme à trous from Holschneider et al. 1987.)
-  repeated uint32 dilation = 18; // The dilation; defaults to 1
-
-  // For 2D convolution only, the *_h and *_w versions may also be used to
-  // specify both spatial dimensions.
-  optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
-  optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
-  optional uint32 kernel_h = 11; // The kernel height (2D only)
-  optional uint32 kernel_w = 12; // The kernel width (2D only)
-  optional uint32 stride_h = 13; // The stride height (2D only)
-  optional uint32 stride_w = 14; // The stride width (2D only)
-
-  optional uint32 group = 5 [default = 1]; // The group size for group conv
-
-  optional FillerParameter weight_filler = 7; // The filler for the weight
-  optional FillerParameter bias_filler = 8; // The filler for the bias
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 15 [default = DEFAULT];
-
-  // The axis to interpret as "channels" when performing convolution.
-  // Preceding dimensions are treated as independent inputs;
-  // succeeding dimensions are treated as "spatial".
-  // With (N, C, H, W) inputs, and axis == 1 (the default), we perform
-  // N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
-  // groups g>1) filters across the spatial axes (H, W) of the input.
-  // With (N, C, D, H, W) inputs, and axis == 1, we perform
-  // N independent 3D convolutions, sliding (C/g)-channels
-  // filters across the spatial axes (D, H, W) of the input.
-  optional int32 axis = 16 [default = 1];
-
-  // Whether to force use of the general ND convolution, even if a specific
-  // implementation for blobs of the appropriate number of spatial dimensions
-  // is available. (Currently, there is only a 2D-specific convolution
-  // implementation; for input blobs with num_axes != 2, this option is
-  // ignored and the ND implementation will be used.)
-  optional bool force_nd_im2col = 17 [default = false];
-}
-
-message CropParameter {
-  // To crop, elements of the first bottom are selected to fit the dimensions
-  // of the second, reference bottom. The crop is configured by
-  // - the crop `axis` to pick the dimensions for cropping
-  // - the crop `offset` to set the shift for all/each dimension
-  // to align the cropped bottom with the reference bottom.
-  // All dimensions up to but excluding `axis` are preserved, while
-  // the dimensions including and trailing `axis` are cropped.
-  // If only one `offset` is set, then all dimensions are offset by this amount.
-  // Otherwise, the number of offsets must equal the number of cropped axes to
-  // shift the crop in each dimension accordingly.
-  // Note: standard dimensions are N,C,H,W so the default is a spatial crop,
-  // and `axis` may be negative to index from the end (e.g., -1 for the last
-  // axis).
-  optional int32 axis = 1 [default = 2];
-  repeated uint32 offset = 2;
-}
-
-message DataParameter {
-  enum DB {
-    LEVELDB = 0;
-    LMDB = 1;
-  }
-  // Specify the data source.
-  optional string source = 1;
-  // Specify the batch size.
-  optional uint32 batch_size = 4;
-  // The rand_skip variable is for the data layer to skip a few data points
-  // to avoid all asynchronous sgd clients to start at the same point. The skip
-  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-  // be larger than the number of keys in the database.
-  // DEPRECATED. Each solver accesses a different subset of the database.
-  optional uint32 rand_skip = 7 [default = 0];
-  optional DB backend = 8 [default = LEVELDB];
-  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
-  // simple scaling and subtracting the data mean, if provided. Note that the
-  // mean subtraction is always carried out before scaling.
-  optional float scale = 2 [default = 1];
-  optional string mean_file = 3;
-  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
-  // crop an image.
-  optional uint32 crop_size = 5 [default = 0];
-  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
-  // data.
-  optional bool mirror = 6 [default = false];
-  // Force the encoded image to have 3 color channels
-  optional bool force_encoded_color = 9 [default = false];
-  // Prefetch queue (Number of batches to prefetch to host memory, increase if
-  // data access bandwidth varies).
-  optional uint32 prefetch = 10 [default = 4];
-}
-
-// Message that store parameters used by DetectionEvaluateLayer
-message DetectionEvaluateParameter {
-  // Number of classes that are actually predicted. Required!
-  optional uint32 num_classes = 1;
-  // Label id for background class. Needed for sanity check so that
-  // background class is neither in the ground truth nor the detections.
-  optional uint32 background_label_id = 2 [default = 0];
-  // Threshold for deciding true/false positive.
-  optional float overlap_threshold = 3 [default = 0.5];
-  // If true, also consider difficult ground truth for evaluation.
-  optional bool evaluate_difficult_gt = 4 [default = true];
-  // A file which contains a list of names and sizes with same order
-  // of the input DB. The file is in the following format:
-  //    name height width
-  //    ...
-  // If provided, we will scale the prediction and ground truth NormalizedBBox
-  // for evaluation.
-  optional string name_size_file = 5;
-  // The resize parameter used in converting NormalizedBBox to original image.
-  optional ResizeParameter resize_param = 6;
-}
-
-message NonMaximumSuppressionParameter {
-  // Threshold to be used in nms.
-  optional float nms_threshold = 1 [default = 0.3];
-  // Maximum number of results to be kept.
-  optional int32 top_k = 2;
-  // Parameter for adaptive nms.
-  optional float eta = 3 [default = 1.0];
-}
-
-message SaveOutputParameter {
-  // Output directory. If not empty, we will save the results.
-  optional string output_directory = 1;
-  // Output name prefix.
-  optional string output_name_prefix = 2;
-  // Output format.
-  //    VOC - PASCAL VOC output format.
-  //    COCO - MS COCO output format.
-  optional string output_format = 3;
-  // If you want to output results, must also provide the following two files.
-  // Otherwise, we will ignore saving results.
-  // label map file.
-  optional string label_map_file = 4;
-  // A file which contains a list of names and sizes with same order
-  // of the input DB. The file is in the following format:
-  //    name height width
-  //    ...
-  optional string name_size_file = 5;
-  // Number of test images. It can be less than the lines specified in
-  // name_size_file. For example, when we only want to evaluate on part
-  // of the test images.
-  optional uint32 num_test_image = 6;
-  // The resize parameter used in saving the data.
-  optional ResizeParameter resize_param = 7;
-}
-
-// Message that store parameters used by DetectionOutputLayer
-message DetectionOutputParameter {
-  // Number of classes to be predicted. Required!
-  optional uint32 num_classes = 1;
-  // If true, bounding box are shared among different classes.
-  optional bool share_location = 2 [default = true];
-  // Background label id. If there is no background class,
-  // set it as -1.
-  optional int32 background_label_id = 3 [default = 0];
-  // Parameters used for non maximum suppression.
-  optional NonMaximumSuppressionParameter nms_param = 4;
-  // Parameters used for saving detection results.
-  optional SaveOutputParameter save_output_param = 5;
-  // Type of coding method for bbox.
-  optional PriorBoxParameter.CodeType code_type = 6 [default = CORNER];
-  // If true, variance is encoded in target; otherwise we need to adjust the
-  // predicted offset accordingly.
-  optional bool variance_encoded_in_target = 8 [default = false];
-  // Number of total bboxes to be kept per image after nms step.
-  // -1 means keeping all bboxes after nms step.
-  optional int32 keep_top_k = 7 [default = -1];
-  // Only consider detections whose confidences are larger than a threshold.
-  // If not provided, consider all boxes.
-  optional float confidence_threshold = 9;
-  // If true, visualize the detection results.
-  optional bool visualize = 10 [default = false];
-  // The threshold used to visualize the detection results.
-  optional float visualize_threshold = 11;
-  // If provided, save outputs to video file.
-  optional string save_file = 12;
-}
-
-message DropoutParameter {
-  optional float dropout_ratio = 1 [default = 0.5]; // dropout ratio
-}
-
-// DummyDataLayer fills any number of arbitrarily shaped blobs with random
-// (or constant) data generated by "Fillers" (see "message FillerParameter").
-message DummyDataParameter {
-  // This layer produces N >= 1 top blobs.  DummyDataParameter must specify 1 or N
-  // shape fields, and 0, 1 or N data_fillers.
-  //
-  // If 0 data_fillers are specified, ConstantFiller with a value of 0 is used.
-  // If 1 data_filler is specified, it is applied to all top blobs.  If N are
-  // specified, the ith is applied to the ith top blob.
-  repeated FillerParameter data_filler = 1;
-  repeated BlobShape shape = 6;
-
-  // 4D dimensions -- deprecated.  Use "shape" instead.
-  repeated uint32 num = 2;
-  repeated uint32 channels = 3;
-  repeated uint32 height = 4;
-  repeated uint32 width = 5;
-}
-
-message EltwiseParameter {
-  enum EltwiseOp {
-    PROD = 0;
-    SUM = 1;
-    MAX = 2;
-  }
-  optional EltwiseOp operation = 1 [default = SUM]; // element-wise operation
-  repeated float coeff = 2; // blob-wise coefficient for SUM operation
-
-  // Whether to use an asymptotically slower (for >2 inputs) but stabler method
-  // of computing the gradient for the PROD operation. (No effect for SUM op.)
-  optional bool stable_prod_grad = 3 [default = true];
-}
-
-// Message that stores parameters used by ELULayer
-message ELUParameter {
-  // Described in:
-  // Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2015). Fast and Accurate
-  // Deep Network Learning by Exponential Linear Units (ELUs). arXiv
-  optional float alpha = 1 [default = 1];
-}
-
-// Message that stores parameters used by EmbedLayer
-message EmbedParameter {
-  optional uint32 num_output = 1; // The number of outputs for the layer
-  // The input is given as integers to be interpreted as one-hot
-  // vector indices with dimension num_input.  Hence num_input should be
-  // 1 greater than the maximum possible input value.
-  optional uint32 input_dim = 2;
-
-  optional bool bias_term = 3 [default = true]; // Whether to use a bias term
-  optional FillerParameter weight_filler = 4; // The filler for the weight
-  optional FillerParameter bias_filler = 5; // The filler for the bias
-
-}
-
-// Message that stores parameters used by ExpLayer
-message ExpParameter {
-  // ExpLayer computes outputs y = base ^ (shift + scale * x), for base > 0.
-  // Or if base is set to the default (-1), base is set to e,
-  // so y = exp(shift + scale * x).
-  optional float base = 1 [default = -1.0];
-  optional float scale = 2 [default = 1.0];
-  optional float shift = 3 [default = 0.0];
-}
-
-/// Message that stores parameters used by FlattenLayer
-message FlattenParameter {
-  // The first axis to flatten: all preceding axes are retained in the output.
-  // May be negative to index from the end (e.g., -1 for the last axis).
-  optional int32 axis = 1 [default = 1];
-
-  // The last axis to flatten: all following axes are retained in the output.
-  // May be negative to index from the end (e.g., the default -1 for the last
-  // axis).
-  optional int32 end_axis = 2 [default = -1];
-}
-
-// Message that stores parameters used by HDF5DataLayer
-message HDF5DataParameter {
-  // Specify the data source.
-  optional string source = 1;
-  // Specify the batch size.
-  optional uint32 batch_size = 2;
-
-  // Specify whether to shuffle the data.
-  // If shuffle == true, the ordering of the HDF5 files is shuffled,
-  // and the ordering of data within any given HDF5 file is shuffled,
-  // but data between different files are not interleaved; all of a file's
-  // data are output (in a random order) before moving onto another file.
-  optional bool shuffle = 3 [default = false];
-}
-
-message HDF5OutputParameter {
-  optional string file_name = 1;
-}
-
-message HingeLossParameter {
-  enum Norm {
-    L1 = 1;
-    L2 = 2;
-  }
-  // Specify the Norm to use L1 or L2
-  optional Norm norm = 1 [default = L1];
-}
-
-message ImageDataParameter {
-  // Specify the data source.
-  optional string source = 1;
-  // Specify the batch size.
-  optional uint32 batch_size = 4 [default = 1];
-  // The rand_skip variable is for the data layer to skip a few data points
-  // to avoid all asynchronous sgd clients to start at the same point. The skip
-  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-  // be larger than the number of keys in the database.
-  optional uint32 rand_skip = 7 [default = 0];
-  // Whether or not ImageLayer should shuffle the list of files at every epoch.
-  optional bool shuffle = 8 [default = false];
-  // It will also resize images if new_height or new_width are not zero.
-  optional uint32 new_height = 9 [default = 0];
-  optional uint32 new_width = 10 [default = 0];
-  // Specify if the images are color or gray
-  optional bool is_color = 11 [default = true];
-  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
-  // simple scaling and subtracting the data mean, if provided. Note that the
-  // mean subtraction is always carried out before scaling.
-  optional float scale = 2 [default = 1];
-  optional string mean_file = 3;
-  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
-  // crop an image.
-  optional uint32 crop_size = 5 [default = 0];
-  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
-  // data.
-  optional bool mirror = 6 [default = false];
-  optional string root_folder = 12 [default = ""];
-}
-
-message InfogainLossParameter {
-  // Specify the infogain matrix source.
-  optional string source = 1;
-}
-
-message InnerProductParameter {
-  optional uint32 num_output = 1; // The number of outputs for the layer
-  optional bool bias_term = 2 [default = true]; // whether to have bias terms
-  optional FillerParameter weight_filler = 3; // The filler for the weight
-  optional FillerParameter bias_filler = 4; // The filler for the bias
-
-  // The first axis to be lumped into a single inner product computation;
-  // all preceding axes are retained in the output.
-  // May be negative to index from the end (e.g., -1 for the last axis).
-  optional int32 axis = 5 [default = 1];
-  // Specify whether to transpose the weight matrix or not.
-  // If transpose == true, any operations will be performed on the transpose
-  // of the weight matrix. The weight matrix itself is not going to be transposed
-  // but rather the transfer flag of operations will be toggled accordingly.
-  optional bool transpose = 6 [default = false];
-}
-
-message InputParameter {
-  // This layer produces N >= 1 top blob(s) to be assigned manually.
-  // Define N shapes to set a shape for each top.
-  // Define 1 shape to set the same shape for every top.
-  // Define no shape to defer to reshaping manually.
-  repeated BlobShape shape = 1;
-}
-
-// Message that stores parameters used by LogLayer
-message LogParameter {
-  // LogLayer computes outputs y = log_base(shift + scale * x), for base > 0.
-  // Or if base is set to the default (-1), base is set to e,
-  // so y = ln(shift + scale * x) = log_e(shift + scale * x)
-  optional float base = 1 [default = -1.0];
-  optional float scale = 2 [default = 1.0];
-  optional float shift = 3 [default = 0.0];
-}
-
-// Message that stores parameters used by LRNLayer
-message LRNParameter {
-  optional uint32 local_size = 1 [default = 5];
-  optional float alpha = 2 [default = 1.];
-  optional float beta = 3 [default = 0.75];
-  enum NormRegion {
-    ACROSS_CHANNELS = 0;
-    WITHIN_CHANNEL = 1;
-  }
-  optional NormRegion norm_region = 4 [default = ACROSS_CHANNELS];
-  optional float k = 5 [default = 1.];
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 6 [default = DEFAULT];
-}
-
-message MemoryDataParameter {
-  optional uint32 batch_size = 1;
-  optional uint32 channels = 2;
-  optional uint32 height = 3;
-  optional uint32 width = 4;
-}
-
-// Message that store parameters used by MultiBoxLossLayer
-message MultiBoxLossParameter {
-  // Localization loss type.
-  enum LocLossType {
-    L2 = 0;
-    SMOOTH_L1 = 1;
-  }
-  optional LocLossType loc_loss_type = 1 [default = SMOOTH_L1];
-  // Confidence loss type.
-  enum ConfLossType {
-    SOFTMAX = 0;
-    LOGISTIC = 1;
-  }
-  optional ConfLossType conf_loss_type = 2 [default = SOFTMAX];
-  // Weight for localization loss.
-  optional float loc_weight = 3 [default = 1.0];
-  // Number of classes to be predicted. Required!
-  optional uint32 num_classes = 4;
-  // If true, bounding box are shared among different classes.
-  optional bool share_location = 5 [default = true];
-  // Matching method during training.
-  enum MatchType {
-    BIPARTITE = 0;
-    PER_PREDICTION = 1;
-  }
-  optional MatchType match_type = 6 [default = PER_PREDICTION];
-  // If match_type is PER_PREDICTION, use overlap_threshold to
-  // determine the extra matching bboxes.
-  optional float overlap_threshold = 7 [default = 0.5];
-  // Use prior for matching.
-  optional bool use_prior_for_matching = 8 [default = true];
-  // Background label id.
-  optional uint32 background_label_id = 9 [default = 0];
-  // If true, also consider difficult ground truth.
-  optional bool use_difficult_gt = 10 [default = true];
-  // If true, perform negative mining.
-  // DEPRECATED: use mining_type instead.
-  optional bool do_neg_mining = 11;
-  // The negative/positive ratio.
-  optional float neg_pos_ratio = 12 [default = 3.0];
-  // The negative overlap upperbound for the unmatched predictions.
-  optional float neg_overlap = 13 [default = 0.5];
-  // Type of coding method for bbox.
-  optional PriorBoxParameter.CodeType code_type = 14 [default = CORNER];
-  // If true, encode the variance of prior box in the loc loss target instead of
-  // in bbox.
-  optional bool encode_variance_in_target = 16 [default = false];
-  // If true, map all object classes to agnostic class. It is useful for learning
-  // objectness detector.
-  optional bool map_object_to_agnostic = 17 [default = false];
-  // If true, ignore cross boundary bbox during matching.
-  // Cross boundary bbox is a bbox who is outside of the image region.
-  optional bool ignore_cross_boundary_bbox = 18 [default = false];
-  // If true, only backpropagate on corners which are inside of the image
-  // region when encode_type is CORNER or CORNER_SIZE.
-  optional bool bp_inside = 19 [default = false];
-  // Mining type during training.
-  //   NONE : use all negatives.
-  //   MAX_NEGATIVE : select negatives based on the score.
-  //   HARD_EXAMPLE : select hard examples based on "Training Region-based Object Detectors with Online Hard Example Mining", Shrivastava et.al.
-  enum MiningType {
-    NONE = 0;
-    MAX_NEGATIVE = 1;
-    HARD_EXAMPLE = 2;
-  }
-  optional MiningType mining_type = 20 [default = MAX_NEGATIVE];
-  // Parameters used for non maximum suppression durig hard example mining.
-  optional NonMaximumSuppressionParameter nms_param = 21;
-  optional int32 sample_size = 22 [default = 64];
-  optional bool use_prior_for_nms = 23 [default = false];
-}
-
-message MVNParameter {
-  // This parameter can be set to false to normalize mean only
-  optional bool normalize_variance = 1 [default = true];
-
-  // This parameter can be set to true to perform DNN-like MVN
-  optional bool across_channels = 2 [default = false];
-
-  // Epsilon for not dividing by zero while normalizing variance
-  optional float eps = 3 [default = 1e-9];
-}
-
-// Message that stores parameters used by NormalizeLayer
-message NormalizeParameter {
-  optional bool across_spatial = 1 [default = true];
-  // Initial value of scale. Default is 1.0 for all
-  optional FillerParameter scale_filler = 2;
-  // Whether or not scale parameters are shared across channels.
-  optional bool channel_shared = 3 [default = true];
-  // Epsilon for not dividing by zero while normalizing variance
-  optional float eps = 4 [default = 1e-10];
-}
-
-message ParameterParameter {
-  optional BlobShape shape = 1;
-}
-
-message PermuteParameter {
-  // The new orders of the axes of data. Notice it should be with
-  // in the same range as the input data, and it starts from 0.
-  // Do not provide repeated order.
-  repeated uint32 order = 1;
-}
-
-message PoolingParameter {
-  enum PoolMethod {
-    MAX = 0;
-    AVE = 1;
-    STOCHASTIC = 2;
-  }
-  optional PoolMethod pool = 1 [default = MAX]; // The pooling method
-  // Pad, kernel size, and stride are all given as a single value for equal
-  // dimensions in height and width or as Y, X pairs.
-  optional uint32 pad = 4 [default = 0]; // The padding size (equal in Y, X)
-  optional uint32 pad_h = 9 [default = 0]; // The padding height
-  optional uint32 pad_w = 10 [default = 0]; // The padding width
-  optional uint32 kernel_size = 2; // The kernel size (square)
-  optional uint32 kernel_h = 5; // The kernel height
-  optional uint32 kernel_w = 6; // The kernel width
-  optional uint32 stride = 3 [default = 1]; // The stride (equal in Y, X)
-  optional uint32 stride_h = 7; // The stride height
-  optional uint32 stride_w = 8; // The stride width
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 11 [default = DEFAULT];
-  // If global_pooling then it will pool over the size of the bottom by doing
-  // kernel_h = bottom->height and kernel_w = bottom->width
-  optional bool global_pooling = 12 [default = false];
-}
-
-message PowerParameter {
-  // PowerLayer computes outputs y = (shift + scale * x) ^ power.
-  optional float power = 1 [default = 1.0];
-  optional float scale = 2 [default = 1.0];
-  optional float shift = 3 [default = 0.0];
-}
-
-// Message that store parameters used by PriorBoxLayer
-message PriorBoxParameter {
-  // Encode/decode type.
-  enum CodeType {
-    CORNER = 1;
-    CENTER_SIZE = 2;
-    CORNER_SIZE = 3;
-  }
-  // Minimum box size (in pixels). Required!
-  repeated float min_size = 1;
-  // Maximum box size (in pixels). Required!
-  repeated float max_size = 2;
-  // Various of aspect ratios. Duplicate ratios will be ignored.
-  // If none is provided, we use default ratio 1.
-  repeated float aspect_ratio = 3;
-  // If true, will flip each aspect ratio.
-  // For example, if there is aspect ratio "r",
-  // we will generate aspect ratio "1.0/r" as well.
-  optional bool flip = 4 [default = true];
-  // If true, will clip the prior so that it is within [0, 1]
-  optional bool clip = 5 [default = false];
-  // Variance for adjusting the prior bboxes.
-  repeated float variance = 6;
-  // By default, we calculate img_height, img_width, step_x, step_y based on
-  // bottom[0] (feat) and bottom[1] (img). Unless these values are explicitely
-  // provided.
-  // Explicitly provide the img_size.
-  optional uint32 img_size = 7;
-  // Either img_size or img_h/img_w should be specified; not both.
-  optional uint32 img_h = 8;
-  optional uint32 img_w = 9;
-
-  // Explicitly provide the step size.
-  optional float step = 10;
-  // Either step or step_h/step_w should be specified; not both.
-  optional float step_h = 11;
-  optional float step_w = 12;
-
-  // Offset to the top left corner of each cell.
-  optional float offset = 13 [default = 0.5];
-}
-
-message PythonParameter {
-  optional string module = 1;
-  optional string layer = 2;
-  // This value is set to the attribute `param_str` of the `PythonLayer` object
-  // in Python before calling the `setup()` method. This could be a number,
-  // string, dictionary in Python dict format, JSON, etc. You may parse this
-  // string in `setup` method and use it in `forward` and `backward`.
-  optional string param_str = 3 [default = ''];
-  // Whether this PythonLayer is shared among worker solvers during data parallelism.
-  // If true, each worker solver sequentially run forward from this layer.
-  // This value should be set true if you are using it as a data layer.
-  optional bool share_in_parallel = 4 [default = false];
-}
-
-// Message that stores parameters used by RecurrentLayer
-message RecurrentParameter {
-  // The dimension of the output (and usually hidden state) representation --
-  // must be explicitly set to non-zero.
-  optional uint32 num_output = 1 [default = 0];
-
-  optional FillerParameter weight_filler = 2; // The filler for the weight
-  optional FillerParameter bias_filler = 3; // The filler for the bias
-
-  // Whether to enable displaying debug_info in the unrolled recurrent net.
-  optional bool debug_info = 4 [default = false];
-
-  // Whether to add as additional inputs (bottoms) the initial hidden state
-  // blobs, and add as additional outputs (tops) the final timestep hidden state
-  // blobs.  The number of additional bottom/top blobs required depends on the
-  // recurrent architecture -- e.g., 1 for RNNs, 2 for LSTMs.
-  optional bool expose_hidden = 5 [default = false];
-}
-
-// Message that stores parameters used by ReductionLayer
-message ReductionParameter {
-  enum ReductionOp {
-    SUM = 1;
-    ASUM = 2;
-    SUMSQ = 3;
-    MEAN = 4;
-  }
-
-  optional ReductionOp operation = 1 [default = SUM]; // reduction operation
-
-  // The first axis to reduce to a scalar -- may be negative to index from the
-  // end (e.g., -1 for the last axis).
-  // (Currently, only reduction along ALL "tail" axes is supported; reduction
-  // of axis M through N, where N < num_axes - 1, is unsupported.)
-  // Suppose we have an n-axis bottom Blob with shape:
-  //     (d0, d1, d2, ..., d(m-1), dm, d(m+1), ..., d(n-1)).
-  // If axis == m, the output Blob will have shape
-  //     (d0, d1, d2, ..., d(m-1)),
-  // and the ReductionOp operation is performed (d0 * d1 * d2 * ... * d(m-1))
-  // times, each including (dm * d(m+1) * ... * d(n-1)) individual data.
-  // If axis == 0 (the default), the output Blob always has the empty shape
-  // (count 1), performing reduction across the entire input --
-  // often useful for creating new loss functions.
-  optional int32 axis = 2 [default = 0];
-
-  optional float coeff = 3 [default = 1.0]; // coefficient for output
-}
-
-// Message that stores parameters used by ReLULayer
-message ReLUParameter {
-  // Allow non-zero slope for negative inputs to speed up optimization
-  // Described in:
-  // Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities
-  // improve neural network acoustic models. In ICML Workshop on Deep Learning
-  // for Audio, Speech, and Language Processing.
-  optional float negative_slope = 1 [default = 0];
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 2 [default = DEFAULT];
-}
-
-message ReshapeParameter {
-  // Specify the output dimensions. If some of the dimensions are set to 0,
-  // the corresponding dimension from the bottom layer is used (unchanged).
-  // Exactly one dimension may be set to -1, in which case its value is
-  // inferred from the count of the bottom blob and the remaining dimensions.
-  // For example, suppose we want to reshape a 2D blob "input" with shape 2 x 8:
-  //
-  //   layer {
-  //     type: "Reshape" bottom: "input" top: "output"
-  //     reshape_param { ... }
-  //   }
-  //
-  // If "input" is 2D with shape 2 x 8, then the following reshape_param
-  // specifications are all equivalent, producing a 3D blob "output" with shape
-  // 2 x 2 x 4:
-  //
-  //   reshape_param { shape { dim:  2  dim: 2  dim:  4 } }
-  //   reshape_param { shape { dim:  0  dim: 2  dim:  4 } }
-  //   reshape_param { shape { dim:  0  dim: 2  dim: -1 } }
-  //   reshape_param { shape { dim:  0  dim:-1  dim:  4 } }
-  //
-  optional BlobShape shape = 1;
-
-  // axis and num_axes control the portion of the bottom blob's shape that are
-  // replaced by (included in) the reshape. By default (axis == 0 and
-  // num_axes == -1), the entire bottom blob shape is included in the reshape,
-  // and hence the shape field must specify the entire output shape.
-  //
-  // axis may be non-zero to retain some portion of the beginning of the input
-  // shape (and may be negative to index from the end; e.g., -1 to begin the
-  // reshape after the last axis, including nothing in the reshape,
-  // -2 to include only the last axis, etc.).
-  //
-  // For example, suppose "input" is a 2D blob with shape 2 x 8.
-  // Then the following ReshapeLayer specifications are all equivalent,
-  // producing a blob "output" with shape 2 x 2 x 4:
-  //
-  //   reshape_param { shape { dim: 2  dim: 2  dim: 4 } }
-  //   reshape_param { shape { dim: 2  dim: 4 } axis:  1 }
-  //   reshape_param { shape { dim: 2  dim: 4 } axis: -3 }
-  //
-  // num_axes specifies the extent of the reshape.
-  // If num_axes >= 0 (and axis >= 0), the reshape will be performed only on
-  // input axes in the range [axis, axis+num_axes].
-  // num_axes may also be -1, the default, to include all remaining axes
-  // (starting from axis).
-  //
-  // For example, suppose "input" is a 2D blob with shape 2 x 8.
-  // Then the following ReshapeLayer specifications are equivalent,
-  // producing a blob "output" with shape 1 x 2 x 8.
-  //
-  //   reshape_param { shape { dim:  1  dim: 2  dim:  8 } }
-  //   reshape_param { shape { dim:  1  dim: 2  }  num_axes: 1 }
-  //   reshape_param { shape { dim:  1  }  num_axes: 0 }
-  //
-  // On the other hand, these would produce output blob shape 2 x 1 x 8:
-  //
-  //   reshape_param { shape { dim: 2  dim: 1  dim: 8  }  }
-  //   reshape_param { shape { dim: 1 }  axis: 1  num_axes: 0 }
-  //
-  optional int32 axis = 2 [default = 0];
-  optional int32 num_axes = 3 [default = -1];
-}
-
-message ScaleParameter {
-  // The first axis of bottom[0] (the first input Blob) along which to apply
-  // bottom[1] (the second input Blob).  May be negative to index from the end
-  // (e.g., -1 for the last axis).
-  //
-  // For example, if bottom[0] is 4D with shape 100x3x40x60, the output
-  // top[0] will have the same shape, and bottom[1] may have any of the
-  // following shapes (for the given value of axis):
-  //    (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
-  //    (axis == 1 == -3)          3;     3x40;     3x40x60
-  //    (axis == 2 == -2)                   40;       40x60
-  //    (axis == 3 == -1)                                60
-  // Furthermore, bottom[1] may have the empty shape (regardless of the value of
-  // "axis") -- a scalar multiplier.
-  optional int32 axis = 1 [default = 1];
-
-  // (num_axes is ignored unless just one bottom is given and the scale is
-  // a learned parameter of the layer.  Otherwise, num_axes is determined by the
-  // number of axes by the second bottom.)
-  // The number of axes of the input (bottom[0]) covered by the scale
-  // parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
-  // Set num_axes := 0, to multiply with a zero-axis Blob: a scalar.
-  optional int32 num_axes = 2 [default = 1];
-
-  // (filler is ignored unless just one bottom is given and the scale is
-  // a learned parameter of the layer.)
-  // The initialization for the learned scale parameter.
-  // Default is the unit (1) initialization, resulting in the ScaleLayer
-  // initially performing the identity operation.
-  optional FillerParameter filler = 3;
-
-  // Whether to also learn a bias (equivalent to a ScaleLayer+BiasLayer, but
-  // may be more efficient).  Initialized with bias_filler (defaults to 0).
-  optional bool bias_term = 4 [default = false];
-  optional FillerParameter bias_filler = 5;
-}
-
-message SigmoidParameter {
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 1 [default = DEFAULT];
-}
-
-message SliceParameter {
-  // The axis along which to slice -- may be negative to index from the end
-  // (e.g., -1 for the last axis).
-  // By default, SliceLayer concatenates blobs along the "channels" axis (1).
-  optional int32 axis = 3 [default = 1];
-  repeated uint32 slice_point = 2;
-
-  // DEPRECATED: alias for "axis" -- does not support negative indexing.
-  optional uint32 slice_dim = 1 [default = 1];
-}
-
-// Message that stores parameters used by SoftmaxLayer, SoftmaxWithLossLayer
-message SoftmaxParameter {
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 1 [default = DEFAULT];
-
-  // The axis along which to perform the softmax -- may be negative to index
-  // from the end (e.g., -1 for the last axis).
-  // Any other axes will be evaluated as independent softmaxes.
-  optional int32 axis = 2 [default = 1];
-}
-
-message TanHParameter {
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 1 [default = DEFAULT];
-}
-
-// Message that stores parameters used by TileLayer
-message TileParameter {
-  // The index of the axis to tile.
-  optional int32 axis = 1 [default = 1];
-
-  // The number of copies (tiles) of the blob to output.
-  optional int32 tiles = 2;
-}
-
-// Message that stores parameters used by ThresholdLayer
-message ThresholdParameter {
-  optional float threshold = 1 [default = 0]; // Strictly positive values
-}
-
-message VideoDataParameter{
-  enum VideoType {
-    WEBCAM = 0;
-    VIDEO = 1;
-  }
-  optional VideoType video_type = 1 [default = WEBCAM];
-  optional int32 device_id = 2 [default = 0];
-  optional string video_file = 3;
-  // Number of frames to be skipped before processing a frame.
-  optional uint32 skip_frames = 4 [default = 0];
-}
-
-message WindowDataParameter {
-  // Specify the data source.
-  optional string source = 1;
-  // For data pre-processing, we can do simple scaling and subtracting the
-  // data mean, if provided. Note that the mean subtraction is always carried
-  // out before scaling.
-  optional float scale = 2 [default = 1];
-  optional string mean_file = 3;
-  // Specify the batch size.
-  optional uint32 batch_size = 4;
-  // Specify if we would like to randomly crop an image.
-  optional uint32 crop_size = 5 [default = 0];
-  // Specify if we want to randomly mirror data.
-  optional bool mirror = 6 [default = false];
-  // Foreground (object) overlap threshold
-  optional float fg_threshold = 7 [default = 0.5];
-  // Background (non-object) overlap threshold
-  optional float bg_threshold = 8 [default = 0.5];
-  // Fraction of batch that should be foreground objects
-  optional float fg_fraction = 9 [default = 0.25];
-  // Amount of contextual padding to add around a window
-  // (used only by the window_data_layer)
-  optional uint32 context_pad = 10 [default = 0];
-  // Mode for cropping out a detection window
-  // warp: cropped window is warped to a fixed size and aspect ratio
-  // square: the tightest square around the window is cropped
-  optional string crop_mode = 11 [default = "warp"];
-  // cache_images: will load all images in memory for faster access
-  optional bool cache_images = 12 [default = false];
-  // append root_folder to locate images
-  optional string root_folder = 13 [default = ""];
-}
-
-message SPPParameter {
-  enum PoolMethod {
-    MAX = 0;
-    AVE = 1;
-    STOCHASTIC = 2;
-  }
-  optional uint32 pyramid_height = 1;
-  optional PoolMethod pool = 2 [default = MAX]; // The pooling method
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 6 [default = DEFAULT];
-}
-
-// DEPRECATED: use LayerParameter.
-message V1LayerParameter {
-  repeated string bottom = 2;
-  repeated string top = 3;
-  optional string name = 4;
-  repeated NetStateRule include = 32;
-  repeated NetStateRule exclude = 33;
-  enum LayerType {
-    NONE = 0;
-    ABSVAL = 35;
-    ACCURACY = 1;
-    ARGMAX = 30;
-    BNLL = 2;
-    CONCAT = 3;
-    CONTRASTIVE_LOSS = 37;
-    CONVOLUTION = 4;
-    DATA = 5;
-    DECONVOLUTION = 39;
-    DROPOUT = 6;
-    DUMMY_DATA = 32;
-    EUCLIDEAN_LOSS = 7;
-    ELTWISE = 25;
-    EXP = 38;
-    FLATTEN = 8;
-    HDF5_DATA = 9;
-    HDF5_OUTPUT = 10;
-    HINGE_LOSS = 28;
-    IM2COL = 11;
-    IMAGE_DATA = 12;
-    INFOGAIN_LOSS = 13;
-    INNER_PRODUCT = 14;
-    LRN = 15;
-    MEMORY_DATA = 29;
-    MULTINOMIAL_LOGISTIC_LOSS = 16;
-    MVN = 34;
-    POOLING = 17;
-    POWER = 26;
-    RELU = 18;
-    SIGMOID = 19;
-    SIGMOID_CROSS_ENTROPY_LOSS = 27;
-    SILENCE = 36;
-    SOFTMAX = 20;
-    SOFTMAX_LOSS = 21;
-    SPLIT = 22;
-    SLICE = 33;
-    TANH = 23;
-    WINDOW_DATA = 24;
-    THRESHOLD = 31;
-  }
-  optional LayerType type = 5;
-  repeated BlobProto blobs = 6;
-  repeated string param = 1001;
-  repeated DimCheckMode blob_share_mode = 1002;
-  enum DimCheckMode {
-    STRICT = 0;
-    PERMISSIVE = 1;
-  }
-  repeated float blobs_lr = 7;
-  repeated float weight_decay = 8;
-  repeated float loss_weight = 35;
-  optional AccuracyParameter accuracy_param = 27;
-  optional ArgMaxParameter argmax_param = 23;
-  optional ConcatParameter concat_param = 9;
-  optional ContrastiveLossParameter contrastive_loss_param = 40;
-  optional ConvolutionParameter convolution_param = 10;
-  optional DataParameter data_param = 11;
-  optional DropoutParameter dropout_param = 12;
-  optional DummyDataParameter dummy_data_param = 26;
-  optional EltwiseParameter eltwise_param = 24;
-  optional ExpParameter exp_param = 41;
-  optional HDF5DataParameter hdf5_data_param = 13;
-  optional HDF5OutputParameter hdf5_output_param = 14;
-  optional HingeLossParameter hinge_loss_param = 29;
-  optional ImageDataParameter image_data_param = 15;
-  optional InfogainLossParameter infogain_loss_param = 16;
-  optional InnerProductParameter inner_product_param = 17;
-  optional LRNParameter lrn_param = 18;
-  optional MemoryDataParameter memory_data_param = 22;
-  optional MVNParameter mvn_param = 34;
-  optional PoolingParameter pooling_param = 19;
-  optional PowerParameter power_param = 21;
-  optional ReLUParameter relu_param = 30;
-  optional SigmoidParameter sigmoid_param = 38;
-  optional SoftmaxParameter softmax_param = 39;
-  optional SliceParameter slice_param = 31;
-  optional TanHParameter tanh_param = 37;
-  optional ThresholdParameter threshold_param = 25;
-  optional WindowDataParameter window_data_param = 20;
-  optional TransformationParameter transform_param = 36;
-  optional LossParameter loss_param = 42;
-  optional V0LayerParameter layer = 1;
-}
-
-// DEPRECATED: V0LayerParameter is the old way of specifying layer parameters
-// in Caffe.  We keep this message type around for legacy support.
-message V0LayerParameter {
-  optional string name = 1; // the layer name
-  optional string type = 2; // the string to specify the layer type
-
-  // Parameters to specify layers with inner products.
-  optional uint32 num_output = 3; // The number of outputs for the layer
-  optional bool biasterm = 4 [default = true]; // whether to have bias terms
-  optional FillerParameter weight_filler = 5; // The filler for the weight
-  optional FillerParameter bias_filler = 6; // The filler for the bias
-
-  optional uint32 pad = 7 [default = 0]; // The padding size
-  optional uint32 kernelsize = 8; // The kernel size
-  optional uint32 group = 9 [default = 1]; // The group size for group conv
-  optional uint32 stride = 10 [default = 1]; // The stride
-  enum PoolMethod {
-    MAX = 0;
-    AVE = 1;
-    STOCHASTIC = 2;
-  }
-  optional PoolMethod pool = 11 [default = MAX]; // The pooling method
-  optional float dropout_ratio = 12 [default = 0.5]; // dropout ratio
-
-  optional uint32 local_size = 13 [default = 5]; // for local response norm
-  optional float alpha = 14 [default = 1.]; // for local response norm
-  optional float beta = 15 [default = 0.75]; // for local response norm
-  optional float k = 22 [default = 1.];
-
-  // For data layers, specify the data source
-  optional string source = 16;
-  // For data pre-processing, we can do simple scaling and subtracting the
-  // data mean, if provided. Note that the mean subtraction is always carried
-  // out before scaling.
-  optional float scale = 17 [default = 1];
-  optional string meanfile = 18;
-  // For data layers, specify the batch size.
-  optional uint32 batchsize = 19;
-  // For data layers, specify if we would like to randomly crop an image.
-  optional uint32 cropsize = 20 [default = 0];
-  // For data layers, specify if we want to randomly mirror data.
-  optional bool mirror = 21 [default = false];
-
-  // The blobs containing the numeric parameters of the layer
-  repeated BlobProto blobs = 50;
-  // The ratio that is multiplied on the global learning rate. If you want to
-  // set the learning ratio for one blob, you need to set it for all blobs.
-  repeated float blobs_lr = 51;
-  // The weight decay that is multiplied on the global weight decay.
-  repeated float weight_decay = 52;
-
-  // The rand_skip variable is for the data layer to skip a few data points
-  // to avoid all asynchronous sgd clients to start at the same point. The skip
-  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-  // be larger than the number of keys in the database.
-  optional uint32 rand_skip = 53 [default = 0];
-
-  // Fields related to detection (det_*)
-  // foreground (object) overlap threshold
-  optional float det_fg_threshold = 54 [default = 0.5];
-  // background (non-object) overlap threshold
-  optional float det_bg_threshold = 55 [default = 0.5];
-  // Fraction of batch that should be foreground objects
-  optional float det_fg_fraction = 56 [default = 0.25];
-
-  // optional bool OBSOLETE_can_clobber = 57 [default = true];
-
-  // Amount of contextual padding to add around a window
-  // (used only by the window_data_layer)
-  optional uint32 det_context_pad = 58 [default = 0];
-
-  // Mode for cropping out a detection window
-  // warp: cropped window is warped to a fixed size and aspect ratio
-  // square: the tightest square around the window is cropped
-  optional string det_crop_mode = 59 [default = "warp"];
-
-  // For ReshapeLayer, one needs to specify the new dimensions.
-  optional int32 new_num = 60 [default = 0];
-  optional int32 new_channels = 61 [default = 0];
-  optional int32 new_height = 62 [default = 0];
-  optional int32 new_width = 63 [default = 0];
-
-  // Whether or not ImageLayer should shuffle the list of files at every epoch.
-  // It will also resize images if new_height or new_width are not zero.
-  optional bool shuffle_images = 64 [default = false];
-
-  // For ConcatLayer, one needs to specify the dimension for concatenation, and
-  // the other dimensions must be the same for all the bottom blobs.
-  // By default it will concatenate blobs along the channels dimension.
-  optional uint32 concat_dim = 65 [default = 1];
-
-  optional HDF5OutputParameter hdf5_output_param = 1001;
-}
-
-message PReLUParameter {
-  // Parametric ReLU described in K. He et al, Delving Deep into Rectifiers:
-  // Surpassing Human-Level Performance on ImageNet Classification, 2015.
-
-  // Initial value of a_i. Default is a_i=0.25 for all i.
-  optional FillerParameter filler = 1;
-  // Whether or not slope paramters are shared across channels.
-  optional bool channel_shared = 2 [default = false];
-}
diff --git a/example/ssd/tools/caffe_converter/caffe_parse/parse_from_protobuf.py b/example/ssd/tools/caffe_converter/caffe_parse/parse_from_protobuf.py
deleted file mode 100644
index 862049a770b1..000000000000
--- a/example/ssd/tools/caffe_converter/caffe_parse/parse_from_protobuf.py
+++ /dev/null
@@ -1,55 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from google.protobuf import text_format
-import numpy as np
-import caffe_parse.caffe_pb2 as caffe_pb2
-
-
-def parse_caffemodel(file_path):
-    """
-    parses the trained .caffemodel file
-
-    filepath: /path/to/trained-model.caffemodel
-
-    returns: layers
-    """
-    f = open(file_path, 'rb')
-    contents = f.read()
-
-    net_param = caffe_pb2.NetParameter()
-    net_param.ParseFromString(contents)
-
-    layers = find_layers(net_param)
-    return layers
-
-
-def find_layers(net_param):
-    if len(net_param.layers) > 0:
-        return net_param.layers
-    elif len(net_param.layer) > 0:
-        return net_param.layer
-    else:
-        raise Exception("Couldn't find layers")
-
-
-def main():
-    param_dict = parse_caffemodel('xxx.caffemodel')
-
-
-if __name__ == '__main__':
-    main()
diff --git a/example/ssd/tools/caffe_converter/caffe_parser.py b/example/ssd/tools/caffe_converter/caffe_parser.py
deleted file mode 100644
index cff6fd590701..000000000000
--- a/example/ssd/tools/caffe_converter/caffe_parser.py
+++ /dev/null
@@ -1,80 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Parse caffe's protobuf
-"""
-import re
-try:
-    import caffe
-    from caffe.proto import caffe_pb2
-    use_caffe = True
-except ImportError:
-    try:
-        import caffe_pb2
-    except ImportError:
-        raise ImportError('You used to compile with protoc --python_out=./ ./caffe.proto')
-    use_caffe = False
-
-from google.protobuf import text_format
-
-def read_prototxt(fname):
-    """Return a caffe_pb2.NetParameter object that defined in a prototxt file
-    """
-    proto = caffe_pb2.NetParameter()
-    with open(fname, 'r') as f:
-        text_format.Merge(str(f.read()), proto)
-    return proto
-
-def get_layers(proto):
-    """Returns layers in a caffe_pb2.NetParameter object
-    """
-    if len(proto.layer):
-        return proto.layer
-    elif len(proto.layers):
-        return proto.layers
-    else:
-        raise ValueError('Invalid proto file.')
-
-def read_caffemodel(prototxt_fname, caffemodel_fname):
-    """Return a caffe_pb2.NetParameter object that defined in a binary
-    caffemodel file
-    """
-    if use_caffe:
-        caffe.set_mode_cpu()
-        net = caffe.Net(prototxt_fname, caffemodel_fname, caffe.TEST)
-        layer_names = net._layer_names
-        layers = net.layers
-        return (layers, layer_names)
-    else:
-        proto = caffe_pb2.NetParameter()
-        with open(caffemodel_fname, 'rb') as f:
-            proto.ParseFromString(f.read())
-        return (get_layers(proto), None)
-
-def layer_iter(layers, layer_names):
-    if use_caffe:
-        for layer_idx, layer in enumerate(layers):
-            layer_name = re.sub('[-/]', '_', layer_names[layer_idx])
-            layer_type = layer.type
-            layer_blobs = layer.blobs
-            yield (layer_name, layer_type, layer_blobs)
-    else:
-        for layer in layers:
-            layer_name = re.sub('[-/]', '_', layer.name)
-            layer_type = layer.type
-            layer_blobs = layer.blobs
-            yield (layer_name, layer_type, layer_blobs)
diff --git a/example/ssd/tools/caffe_converter/caffe_proto_utils.py b/example/ssd/tools/caffe_converter/caffe_proto_utils.py
deleted file mode 100644
index 45978e7dc59d..000000000000
--- a/example/ssd/tools/caffe_converter/caffe_proto_utils.py
+++ /dev/null
@@ -1,204 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Helper functions for parsing caffe prototxt into a workable DAG
-"""
-
-
-def process_network_proto(caffe_root, deploy_proto):
-    """
-    Runs the caffe upgrade tool on the prototxt to create a prototxt in the latest format.
-    This enable us to work just with latest structures, instead of supporting all the variants
-
-    :param caffe_root: link to caffe root folder, where the upgrade tool is located
-    :param deploy_proto: name of the original prototxt file
-    :return: name of new processed prototxt file
-    """
-    processed_deploy_proto = deploy_proto + ".processed"
-
-    from shutil import copyfile
-    copyfile(deploy_proto, processed_deploy_proto)
-
-    # run upgrade tool on new file name (same output file)
-    import os
-    upgrade_tool_command_line = caffe_root + '/build/tools/upgrade_net_proto_text.bin ' \
-                                + processed_deploy_proto + ' ' + processed_deploy_proto
-    os.system(upgrade_tool_command_line)
-
-    return processed_deploy_proto
-
-
-class LayerRecord(object):
-    """
-    A record which describe basic layer parameters
-    """
-
-    def __init__(self, layer_def):
-
-        self.layer_def = layer_def
-        self.name = layer_def.name
-        self.type = layer_def.type
-
-        # keep filter, stride and pad
-        if layer_def.type == 'Convolution':
-            if LayerRecord._is_iterable(layer_def.convolution_param.kernel_size):
-                self.filter = list(layer_def.convolution_param.kernel_size)
-            else:
-                self.filter = list([layer_def.convolution_param.kernel_size])
-            if len(self.filter) == 1:
-                self.filter *= 2
-            if LayerRecord._is_iterable(layer_def.convolution_param.pad):
-                self.pad = list(layer_def.convolution_param.pad)
-            else:
-                self.pad = list([layer_def.convolution_param.pad])
-            if len(self.pad) == 0:
-                self.pad = [0, 0]
-            elif len(self.pad) == 1:
-                self.pad *= 2
-            if LayerRecord._is_iterable(layer_def.convolution_param.stride):
-                self.stride = list(layer_def.convolution_param.stride)
-            else:
-                self.stride = list([layer_def.convolution_param.stride])
-            if len(self.stride) == 0:
-                self.stride = [1, 1]
-            elif len(self.stride) == 1:
-                self.stride *= 2
-
-        elif layer_def.type == 'Pooling':
-            self.filter = [layer_def.pooling_param.kernel_size]
-            if len(self.filter) == 1:
-                self.filter *= 2
-            self.pad = [layer_def.pooling_param.pad]
-            if len(self.pad) == 0:
-                self.pad = [0, 0]
-            elif len(self.pad) == 1:
-                self.pad *= 2
-            self.stride = [layer_def.pooling_param.stride]
-            if len(self.stride) == 0:
-                self.stride = [1, 1]
-            elif len(self.stride) == 1:
-                self.stride *= 2
-
-        else:
-            self.filter = [0, 0]
-            self.pad = [0, 0]
-            self.stride = [1, 1]
-
-        # keep tops
-        self.tops = list(layer_def.top)
-
-        # keep bottoms
-        self.bottoms = list(layer_def.bottom)
-
-        # list of parent layers
-        self.parents = []
-
-        # list of child layers
-        self.children = []
-
-    @staticmethod
-    def _is_iterable(obj):
-        return hasattr(obj, '__iter__')
-
-def read_network_dag(processed_deploy_prototxt):
-    """
-    Reads from the caffe prototxt the network structure
-    :param processed_deploy_prototxt: name of prototxt to load, preferably the prototxt should
-     be processed before using a call to process_network_proto()
-    :return: network_def, layer_name_to_record, top_to_layers
-    network_def: caffe network structure, gives access to *all* the network information
-    layer_name_to_record: *ordered* dictionary which maps between layer name and a structure which
-      describes in a simple form the layer parameters
-    top_to_layers: dictionary which maps a blob name to an ordered list of layers which output it
-     when a top is used several times, like in inplace layhers, the list will contain all the layers
-     by order of appearance
-    """
-
-    from caffe.proto import caffe_pb2
-    from google.protobuf import text_format
-    from collections import OrderedDict
-
-    # load prototxt file
-    network_def = caffe_pb2.NetParameter()
-    with open(processed_deploy_prototxt, 'r') as proto_file:
-        text_format.Merge(str(proto_file.read()), network_def)
-
-    # map layer name to layer record
-    layer_name_to_record = OrderedDict()
-    for layer_def in network_def.layer:
-        if (len(layer_def.include) == 0) or \
-           (caffe_pb2.TEST in [item.phase for item in layer_def.include]):
-
-            layer_name_to_record[layer_def.name] = LayerRecord(layer_def)
-
-    top_to_layers = dict()
-    for layer in network_def.layer:
-        # no specific phase, or TEST phase is specifically asked for
-        if (len(layer.include) == 0) or (caffe_pb2.TEST in [item.phase for item in layer.include]):
-            for top in layer.top:
-                if top not in top_to_layers:
-                    top_to_layers[top] = list()
-                top_to_layers[top].append(layer.name)
-
-    # find parents and children of all layers
-    for child_layer_name in layer_name_to_record.keys():  # pylint: disable=too-many-nested-blocks
-        child_layer_def = layer_name_to_record[child_layer_name]
-        for bottom in child_layer_def.bottoms:
-            if bottom in top_to_layers:
-                for parent_layer_name in top_to_layers[bottom]:
-                    if parent_layer_name in layer_name_to_record:
-                        parent_layer_def = layer_name_to_record[parent_layer_name]
-                        if parent_layer_def not in child_layer_def.parents:
-                            child_layer_def.parents.append(parent_layer_def)
-                        if child_layer_def not in parent_layer_def.children:
-                            parent_layer_def.children.append(child_layer_def)
-
-    # update filter, strid, pad for maxout "structures"
-    for layer_name in layer_name_to_record.keys():
-        layer_def = layer_name_to_record[layer_name]
-        if layer_def.type == 'Eltwise' and \
-           len(layer_def.parents) == 1 and \
-           layer_def.parents[0].type == 'Slice' and \
-           len(layer_def.parents[0].parents) == 1 and \
-           layer_def.parents[0].parents[0].type in ['Convolution', 'InnerProduct']:
-            layer_def.filter = layer_def.parents[0].parents[0].filter
-            layer_def.stride = layer_def.parents[0].parents[0].stride
-            layer_def.pad = layer_def.parents[0].parents[0].pad
-
-    return network_def, layer_name_to_record, top_to_layers
-
-
-def read_caffe_mean(caffe_mean_file):
-    """
-    Reads caffe formatted mean file
-    :param caffe_mean_file: path to caffe mean file, presumably with 'binaryproto' suffix
-    :return: mean image, converted from BGR to RGB format
-    """
-
-    import caffe_parser
-    import numpy as np
-    mean_blob = caffe_parser.caffe_pb2.BlobProto()
-    with open(caffe_mean_file, 'rb') as f:
-        mean_blob.ParseFromString(f.read())
-
-    img_mean_np = np.array(mean_blob.data)
-    img_mean_np = img_mean_np.reshape(mean_blob.channels, mean_blob.height, mean_blob.width)
-
-    # swap channels from Caffe BGR to RGB
-    img_mean_np[[0, 2], :, :] = img_mean_np[[2, 0], :, :]
-
-    return img_mean_np
diff --git a/example/ssd/tools/caffe_converter/compare_layers.py b/example/ssd/tools/caffe_converter/compare_layers.py
deleted file mode 100644
index 9509027797aa..000000000000
--- a/example/ssd/tools/caffe_converter/compare_layers.py
+++ /dev/null
@@ -1,365 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Test converted models layer by layer
-"""
-import os
-import argparse
-import logging
-import mxnet as mx
-import cv2
-import numpy as np
-
-logging.basicConfig(level=logging.INFO)
-
-
-def read_image(img_path, image_dims=None, mean=None):
-    """
-    Reads an image from file path or URL, optionally resizing to given image dimensions and
-    subtracting mean.
-    :param img_path: path to file, or url to download
-    :param image_dims: image dimensions to resize to, or None
-    :param mean: mean file to subtract, or None
-    :return: loaded image, in RGB format
-    """
-
-    import urllib
-
-    filename = img_path.split("/")[-1]
-    if img_path.startswith('http'):
-        urllib.urlretrieve(img_path, filename)
-        img = cv2.imread(filename)
-    else:
-        img = cv2.imread(img_path)
-
-    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
-    if image_dims is not None:
-        img = cv2.resize(img, image_dims)  # resize to image_dims to fit model
-    img = np.rollaxis(img, 2) # change to (c, h, w) order
-    img = img[np.newaxis, :]  # extend to (n, c, h, w)
-    if mean is not None:
-        mean = np.array(mean)
-        if mean.shape == (3,):
-            mean = mean[np.newaxis, :, np.newaxis, np.newaxis]  # extend to (n, c, 1, 1)
-        img = img.astype(np.float32) - mean # subtract mean
-
-    return img
-
-
-def _ch_dev(arg_params, aux_params, ctx):
-    """
-    Changes device of given mxnet arguments
-    :param arg_params: arguments
-    :param aux_params: auxiliary parameters
-    :param ctx: new device context
-    :return: arguments and auxiliary parameters on new device
-    """
-    new_args = dict()
-    new_auxs = dict()
-    for k, v in arg_params.items():
-        new_args[k] = v.as_in_context(ctx)
-    for k, v in aux_params.items():
-        new_auxs[k] = v.as_in_context(ctx)
-    return new_args, new_auxs
-
-
-def convert_and_compare_caffe_to_mxnet(image_url, gpu, caffe_prototxt_path, caffe_model_path,
-                                       caffe_mean, mean_diff_allowed, max_diff_allowed):
-    """
-    Run the layer comparison on a caffe model, given its prototxt, weights and mean.
-    The comparison is done by inferring on a given image using both caffe and mxnet model
-    :param image_url: image file or url to run inference on
-    :param gpu: gpu to use, -1 for cpu
-    :param caffe_prototxt_path: path to caffe prototxt
-    :param caffe_model_path: path to caffe weights
-    :param caffe_mean: path to caffe mean file
-    """
-
-    import caffe
-    from caffe_proto_utils import read_network_dag, process_network_proto, read_caffe_mean
-    from convert_model import convert_model
-
-    if isinstance(caffe_mean, str):
-        caffe_mean = read_caffe_mean(caffe_mean)
-    elif caffe_mean is None:
-        pass
-    elif len(caffe_mean) == 3:
-        # swap channels from Caffe BGR to RGB
-        caffe_mean = caffe_mean[::-1]
-
-    # get caffe root location, this is needed to run the upgrade network utility, so we only need
-    # to support parsing of latest caffe
-    caffe_root = os.path.dirname(os.path.dirname(caffe.__path__[0]))
-    caffe_prototxt_path = process_network_proto(caffe_root, caffe_prototxt_path)
-
-    _, layer_name_to_record, top_to_layers = read_network_dag(caffe_prototxt_path)
-
-    caffe.set_mode_cpu()
-    caffe_net = caffe.Net(caffe_prototxt_path, caffe_model_path, caffe.TEST)
-
-    image_dims = tuple(caffe_net.blobs['data'].shape)[2:4]
-
-    logging.info('getting image %s', image_url)
-    img_rgb = read_image(image_url, image_dims, caffe_mean)
-    img_bgr = img_rgb[:, ::-1, :, :]
-
-    caffe_net.blobs['data'].reshape(*img_bgr.shape)
-    caffe_net.blobs['data'].data[...] = img_bgr
-    _ = caffe_net.forward()
-
-    # read sym and add all outputs
-    sym, arg_params, aux_params, _ = convert_model(caffe_prototxt_path, caffe_model_path)
-    sym = sym.get_internals()
-
-    # now mxnet
-    if gpu < 0:
-        ctx = mx.cpu(0)
-    else:
-        ctx = mx.gpu(gpu)
-
-    arg_params, aux_params = _ch_dev(arg_params, aux_params, ctx)
-    arg_params["data"] = mx.nd.array(img_rgb, ctx)
-    arg_params["prob_label"] = mx.nd.empty((1,), ctx)
-    exe = sym.bind(ctx, arg_params, args_grad=None, grad_req="null", aux_states=aux_params)
-    exe.forward(is_train=False)
-
-    compare_layers_from_nets(caffe_net, arg_params, aux_params, exe, layer_name_to_record,
-                             top_to_layers, mean_diff_allowed, max_diff_allowed)
-
-    return
-
-
-def _bfs(root_node, process_node):
-    """
-    Implementation of Breadth-first search (BFS) on caffe network DAG
-    :param root_node: root node of caffe network DAG
-    :param process_node: function to run on each node
-    """
-
-    from collections import deque
-
-    seen_nodes = set()
-    next_nodes = deque()
-
-    seen_nodes.add(root_node)
-    next_nodes.append(root_node)
-
-    while next_nodes:
-        current_node = next_nodes.popleft()
-
-        # process current node
-        process_node(current_node)
-
-        for child_node in current_node.children:
-            if child_node not in seen_nodes:
-                seen_nodes.add(child_node)
-                next_nodes.append(child_node)
-
-
-def compare_layers_from_nets(caffe_net, arg_params, aux_params, exe, layer_name_to_record,
-                             top_to_layers, mean_diff_allowed, max_diff_allowed):
-    """
-    Compare layer by layer of a caffe network with mxnet network
-    :param caffe_net: loaded caffe network
-    :param arg_params: arguments
-    :param aux_params: auxiliary parameters
-    :param exe: mxnet model
-    :param layer_name_to_record: map between caffe layer and information record
-    :param top_to_layers: map between caffe blob name to layers which outputs it (including inplace)
-    :param mean_diff_allowed: mean difference allowed between caffe blob and mxnet blob
-    :param max_diff_allowed: max difference allowed between caffe blob and mxnet blob
-    """
-
-    import re
-
-    log_format = '  {0:<40}  {1:<40}  {2:<8}  {3:>10}  {4:>10}  {5:<1}'
-
-    compare_layers_from_nets.is_first_convolution = True
-
-    def _compare_blob(caf_blob, mx_blob, caf_name, mx_name, blob_type, note):
-        diff = np.abs(mx_blob - caf_blob)
-        diff_mean = diff.mean()
-        diff_max = diff.max()
-        logging.info(log_format.format(caf_name, mx_name, blob_type, '%4.5f' % diff_mean,
-                                       '%4.5f' % diff_max, note))
-        assert diff_mean < mean_diff_allowed
-        assert diff_max < max_diff_allowed
-
-    def _process_layer_parameters(layer):
-
-        logging.debug('processing layer %s of type %s', layer.name, layer.type)
-
-        normalized_layer_name = re.sub('[-/]', '_', layer.name)
-
-        # handle weight and bias of convolution and fully-connected layers
-        if layer.name in caffe_net.params and layer.type in ['Convolution', 'InnerProduct',
-                                                             'Deconvolution']:
-
-            has_bias = len(caffe_net.params[layer.name]) > 1
-
-            mx_name_weight = '{}_weight'.format(normalized_layer_name)
-            mx_beta = arg_params[mx_name_weight].asnumpy()
-
-            # first convolution should change from BGR to RGB
-            if layer.type == 'Convolution' and compare_layers_from_nets.is_first_convolution:
-                compare_layers_from_nets.is_first_convolution = False
-
-                # if RGB or RGBA
-                if mx_beta.shape[1] == 3 or mx_beta.shape[1] == 4:
-                    # Swapping BGR of caffe into RGB in mxnet
-                    mx_beta[:, [0, 2], :, :] = mx_beta[:, [2, 0], :, :]
-
-            caf_beta = caffe_net.params[layer.name][0].data
-            _compare_blob(caf_beta, mx_beta, layer.name, mx_name_weight, 'weight', '')
-
-            if has_bias:
-                mx_name_bias = '{}_bias'.format(normalized_layer_name)
-                mx_gamma = arg_params[mx_name_bias].asnumpy()
-                caf_gamma = caffe_net.params[layer.name][1].data
-                _compare_blob(caf_gamma, mx_gamma, layer.name, mx_name_bias, 'bias', '')
-
-        elif layer.name in caffe_net.params and layer.type == 'Scale':
-
-            if 'scale' in normalized_layer_name:
-                bn_name = normalized_layer_name.replace('scale', 'bn')
-            elif 'sc' in normalized_layer_name:
-                bn_name = normalized_layer_name.replace('sc', 'bn')
-            else:
-                assert False, 'Unknown name convention for bn/scale'
-
-            beta_name = '{}_beta'.format(bn_name)
-            gamma_name = '{}_gamma'.format(bn_name)
-
-            mx_beta = arg_params[beta_name].asnumpy()
-            caf_beta = caffe_net.params[layer.name][1].data
-            _compare_blob(caf_beta, mx_beta, layer.name, beta_name, 'mov_mean', '')
-
-            mx_gamma = arg_params[gamma_name].asnumpy()
-            caf_gamma = caffe_net.params[layer.name][0].data
-            _compare_blob(caf_gamma, mx_gamma, layer.name, gamma_name, 'mov_var', '')
-
-        elif layer.name in caffe_net.params and layer.type == 'BatchNorm':
-
-            mean_name = '{}_moving_mean'.format(normalized_layer_name)
-            var_name = '{}_moving_var'.format(normalized_layer_name)
-
-            caf_rescale_factor = caffe_net.params[layer.name][2].data
-
-            mx_mean = aux_params[mean_name].asnumpy()
-            caf_mean = caffe_net.params[layer.name][0].data / caf_rescale_factor
-            _compare_blob(caf_mean, mx_mean, layer.name, mean_name, 'mean', '')
-
-            mx_var = aux_params[var_name].asnumpy()
-            caf_var = caffe_net.params[layer.name][1].data / caf_rescale_factor
-            _compare_blob(caf_var, mx_var, layer.name, var_name, 'var',
-                          'expect 1e-04 change due to cudnn eps')
-
-        elif layer.type in ['Input', 'Pooling', 'ReLU', 'Eltwise', 'Softmax', 'LRN', 'Concat',
-                            'Dropout', 'Crop']:
-            # no parameters to check for these layers
-            pass
-
-        else:
-            logging.warn('No handling for layer %s of type %s, should we ignore it?', layer.name,
-                         layer.type)
-
-        return
-
-    def _process_layer_output(caffe_blob_name):
-
-        logging.debug('processing blob %s', caffe_blob_name)
-
-        # skip blobs not originating from actual layers, e.g. artificial split layers added by caffe
-        if caffe_blob_name not in top_to_layers:
-            return
-
-        caf_blob = caffe_net.blobs[caffe_blob_name].data
-
-        # data should change from BGR to RGB
-        if caffe_blob_name == 'data':
-
-            # if RGB or RGBA
-            if caf_blob.shape[1] == 3 or caf_blob.shape[1] == 4:
-                # Swapping BGR of caffe into RGB in mxnet
-                caf_blob[:, [0, 2], :, :] = caf_blob[:, [2, 0], :, :]
-            mx_name = 'data'
-
-        else:
-            # get last layer name which outputs this blob name
-            last_layer_name = top_to_layers[caffe_blob_name][-1]
-            normalized_last_layer_name = re.sub('[-/]', '_', last_layer_name)
-            mx_name = '{}_output'.format(normalized_last_layer_name)
-            if 'scale' in mx_name:
-                mx_name = mx_name.replace('scale', 'bn')
-            elif 'sc' in mx_name:
-                mx_name = mx_name.replace('sc', 'bn')
-
-        if mx_name not in exe.output_dict:
-            logging.error('mxnet blob %s is missing, time to extend the compare tool..', mx_name)
-            return
-
-        mx_blob = exe.output_dict[mx_name].asnumpy()
-        _compare_blob(caf_blob, mx_blob, caffe_blob_name, mx_name, 'output', '')
-
-        return
-
-    # check layer parameters
-    logging.info('\n***** Network Parameters '.ljust(140, '*'))
-    logging.info(log_format.format('CAFFE', 'MXNET', 'Type', 'Mean(diff)', 'Max(diff)', 'Note'))
-    first_layer_name = layer_name_to_record.keys()[0]
-    _bfs(layer_name_to_record[first_layer_name], _process_layer_parameters)
-
-    # check layer output
-    logging.info('\n***** Network Outputs '.ljust(140, '*'))
-    logging.info(log_format.format('CAFFE', 'MXNET', 'Type', 'Mean(diff)', 'Max(diff)', 'Note'))
-    for caffe_blob_name in caffe_net.blobs.keys():
-        _process_layer_output(caffe_blob_name)
-
-    return
-
-
-def main():
-    """Entrypoint for compare_layers"""
-
-    parser = argparse.ArgumentParser(
-        description='Tool for testing caffe to mxnet conversion layer by layer')
-    parser.add_argument('--image_url', type=str,
-                        default='https://github.com/dmlc/web-data/raw/master/mxnet/doc/'\
-                                'tutorials/python/predict_image/cat.jpg',
-                        help='input image to test inference, can be either file path or url')
-    parser.add_argument('--caffe_prototxt_path', type=str,
-                        default='./model.prototxt',
-                        help='path to caffe prototxt')
-    parser.add_argument('--caffe_model_path', type=str,
-                        default='./model.caffemodel',
-                        help='path to caffe weights')
-    parser.add_argument('--caffe_mean', type=str,
-                        default='./model_mean.binaryproto',
-                        help='path to caffe mean file')
-    parser.add_argument('--mean_diff_allowed', type=int, default=1e-03,
-                        help='mean difference allowed between caffe blob and mxnet blob')
-    parser.add_argument('--max_diff_allowed', type=int, default=1e-01,
-                        help='max difference allowed between caffe blob and mxnet blob')
-    parser.add_argument('--gpu', type=int, default=-1, help='the gpu id used for predict')
-    args = parser.parse_args()
-    convert_and_compare_caffe_to_mxnet(args.image_url, args.gpu, args.caffe_prototxt_path,
-                                       args.caffe_model_path, args.caffe_mean,
-                                       args.mean_diff_allowed, args.max_diff_allowed)
-
-if __name__ == '__main__':
-    main()
diff --git a/example/ssd/tools/caffe_converter/convert_mean.py b/example/ssd/tools/caffe_converter/convert_mean.py
deleted file mode 100644
index 3b6dc42a7afc..000000000000
--- a/example/ssd/tools/caffe_converter/convert_mean.py
+++ /dev/null
@@ -1,63 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Convert caffe mean
-"""
-import argparse
-import mxnet as mx
-import numpy as np
-import caffe_parser
-
-def convert_mean(binaryproto_fname, output=None):
-    """Convert caffe mean
-
-    Parameters
-    ----------
-    binaryproto_fname : str
-        Filename of the mean
-    output : str, optional
-        Save the mean into mxnet's format
-
-    Returns
-    -------
-    NDArray
-        Mean in ndarray
-    """
-    mean_blob = caffe_parser.caffe_pb2.BlobProto()
-    with open(binaryproto_fname, 'rb') as f:
-        mean_blob.ParseFromString(f.read())
-
-    img_mean_np = np.array(mean_blob.data)
-    img_mean_np = img_mean_np.reshape(
-        mean_blob.channels, mean_blob.height, mean_blob.width
-    )
-    # swap channels from Caffe BGR to RGB
-    img_mean_np[[0, 2], :, :] = img_mean_np[[2, 0], :, :]
-    nd = mx.nd.array(img_mean_np)
-    if output is not None:
-        mx.nd.save(output, {"mean_image": nd})
-    return nd
-
-def main():
-    parser = argparse.ArgumentParser(description='Convert caffe mean')
-    parser.add_argument('binaryproto_fname', help='Filename of the mean')
-    parser.add_argument('output', help='The name of the output file')
-    args = parser.parse_args()
-    convert_mean(args.binaryproto_fname, args.output)
-
-if __name__ == '__main__':
-    main()
diff --git a/example/ssd/tools/caffe_converter/convert_model.py b/example/ssd/tools/caffe_converter/convert_model.py
deleted file mode 100644
index 97bd5fa13e10..000000000000
--- a/example/ssd/tools/caffe_converter/convert_model.py
+++ /dev/null
@@ -1,224 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-import argparse
-import sys
-import caffe_parser
-import mxnet as mx
-import numpy as np
-from convert_symbol import convert_symbol
-
-def convert_model(prototxt_fname, caffemodel_fname, output_prefix=None):
-    """Convert caffe model
-
-    Parameters
-    ----------
-
-    prototxt_fname : str
-         Filename of the prototxt model definition
-    caffemodel_fname : str
-         Filename of the binary caffe model
-    output_prefix : str, optinoal
-         If given, then save the converted MXNet into output_prefx+'.json' and
-         output_prefx+'.params'
-
-    Returns
-    -------
-    sym : Symbol
-         Symbol convereted from prototxt
-    arg_params : list of NDArray
-         Argument parameters
-    aux_params : list of NDArray
-         Aux parameters
-    input_dim : tuple
-         Input dimension
-    """
-    sym, input_dim = convert_symbol(prototxt_fname)
-    arg_shapes, _, aux_shapes = sym.infer_shape(data=tuple(input_dim))
-    arg_names = sym.list_arguments()
-    aux_names = sym.list_auxiliary_states()
-    arg_shape_dic = dict(zip(arg_names, arg_shapes))
-    aux_shape_dic = dict(zip(aux_names, aux_shapes))
-    arg_params = {}
-    aux_params = {}
-    first_conv = True
-
-    layers, names = caffe_parser.read_caffemodel(prototxt_fname, caffemodel_fname)
-    layer_iter = caffe_parser.layer_iter(layers, names)
-    layers_proto = caffe_parser.get_layers(caffe_parser.read_prototxt(prototxt_fname))
-
-    for layer_name, layer_type, layer_blobs in layer_iter:
-        if layer_type == 'Convolution' or layer_type == 'InnerProduct'  \
-           or layer_type == 4 or layer_type == 14 or layer_type == 'PReLU' \
-           or layer_type == 'Deconvolution' or layer_type == 39  or layer_type == 'Normalize':
-            if layer_type == 'PReLU':
-                assert (len(layer_blobs) == 1)
-                wmat = layer_blobs[0].data
-                weight_name = layer_name + '_gamma'
-                arg_params[weight_name] = mx.nd.zeros(wmat.shape)
-                arg_params[weight_name][:] = wmat
-                continue
-            if layer_type == 'Normalize':
-                assert (len(layer_blobs) == 1)
-                weight_name = layer_name + '_scale'
-                wmat = layer_blobs[0].data
-                arg_params[weight_name] = mx.nd.zeros((1, len(wmat), 1, 1))
-                arg_params[weight_name][:] = np.array(list(wmat)).reshape((1, len(wmat), 1, 1))
-                continue
-            wmat_dim = []
-            if getattr(layer_blobs[0].shape, 'dim', None) is not None:
-                if len(layer_blobs[0].shape.dim) > 0:
-                    wmat_dim = layer_blobs[0].shape.dim
-                else:
-                    wmat_dim = [layer_blobs[0].num, layer_blobs[0].channels,
-                                layer_blobs[0].height, layer_blobs[0].width]
-            else:
-                wmat_dim = list(layer_blobs[0].shape)
-            wmat = np.array(layer_blobs[0].data).reshape(wmat_dim)
-
-            channels = wmat_dim[1]
-            if channels == 3 or channels == 4:  # RGB or RGBA
-                if first_conv:
-                    # Swapping BGR of caffe into RGB in mxnet
-                    wmat[:, [0, 2], :, :] = wmat[:, [2, 0], :, :]
-
-            assert(wmat.flags['C_CONTIGUOUS'] is True)
-            sys.stdout.write('converting layer {0}, wmat shape = {1}'.format(
-                layer_name, wmat.shape))
-            if len(layer_blobs) == 2:
-                bias = np.array(layer_blobs[1].data)
-                bias = bias.reshape((bias.shape[0], 1))
-                assert(bias.flags['C_CONTIGUOUS'] is True)
-                bias_name = layer_name + "_bias"
-
-                if bias_name not in arg_shape_dic:
-                    print(bias_name + ' not found in arg_shape_dic.')
-                    continue
-                bias = bias.reshape(arg_shape_dic[bias_name])
-                arg_params[bias_name] = mx.nd.zeros(bias.shape)
-                arg_params[bias_name][:] = bias
-                sys.stdout.write(', bias shape = {}'.format(bias.shape))
-
-            sys.stdout.write('\n')
-            sys.stdout.flush()
-            wmat = wmat.reshape((wmat.shape[0], -1))
-            weight_name = layer_name + "_weight"
-
-            if weight_name not in arg_shape_dic:
-                print(weight_name + ' not found in arg_shape_dic.')
-                continue
-            wmat = wmat.reshape(arg_shape_dic[weight_name])
-            arg_params[weight_name] = mx.nd.zeros(wmat.shape)
-            arg_params[weight_name][:] = wmat
-
-
-            if first_conv and (layer_type == 'Convolution' or layer_type == 4):
-                first_conv = False
-
-        elif layer_type == 'Scale':
-            if 'scale' in layer_name:
-                bn_name = layer_name.replace('scale', 'bn')
-            elif 'sc' in layer_name:
-                bn_name = layer_name.replace('sc', 'bn')
-            else:
-                assert False, 'Unknown name convention for bn/scale'
-
-            gamma = np.array(layer_blobs[0].data)
-            beta = np.array(layer_blobs[1].data)
-            # beta = np.expand_dims(beta, 1)
-            beta_name = '{}_beta'.format(bn_name)
-            gamma_name = '{}_gamma'.format(bn_name)
-
-            beta = beta.reshape(arg_shape_dic[beta_name])
-            gamma = gamma.reshape(arg_shape_dic[gamma_name])
-            arg_params[beta_name] = mx.nd.zeros(beta.shape)
-            arg_params[gamma_name] = mx.nd.zeros(gamma.shape)
-            arg_params[beta_name][:] = beta
-            arg_params[gamma_name][:] = gamma
-
-            assert gamma.flags['C_CONTIGUOUS'] is True
-            assert beta.flags['C_CONTIGUOUS'] is True
-            print('converting scale layer, beta shape = {}, gamma shape = {}'.format(
-                beta.shape, gamma.shape))
-        elif layer_type == 'BatchNorm':
-            bn_name = layer_name
-            mean = np.array(layer_blobs[0].data)
-            var = np.array(layer_blobs[1].data)
-            rescale_factor = layer_blobs[2].data[0]
-            if rescale_factor != 0:
-                rescale_factor = 1 / rescale_factor
-            mean_name = '{}_moving_mean'.format(bn_name)
-            var_name = '{}_moving_var'.format(bn_name)
-            mean = mean.reshape(aux_shape_dic[mean_name])
-            var = var.reshape(aux_shape_dic[var_name])
-            aux_params[mean_name] = mx.nd.zeros(mean.shape)
-            aux_params[var_name] = mx.nd.zeros(var.shape)
-            # Get the original epsilon
-            for idx, layer in enumerate(layers_proto):
-                if layer.name == bn_name:
-                    bn_index = idx
-            eps_caffe = layers_proto[bn_index].batch_norm_param.eps
-            # Compensate for the epsilon shift performed in convert_symbol
-            eps_symbol = float(sym.attr_dict()[bn_name + '_moving_mean']['eps'])
-            eps_correction = eps_caffe - eps_symbol
-            # Fill parameters
-            aux_params[mean_name][:] = mean * rescale_factor
-            aux_params[var_name][:] = var * rescale_factor + eps_correction
-            assert var.flags['C_CONTIGUOUS'] is True
-            assert mean.flags['C_CONTIGUOUS'] is True
-            print('converting batchnorm layer, mean shape = {}, var shape = {}'.format(
-                mean.shape, var.shape))
-
-            fix_gamma = layers_proto[bn_index+1].type != 'Scale'
-            if fix_gamma:
-                gamma_name = '{}_gamma'.format(bn_name)
-                gamma = np.array(np.ones(arg_shape_dic[gamma_name]))
-                beta_name = '{}_beta'.format(bn_name)
-                beta = np.array(np.zeros(arg_shape_dic[beta_name]))
-                arg_params[beta_name] = mx.nd.zeros(beta.shape)
-                arg_params[gamma_name] = mx.nd.zeros(gamma.shape)
-                arg_params[beta_name][:] = beta
-                arg_params[gamma_name][:] = gamma
-                assert gamma.flags['C_CONTIGUOUS'] is True
-                assert beta.flags['C_CONTIGUOUS'] is True
-
-        else:
-            print('\tskipping layer {} of type {}'.format(layer_name, layer_type))
-            assert len(layer_blobs) == 0
-
-    if output_prefix is not None:
-        model = mx.mod.Module(symbol=sym, label_names=None)
-        model.bind(data_shapes=[('data', tuple(input_dim))])
-        model.init_params(arg_params=arg_params, aux_params=aux_params)
-        model.save_checkpoint(output_prefix, 0)
-
-    return sym, arg_params, aux_params, input_dim
-
-def main():
-    parser = argparse.ArgumentParser(
-        description='Caffe prototxt to mxnet model parameter converter.')
-    parser.add_argument('prototxt', help='The prototxt filename')
-    parser.add_argument('caffemodel', help='The binary caffemodel filename')
-    parser.add_argument('save_model_name', help='The name of the output model prefix')
-    args = parser.parse_args()
-
-    convert_model(args.prototxt, args.caffemodel, args.save_model_name)
-    print ('Saved model successfully to {}'.format(args.save_model_name))
-
-if __name__ == '__main__':
-    main()
diff --git a/example/ssd/tools/caffe_converter/convert_symbol.py b/example/ssd/tools/caffe_converter/convert_symbol.py
deleted file mode 100644
index 17fdcd996b96..000000000000
--- a/example/ssd/tools/caffe_converter/convert_symbol.py
+++ /dev/null
@@ -1,394 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-import argparse
-import re
-import caffe_parser
-
-def _get_input(proto):
-    """Get input size
-    """
-    layer = caffe_parser.get_layers(proto)
-    if len(proto.input_dim) > 0:
-        input_dim = proto.input_dim
-    elif len(proto.input_shape) > 0:
-        input_dim = proto.input_shape[0].dim
-    elif layer[0].type == "Input":
-        input_dim = layer[0].input_param.shape[0].dim
-        layer.pop(0)
-    else:
-        raise ValueError('Cannot find input size')
-
-    assert layer[0].type != "Input", 'only support single input'
-    # We assume the first bottom blob of first layer is the output from data layer
-    input_name = layer[0].bottom[0]
-    return input_name, input_dim, layer
-
-def _convert_conv_param(param):
-    """
-    Convert convolution layer parameter from Caffe to MXNet
-    """
-    param_string = "num_filter=%d" % param.num_output
-
-    pad_w = 0
-    pad_h = 0
-    if isinstance(param.pad, int):
-        pad = param.pad
-        param_string += ", pad=(%d, %d)" % (pad, pad)
-    else:
-        if len(param.pad) > 0:
-            pad = param.pad[0]
-            param_string += ", pad=(%d, %d)" % (pad, pad)
-        else:
-            if isinstance(param.pad_w, int):
-                pad_w = param.pad_w
-            if isinstance(param.pad_h, int):
-                pad_h = param.pad_h
-            param_string += ", pad=(%d, %d)" % (pad_h, pad_w)
-
-    if isinstance(param.kernel_size, int):
-        kernel_size = param.kernel_size
-        param_string += ", kernel=(%d,%d)" % (kernel_size, kernel_size)
-    else:
-        if len(param.kernel_size) > 0:
-            kernel_size = param.kernel_size[0]
-            param_string += ", kernel=(%d,%d)" % (kernel_size, kernel_size)
-        else:
-            assert isinstance(param.kernel_w, int)
-            kernel_w = param.kernel_w
-            assert isinstance(param.kernel_h, int)
-            kernel_h = param.kernel_h
-            param_string += ", kernel=(%d,%d)" % (kernel_h, kernel_w)
-
-    stride = 1
-    if isinstance(param.stride, int):
-        stride = param.stride
-    else:
-        stride = 1 if len(param.stride) == 0 else param.stride[0]
-
-    param_string += ", stride=(%d,%d)" % (stride, stride)
-
-    dilate = 1
-    if hasattr(param, 'dilation'):
-        if isinstance(param.dilation, int):
-            dilate = param.dilation
-        else:
-            dilate = 1 if len(param.dilation) == 0 else param.dilation[0]
-
-    param_string += ", no_bias=%s" % (not param.bias_term)
-
-    # deal with dilation. Won't be in deconvolution
-    if dilate > 1:
-        param_string += ", dilate=(%d, %d)" % (dilate, dilate)
-
-    if isinstance(param.group, int):
-        if param.group != 1:
-            param_string += ", num_group=%d" % param.group
-
-    return param_string
-
-def _convert_pooling_param(param):
-    """Convert the pooling layer parameter
-    """
-    param_string = "pooling_convention='full', "
-    if param.global_pooling:
-        param_string += "global_pool=True, kernel=(1,1)"
-    else:
-        param_string += "pad=(%d,%d), kernel=(%d,%d), stride=(%d,%d)" % (
-            param.pad, param.pad, param.kernel_size, param.kernel_size,
-            param.stride, param.stride)
-    if param.pool == 0:
-        param_string += ", pool_type='max'"
-    elif param.pool == 1:
-        param_string += ", pool_type='avg'"
-    else:
-        raise ValueError("Unknown Pooling Method!")
-    return param_string
-
-def _find_layer(layers, name):
-    for layer in layers:
-        if layer.name == name:
-            return layer
-    return None
-
-def _parse_proto(prototxt_fname):
-    """Parse Caffe prototxt into symbol string
-    """
-    proto = caffe_parser.read_prototxt(prototxt_fname)
-
-    # process data layer
-    input_name, input_dim, layers = _get_input(proto)
-    # only support single input, so always use `data` as the input data
-    mapping = {input_name: 'data'}
-    need_flatten = {input_name: False}
-    symbol_string = "import mxnet as mx\ndata = mx.symbol.Variable(name='data')\n"
-
-    flatten_count = 0
-    output_name = ""
-    prev_name = None
-
-    # convert reset layers one by one
-    for i, layer in enumerate(layers):
-        type_string = ''
-        param_string = ''
-        skip_layer = False
-        bottom_order = []
-        name = re.sub('[-/]', '_', layer.name)
-        if layer.type == 'Convolution' or layer.type == 4:
-            type_string = 'mx.symbol.Convolution'
-            param_string = _convert_conv_param(layer.convolution_param)
-            need_flatten[name] = True
-        if layer.type == 'Deconvolution' or layer.type == 39:
-            type_string = 'mx.symbol.Deconvolution'
-            param_string = _convert_conv_param(layer.convolution_param)
-            need_flatten[name] = True
-        if layer.type == 'Pooling' or layer.type == 17:
-            type_string = 'mx.symbol.Pooling'
-            param_string = _convert_pooling_param(layer.pooling_param)
-            need_flatten[name] = True
-        if layer.type == 'ReLU' or layer.type == 18:
-            type_string = 'mx.symbol.Activation'
-            param_string = "act_type='relu'"
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'TanH' or layer.type == 23:
-            type_string = 'mx.symbol.Activation'
-            param_string = "act_type='tanh'"
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'Sigmoid' or layer.type == 19:
-            type_string = 'mx.symbol.Activation'
-            param_string = "act_type='sigmoid'"
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'LRN' or layer.type == 15:
-            type_string = 'mx.symbol.LRN'
-            param = layer.lrn_param
-            param_string = "alpha=%f, beta=%f, knorm=%f, nsize=%d" % (
-                param.alpha, param.beta, param.k, param.local_size)
-            need_flatten[name] = True
-        if layer.type == 'InnerProduct' or layer.type == 14:
-            type_string = 'mx.symbol.FullyConnected'
-            param = layer.inner_product_param
-            param_string = "num_hidden=%d, no_bias=%s" % (
-                param.num_output, not param.bias_term)
-            need_flatten[name] = False
-        if layer.type == 'Dropout' or layer.type == 6:
-            type_string = 'mx.symbol.Dropout'
-            param = layer.dropout_param
-            param_string = "p=%f" % param.dropout_ratio
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'Softmax' or layer.type == 20:
-            if layer.softmax_param.axis == 2:
-                symbol_string += "%s = mx.symbol.transpose(%s, axes=(0,2,1))\n" %\
-                    (mapping[layer.bottom[0]], mapping[layer.bottom[0]])
-                type_string = 'mx.symbol.SoftmaxActivation'
-                param_string = "mode='channel'"
-                need_flatten[name] = False
-            else:
-                type_string = 'mx.symbol.SoftmaxOutput'
-        if layer.type == 'Flatten' or layer.type == 8:
-            if 'softmax' in layer.bottom[0]:
-                prev_name = re.sub('[-/]', '_', layers[i-1].name)
-                skip_layer = True
-            else:
-                type_string = 'mx.symbol.Flatten'
-            need_flatten[name] = False
-        if layer.type == 'Split' or layer.type == 22:
-            type_string = 'split'  # will process later
-        if layer.type == 'Concat' or layer.type == 3:
-            type_string = 'mx.symbol.Concat'
-            need_flatten[name] = True
-        if layer.type == 'Crop':
-            type_string = 'mx.symbol.Crop'
-            need_flatten[name] = True
-            param_string = 'center_crop=True'
-        if layer.type == 'BatchNorm':
-            type_string = 'mx.symbol.BatchNorm'
-            param = layer.batch_norm_param
-            # CuDNN requires eps to be greater than 1e-05
-            # We compensate for this change in convert_model
-            epsilon = param.eps
-            if (epsilon <= 1e-05):
-                epsilon = 1e-04
-            # if next layer is scale, don't fix gamma
-            fix_gamma = layers[i+1].type != 'Scale'
-            param_string = 'use_global_stats=%s, fix_gamma=%s, eps=%f' % (
-                param.use_global_stats, fix_gamma, epsilon)
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'Scale':
-            assert layers[i-1].type == 'BatchNorm'
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-            skip_layer = True
-            prev_name = re.sub('[-/]', '_', layers[i-1].name)
-        if layer.type == 'PReLU':
-            type_string = 'mx.symbol.LeakyReLU'
-            param = layer.prelu_param
-            param_string = "act_type='prelu', slope=%f" % param.filler.value
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'Eltwise':
-            type_string = 'mx.symbol.broadcast_add'
-            param_string = ""
-            need_flatten[name] = False
-        if layer.type == 'Reshape':
-            type_string = 'mx.symbol.Reshape'
-            param = layer.reshape_param
-            param_string = 'shape=(' + ','.join([str(x) for x in list(param.shape.dim)]) + ')'
-            need_flatten[name] = True
-        if layer.type == 'AbsVal':
-            type_string = 'mx.symbol.abs'
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'Normalize':
-            bottom = re.sub('[-/]', '_', layer.bottom[0])
-            conv_layer = _find_layer(layers, bottom)
-            assert conv_layer is not None
-            param = layer.norm_param
-            assert not param.across_spatial and not param.channel_shared
-            assert param.scale_filler.type == 'constant'
-            if conv_layer.type == 'Convolution':
-                scale_name = "%s_scale" % name
-                symbol_string += "%s=mx.sym.Variable(name='%s', shape=(1, %d, 1, 1), init=mx.init.Constant(%f))\n" % \
-                    (scale_name, scale_name, conv_layer.convolution_param.num_output,
-                    param.scale_filler.value)
-                symbol_string += "%s=mx.symbol.L2Normalization(name='%s', data=%s, mode='channel')\n" %\
-                    (name, name, mapping[layer.bottom[0]])
-                symbol_string += "%s=mx.symbol.broadcast_mul(lhs=%s, rhs=%s)\n" %\
-                    (name, scale_name, name)
-                type_string = 'split'
-                need_flatten[name] = True
-            else:
-                raise ValueError('Unknown/Invalid normalize layer!')
-        if layer.type == 'Permute':
-            type_string = 'mx.symbol.transpose'
-            param_string = "axes=(%s)" % (','.join([str(x) for x in layer.permute_param.order]))
-            need_flatten[name] = True
-            from_name = ''
-        if layer.type == 'PriorBox':
-            param = layer.prior_box_param
-            if layer.bottom[0] == 'data':
-                bottom_order = [1]
-            else:
-                bottom_order = [0]
-            try:
-                import math
-                min_size = param.min_size[0] / input_dim[2]
-                max_size = math.sqrt(param.min_size[0] * param.max_size[0]) / input_dim[2]
-                sizes = '(%f, %f)' %(min_size, max_size)
-            except AttributeError:
-                min_size = param.min_size[0] / input_dim[2]
-                sizes = '(%f)' %(min_size)
-            ars = list(param.aspect_ratio)
-            ratios = [1.]
-            for ar in ars:
-                ratios.append(ar)
-                if param.flip:
-                    ratios.append(1. / ar)
-            ratios_string = '(' + ','.join(str(x) for x in ratios) + ')'
-            clip = param.clip
-            if (param.step_h > 0 or param.step_w > 0):
-                step_h = param.step_h
-                step_w = param.step_w
-            elif param.step > 0:
-                step_h = param.step
-                step_w = param.step
-            else:
-                step_h = -1
-                step_w = -1
-            finput_dimh = float(input_dim[2])
-            finput_dimw = float(input_dim[3])
-            step = '(%f, %f)' % (step_h / finput_dimh, step_w / finput_dimw)
-            assert param.offset == 0.5, "currently only support offset = 0.5"
-            symbol_string += '%s = mx.contrib.symbol.MultiBoxPrior(%s, sizes=%s, ratios=%s, clip=%s, steps=%s, name="%s")\n' % \
-                (name, mapping[layer.bottom[0]], sizes, ratios_string, clip, step, name)
-            symbol_string += '%s = mx.symbol.Flatten(data=%s)\n' % (name, name)
-            type_string = 'split'
-            need_flatten[name] = False
-        if layer.type == 'DetectionOutput':
-            bottom_order = [1, 0, 2]
-            param = layer.detection_output_param
-            assert param.share_location == True
-            assert param.background_label_id == 0
-            nms_param = param.nms_param
-            type_string = 'mx.contrib.symbol.MultiBoxDetection'
-            param_string = "nms_threshold=%f, nms_topk=%d, clip=False" % \
-                (nms_param.nms_threshold, nms_param.top_k)
-        if skip_layer:
-            assert len(layer.bottom) == 1
-            symbol_string += "%s = %s\n" % (name, prev_name)
-        elif type_string == '':
-            raise ValueError('Unknown layer %s!' % layer.type)
-        elif type_string != 'split':
-            bottom = layer.bottom
-            if param_string != "":
-                param_string = ", " + param_string
-            if len(bottom) == 1:
-                # print(need_flatten)
-                if need_flatten[mapping[bottom[0]]] and type_string == 'mx.symbol.FullyConnected':
-                    flatten_name = "flatten_%d" % flatten_count
-                    symbol_string += "%s=mx.symbol.Flatten(name='%s', data=%s)\n" % (
-                        flatten_name, flatten_name, mapping[bottom[0]])
-                    flatten_count += 1
-                    need_flatten[flatten_name] = False
-                    bottom[0] = flatten_name
-                    mapping[bottom[0]] = bottom[0]
-                symbol_string += "%s = %s(name='%s', data=%s %s)\n" % (
-                    name, type_string, name, mapping[bottom[0]], param_string)
-            else:
-                if not bottom_order:
-                    bottom_order = range(len(bottom))
-                symbol_string += "%s = %s(name='%s', *[%s] %s)\n" % \
-                                 (name, type_string, name, ','.join([mapping[bottom[x]] for x in bottom_order]), param_string)
-                if layer.type == 'Concat' and layer.concat_param.axis == 2:
-                    symbol_string += "%s = mx.symbol.Reshape(data=%s, shape=(0, -1, 4), name='%s')\n" %\
-                        (name, name, name)
-        for j in range(len(layer.top)):
-            mapping[layer.top[j]] = name
-        output_name = name
-    return symbol_string, output_name, input_dim
-
-def convert_symbol(prototxt_fname):
-    """Convert caffe model definition into Symbol
-
-    Parameters
-    ----------
-    prototxt_fname : str
-        Filename of the prototxt file
-
-    Returns
-    -------
-    Symbol
-        Converted Symbol
-    tuple
-        Input shape
-    """
-    sym, output_name, input_dim = _parse_proto(prototxt_fname)
-    exec(sym)                   # pylint: disable=exec-used
-    _locals = locals()
-    exec("ret = " + output_name, globals(), _locals)  # pylint: disable=exec-used
-    ret = _locals['ret']
-    return ret, input_dim
-
-def main():
-    parser = argparse.ArgumentParser(
-        description='Convert caffe prototxt into Symbol')
-    parser.add_argument('prototxt', help='The prototxt filename')
-    parser.add_argument('output', help='filename for the output json file')
-    args = parser.parse_args()
-
-    sym, _ = convert_symbol(args.prototxt)
-    sym.save(args.output)
-
-if __name__ == '__main__':
-    main()
diff --git a/example/ssd/tools/caffe_converter/make_win32.bat b/example/ssd/tools/caffe_converter/make_win32.bat
deleted file mode 100644
index 1ee8e89f018f..000000000000
--- a/example/ssd/tools/caffe_converter/make_win32.bat
+++ /dev/null
@@ -1,20 +0,0 @@
-rem Licensed to the Apache Software Foundation (ASF) under one
-rem or more contributor license agreements.  See the NOTICE file
-rem distributed with this work for additional information
-rem regarding copyright ownership.  The ASF licenses this file
-rem to you under the Apache License, Version 2.0 (the
-rem "License"); you may not use this file except in compliance
-rem with the License.  You may obtain a copy of the License at
-rem
-rem   http://www.apache.org/licenses/LICENSE-2.0
-rem
-rem Unless required by applicable law or agreed to in writing,
-rem software distributed under the License is distributed on an
-rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-rem KIND, either express or implied.  See the License for the
-rem specific language governing permissions and limitations
-rem under the License.
-
-@protoc --python_out=./ ./caffe_parse/caffe.proto
-@echo done.
-@pause
diff --git a/example/ssd/tools/caffe_converter/mean_image.py b/example/ssd/tools/caffe_converter/mean_image.py
deleted file mode 100644
index e07c6fb281c0..000000000000
--- a/example/ssd/tools/caffe_converter/mean_image.py
+++ /dev/null
@@ -1,67 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import mxnet as mx
-import numpy as np
-import argparse
-
-caffe_flag = True
-try:
-    import caffe
-    from caffe.proto import caffe_pb2
-except ImportError:
-    caffe_flag = False
-    import caffe_parse.caffe_pb2
-
-
-def protoBlobFileToND(proto_file):
-    data = ''
-    file = open(proto_file, "r")
-    if not file:
-        raise Exception("ERROR (" + proto_file + ")!")
-    data = file.read()
-    file.close()
-
-    if caffe_flag:
-        mean_blob = caffe.proto.caffe_pb2.BlobProto()
-    else:
-        mean_blob = caffe_parse.caffe_pb2.BlobProto()
-
-    mean_blob.ParseFromString(data)
-    img_mean_np = np.array(mean_blob.data)
-    img_mean_np = img_mean_np.reshape(
-        mean_blob.channels, mean_blob.height, mean_blob.width
-    )
-    # swap channels from Caffe BGR to RGB
-    img_mean_np2 = img_mean_np
-    img_mean_np[0] = img_mean_np2[2]
-    img_mean_np[2] = img_mean_np2[0]
-    return mx.nd.array(img_mean_np)
-
-
-def main():
-    parser = argparse.ArgumentParser(description='Caffe prototxt to mxnet model parameter converter.\
-                    Note that only basic functions are implemented. You are welcomed to contribute to this file.')
-    parser.add_argument('mean_image_proto', help='The protobuf file in Caffe format')
-    parser.add_argument('save_name', help='The name of the output file prefix')
-    args = parser.parse_args()
-    nd = protoBlobFileToND(args.mean_image_proto)
-    mx.nd.save(args.save_name + ".nd", {"mean_image": nd})
-
-
-if __name__ == '__main__':
-    main()
diff --git a/tools/caffe_converter/.gitignore b/tools/caffe_converter/.gitignore
deleted file mode 100644
index 322dff360126..000000000000
--- a/tools/caffe_converter/.gitignore
+++ /dev/null
@@ -1,2 +0,0 @@
-model/
-Cat-hd-wallpapers.jpg
diff --git a/tools/caffe_converter/Makefile b/tools/caffe_converter/Makefile
deleted file mode 100644
index d39945b7cc33..000000000000
--- a/tools/caffe_converter/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-ifndef PROTOC
-DEPS_PROTOC=../../deps/bin/protoc
-ifneq ("$(wildcard $(DEPS_PROTOC))","")
-PROTOC = $(DEPS_PROTOC)
-else
-PROTOC = protoc
-endif
-endif
-
-all: caffe_pb2.py
-
-clean:
-	rm caffe_pb2.py*
-
-caffe_pb2.py:
-	$(PROTOC) --python_out=./ ./caffe.proto
diff --git a/tools/caffe_converter/README.md b/tools/caffe_converter/README.md
deleted file mode 100644
index 22164d550e9d..000000000000
--- a/tools/caffe_converter/README.md
+++ /dev/null
@@ -1,30 +0,0 @@
-<!--- Licensed to the Apache Software Foundation (ASF) under one -->
-<!--- or more contributor license agreements.  See the NOTICE file -->
-<!--- distributed with this work for additional information -->
-<!--- regarding copyright ownership.  The ASF licenses this file -->
-<!--- to you under the Apache License, Version 2.0 (the -->
-<!--- "License"); you may not use this file except in compliance -->
-<!--- with the License.  You may obtain a copy of the License at -->
-
-<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
-
-<!--- Unless required by applicable law or agreed to in writing, -->
-<!--- software distributed under the License is distributed on an -->
-<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
-<!--- KIND, either express or implied.  See the License for the -->
-<!--- specific language governing permissions and limitations -->
-<!--- under the License. -->
-
-# Convert Caffe Model to Mxnet Format
-
-This folder contains the source codes for this tool.
-
-If Caffe with python binding is installed, we can use the following command to
-convert a Resnet-50 pretrained model.
-
-```bash
-python convert_caffe_modelzoo.py resnet-50
-```
-
-Please refer to
-[docs/faq/caffe.md](../../docs/faq/caffe.md) for more details.
diff --git a/tools/caffe_converter/caffe.proto b/tools/caffe_converter/caffe.proto
deleted file mode 100644
index b8299eeea9ae..000000000000
--- a/tools/caffe_converter/caffe.proto
+++ /dev/null
@@ -1,1416 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-syntax = "proto2";
-
-package caffe;
-
-// Specifies the shape (dimensions) of a Blob.
-message BlobShape {
-  repeated int64 dim = 1 [packed = true];
-}
-
-message BlobProto {
-  optional BlobShape shape = 7;
-  repeated float data = 5 [packed = true];
-  repeated float diff = 6 [packed = true];
-  repeated double double_data = 8 [packed = true];
-  repeated double double_diff = 9 [packed = true];
-
-  // 4D dimensions -- deprecated.  Use "shape" instead.
-  optional int32 num = 1 [default = 0];
-  optional int32 channels = 2 [default = 0];
-  optional int32 height = 3 [default = 0];
-  optional int32 width = 4 [default = 0];
-}
-
-// The BlobProtoVector is simply a way to pass multiple blobproto instances
-// around.
-message BlobProtoVector {
-  repeated BlobProto blobs = 1;
-}
-
-message Datum {
-  optional int32 channels = 1;
-  optional int32 height = 2;
-  optional int32 width = 3;
-  // the actual image data, in bytes
-  optional bytes data = 4;
-  optional int32 label = 5;
-  // Optionally, the datum could also hold float data.
-  repeated float float_data = 6;
-  // If true data contains an encoded image that need to be decoded
-  optional bool encoded = 7 [default = false];
-}
-
-message FillerParameter {
-  // The filler type.
-  optional string type = 1 [default = 'constant'];
-  optional float value = 2 [default = 0]; // the value in constant filler
-  optional float min = 3 [default = 0]; // the min value in uniform filler
-  optional float max = 4 [default = 1]; // the max value in uniform filler
-  optional float mean = 5 [default = 0]; // the mean value in Gaussian filler
-  optional float std = 6 [default = 1]; // the std value in Gaussian filler
-  // The expected number of non-zero output weights for a given input in
-  // Gaussian filler -- the default -1 means don't perform sparsification.
-  optional int32 sparse = 7 [default = -1];
-  // Normalize the filler variance by fan_in, fan_out, or their average.
-  // Applies to 'xavier' and 'msra' fillers.
-  enum VarianceNorm {
-    FAN_IN = 0;
-    FAN_OUT = 1;
-    AVERAGE = 2;
-  }
-  optional VarianceNorm variance_norm = 8 [default = FAN_IN];
-}
-
-message NetParameter {
-  optional string name = 1; // consider giving the network a name
-  // DEPRECATED. See InputParameter. The input blobs to the network.
-  repeated string input = 3;
-  // DEPRECATED. See InputParameter. The shape of the input blobs.
-  repeated BlobShape input_shape = 8;
-
-  // 4D input dimensions -- deprecated.  Use "input_shape" instead.
-  // If specified, for each input blob there should be four
-  // values specifying the num, channels, height and width of the input blob.
-  // Thus, there should be a total of (4 * #input) numbers.
-  repeated int32 input_dim = 4;
-
-  // Whether the network will force every layer to carry out backward operation.
-  // If set False, then whether to carry out backward is determined
-  // automatically according to the net structure and learning rates.
-  optional bool force_backward = 5 [default = false];
-  // The current "state" of the network, including the phase, level, and stage.
-  // Some layers may be included/excluded depending on this state and the states
-  // specified in the layers' include and exclude fields.
-  optional NetState state = 6;
-
-  // Print debugging information about results while running Net::Forward,
-  // Net::Backward, and Net::Update.
-  optional bool debug_info = 7 [default = false];
-
-  // The layers that make up the net.  Each of their configurations, including
-  // connectivity and behavior, is specified as a LayerParameter.
-  repeated LayerParameter layer = 100;  // ID 100 so layers are printed last.
-
-  // DEPRECATED: use 'layer' instead.
-  repeated V1LayerParameter layers = 2;
-}
-
-// NOTE
-// Update the next available ID when you add a new SolverParameter field.
-//
-// SolverParameter next available ID: 41 (last added: type)
-message SolverParameter {
-  //////////////////////////////////////////////////////////////////////////////
-  // Specifying the train and test networks
-  //
-  // Exactly one train net must be specified using one of the following fields:
-  //     train_net_param, train_net, net_param, net
-  // One or more test nets may be specified using any of the following fields:
-  //     test_net_param, test_net, net_param, net
-  // If more than one test net field is specified (e.g., both net and
-  // test_net are specified), they will be evaluated in the field order given
-  // above: (1) test_net_param, (2) test_net, (3) net_param/net.
-  // A test_iter must be specified for each test_net.
-  // A test_level and/or a test_stage may also be specified for each test_net.
-  //////////////////////////////////////////////////////////////////////////////
-
-  // Proto filename for the train net, possibly combined with one or more
-  // test nets.
-  optional string net = 24;
-  // Inline train net param, possibly combined with one or more test nets.
-  optional NetParameter net_param = 25;
-
-  optional string train_net = 1; // Proto filename for the train net.
-  repeated string test_net = 2; // Proto filenames for the test nets.
-  optional NetParameter train_net_param = 21; // Inline train net params.
-  repeated NetParameter test_net_param = 22; // Inline test net params.
-
-  // The states for the train/test nets. Must be unspecified or
-  // specified once per net.
-  //
-  // By default, all states will have solver = true;
-  // train_state will have phase = TRAIN,
-  // and all test_state's will have phase = TEST.
-  // Other defaults are set according to the NetState defaults.
-  optional NetState train_state = 26;
-  repeated NetState test_state = 27;
-
-  // The number of iterations for each test net.
-  repeated int32 test_iter = 3;
-
-  // The number of iterations between two testing phases.
-  optional int32 test_interval = 4 [default = 0];
-  optional bool test_compute_loss = 19 [default = false];
-  // If true, run an initial test pass before the first iteration,
-  // ensuring memory availability and printing the starting value of the loss.
-  optional bool test_initialization = 32 [default = true];
-  optional float base_lr = 5; // The base learning rate
-  // the number of iterations between displaying info. If display = 0, no info
-  // will be displayed.
-  optional int32 display = 6;
-  // Display the loss averaged over the last average_loss iterations
-  optional int32 average_loss = 33 [default = 1];
-  optional int32 max_iter = 7; // the maximum number of iterations
-  // accumulate gradients over `iter_size` x `batch_size` instances
-  optional int32 iter_size = 36 [default = 1];
-
-  // The learning rate decay policy. The currently implemented learning rate
-  // policies are as follows:
-  //    - fixed: always return base_lr.
-  //    - step: return base_lr * gamma ^ (floor(iter / step))
-  //    - exp: return base_lr * gamma ^ iter
-  //    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
-  //    - multistep: similar to step but it allows non uniform steps defined by
-  //      stepvalue
-  //    - poly: the effective learning rate follows a polynomial decay, to be
-  //      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
-  //    - sigmoid: the effective learning rate follows a sigmod decay
-  //      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
-  //
-  // where base_lr, max_iter, gamma, step, stepvalue and power are defined
-  // in the solver parameter protocol buffer, and iter is the current iteration.
-  optional string lr_policy = 8;
-  optional float gamma = 9; // The parameter to compute the learning rate.
-  optional float power = 10; // The parameter to compute the learning rate.
-  optional float momentum = 11; // The momentum value.
-  optional float weight_decay = 12; // The weight decay.
-  // regularization types supported: L1 and L2
-  // controlled by weight_decay
-  optional string regularization_type = 29 [default = "L2"];
-  // the stepsize for learning rate policy "step"
-  optional int32 stepsize = 13;
-  // the stepsize for learning rate policy "multistep"
-  repeated int32 stepvalue = 34;
-
-  // Set clip_gradients to >= 0 to clip parameter gradients to that L2 norm,
-  // whenever their actual L2 norm is larger.
-  optional float clip_gradients = 35 [default = -1];
-
-  optional int32 snapshot = 14 [default = 0]; // The snapshot interval
-  optional string snapshot_prefix = 15; // The prefix for the snapshot.
-  // whether to snapshot diff in the results or not. Snapshotting diff will help
-  // debugging but the final protocol buffer size will be much larger.
-  optional bool snapshot_diff = 16 [default = false];
-  enum SnapshotFormat {
-    HDF5 = 0;
-    BINARYPROTO = 1;
-  }
-  optional SnapshotFormat snapshot_format = 37 [default = BINARYPROTO];
-  // the mode solver will use: 0 for CPU and 1 for GPU. Use GPU in default.
-  enum SolverMode {
-    CPU = 0;
-    GPU = 1;
-  }
-  optional SolverMode solver_mode = 17 [default = GPU];
-  // the device_id will that be used in GPU mode. Use device_id = 0 in default.
-  optional int32 device_id = 18 [default = 0];
-  // If non-negative, the seed with which the Solver will initialize the Caffe
-  // random number generator -- useful for reproducible results. Otherwise,
-  // (and by default) initialize using a seed derived from the system clock.
-  optional int64 random_seed = 20 [default = -1];
-
-  // type of the solver
-  optional string type = 40 [default = "SGD"];
-
-  // numerical stability for RMSProp, AdaGrad and AdaDelta and Adam
-  optional float delta = 31 [default = 1e-8];
-  // parameters for the Adam solver
-  optional float momentum2 = 39 [default = 0.999];
-
-  // RMSProp decay value
-  // MeanSquare(t) = rms_decay*MeanSquare(t-1) + (1-rms_decay)*SquareGradient(t)
-  optional float rms_decay = 38;
-
-  // If true, print information about the state of the net that may help with
-  // debugging learning problems.
-  optional bool debug_info = 23 [default = false];
-
-  // If false, don't save a snapshot after training finishes.
-  optional bool snapshot_after_train = 28 [default = true];
-
-  // DEPRECATED: old solver enum types, use string instead
-  enum SolverType {
-    SGD = 0;
-    NESTEROV = 1;
-    ADAGRAD = 2;
-    RMSPROP = 3;
-    ADADELTA = 4;
-    ADAM = 5;
-  }
-  // DEPRECATED: use type instead of solver_type
-  optional SolverType solver_type = 30 [default = SGD];
-}
-
-// A message that stores the solver snapshots
-message SolverState {
-  optional int32 iter = 1; // The current iteration
-  optional string learned_net = 2; // The file that stores the learned net.
-  repeated BlobProto history = 3; // The history for sgd solvers
-  optional int32 current_step = 4 [default = 0]; // The current step for learning rate
-}
-
-enum Phase {
-   TRAIN = 0;
-   TEST = 1;
-}
-
-message NetState {
-  optional Phase phase = 1 [default = TEST];
-  optional int32 level = 2 [default = 0];
-  repeated string stage = 3;
-}
-
-message NetStateRule {
-  // Set phase to require the NetState have a particular phase (TRAIN or TEST)
-  // to meet this rule.
-  optional Phase phase = 1;
-
-  // Set the minimum and/or maximum levels in which the layer should be used.
-  // Leave undefined to meet the rule regardless of level.
-  optional int32 min_level = 2;
-  optional int32 max_level = 3;
-
-  // Customizable sets of stages to include or exclude.
-  // The net must have ALL of the specified stages and NONE of the specified
-  // "not_stage"s to meet the rule.
-  // (Use multiple NetStateRules to specify conjunctions of stages.)
-  repeated string stage = 4;
-  repeated string not_stage = 5;
-}
-
-// Specifies training parameters (multipliers on global learning constants,
-// and the name and other settings used for weight sharing).
-message ParamSpec {
-  // The names of the parameter blobs -- useful for sharing parameters among
-  // layers, but never required otherwise.  To share a parameter between two
-  // layers, give it a (non-empty) name.
-  optional string name = 1;
-
-  // Whether to require shared weights to have the same shape, or just the same
-  // count -- defaults to STRICT if unspecified.
-  optional DimCheckMode share_mode = 2;
-  enum DimCheckMode {
-    // STRICT (default) requires that num, channels, height, width each match.
-    STRICT = 0;
-    // PERMISSIVE requires only the count (num*channels*height*width) to match.
-    PERMISSIVE = 1;
-  }
-
-  // The multiplier on the global learning rate for this parameter.
-  optional float lr_mult = 3 [default = 1.0];
-
-  // The multiplier on the global weight decay for this parameter.
-  optional float decay_mult = 4 [default = 1.0];
-}
-
-// NOTE
-// Update the next available ID when you add a new LayerParameter field.
-//
-// LayerParameter next available layer-specific ID: 147 (last added: recurrent_param)
-message LayerParameter {
-  optional string name = 1; // the layer name
-  optional string type = 2; // the layer type
-  repeated string bottom = 3; // the name of each bottom blob
-  repeated string top = 4; // the name of each top blob
-
-  // The train / test phase for computation.
-  optional Phase phase = 10;
-
-  // The amount of weight to assign each top blob in the objective.
-  // Each layer assigns a default value, usually of either 0 or 1,
-  // to each top blob.
-  repeated float loss_weight = 5;
-
-  // Specifies training parameters (multipliers on global learning constants,
-  // and the name and other settings used for weight sharing).
-  repeated ParamSpec param = 6;
-
-  // The blobs containing the numeric parameters of the layer.
-  repeated BlobProto blobs = 7;
-
-  // Specifies whether to backpropagate to each bottom. If unspecified,
-  // Caffe will automatically infer whether each input needs backpropagation
-  // to compute parameter gradients. If set to true for some inputs,
-  // backpropagation to those inputs is forced; if set false for some inputs,
-  // backpropagation to those inputs is skipped.
-  //
-  // The size must be either 0 or equal to the number of bottoms.
-  repeated bool propagate_down = 11;
-
-  // Rules controlling whether and when a layer is included in the network,
-  // based on the current NetState.  You may specify a non-zero number of rules
-  // to include OR exclude, but not both.  If no include or exclude rules are
-  // specified, the layer is always included.  If the current NetState meets
-  // ANY (i.e., one or more) of the specified rules, the layer is
-  // included/excluded.
-  repeated NetStateRule include = 8;
-  repeated NetStateRule exclude = 9;
-
-  // Parameters for data pre-processing.
-  optional TransformationParameter transform_param = 100;
-
-  // Parameters shared by loss layers.
-  optional LossParameter loss_param = 101;
-
-  // Layer type-specific parameters.
-  //
-  // Note: certain layers may have more than one computational engine
-  // for their implementation. These layers include an Engine type and
-  // engine parameter for selecting the implementation.
-  // The default for the engine is set by the ENGINE switch at compile-time.
-  optional AccuracyParameter accuracy_param = 102;
-  optional ArgMaxParameter argmax_param = 103;
-  optional BatchNormParameter batch_norm_param = 139;
-  optional BiasParameter bias_param = 141;
-  optional ConcatParameter concat_param = 104;
-  optional ContrastiveLossParameter contrastive_loss_param = 105;
-  optional ConvolutionParameter convolution_param = 106;
-  optional CropParameter crop_param = 144;
-  optional DataParameter data_param = 107;
-  optional DropoutParameter dropout_param = 108;
-  optional DummyDataParameter dummy_data_param = 109;
-  optional EltwiseParameter eltwise_param = 110;
-  optional ELUParameter elu_param = 140;
-  optional EmbedParameter embed_param = 137;
-  optional ExpParameter exp_param = 111;
-  optional FlattenParameter flatten_param = 135;
-  optional HDF5DataParameter hdf5_data_param = 112;
-  optional HDF5OutputParameter hdf5_output_param = 113;
-  optional HingeLossParameter hinge_loss_param = 114;
-  optional ImageDataParameter image_data_param = 115;
-  optional InfogainLossParameter infogain_loss_param = 116;
-  optional InnerProductParameter inner_product_param = 117;
-  optional InputParameter input_param = 143;
-  optional LogParameter log_param = 134;
-  optional LRNParameter lrn_param = 118;
-  optional MemoryDataParameter memory_data_param = 119;
-  optional MVNParameter mvn_param = 120;
-  optional ParameterParameter parameter_param = 145;
-  optional PoolingParameter pooling_param = 121;
-  optional PowerParameter power_param = 122;
-  optional PReLUParameter prelu_param = 131;
-  optional PythonParameter python_param = 130;
-  optional RecurrentParameter recurrent_param = 146;
-  optional ReductionParameter reduction_param = 136;
-  optional ReLUParameter relu_param = 123;
-  optional ReshapeParameter reshape_param = 133;
-  optional ScaleParameter scale_param = 142;
-  optional SigmoidParameter sigmoid_param = 124;
-  optional SoftmaxParameter softmax_param = 125;
-  optional SPPParameter spp_param = 132;
-  optional SliceParameter slice_param = 126;
-  optional TanHParameter tanh_param = 127;
-  optional ThresholdParameter threshold_param = 128;
-  optional TileParameter tile_param = 138;
-  optional WindowDataParameter window_data_param = 129;
-}
-
-// Message that stores parameters used to apply transformation
-// to the data layer's data
-message TransformationParameter {
-  // For data pre-processing, we can do simple scaling and subtracting the
-  // data mean, if provided. Note that the mean subtraction is always carried
-  // out before scaling.
-  optional float scale = 1 [default = 1];
-  // Specify if we want to randomly mirror data.
-  optional bool mirror = 2 [default = false];
-  // Specify if we would like to randomly crop an image.
-  optional uint32 crop_size = 3 [default = 0];
-  // mean_file and mean_value cannot be specified at the same time
-  optional string mean_file = 4;
-  // if specified can be repeated once (would subtract it from all the channels)
-  // or can be repeated the same number of times as channels
-  // (would subtract them from the corresponding channel)
-  repeated float mean_value = 5;
-  // Force the decoded image to have 3 color channels.
-  optional bool force_color = 6 [default = false];
-  // Force the decoded image to have 1 color channels.
-  optional bool force_gray = 7 [default = false];
-}
-
-// Message that stores parameters shared by loss layers
-message LossParameter {
-  // If specified, ignore instances with the given label.
-  optional int32 ignore_label = 1;
-  // How to normalize the loss for loss layers that aggregate across batches,
-  // spatial dimensions, or other dimensions.  Currently only implemented in
-  // SoftmaxWithLoss layer.
-  enum NormalizationMode {
-    // Divide by the number of examples in the batch times spatial dimensions.
-    // Outputs that receive the ignore label will NOT be ignored in computing
-    // the normalization factor.
-    FULL = 0;
-    // Divide by the total number of output locations that do not take the
-    // ignore_label.  If ignore_label is not set, this behaves like FULL.
-    VALID = 1;
-    // Divide by the batch size.
-    BATCH_SIZE = 2;
-    // Do not normalize the loss.
-    NONE = 3;
-  }
-  optional NormalizationMode normalization = 3 [default = VALID];
-  // Deprecated.  Ignored if normalization is specified.  If normalization
-  // is not specified, then setting this to false will be equivalent to
-  // normalization = BATCH_SIZE to be consistent with previous behavior.
-  optional bool normalize = 2;
-}
-
-// Messages that store parameters used by individual layer types follow, in
-// alphabetical order.
-
-message AccuracyParameter {
-  // When computing accuracy, count as correct by comparing the true label to
-  // the top k scoring classes.  By default, only compare to the top scoring
-  // class (i.e. argmax).
-  optional uint32 top_k = 1 [default = 1];
-
-  // The "label" axis of the prediction blob, whose argmax corresponds to the
-  // predicted label -- may be negative to index from the end (e.g., -1 for the
-  // last axis).  For example, if axis == 1 and the predictions are
-  // (N x C x H x W), the label blob is expected to contain N*H*W ground truth
-  // labels with integer values in {0, 1, ..., C-1}.
-  optional int32 axis = 2 [default = 1];
-
-  // If specified, ignore instances with the given label.
-  optional int32 ignore_label = 3;
-}
-
-message ArgMaxParameter {
-  // If true produce pairs (argmax, maxval)
-  optional bool out_max_val = 1 [default = false];
-  optional uint32 top_k = 2 [default = 1];
-  // The axis along which to maximise -- may be negative to index from the
-  // end (e.g., -1 for the last axis).
-  // By default ArgMaxLayer maximizes over the flattened trailing dimensions
-  // for each index of the first / num dimension.
-  optional int32 axis = 3;
-}
-
-message ConcatParameter {
-  // The axis along which to concatenate -- may be negative to index from the
-  // end (e.g., -1 for the last axis).  Other axes must have the
-  // same dimension for all the bottom blobs.
-  // By default, ConcatLayer concatenates blobs along the "channels" axis (1).
-  optional int32 axis = 2 [default = 1];
-
-  // DEPRECATED: alias for "axis" -- does not support negative indexing.
-  optional uint32 concat_dim = 1 [default = 1];
-}
-
-message BatchNormParameter {
-  // If false, accumulate global mean/variance values via a moving average. If
-  // true, use those accumulated values instead of computing mean/variance
-  // across the batch.
-  optional bool use_global_stats = 1;
-  // How much does the moving average decay each iteration?
-  optional float moving_average_fraction = 2 [default = .999];
-  // Small value to add to the variance estimate so that we don't divide by
-  // zero.
-  optional float eps = 3 [default = 1e-5];
-}
-
-message BiasParameter {
-  // The first axis of bottom[0] (the first input Blob) along which to apply
-  // bottom[1] (the second input Blob).  May be negative to index from the end
-  // (e.g., -1 for the last axis).
-  //
-  // For example, if bottom[0] is 4D with shape 100x3x40x60, the output
-  // top[0] will have the same shape, and bottom[1] may have any of the
-  // following shapes (for the given value of axis):
-  //    (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
-  //    (axis == 1 == -3)          3;     3x40;     3x40x60
-  //    (axis == 2 == -2)                   40;       40x60
-  //    (axis == 3 == -1)                                60
-  // Furthermore, bottom[1] may have the empty shape (regardless of the value of
-  // "axis") -- a scalar bias.
-  optional int32 axis = 1 [default = 1];
-
-  // (num_axes is ignored unless just one bottom is given and the bias is
-  // a learned parameter of the layer.  Otherwise, num_axes is determined by the
-  // number of axes by the second bottom.)
-  // The number of axes of the input (bottom[0]) covered by the bias
-  // parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
-  // Set num_axes := 0, to add a zero-axis Blob: a scalar.
-  optional int32 num_axes = 2 [default = 1];
-
-  // (filler is ignored unless just one bottom is given and the bias is
-  // a learned parameter of the layer.)
-  // The initialization for the learned bias parameter.
-  // Default is the zero (0) initialization, resulting in the BiasLayer
-  // initially performing the identity operation.
-  optional FillerParameter filler = 3;
-}
-
-message ContrastiveLossParameter {
-  // margin for dissimilar pair
-  optional float margin = 1 [default = 1.0];
-  // The first implementation of this cost did not exactly match the cost of
-  // Hadsell et al 2006 -- using (margin - d^2) instead of (margin - d)^2.
-  // legacy_version = false (the default) uses (margin - d)^2 as proposed in the
-  // Hadsell paper. New models should probably use this version.
-  // legacy_version = true uses (margin - d^2). This is kept to support /
-  // reproduce existing models and results
-  optional bool legacy_version = 2 [default = false];
-}
-
-message ConvolutionParameter {
-  optional uint32 num_output = 1; // The number of outputs for the layer
-  optional bool bias_term = 2 [default = true]; // whether to have bias terms
-
-  // Pad, kernel size, and stride are all given as a single value for equal
-  // dimensions in all spatial dimensions, or once per spatial dimension.
-  repeated uint32 pad = 3; // The padding size; defaults to 0
-  repeated uint32 kernel_size = 4; // The kernel size
-  repeated uint32 stride = 6; // The stride; defaults to 1
-  // Factor used to dilate the kernel, (implicitly) zero-filling the resulting
-  // holes. (Kernel dilation is sometimes referred to by its use in the
-  // algorithme à trous from Holschneider et al. 1987.)
-  repeated uint32 dilation = 18; // The dilation; defaults to 1
-
-  // For 2D convolution only, the *_h and *_w versions may also be used to
-  // specify both spatial dimensions.
-  optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
-  optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
-  optional uint32 kernel_h = 11; // The kernel height (2D only)
-  optional uint32 kernel_w = 12; // The kernel width (2D only)
-  optional uint32 stride_h = 13; // The stride height (2D only)
-  optional uint32 stride_w = 14; // The stride width (2D only)
-
-  optional uint32 group = 5 [default = 1]; // The group size for group conv
-
-  optional FillerParameter weight_filler = 7; // The filler for the weight
-  optional FillerParameter bias_filler = 8; // The filler for the bias
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 15 [default = DEFAULT];
-
-  // The axis to interpret as "channels" when performing convolution.
-  // Preceding dimensions are treated as independent inputs;
-  // succeeding dimensions are treated as "spatial".
-  // With (N, C, H, W) inputs, and axis == 1 (the default), we perform
-  // N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
-  // groups g>1) filters across the spatial axes (H, W) of the input.
-  // With (N, C, D, H, W) inputs, and axis == 1, we perform
-  // N independent 3D convolutions, sliding (C/g)-channels
-  // filters across the spatial axes (D, H, W) of the input.
-  optional int32 axis = 16 [default = 1];
-
-  // Whether to force use of the general ND convolution, even if a specific
-  // implementation for blobs of the appropriate number of spatial dimensions
-  // is available. (Currently, there is only a 2D-specific convolution
-  // implementation; for input blobs with num_axes != 2, this option is
-  // ignored and the ND implementation will be used.)
-  optional bool force_nd_im2col = 17 [default = false];
-}
-
-message CropParameter {
-  // To crop, elements of the first bottom are selected to fit the dimensions
-  // of the second, reference bottom. The crop is configured by
-  // - the crop `axis` to pick the dimensions for cropping
-  // - the crop `offset` to set the shift for all/each dimension
-  // to align the cropped bottom with the reference bottom.
-  // All dimensions up to but excluding `axis` are preserved, while
-  // the dimensions including and trailing `axis` are cropped.
-  // If only one `offset` is set, then all dimensions are offset by this amount.
-  // Otherwise, the number of offsets must equal the number of cropped axes to
-  // shift the crop in each dimension accordingly.
-  // Note: standard dimensions are N,C,H,W so the default is a spatial crop,
-  // and `axis` may be negative to index from the end (e.g., -1 for the last
-  // axis).
-  optional int32 axis = 1 [default = 2];
-  repeated uint32 offset = 2;
-}
-
-message DataParameter {
-  enum DB {
-    LEVELDB = 0;
-    LMDB = 1;
-  }
-  // Specify the data source.
-  optional string source = 1;
-  // Specify the batch size.
-  optional uint32 batch_size = 4;
-  // The rand_skip variable is for the data layer to skip a few data points
-  // to avoid all asynchronous sgd clients to start at the same point. The skip
-  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-  // be larger than the number of keys in the database.
-  // DEPRECATED. Each solver accesses a different subset of the database.
-  optional uint32 rand_skip = 7 [default = 0];
-  optional DB backend = 8 [default = LEVELDB];
-  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
-  // simple scaling and subtracting the data mean, if provided. Note that the
-  // mean subtraction is always carried out before scaling.
-  optional float scale = 2 [default = 1];
-  optional string mean_file = 3;
-  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
-  // crop an image.
-  optional uint32 crop_size = 5 [default = 0];
-  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
-  // data.
-  optional bool mirror = 6 [default = false];
-  // Force the encoded image to have 3 color channels
-  optional bool force_encoded_color = 9 [default = false];
-  // Prefetch queue (Number of batches to prefetch to host memory, increase if
-  // data access bandwidth varies).
-  optional uint32 prefetch = 10 [default = 4];
-}
-
-message DropoutParameter {
-  optional float dropout_ratio = 1 [default = 0.5]; // dropout ratio
-}
-
-// DummyDataLayer fills any number of arbitrarily shaped blobs with random
-// (or constant) data generated by "Fillers" (see "message FillerParameter").
-message DummyDataParameter {
-  // This layer produces N >= 1 top blobs.  DummyDataParameter must specify 1 or N
-  // shape fields, and 0, 1 or N data_fillers.
-  //
-  // If 0 data_fillers are specified, ConstantFiller with a value of 0 is used.
-  // If 1 data_filler is specified, it is applied to all top blobs.  If N are
-  // specified, the ith is applied to the ith top blob.
-  repeated FillerParameter data_filler = 1;
-  repeated BlobShape shape = 6;
-
-  // 4D dimensions -- deprecated.  Use "shape" instead.
-  repeated uint32 num = 2;
-  repeated uint32 channels = 3;
-  repeated uint32 height = 4;
-  repeated uint32 width = 5;
-}
-
-message EltwiseParameter {
-  enum EltwiseOp {
-    PROD = 0;
-    SUM = 1;
-    MAX = 2;
-  }
-  optional EltwiseOp operation = 1 [default = SUM]; // element-wise operation
-  repeated float coeff = 2; // blob-wise coefficient for SUM operation
-
-  // Whether to use an asymptotically slower (for >2 inputs) but stabler method
-  // of computing the gradient for the PROD operation. (No effect for SUM op.)
-  optional bool stable_prod_grad = 3 [default = true];
-}
-
-// Message that stores parameters used by ELULayer
-message ELUParameter {
-  // Described in:
-  // Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2015). Fast and Accurate
-  // Deep Network Learning by Exponential Linear Units (ELUs). arXiv
-  optional float alpha = 1 [default = 1];
-}
-
-// Message that stores parameters used by EmbedLayer
-message EmbedParameter {
-  optional uint32 num_output = 1; // The number of outputs for the layer
-  // The input is given as integers to be interpreted as one-hot
-  // vector indices with dimension num_input.  Hence num_input should be
-  // 1 greater than the maximum possible input value.
-  optional uint32 input_dim = 2;
-
-  optional bool bias_term = 3 [default = true]; // Whether to use a bias term
-  optional FillerParameter weight_filler = 4; // The filler for the weight
-  optional FillerParameter bias_filler = 5; // The filler for the bias
-
-}
-
-// Message that stores parameters used by ExpLayer
-message ExpParameter {
-  // ExpLayer computes outputs y = base ^ (shift + scale * x), for base > 0.
-  // Or if base is set to the default (-1), base is set to e,
-  // so y = exp(shift + scale * x).
-  optional float base = 1 [default = -1.0];
-  optional float scale = 2 [default = 1.0];
-  optional float shift = 3 [default = 0.0];
-}
-
-/// Message that stores parameters used by FlattenLayer
-message FlattenParameter {
-  // The first axis to flatten: all preceding axes are retained in the output.
-  // May be negative to index from the end (e.g., -1 for the last axis).
-  optional int32 axis = 1 [default = 1];
-
-  // The last axis to flatten: all following axes are retained in the output.
-  // May be negative to index from the end (e.g., the default -1 for the last
-  // axis).
-  optional int32 end_axis = 2 [default = -1];
-}
-
-// Message that stores parameters used by HDF5DataLayer
-message HDF5DataParameter {
-  // Specify the data source.
-  optional string source = 1;
-  // Specify the batch size.
-  optional uint32 batch_size = 2;
-
-  // Specify whether to shuffle the data.
-  // If shuffle == true, the ordering of the HDF5 files is shuffled,
-  // and the ordering of data within any given HDF5 file is shuffled,
-  // but data between different files are not interleaved; all of a file's
-  // data are output (in a random order) before moving onto another file.
-  optional bool shuffle = 3 [default = false];
-}
-
-message HDF5OutputParameter {
-  optional string file_name = 1;
-}
-
-message HingeLossParameter {
-  enum Norm {
-    L1 = 1;
-    L2 = 2;
-  }
-  // Specify the Norm to use L1 or L2
-  optional Norm norm = 1 [default = L1];
-}
-
-message ImageDataParameter {
-  // Specify the data source.
-  optional string source = 1;
-  // Specify the batch size.
-  optional uint32 batch_size = 4 [default = 1];
-  // The rand_skip variable is for the data layer to skip a few data points
-  // to avoid all asynchronous sgd clients to start at the same point. The skip
-  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-  // be larger than the number of keys in the database.
-  optional uint32 rand_skip = 7 [default = 0];
-  // Whether or not ImageLayer should shuffle the list of files at every epoch.
-  optional bool shuffle = 8 [default = false];
-  // It will also resize images if new_height or new_width are not zero.
-  optional uint32 new_height = 9 [default = 0];
-  optional uint32 new_width = 10 [default = 0];
-  // Specify if the images are color or gray
-  optional bool is_color = 11 [default = true];
-  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
-  // simple scaling and subtracting the data mean, if provided. Note that the
-  // mean subtraction is always carried out before scaling.
-  optional float scale = 2 [default = 1];
-  optional string mean_file = 3;
-  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
-  // crop an image.
-  optional uint32 crop_size = 5 [default = 0];
-  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
-  // data.
-  optional bool mirror = 6 [default = false];
-  optional string root_folder = 12 [default = ""];
-}
-
-message InfogainLossParameter {
-  // Specify the infogain matrix source.
-  optional string source = 1;
-}
-
-message InnerProductParameter {
-  optional uint32 num_output = 1; // The number of outputs for the layer
-  optional bool bias_term = 2 [default = true]; // whether to have bias terms
-  optional FillerParameter weight_filler = 3; // The filler for the weight
-  optional FillerParameter bias_filler = 4; // The filler for the bias
-
-  // The first axis to be lumped into a single inner product computation;
-  // all preceding axes are retained in the output.
-  // May be negative to index from the end (e.g., -1 for the last axis).
-  optional int32 axis = 5 [default = 1];
-  // Specify whether to transpose the weight matrix or not.
-  // If transpose == true, any operations will be performed on the transpose
-  // of the weight matrix. The weight matrix itself is not going to be transposed
-  // but rather the transfer flag of operations will be toggled accordingly.
-  optional bool transpose = 6 [default = false];
-}
-
-message InputParameter {
-  // This layer produces N >= 1 top blob(s) to be assigned manually.
-  // Define N shapes to set a shape for each top.
-  // Define 1 shape to set the same shape for every top.
-  // Define no shape to defer to reshaping manually.
-  repeated BlobShape shape = 1;
-}
-
-// Message that stores parameters used by LogLayer
-message LogParameter {
-  // LogLayer computes outputs y = log_base(shift + scale * x), for base > 0.
-  // Or if base is set to the default (-1), base is set to e,
-  // so y = ln(shift + scale * x) = log_e(shift + scale * x)
-  optional float base = 1 [default = -1.0];
-  optional float scale = 2 [default = 1.0];
-  optional float shift = 3 [default = 0.0];
-}
-
-// Message that stores parameters used by LRNLayer
-message LRNParameter {
-  optional uint32 local_size = 1 [default = 5];
-  optional float alpha = 2 [default = 1.];
-  optional float beta = 3 [default = 0.75];
-  enum NormRegion {
-    ACROSS_CHANNELS = 0;
-    WITHIN_CHANNEL = 1;
-  }
-  optional NormRegion norm_region = 4 [default = ACROSS_CHANNELS];
-  optional float k = 5 [default = 1.];
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 6 [default = DEFAULT];
-}
-
-message MemoryDataParameter {
-  optional uint32 batch_size = 1;
-  optional uint32 channels = 2;
-  optional uint32 height = 3;
-  optional uint32 width = 4;
-}
-
-message MVNParameter {
-  // This parameter can be set to false to normalize mean only
-  optional bool normalize_variance = 1 [default = true];
-
-  // This parameter can be set to true to perform DNN-like MVN
-  optional bool across_channels = 2 [default = false];
-
-  // Epsilon for not dividing by zero while normalizing variance
-  optional float eps = 3 [default = 1e-9];
-}
-
-message ParameterParameter {
-  optional BlobShape shape = 1;
-}
-
-message PoolingParameter {
-  enum PoolMethod {
-    MAX = 0;
-    AVE = 1;
-    STOCHASTIC = 2;
-  }
-  optional PoolMethod pool = 1 [default = MAX]; // The pooling method
-  // Pad, kernel size, and stride are all given as a single value for equal
-  // dimensions in height and width or as Y, X pairs.
-  optional uint32 pad = 4 [default = 0]; // The padding size (equal in Y, X)
-  optional uint32 pad_h = 9 [default = 0]; // The padding height
-  optional uint32 pad_w = 10 [default = 0]; // The padding width
-  optional uint32 kernel_size = 2; // The kernel size (square)
-  optional uint32 kernel_h = 5; // The kernel height
-  optional uint32 kernel_w = 6; // The kernel width
-  optional uint32 stride = 3 [default = 1]; // The stride (equal in Y, X)
-  optional uint32 stride_h = 7; // The stride height
-  optional uint32 stride_w = 8; // The stride width
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 11 [default = DEFAULT];
-  // If global_pooling then it will pool over the size of the bottom by doing
-  // kernel_h = bottom->height and kernel_w = bottom->width
-  optional bool global_pooling = 12 [default = false];
-}
-
-message PowerParameter {
-  // PowerLayer computes outputs y = (shift + scale * x) ^ power.
-  optional float power = 1 [default = 1.0];
-  optional float scale = 2 [default = 1.0];
-  optional float shift = 3 [default = 0.0];
-}
-
-message PythonParameter {
-  optional string module = 1;
-  optional string layer = 2;
-  // This value is set to the attribute `param_str` of the `PythonLayer` object
-  // in Python before calling the `setup()` method. This could be a number,
-  // string, dictionary in Python dict format, JSON, etc. You may parse this
-  // string in `setup` method and use it in `forward` and `backward`.
-  optional string param_str = 3 [default = ''];
-  // Whether this PythonLayer is shared among worker solvers during data parallelism.
-  // If true, each worker solver sequentially run forward from this layer.
-  // This value should be set true if you are using it as a data layer.
-  optional bool share_in_parallel = 4 [default = false];
-}
-
-// Message that stores parameters used by RecurrentLayer
-message RecurrentParameter {
-  // The dimension of the output (and usually hidden state) representation --
-  // must be explicitly set to non-zero.
-  optional uint32 num_output = 1 [default = 0];
-
-  optional FillerParameter weight_filler = 2; // The filler for the weight
-  optional FillerParameter bias_filler = 3; // The filler for the bias
-
-  // Whether to enable displaying debug_info in the unrolled recurrent net.
-  optional bool debug_info = 4 [default = false];
-
-  // Whether to add as additional inputs (bottoms) the initial hidden state
-  // blobs, and add as additional outputs (tops) the final timestep hidden state
-  // blobs.  The number of additional bottom/top blobs required depends on the
-  // recurrent architecture -- e.g., 1 for RNNs, 2 for LSTMs.
-  optional bool expose_hidden = 5 [default = false];
-}
-
-// Message that stores parameters used by ReductionLayer
-message ReductionParameter {
-  enum ReductionOp {
-    SUM = 1;
-    ASUM = 2;
-    SUMSQ = 3;
-    MEAN = 4;
-  }
-
-  optional ReductionOp operation = 1 [default = SUM]; // reduction operation
-
-  // The first axis to reduce to a scalar -- may be negative to index from the
-  // end (e.g., -1 for the last axis).
-  // (Currently, only reduction along ALL "tail" axes is supported; reduction
-  // of axis M through N, where N < num_axes - 1, is unsupported.)
-  // Suppose we have an n-axis bottom Blob with shape:
-  //     (d0, d1, d2, ..., d(m-1), dm, d(m+1), ..., d(n-1)).
-  // If axis == m, the output Blob will have shape
-  //     (d0, d1, d2, ..., d(m-1)),
-  // and the ReductionOp operation is performed (d0 * d1 * d2 * ... * d(m-1))
-  // times, each including (dm * d(m+1) * ... * d(n-1)) individual data.
-  // If axis == 0 (the default), the output Blob always has the empty shape
-  // (count 1), performing reduction across the entire input --
-  // often useful for creating new loss functions.
-  optional int32 axis = 2 [default = 0];
-
-  optional float coeff = 3 [default = 1.0]; // coefficient for output
-}
-
-// Message that stores parameters used by ReLULayer
-message ReLUParameter {
-  // Allow non-zero slope for negative inputs to speed up optimization
-  // Described in:
-  // Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities
-  // improve neural network acoustic models. In ICML Workshop on Deep Learning
-  // for Audio, Speech, and Language Processing.
-  optional float negative_slope = 1 [default = 0];
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 2 [default = DEFAULT];
-}
-
-message ReshapeParameter {
-  // Specify the output dimensions. If some of the dimensions are set to 0,
-  // the corresponding dimension from the bottom layer is used (unchanged).
-  // Exactly one dimension may be set to -1, in which case its value is
-  // inferred from the count of the bottom blob and the remaining dimensions.
-  // For example, suppose we want to reshape a 2D blob "input" with shape 2 x 8:
-  //
-  //   layer {
-  //     type: "Reshape" bottom: "input" top: "output"
-  //     reshape_param { ... }
-  //   }
-  //
-  // If "input" is 2D with shape 2 x 8, then the following reshape_param
-  // specifications are all equivalent, producing a 3D blob "output" with shape
-  // 2 x 2 x 4:
-  //
-  //   reshape_param { shape { dim:  2  dim: 2  dim:  4 } }
-  //   reshape_param { shape { dim:  0  dim: 2  dim:  4 } }
-  //   reshape_param { shape { dim:  0  dim: 2  dim: -1 } }
-  //   reshape_param { shape { dim:  0  dim:-1  dim:  4 } }
-  //
-  optional BlobShape shape = 1;
-
-  // axis and num_axes control the portion of the bottom blob's shape that are
-  // replaced by (included in) the reshape. By default (axis == 0 and
-  // num_axes == -1), the entire bottom blob shape is included in the reshape,
-  // and hence the shape field must specify the entire output shape.
-  //
-  // axis may be non-zero to retain some portion of the beginning of the input
-  // shape (and may be negative to index from the end; e.g., -1 to begin the
-  // reshape after the last axis, including nothing in the reshape,
-  // -2 to include only the last axis, etc.).
-  //
-  // For example, suppose "input" is a 2D blob with shape 2 x 8.
-  // Then the following ReshapeLayer specifications are all equivalent,
-  // producing a blob "output" with shape 2 x 2 x 4:
-  //
-  //   reshape_param { shape { dim: 2  dim: 2  dim: 4 } }
-  //   reshape_param { shape { dim: 2  dim: 4 } axis:  1 }
-  //   reshape_param { shape { dim: 2  dim: 4 } axis: -3 }
-  //
-  // num_axes specifies the extent of the reshape.
-  // If num_axes >= 0 (and axis >= 0), the reshape will be performed only on
-  // input axes in the range [axis, axis+num_axes].
-  // num_axes may also be -1, the default, to include all remaining axes
-  // (starting from axis).
-  //
-  // For example, suppose "input" is a 2D blob with shape 2 x 8.
-  // Then the following ReshapeLayer specifications are equivalent,
-  // producing a blob "output" with shape 1 x 2 x 8.
-  //
-  //   reshape_param { shape { dim:  1  dim: 2  dim:  8 } }
-  //   reshape_param { shape { dim:  1  dim: 2  }  num_axes: 1 }
-  //   reshape_param { shape { dim:  1  }  num_axes: 0 }
-  //
-  // On the other hand, these would produce output blob shape 2 x 1 x 8:
-  //
-  //   reshape_param { shape { dim: 2  dim: 1  dim: 8  }  }
-  //   reshape_param { shape { dim: 1 }  axis: 1  num_axes: 0 }
-  //
-  optional int32 axis = 2 [default = 0];
-  optional int32 num_axes = 3 [default = -1];
-}
-
-message ScaleParameter {
-  // The first axis of bottom[0] (the first input Blob) along which to apply
-  // bottom[1] (the second input Blob).  May be negative to index from the end
-  // (e.g., -1 for the last axis).
-  //
-  // For example, if bottom[0] is 4D with shape 100x3x40x60, the output
-  // top[0] will have the same shape, and bottom[1] may have any of the
-  // following shapes (for the given value of axis):
-  //    (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
-  //    (axis == 1 == -3)          3;     3x40;     3x40x60
-  //    (axis == 2 == -2)                   40;       40x60
-  //    (axis == 3 == -1)                                60
-  // Furthermore, bottom[1] may have the empty shape (regardless of the value of
-  // "axis") -- a scalar multiplier.
-  optional int32 axis = 1 [default = 1];
-
-  // (num_axes is ignored unless just one bottom is given and the scale is
-  // a learned parameter of the layer.  Otherwise, num_axes is determined by the
-  // number of axes by the second bottom.)
-  // The number of axes of the input (bottom[0]) covered by the scale
-  // parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
-  // Set num_axes := 0, to multiply with a zero-axis Blob: a scalar.
-  optional int32 num_axes = 2 [default = 1];
-
-  // (filler is ignored unless just one bottom is given and the scale is
-  // a learned parameter of the layer.)
-  // The initialization for the learned scale parameter.
-  // Default is the unit (1) initialization, resulting in the ScaleLayer
-  // initially performing the identity operation.
-  optional FillerParameter filler = 3;
-
-  // Whether to also learn a bias (equivalent to a ScaleLayer+BiasLayer, but
-  // may be more efficient).  Initialized with bias_filler (defaults to 0).
-  optional bool bias_term = 4 [default = false];
-  optional FillerParameter bias_filler = 5;
-}
-
-message SigmoidParameter {
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 1 [default = DEFAULT];
-}
-
-message SliceParameter {
-  // The axis along which to slice -- may be negative to index from the end
-  // (e.g., -1 for the last axis).
-  // By default, SliceLayer concatenates blobs along the "channels" axis (1).
-  optional int32 axis = 3 [default = 1];
-  repeated uint32 slice_point = 2;
-
-  // DEPRECATED: alias for "axis" -- does not support negative indexing.
-  optional uint32 slice_dim = 1 [default = 1];
-}
-
-// Message that stores parameters used by SoftmaxLayer, SoftmaxWithLossLayer
-message SoftmaxParameter {
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 1 [default = DEFAULT];
-
-  // The axis along which to perform the softmax -- may be negative to index
-  // from the end (e.g., -1 for the last axis).
-  // Any other axes will be evaluated as independent softmaxes.
-  optional int32 axis = 2 [default = 1];
-}
-
-message TanHParameter {
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 1 [default = DEFAULT];
-}
-
-// Message that stores parameters used by TileLayer
-message TileParameter {
-  // The index of the axis to tile.
-  optional int32 axis = 1 [default = 1];
-
-  // The number of copies (tiles) of the blob to output.
-  optional int32 tiles = 2;
-}
-
-// Message that stores parameters used by ThresholdLayer
-message ThresholdParameter {
-  optional float threshold = 1 [default = 0]; // Strictly positive values
-}
-
-message WindowDataParameter {
-  // Specify the data source.
-  optional string source = 1;
-  // For data pre-processing, we can do simple scaling and subtracting the
-  // data mean, if provided. Note that the mean subtraction is always carried
-  // out before scaling.
-  optional float scale = 2 [default = 1];
-  optional string mean_file = 3;
-  // Specify the batch size.
-  optional uint32 batch_size = 4;
-  // Specify if we would like to randomly crop an image.
-  optional uint32 crop_size = 5 [default = 0];
-  // Specify if we want to randomly mirror data.
-  optional bool mirror = 6 [default = false];
-  // Foreground (object) overlap threshold
-  optional float fg_threshold = 7 [default = 0.5];
-  // Background (non-object) overlap threshold
-  optional float bg_threshold = 8 [default = 0.5];
-  // Fraction of batch that should be foreground objects
-  optional float fg_fraction = 9 [default = 0.25];
-  // Amount of contextual padding to add around a window
-  // (used only by the window_data_layer)
-  optional uint32 context_pad = 10 [default = 0];
-  // Mode for cropping out a detection window
-  // warp: cropped window is warped to a fixed size and aspect ratio
-  // square: the tightest square around the window is cropped
-  optional string crop_mode = 11 [default = "warp"];
-  // cache_images: will load all images in memory for faster access
-  optional bool cache_images = 12 [default = false];
-  // append root_folder to locate images
-  optional string root_folder = 13 [default = ""];
-}
-
-message SPPParameter {
-  enum PoolMethod {
-    MAX = 0;
-    AVE = 1;
-    STOCHASTIC = 2;
-  }
-  optional uint32 pyramid_height = 1;
-  optional PoolMethod pool = 2 [default = MAX]; // The pooling method
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 6 [default = DEFAULT];
-}
-
-// DEPRECATED: use LayerParameter.
-message V1LayerParameter {
-  repeated string bottom = 2;
-  repeated string top = 3;
-  optional string name = 4;
-  repeated NetStateRule include = 32;
-  repeated NetStateRule exclude = 33;
-  enum LayerType {
-    NONE = 0;
-    ABSVAL = 35;
-    ACCURACY = 1;
-    ARGMAX = 30;
-    BNLL = 2;
-    CONCAT = 3;
-    CONTRASTIVE_LOSS = 37;
-    CONVOLUTION = 4;
-    DATA = 5;
-    DECONVOLUTION = 39;
-    DROPOUT = 6;
-    DUMMY_DATA = 32;
-    EUCLIDEAN_LOSS = 7;
-    ELTWISE = 25;
-    EXP = 38;
-    FLATTEN = 8;
-    HDF5_DATA = 9;
-    HDF5_OUTPUT = 10;
-    HINGE_LOSS = 28;
-    IM2COL = 11;
-    IMAGE_DATA = 12;
-    INFOGAIN_LOSS = 13;
-    INNER_PRODUCT = 14;
-    LRN = 15;
-    MEMORY_DATA = 29;
-    MULTINOMIAL_LOGISTIC_LOSS = 16;
-    MVN = 34;
-    POOLING = 17;
-    POWER = 26;
-    RELU = 18;
-    SIGMOID = 19;
-    SIGMOID_CROSS_ENTROPY_LOSS = 27;
-    SILENCE = 36;
-    SOFTMAX = 20;
-    SOFTMAX_LOSS = 21;
-    SPLIT = 22;
-    SLICE = 33;
-    TANH = 23;
-    WINDOW_DATA = 24;
-    THRESHOLD = 31;
-  }
-  optional LayerType type = 5;
-  repeated BlobProto blobs = 6;
-  repeated string param = 1001;
-  repeated DimCheckMode blob_share_mode = 1002;
-  enum DimCheckMode {
-    STRICT = 0;
-    PERMISSIVE = 1;
-  }
-  repeated float blobs_lr = 7;
-  repeated float weight_decay = 8;
-  repeated float loss_weight = 35;
-  optional AccuracyParameter accuracy_param = 27;
-  optional ArgMaxParameter argmax_param = 23;
-  optional ConcatParameter concat_param = 9;
-  optional ContrastiveLossParameter contrastive_loss_param = 40;
-  optional ConvolutionParameter convolution_param = 10;
-  optional DataParameter data_param = 11;
-  optional DropoutParameter dropout_param = 12;
-  optional DummyDataParameter dummy_data_param = 26;
-  optional EltwiseParameter eltwise_param = 24;
-  optional ExpParameter exp_param = 41;
-  optional HDF5DataParameter hdf5_data_param = 13;
-  optional HDF5OutputParameter hdf5_output_param = 14;
-  optional HingeLossParameter hinge_loss_param = 29;
-  optional ImageDataParameter image_data_param = 15;
-  optional InfogainLossParameter infogain_loss_param = 16;
-  optional InnerProductParameter inner_product_param = 17;
-  optional LRNParameter lrn_param = 18;
-  optional MemoryDataParameter memory_data_param = 22;
-  optional MVNParameter mvn_param = 34;
-  optional PoolingParameter pooling_param = 19;
-  optional PowerParameter power_param = 21;
-  optional ReLUParameter relu_param = 30;
-  optional SigmoidParameter sigmoid_param = 38;
-  optional SoftmaxParameter softmax_param = 39;
-  optional SliceParameter slice_param = 31;
-  optional TanHParameter tanh_param = 37;
-  optional ThresholdParameter threshold_param = 25;
-  optional WindowDataParameter window_data_param = 20;
-  optional TransformationParameter transform_param = 36;
-  optional LossParameter loss_param = 42;
-  optional V0LayerParameter layer = 1;
-}
-
-// DEPRECATED: V0LayerParameter is the old way of specifying layer parameters
-// in Caffe.  We keep this message type around for legacy support.
-message V0LayerParameter {
-  optional string name = 1; // the layer name
-  optional string type = 2; // the string to specify the layer type
-
-  // Parameters to specify layers with inner products.
-  optional uint32 num_output = 3; // The number of outputs for the layer
-  optional bool biasterm = 4 [default = true]; // whether to have bias terms
-  optional FillerParameter weight_filler = 5; // The filler for the weight
-  optional FillerParameter bias_filler = 6; // The filler for the bias
-
-  optional uint32 pad = 7 [default = 0]; // The padding size
-  optional uint32 kernelsize = 8; // The kernel size
-  optional uint32 group = 9 [default = 1]; // The group size for group conv
-  optional uint32 stride = 10 [default = 1]; // The stride
-  enum PoolMethod {
-    MAX = 0;
-    AVE = 1;
-    STOCHASTIC = 2;
-  }
-  optional PoolMethod pool = 11 [default = MAX]; // The pooling method
-  optional float dropout_ratio = 12 [default = 0.5]; // dropout ratio
-
-  optional uint32 local_size = 13 [default = 5]; // for local response norm
-  optional float alpha = 14 [default = 1.]; // for local response norm
-  optional float beta = 15 [default = 0.75]; // for local response norm
-  optional float k = 22 [default = 1.];
-
-  // For data layers, specify the data source
-  optional string source = 16;
-  // For data pre-processing, we can do simple scaling and subtracting the
-  // data mean, if provided. Note that the mean subtraction is always carried
-  // out before scaling.
-  optional float scale = 17 [default = 1];
-  optional string meanfile = 18;
-  // For data layers, specify the batch size.
-  optional uint32 batchsize = 19;
-  // For data layers, specify if we would like to randomly crop an image.
-  optional uint32 cropsize = 20 [default = 0];
-  // For data layers, specify if we want to randomly mirror data.
-  optional bool mirror = 21 [default = false];
-
-  // The blobs containing the numeric parameters of the layer
-  repeated BlobProto blobs = 50;
-  // The ratio that is multiplied on the global learning rate. If you want to
-  // set the learning ratio for one blob, you need to set it for all blobs.
-  repeated float blobs_lr = 51;
-  // The weight decay that is multiplied on the global weight decay.
-  repeated float weight_decay = 52;
-
-  // The rand_skip variable is for the data layer to skip a few data points
-  // to avoid all asynchronous sgd clients to start at the same point. The skip
-  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-  // be larger than the number of keys in the database.
-  optional uint32 rand_skip = 53 [default = 0];
-
-  // Fields related to detection (det_*)
-  // foreground (object) overlap threshold
-  optional float det_fg_threshold = 54 [default = 0.5];
-  // background (non-object) overlap threshold
-  optional float det_bg_threshold = 55 [default = 0.5];
-  // Fraction of batch that should be foreground objects
-  optional float det_fg_fraction = 56 [default = 0.25];
-
-  // optional bool OBSOLETE_can_clobber = 57 [default = true];
-
-  // Amount of contextual padding to add around a window
-  // (used only by the window_data_layer)
-  optional uint32 det_context_pad = 58 [default = 0];
-
-  // Mode for cropping out a detection window
-  // warp: cropped window is warped to a fixed size and aspect ratio
-  // square: the tightest square around the window is cropped
-  optional string det_crop_mode = 59 [default = "warp"];
-
-  // For ReshapeLayer, one needs to specify the new dimensions.
-  optional int32 new_num = 60 [default = 0];
-  optional int32 new_channels = 61 [default = 0];
-  optional int32 new_height = 62 [default = 0];
-  optional int32 new_width = 63 [default = 0];
-
-  // Whether or not ImageLayer should shuffle the list of files at every epoch.
-  // It will also resize images if new_height or new_width are not zero.
-  optional bool shuffle_images = 64 [default = false];
-
-  // For ConcatLayer, one needs to specify the dimension for concatenation, and
-  // the other dimensions must be the same for all the bottom blobs.
-  // By default it will concatenate blobs along the channels dimension.
-  optional uint32 concat_dim = 65 [default = 1];
-
-  optional HDF5OutputParameter hdf5_output_param = 1001;
-}
-
-message PReLUParameter {
-  // Parametric ReLU described in K. He et al, Delving Deep into Rectifiers:
-  // Surpassing Human-Level Performance on ImageNet Classification, 2015.
-
-  // Initial value of a_i. Default is a_i=0.25 for all i.
-  optional FillerParameter filler = 1;
-  // Whether or not slope paramters are shared across channels.
-  optional bool channel_shared = 2 [default = false];
-}
diff --git a/tools/caffe_converter/caffe_parser.py b/tools/caffe_converter/caffe_parser.py
deleted file mode 100644
index a52ff9acc3de..000000000000
--- a/tools/caffe_converter/caffe_parser.py
+++ /dev/null
@@ -1,81 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Parse caffe's protobuf
-"""
-import re
-try:
-    import caffe
-    from caffe.proto import caffe_pb2
-    use_caffe = True
-except ImportError:
-    try:
-        import caffe_pb2
-    except ImportError:
-        raise ImportError('You used to compile with protoc --python_out=./ ./caffe.proto')
-    use_caffe = False
-
-from google.protobuf import text_format # pylint: disable=relative-import
-
-def read_prototxt(fname):
-    """Return a caffe_pb2.NetParameter object that defined in a prototxt file
-    """
-    proto = caffe_pb2.NetParameter()
-    with open(fname, 'r') as f:
-        text_format.Merge(str(f.read()), proto)
-    return proto
-
-def get_layers(proto):
-    """Returns layers in a caffe_pb2.NetParameter object
-    """
-    if len(proto.layer):
-        return proto.layer
-    elif len(proto.layers):
-        return proto.layers
-    else:
-        raise ValueError('Invalid proto file.')
-
-def read_caffemodel(prototxt_fname, caffemodel_fname):
-    """Return a caffe_pb2.NetParameter object that defined in a binary
-    caffemodel file
-    """
-    if use_caffe:
-        caffe.set_mode_cpu()
-        net = caffe.Net(prototxt_fname, caffemodel_fname, caffe.TEST)
-        layer_names = net._layer_names
-        layers = net.layers
-        return (layers, layer_names)
-    else:
-        proto = caffe_pb2.NetParameter()
-        with open(caffemodel_fname, 'rb') as f:
-            proto.ParseFromString(f.read())
-        return (get_layers(proto), None)
-
-def layer_iter(layers, layer_names):
-    """Iterate over all layers"""
-    if use_caffe:
-        for layer_idx, layer in enumerate(layers):
-            layer_name = re.sub('[-/]', '_', layer_names[layer_idx])
-            layer_type = layer.type
-            layer_blobs = layer.blobs
-            yield (layer_name, layer_type, layer_blobs)
-    else:
-        for layer in layers:
-            layer_name = re.sub('[-/]', '_', layer.name)
-            layer_type = layer.type
-            layer_blobs = layer.blobs
-            yield (layer_name, layer_type, layer_blobs)
diff --git a/tools/caffe_converter/caffe_proto_utils.py b/tools/caffe_converter/caffe_proto_utils.py
deleted file mode 100644
index 8d6183457637..000000000000
--- a/tools/caffe_converter/caffe_proto_utils.py
+++ /dev/null
@@ -1,204 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Helper functions for parsing caffe prototxt into a workable DAG
-"""
-
-
-def process_network_proto(caffe_root, deploy_proto):
-    """
-    Runs the caffe upgrade tool on the prototxt to create a prototxt in the latest format.
-    This enable us to work just with latest structures, instead of supporting all the variants
-
-    :param caffe_root: link to caffe root folder, where the upgrade tool is located
-    :param deploy_proto: name of the original prototxt file
-    :return: name of new processed prototxt file
-    """
-    processed_deploy_proto = deploy_proto + ".processed"
-
-    from shutil import copyfile
-    copyfile(deploy_proto, processed_deploy_proto)
-
-    # run upgrade tool on new file name (same output file)
-    import os
-    upgrade_tool_command_line = caffe_root + '/build/tools/upgrade_net_proto_text.bin ' \
-                                + processed_deploy_proto + ' ' + processed_deploy_proto
-    os.system(upgrade_tool_command_line)
-
-    return processed_deploy_proto
-
-
-class LayerRecord(object):
-    """
-    A record which describe basic layer parameters
-    """
-
-    def __init__(self, layer_def):
-
-        self.layer_def = layer_def
-        self.name = layer_def.name
-        self.type = layer_def.type
-
-        # keep filter, stride and pad
-        if layer_def.type == 'Convolution':
-            if LayerRecord._is_iterable(layer_def.convolution_param.kernel_size):
-                self.filter = list(layer_def.convolution_param.kernel_size)
-            else:
-                self.filter = list([layer_def.convolution_param.kernel_size])
-            if len(self.filter) == 1:
-                self.filter *= 2
-            if LayerRecord._is_iterable(layer_def.convolution_param.pad):
-                self.pad = list(layer_def.convolution_param.pad)
-            else:
-                self.pad = list([layer_def.convolution_param.pad])
-            if len(self.pad) == 0:
-                self.pad = [0, 0]
-            elif len(self.pad) == 1:
-                self.pad *= 2
-            if LayerRecord._is_iterable(layer_def.convolution_param.stride):
-                self.stride = list(layer_def.convolution_param.stride)
-            else:
-                self.stride = list([layer_def.convolution_param.stride])
-            if len(self.stride) == 0:
-                self.stride = [1, 1]
-            elif len(self.stride) == 1:
-                self.stride *= 2
-
-        elif layer_def.type == 'Pooling':
-            self.filter = [layer_def.pooling_param.kernel_size]
-            if len(self.filter) == 1:
-                self.filter *= 2
-            self.pad = [layer_def.pooling_param.pad]
-            if len(self.pad) == 0:
-                self.pad = [0, 0]
-            elif len(self.pad) == 1:
-                self.pad *= 2
-            self.stride = [layer_def.pooling_param.stride]
-            if len(self.stride) == 0:
-                self.stride = [1, 1]
-            elif len(self.stride) == 1:
-                self.stride *= 2
-
-        else:
-            self.filter = [0, 0]
-            self.pad = [0, 0]
-            self.stride = [1, 1]
-
-        # keep tops
-        self.tops = list(layer_def.top)
-
-        # keep bottoms
-        self.bottoms = list(layer_def.bottom)
-
-        # list of parent layers
-        self.parents = []
-
-        # list of child layers
-        self.children = []
-
-    @staticmethod
-    def _is_iterable(obj):
-        return hasattr(obj, '__iter__')
-
-def read_network_dag(processed_deploy_prototxt):
-    """
-    Reads from the caffe prototxt the network structure
-    :param processed_deploy_prototxt: name of prototxt to load, preferably the prototxt should
-     be processed before using a call to process_network_proto()
-    :return: network_def, layer_name_to_record, top_to_layers
-    network_def: caffe network structure, gives access to *all* the network information
-    layer_name_to_record: *ordered* dictionary which maps between layer name and a structure which
-      describes in a simple form the layer parameters
-    top_to_layers: dictionary which maps a blob name to an ordered list of layers which output it
-     when a top is used several times, like in inplace layhers, the list will contain all the layers
-     by order of appearance
-    """
-
-    from caffe.proto import caffe_pb2
-    from google.protobuf import text_format # pylint: disable=relative-import
-    from collections import OrderedDict
-
-    # load prototxt file
-    network_def = caffe_pb2.NetParameter()
-    with open(processed_deploy_prototxt, 'r') as proto_file:
-        text_format.Merge(str(proto_file.read()), network_def)
-
-    # map layer name to layer record
-    layer_name_to_record = OrderedDict()
-    for layer_def in network_def.layer:
-        if (len(layer_def.include) == 0) or \
-           (caffe_pb2.TEST in [item.phase for item in layer_def.include]):
-
-            layer_name_to_record[layer_def.name] = LayerRecord(layer_def)
-
-    top_to_layers = dict()
-    for layer in network_def.layer:
-        # no specific phase, or TEST phase is specifically asked for
-        if (len(layer.include) == 0) or (caffe_pb2.TEST in [item.phase for item in layer.include]):
-            for top in layer.top:
-                if top not in top_to_layers:
-                    top_to_layers[top] = list()
-                top_to_layers[top].append(layer.name)
-
-    # find parents and children of all layers
-    for child_layer_name in layer_name_to_record.keys():  # pylint: disable=too-many-nested-blocks
-        child_layer_def = layer_name_to_record[child_layer_name]
-        for bottom in child_layer_def.bottoms:
-            if bottom in top_to_layers:
-                for parent_layer_name in top_to_layers[bottom]:
-                    if parent_layer_name in layer_name_to_record:
-                        parent_layer_def = layer_name_to_record[parent_layer_name]
-                        if parent_layer_def not in child_layer_def.parents:
-                            child_layer_def.parents.append(parent_layer_def)
-                        if child_layer_def not in parent_layer_def.children:
-                            parent_layer_def.children.append(child_layer_def)
-
-    # update filter, strid, pad for maxout "structures"
-    for layer_name in layer_name_to_record.keys():
-        layer_def = layer_name_to_record[layer_name]
-        if layer_def.type == 'Eltwise' and \
-           len(layer_def.parents) == 1 and \
-           layer_def.parents[0].type == 'Slice' and \
-           len(layer_def.parents[0].parents) == 1 and \
-           layer_def.parents[0].parents[0].type in ['Convolution', 'InnerProduct']:
-            layer_def.filter = layer_def.parents[0].parents[0].filter
-            layer_def.stride = layer_def.parents[0].parents[0].stride
-            layer_def.pad = layer_def.parents[0].parents[0].pad
-
-    return network_def, layer_name_to_record, top_to_layers
-
-
-def read_caffe_mean(caffe_mean_file):
-    """
-    Reads caffe formatted mean file
-    :param caffe_mean_file: path to caffe mean file, presumably with 'binaryproto' suffix
-    :return: mean image, converted from BGR to RGB format
-    """
-
-    import caffe_parser
-    import numpy as np
-    mean_blob = caffe_parser.caffe_pb2.BlobProto()
-    with open(caffe_mean_file, 'rb') as f:
-        mean_blob.ParseFromString(f.read())
-
-    img_mean_np = np.array(mean_blob.data)
-    img_mean_np = img_mean_np.reshape(mean_blob.channels, mean_blob.height, mean_blob.width)
-
-    # swap channels from Caffe BGR to RGB
-    img_mean_np[[0, 2], :, :] = img_mean_np[[2, 0], :, :]
-
-    return img_mean_np
diff --git a/tools/caffe_converter/compare_layers.py b/tools/caffe_converter/compare_layers.py
deleted file mode 100644
index 8d6598c8b781..000000000000
--- a/tools/caffe_converter/compare_layers.py
+++ /dev/null
@@ -1,362 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Test converted models layer by layer
-"""
-import argparse
-import logging
-import os
-import warnings
-
-import numpy as np
-import cv2
-import mxnet as mx
-
-logging.basicConfig(level=logging.INFO)
-
-
-def read_image(img_path, image_dims=None, mean=None):
-    """
-    Reads an image from file path or URL, optionally resizing to given image dimensions and
-    subtracting mean.
-    :param img_path: path to file, or url to download
-    :param image_dims: image dimensions to resize to, or None
-    :param mean: mean file to subtract, or None
-    :return: loaded image, in RGB format
-    """
-
-    import urllib
-
-    filename = img_path.split("/")[-1]
-    if img_path.startswith('http'):
-        urllib.urlretrieve(img_path, filename)
-        img = cv2.imread(filename)
-    else:
-        img = cv2.imread(img_path)
-
-    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
-    if image_dims is not None:
-        img = cv2.resize(img, image_dims)  # resize to image_dims to fit model
-    img = np.rollaxis(img, 2) # change to (c, h, w) order
-    img = img[np.newaxis, :]  # extend to (n, c, h, w)
-    if mean is not None:
-        mean = np.array(mean)
-        if mean.shape == (3,):
-            mean = mean[np.newaxis, :, np.newaxis, np.newaxis]  # extend to (n, c, 1, 1)
-        img = img.astype(np.float32) - mean # subtract mean
-
-    return img
-
-
-def _ch_dev(arg_params, aux_params, ctx):
-    """
-    Changes device of given mxnet arguments
-    :param arg_params: arguments
-    :param aux_params: auxiliary parameters
-    :param ctx: new device context
-    :return: arguments and auxiliary parameters on new device
-    """
-    new_args = dict()
-    new_auxs = dict()
-    for k, v in arg_params.items():
-        new_args[k] = v.as_in_context(ctx)
-    for k, v in aux_params.items():
-        new_auxs[k] = v.as_in_context(ctx)
-    return new_args, new_auxs
-
-
-def convert_and_compare_caffe_to_mxnet(image_url, gpu, caffe_prototxt_path, caffe_model_path,
-                                       caffe_mean, mean_diff_allowed, max_diff_allowed):
-    """
-    Run the layer comparison on a caffe model, given its prototxt, weights and mean.
-    The comparison is done by inferring on a given image using both caffe and mxnet model
-    :param image_url: image file or url to run inference on
-    :param gpu: gpu to use, -1 for cpu
-    :param caffe_prototxt_path: path to caffe prototxt
-    :param caffe_model_path: path to caffe weights
-    :param caffe_mean: path to caffe mean file
-    """
-
-    import caffe
-    from caffe_proto_utils import read_network_dag, process_network_proto, read_caffe_mean
-    from convert_model import convert_model
-
-    if isinstance(caffe_mean, str):
-        caffe_mean = read_caffe_mean(caffe_mean)
-    elif caffe_mean is None:
-        pass
-    elif len(caffe_mean) == 3:
-        # swap channels from Caffe BGR to RGB
-        caffe_mean = caffe_mean[::-1]
-
-    # get caffe root location, this is needed to run the upgrade network utility, so we only need
-    # to support parsing of latest caffe
-    caffe_root = os.path.dirname(os.path.dirname(caffe.__path__[0]))
-    caffe_prototxt_path = process_network_proto(caffe_root, caffe_prototxt_path)
-
-    _, layer_name_to_record, top_to_layers = read_network_dag(caffe_prototxt_path)
-
-    caffe.set_mode_cpu()
-    caffe_net = caffe.Net(caffe_prototxt_path, caffe_model_path, caffe.TEST)
-
-    image_dims = tuple(caffe_net.blobs['data'].shape)[2:4]
-
-    logging.info('getting image %s', image_url)
-    img_rgb = read_image(image_url, image_dims, caffe_mean)
-    img_bgr = img_rgb[:, ::-1, :, :]
-
-    caffe_net.blobs['data'].reshape(*img_bgr.shape)
-    caffe_net.blobs['data'].data[...] = img_bgr
-    _ = caffe_net.forward()
-
-    # read sym and add all outputs
-    sym, arg_params, aux_params, _ = convert_model(caffe_prototxt_path, caffe_model_path)
-    sym = sym.get_internals()
-
-    # now mxnet
-    if gpu < 0:
-        ctx = mx.cpu(0)
-    else:
-        ctx = mx.gpu(gpu)
-
-    arg_params, aux_params = _ch_dev(arg_params, aux_params, ctx)
-    arg_params["data"] = mx.nd.array(img_rgb, ctx)
-    arg_params["prob_label"] = mx.nd.empty((1,), ctx)
-    exe = sym.bind(ctx, arg_params, args_grad=None, grad_req="null", aux_states=aux_params)
-    exe.forward(is_train=False)
-
-    compare_layers_from_nets(caffe_net, arg_params, aux_params, exe, layer_name_to_record,
-                             top_to_layers, mean_diff_allowed, max_diff_allowed)
-
-
-def _bfs(root_node, process_node):
-    """
-    Implementation of Breadth-first search (BFS) on caffe network DAG
-    :param root_node: root node of caffe network DAG
-    :param process_node: function to run on each node
-    """
-
-    from collections import deque
-
-    seen_nodes = set()
-    next_nodes = deque()
-
-    seen_nodes.add(root_node)
-    next_nodes.append(root_node)
-
-    while next_nodes:
-        current_node = next_nodes.popleft()
-
-        # process current node
-        process_node(current_node)
-
-        for child_node in current_node.children:
-            if child_node not in seen_nodes:
-                seen_nodes.add(child_node)
-                next_nodes.append(child_node)
-
-
-def compare_layers_from_nets(caffe_net, arg_params, aux_params, exe, layer_name_to_record,
-                             top_to_layers, mean_diff_allowed, max_diff_allowed):
-    """
-    Compare layer by layer of a caffe network with mxnet network
-    :param caffe_net: loaded caffe network
-    :param arg_params: arguments
-    :param aux_params: auxiliary parameters
-    :param exe: mxnet model
-    :param layer_name_to_record: map between caffe layer and information record
-    :param top_to_layers: map between caffe blob name to layers which outputs it (including inplace)
-    :param mean_diff_allowed: mean difference allowed between caffe blob and mxnet blob
-    :param max_diff_allowed: max difference allowed between caffe blob and mxnet blob
-    """
-
-    import re
-
-    log_format = '  {0:<40}  {1:<40}  {2:<8}  {3:>10}  {4:>10}  {5:<1}'
-
-    compare_layers_from_nets.is_first_convolution = True
-
-    def _compare_blob(caf_blob, mx_blob, caf_name, mx_name, blob_type, note):
-        diff = np.abs(mx_blob - caf_blob)
-        diff_mean = diff.mean()
-        diff_max = diff.max()
-        logging.info(log_format.format(caf_name, mx_name, blob_type, '%4.5f' % diff_mean,
-                                       '%4.5f' % diff_max, note))
-        assert diff_mean < mean_diff_allowed
-        assert diff_max < max_diff_allowed
-
-    def _process_layer_parameters(layer):
-
-        logging.debug('processing layer %s of type %s', layer.name, layer.type)
-
-        normalized_layer_name = re.sub('[-/]', '_', layer.name)
-
-        # handle weight and bias of convolution and fully-connected layers
-        if layer.name in caffe_net.params and layer.type in ['Convolution', 'InnerProduct',
-                                                             'Deconvolution']:
-
-            has_bias = len(caffe_net.params[layer.name]) > 1
-
-            mx_name_weight = '{}_weight'.format(normalized_layer_name)
-            mx_beta = arg_params[mx_name_weight].asnumpy()
-
-            # first convolution should change from BGR to RGB
-            if layer.type == 'Convolution' and compare_layers_from_nets.is_first_convolution:
-                compare_layers_from_nets.is_first_convolution = False
-
-                # if RGB or RGBA
-                if mx_beta.shape[1] == 3 or mx_beta.shape[1] == 4:
-                    # Swapping BGR of caffe into RGB in mxnet
-                    mx_beta[:, [0, 2], :, :] = mx_beta[:, [2, 0], :, :]
-
-            caf_beta = caffe_net.params[layer.name][0].data
-            _compare_blob(caf_beta, mx_beta, layer.name, mx_name_weight, 'weight', '')
-
-            if has_bias:
-                mx_name_bias = '{}_bias'.format(normalized_layer_name)
-                mx_gamma = arg_params[mx_name_bias].asnumpy()
-                caf_gamma = caffe_net.params[layer.name][1].data
-                _compare_blob(caf_gamma, mx_gamma, layer.name, mx_name_bias, 'bias', '')
-
-        elif layer.name in caffe_net.params and layer.type == 'Scale':
-
-            if 'scale' in normalized_layer_name:
-                bn_name = normalized_layer_name.replace('scale', 'bn')
-            elif 'sc' in normalized_layer_name:
-                bn_name = normalized_layer_name.replace('sc', 'bn')
-            else:
-                assert False, 'Unknown name convention for bn/scale'
-
-            beta_name = '{}_beta'.format(bn_name)
-            gamma_name = '{}_gamma'.format(bn_name)
-
-            mx_beta = arg_params[beta_name].asnumpy()
-            caf_beta = caffe_net.params[layer.name][1].data
-            _compare_blob(caf_beta, mx_beta, layer.name, beta_name, 'mov_mean', '')
-
-            mx_gamma = arg_params[gamma_name].asnumpy()
-            caf_gamma = caffe_net.params[layer.name][0].data
-            _compare_blob(caf_gamma, mx_gamma, layer.name, gamma_name, 'mov_var', '')
-
-        elif layer.name in caffe_net.params and layer.type == 'BatchNorm':
-
-            mean_name = '{}_moving_mean'.format(normalized_layer_name)
-            var_name = '{}_moving_var'.format(normalized_layer_name)
-
-            caf_rescale_factor = caffe_net.params[layer.name][2].data
-
-            mx_mean = aux_params[mean_name].asnumpy()
-            caf_mean = caffe_net.params[layer.name][0].data / caf_rescale_factor
-            _compare_blob(caf_mean, mx_mean, layer.name, mean_name, 'mean', '')
-
-            mx_var = aux_params[var_name].asnumpy()
-            caf_var = caffe_net.params[layer.name][1].data / caf_rescale_factor
-            _compare_blob(caf_var, mx_var, layer.name, var_name, 'var',
-                          'expect 1e-04 change due to cudnn eps')
-
-        elif layer.type in ['Input', 'Pooling', 'ReLU', 'Eltwise', 'Softmax', 'LRN', 'Concat',
-                            'Dropout', 'Crop']:
-            # no parameters to check for these layers
-            pass
-
-        else:
-            warnings.warn('No handling for layer %s of type %s, should we ignore it?', layer.name,
-                          layer.type)
-
-
-    def _process_layer_output(caffe_blob_name):
-
-        logging.debug('processing blob %s', caffe_blob_name)
-
-        # skip blobs not originating from actual layers, e.g. artificial split layers added by caffe
-        if caffe_blob_name not in top_to_layers:
-            return
-
-        caf_blob = caffe_net.blobs[caffe_blob_name].data
-
-        # data should change from BGR to RGB
-        if caffe_blob_name == 'data':
-
-            # if RGB or RGBA
-            if caf_blob.shape[1] == 3 or caf_blob.shape[1] == 4:
-                # Swapping BGR of caffe into RGB in mxnet
-                caf_blob[:, [0, 2], :, :] = caf_blob[:, [2, 0], :, :]
-            mx_name = 'data'
-
-        else:
-            # get last layer name which outputs this blob name
-            last_layer_name = top_to_layers[caffe_blob_name][-1]
-            normalized_last_layer_name = re.sub('[-/]', '_', last_layer_name)
-            mx_name = '{}_output'.format(normalized_last_layer_name)
-            if 'scale' in mx_name:
-                mx_name = mx_name.replace('scale', 'bn')
-            elif 'sc' in mx_name:
-                mx_name = mx_name.replace('sc', 'bn')
-
-        if mx_name not in exe.output_dict:
-            logging.error('mxnet blob %s is missing, time to extend the compare tool..', mx_name)
-            return
-
-        mx_blob = exe.output_dict[mx_name].asnumpy()
-        _compare_blob(caf_blob, mx_blob, caffe_blob_name, mx_name, 'output', '')
-
-        return
-
-    # check layer parameters
-    logging.info('\n***** Network Parameters '.ljust(140, '*'))
-    logging.info(log_format.format('CAFFE', 'MXNET', 'Type', 'Mean(diff)', 'Max(diff)', 'Note'))
-    first_layer_name = layer_name_to_record.keys()[0]
-    _bfs(layer_name_to_record[first_layer_name], _process_layer_parameters)
-
-    # check layer output
-    logging.info('\n***** Network Outputs '.ljust(140, '*'))
-    logging.info(log_format.format('CAFFE', 'MXNET', 'Type', 'Mean(diff)', 'Max(diff)', 'Note'))
-    for caffe_blob_name in caffe_net.blobs.keys():
-        _process_layer_output(caffe_blob_name)
-
-
-def main():
-    """Entrypoint for compare_layers"""
-
-    parser = argparse.ArgumentParser(
-        description='Tool for testing caffe to mxnet conversion layer by layer')
-    parser.add_argument('--image_url', type=str,
-                        default='https://github.com/dmlc/web-data/raw/master/mxnet/doc/'\
-                                'tutorials/python/predict_image/cat.jpg',
-                        help='input image to test inference, can be either file path or url')
-    parser.add_argument('--caffe_prototxt_path', type=str,
-                        default='./model.prototxt',
-                        help='path to caffe prototxt')
-    parser.add_argument('--caffe_model_path', type=str,
-                        default='./model.caffemodel',
-                        help='path to caffe weights')
-    parser.add_argument('--caffe_mean', type=str,
-                        default='./model_mean.binaryproto',
-                        help='path to caffe mean file')
-    parser.add_argument('--mean_diff_allowed', type=int, default=1e-03,
-                        help='mean difference allowed between caffe blob and mxnet blob')
-    parser.add_argument('--max_diff_allowed', type=int, default=1e-01,
-                        help='max difference allowed between caffe blob and mxnet blob')
-    parser.add_argument('--gpu', type=int, default=-1, help='the gpu id used for predict')
-    args = parser.parse_args()
-    convert_and_compare_caffe_to_mxnet(args.image_url, args.gpu, args.caffe_prototxt_path,
-                                       args.caffe_model_path, args.caffe_mean,
-                                       args.mean_diff_allowed, args.max_diff_allowed)
-
-if __name__ == '__main__':
-    main()
diff --git a/tools/caffe_converter/convert_caffe_modelzoo.py b/tools/caffe_converter/convert_caffe_modelzoo.py
deleted file mode 100644
index 68aeb614bbe9..000000000000
--- a/tools/caffe_converter/convert_caffe_modelzoo.py
+++ /dev/null
@@ -1,162 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Convert Caffe's modelzoo
-"""
-import os
-import argparse
-from convert_model import convert_model
-from convert_mean import convert_mean
-import mxnet as mx
-
-apache_repo_url = 'https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/'
-repo_url = os.environ.get('MXNET_GLUON_REPO', apache_repo_url)
-_mx_caffe_model_root = '{repo}caffe/models/'.format(repo=repo_url)
-
-"""Dictionary for model meta information
-
-For each model, it requires three attributes:
-
-  - prototxt: URL for the deploy prototxt file
-  - caffemodel: URL for the binary caffemodel
-  - mean : URL for the data mean or a tuple of float
-
-Optionly it takes
-
-  - top-1-acc : top 1 accuracy for testing
-  - top-5-acc : top 5 accuracy for testing
-"""
-model_meta_info = {
-    # pylint: disable=line-too-long
-    'bvlc_alexnet' : {
-        'prototxt' : (_mx_caffe_model_root + 'bvlc_alexnet/deploy.prototxt',
-                      'cb77655eb4db32c9c47699c6050926f9e0fc476a'),
-        'caffemodel' : (_mx_caffe_model_root + 'bvlc_alexnet/bvlc_alexnet.caffemodel',
-                        '9116a64c0fbe4459d18f4bb6b56d647b63920377'),
-        'mean' : (_mx_caffe_model_root + 'bvlc_alexnet/imagenet_mean.binaryproto',
-                  '63e4652e656abc1e87b7a8339a7e02fca63a2c0c'),
-        'top-1-acc' : 0.571,
-        'top-5-acc' : 0.802
-    },
-    'bvlc_googlenet' : {
-        'prototxt' : (_mx_caffe_model_root + 'bvlc_googlenet/deploy.prototxt',
-                      '7060345c8012294baa60eeb5901d2d3fd89d75fc'),
-        'caffemodel' : (_mx_caffe_model_root + 'bvlc_googlenet/bvlc_googlenet.caffemodel',
-                        '405fc5acd08a3bb12de8ee5e23a96bec22f08204'),
-        'mean' : (123, 117, 104),
-        'top-1-acc' : 0.687,
-        'top-5-acc' : 0.889
-    },
-    'vgg-16' : {
-        'prototxt' : (_mx_caffe_model_root + 'vgg/VGG_ILSVRC_16_layers_deploy.prototxt',
-                      '2734e5500f1445bd7c9fee540c99f522485247bd'),
-        'caffemodel' : (_mx_caffe_model_root + 'vgg/VGG_ILSVRC_16_layers.caffemodel',
-                        '9363e1f6d65f7dba68c4f27a1e62105cdf6c4e24'),
-        'mean': (123.68, 116.779, 103.939),
-        'top-1-acc' : 0.734,
-        'top-5-acc' : 0.914
-    },
-    'vgg-19' : {
-        'prototxt' : (_mx_caffe_model_root + 'vgg/VGG_ILSVRC_19_layers_deploy.prototxt',
-                      '132d2f60b3d3b1c2bb9d3fdb0c8931a44f89e2ae'),
-        'caffemodel' : (_mx_caffe_model_root + 'vgg/VGG_ILSVRC_19_layers.caffemodel',
-                        '239785e7862442717d831f682bb824055e51e9ba'),
-        'mean' : (123.68, 116.779, 103.939),
-        'top-1-acc' : 0.731,
-        'top-5-acc' : 0.913
-    },
-    'resnet-50' : {
-        'prototxt' : (_mx_caffe_model_root + 'resnet/ResNet-50-deploy.prototxt',
-                      '5d6fd5aeadd8d4684843c5028b4e5672b9e51638'),
-        'caffemodel' : (_mx_caffe_model_root + 'resnet/ResNet-50-model.caffemodel',
-                        'b7c79ccc21ad0479cddc0dd78b1d20c4d722908d'),
-        'mean' : (_mx_caffe_model_root + 'resnet/ResNet_mean.binaryproto',
-                  '0b056fd4444f0ae1537af646ba736edf0d4cefaf'),
-        'top-1-acc' : 0.753,
-        'top-5-acc' : 0.922
-    },
-    'resnet-101' : {
-        'prototxt' : (_mx_caffe_model_root + 'resnet/ResNet-101-deploy.prototxt',
-                      'c165d6b6ccef7cc39ee16a66f00f927f93de198b'),
-        'caffemodel' : (_mx_caffe_model_root + 'resnet/ResNet-101-model.caffemodel',
-                        '1dbf5f493926bb9b6b3363b12d5133c0f8b78904'),
-        'mean' : (_mx_caffe_model_root + 'resnet/ResNet_mean.binaryproto',
-                  '0b056fd4444f0ae1537af646ba736edf0d4cefaf'),
-        'top-1-acc' : 0.764,
-        'top-5-acc' : 0.929
-    },
-    'resnet-152' : {
-        'prototxt' : (_mx_caffe_model_root + 'resnet/ResNet-152-deploy.prototxt',
-                      'ae15aade2304af8a774c5bfb1d32457f119214ef'),
-        'caffemodel' : (_mx_caffe_model_root + 'resnet/ResNet-152-model.caffemodel',
-                        '251edb93604ac8268c7fd2227a0f15144310e1aa'),
-        'mean' : (_mx_caffe_model_root + 'resnet/ResNet_mean.binaryproto',
-                  '0b056fd4444f0ae1537af646ba736edf0d4cefaf'),
-        'top-1-acc' : 0.77,
-        'top-5-acc' : 0.933
-    },
-}
-
-def get_model_meta_info(model_name):
-    """returns a dict with model information"""
-    return model_meta_info[model_name].copy()
-
-def download_caffe_model(model_name, meta_info, dst_dir='./model'):
-    """Download caffe model into disk by the given meta info """
-    if not os.path.isdir(dst_dir):
-        os.mkdir(dst_dir)
-    model_name = os.path.join(dst_dir, model_name)
-
-    assert 'prototxt' in meta_info, "missing prototxt url"
-    proto_url, proto_sha1 = meta_info['prototxt']
-    prototxt = mx.gluon.utils.download(proto_url,
-                                       model_name+'_deploy.prototxt',
-                                       sha1_hash=proto_sha1)
-
-    assert 'caffemodel' in meta_info, "mssing caffemodel url"
-    caffemodel_url, caffemodel_sha1 = meta_info['caffemodel']
-    caffemodel = mx.gluon.utils.download(caffemodel_url,
-                                         model_name+'.caffemodel',
-                                         sha1_hash=caffemodel_sha1)
-    assert 'mean' in meta_info, 'no mean info'
-    mean = meta_info['mean']
-    if isinstance(mean[0], str):
-        mean_url, mean_sha1 = mean
-        mean = mx.gluon.utils.download(mean_url,
-                                       model_name+'_mean.binaryproto',
-                                       sha1_hash=mean_sha1)
-    return (prototxt, caffemodel, mean)
-
-def convert_caffe_model(model_name, meta_info, dst_dir='./model'):
-    """Download, convert and save a caffe model"""
-
-    (prototxt, caffemodel, mean) = download_caffe_model(model_name, meta_info, dst_dir)
-    model_name = os.path.join(dst_dir, model_name)
-    convert_model(prototxt, caffemodel, model_name)
-    if isinstance(mean, str):
-        mx_mean = model_name + '-mean.nd'
-        convert_mean(mean, mx_mean)
-        mean = mx_mean
-    return (model_name, mean)
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='Convert Caffe model zoo')
-    parser.add_argument('model_name', help='can be '+', '.join(model_meta_info.keys()))
-    args = parser.parse_args()
-    assert args.model_name in model_meta_info, 'Unknown model ' + args.model_name
-    fname, _ = convert_caffe_model(args.model_name, model_meta_info[args.model_name])
-    print('Model is saved into ' + fname)
diff --git a/tools/caffe_converter/convert_mean.py b/tools/caffe_converter/convert_mean.py
deleted file mode 100644
index 1a3df71d3e54..000000000000
--- a/tools/caffe_converter/convert_mean.py
+++ /dev/null
@@ -1,63 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Convert caffe mean
-"""
-import argparse
-import numpy as np
-import mxnet as mx
-import caffe_parser
-
-def convert_mean(binaryproto_fname, output=None):
-    """Convert caffe mean
-
-    Parameters
-    ----------
-    binaryproto_fname : str
-        Filename of the mean
-    output : str, optional
-        Save the mean into mxnet's format
-
-    Returns
-    -------
-    NDArray
-        Mean in ndarray
-    """
-    mean_blob = caffe_parser.caffe_pb2.BlobProto()
-    with open(binaryproto_fname, 'rb') as f:
-        mean_blob.ParseFromString(f.read())
-
-    img_mean_np = np.array(mean_blob.data)
-    img_mean_np = img_mean_np.reshape(
-        mean_blob.channels, mean_blob.height, mean_blob.width
-    )
-    # swap channels from Caffe BGR to RGB
-    img_mean_np[[0, 2], :, :] = img_mean_np[[2, 0], :, :]
-    nd = mx.nd.array(img_mean_np)
-    if output is not None:
-        mx.nd.save(output, {"mean_image": nd})
-    return nd
-
-def main():
-    parser = argparse.ArgumentParser(description='Convert caffe mean')
-    parser.add_argument('binaryproto_fname', help='Filename of the mean')
-    parser.add_argument('output', help='The name of the output file')
-    args = parser.parse_args()
-    convert_mean(args.binaryproto_fname, args.output)
-
-if __name__ == '__main__':
-    main()
diff --git a/tools/caffe_converter/convert_model.py b/tools/caffe_converter/convert_model.py
deleted file mode 100644
index 5c2a11e4b88b..000000000000
--- a/tools/caffe_converter/convert_model.py
+++ /dev/null
@@ -1,229 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Convert caffe model
-"""
-from __future__ import print_function
-import argparse
-import sys
-import re
-import numpy as np
-import caffe_parser
-import mxnet as mx
-from convert_symbol import convert_symbol
-
-def prob_label(arg_names):
-    candidates = [arg for arg in arg_names if
-                  not arg.endswith('data') and
-                  not arg.endswith('_weight') and
-                  not arg.endswith('_bias') and
-                  not arg.endswith('_gamma') and
-                  not arg.endswith('_beta')]
-    if len(candidates) == 0:
-        return 'prob_label'
-    return candidates[-1]
-
-def convert_model(prototxt_fname, caffemodel_fname, output_prefix=None):
-    """Convert caffe model
-
-    Parameters
-    ----------
-
-    prototxt_fname : str
-         Filename of the prototxt model definition
-    caffemodel_fname : str
-         Filename of the binary caffe model
-    output_prefix : str, optinoal
-         If given, then save the converted MXNet into output_prefx+'.json' and
-         output_prefx+'.params'
-
-    Returns
-    -------
-    sym : Symbol
-         Symbol convereted from prototxt
-    arg_params : list of NDArray
-         Argument parameters
-    aux_params : list of NDArray
-         Aux parameters
-    input_dim : tuple
-         Input dimension
-    """
-    sym, input_dim = convert_symbol(prototxt_fname)
-    arg_shapes, _, aux_shapes = sym.infer_shape(data=tuple(input_dim))
-    arg_names = sym.list_arguments()
-    aux_names = sym.list_auxiliary_states()
-    arg_shape_dic = dict(zip(arg_names, arg_shapes))
-    aux_shape_dic = dict(zip(aux_names, aux_shapes))
-    arg_params = {}
-    aux_params = {}
-    first_conv = True
-
-    layers, names = caffe_parser.read_caffemodel(prototxt_fname, caffemodel_fname)
-    layer_iter = caffe_parser.layer_iter(layers, names)
-    layers_proto = caffe_parser.get_layers(caffe_parser.read_prototxt(prototxt_fname))
-
-    for layer_name, layer_type, layer_blobs in layer_iter:
-        if layer_type in ('Convolution', 'InnerProduct', 4, 14, 'PReLU', 'Deconvolution',
-                          39):
-            if layer_type == 'PReLU':
-                assert (len(layer_blobs) == 1)
-                weight_name = layer_name + '_gamma'
-                wmat = np.array(layer_blobs[0].data).reshape(arg_shape_dic[weight_name])
-                arg_params[weight_name] = mx.nd.zeros(wmat.shape)
-                arg_params[weight_name][:] = wmat
-                continue
-            wmat_dim = []
-            if getattr(layer_blobs[0].shape, 'dim', None) is not None:
-                if len(layer_blobs[0].shape.dim) > 0:
-                    wmat_dim = layer_blobs[0].shape.dim
-                else:
-                    wmat_dim = [layer_blobs[0].num, layer_blobs[0].channels,
-                                layer_blobs[0].height, layer_blobs[0].width]
-            else:
-                wmat_dim = list(layer_blobs[0].shape)
-            wmat = np.array(layer_blobs[0].data).reshape(wmat_dim)
-
-            channels = wmat_dim[1]
-            if channels in (3, 4):  # RGB or RGBA
-                if first_conv:
-                    # Swapping BGR of caffe into RGB in mxnet
-                    wmat[:, [0, 2], :, :] = wmat[:, [2, 0], :, :]
-
-            assert(wmat.flags['C_CONTIGUOUS'] is True)
-            sys.stdout.write('converting layer {0}, wmat shape = {1}'.format(
-                layer_name, wmat.shape))
-            if len(layer_blobs) == 2:
-                bias = np.array(layer_blobs[1].data)
-                bias = bias.reshape((bias.shape[0], 1))
-                assert(bias.flags['C_CONTIGUOUS'] is True)
-                bias_name = layer_name + "_bias"
-
-                if bias_name not in arg_shape_dic:
-                    print(bias_name + ' not found in arg_shape_dic.')
-                    continue
-                bias = bias.reshape(arg_shape_dic[bias_name])
-                arg_params[bias_name] = mx.nd.zeros(bias.shape)
-                arg_params[bias_name][:] = bias
-                sys.stdout.write(', bias shape = {}'.format(bias.shape))
-
-            sys.stdout.write('\n')
-            sys.stdout.flush()
-            wmat = wmat.reshape((wmat.shape[0], -1))
-            weight_name = layer_name + "_weight"
-
-            if weight_name not in arg_shape_dic:
-                print(weight_name + ' not found in arg_shape_dic.')
-                continue
-            wmat = wmat.reshape(arg_shape_dic[weight_name])
-            arg_params[weight_name] = mx.nd.zeros(wmat.shape)
-            arg_params[weight_name][:] = wmat
-
-            if first_conv and layer_type in ('Convolution', 4):
-                first_conv = False
-
-        elif layer_type == 'Scale':
-            if 'scale' in layer_name:
-                bn_name = layer_name.replace('scale', 'bn')
-            elif 'sc' in layer_name:
-                bn_name = layer_name.replace('sc', 'bn')
-            else:
-                assert False, 'Unknown name convention for bn/scale'
-
-            gamma = np.array(layer_blobs[0].data)
-            beta = np.array(layer_blobs[1].data)
-            # beta = np.expand_dims(beta, 1)
-            beta_name = '{}_beta'.format(bn_name)
-            gamma_name = '{}_gamma'.format(bn_name)
-
-            beta = beta.reshape(arg_shape_dic[beta_name])
-            gamma = gamma.reshape(arg_shape_dic[gamma_name])
-            arg_params[beta_name] = mx.nd.zeros(beta.shape)
-            arg_params[gamma_name] = mx.nd.zeros(gamma.shape)
-            arg_params[beta_name][:] = beta
-            arg_params[gamma_name][:] = gamma
-
-            assert gamma.flags['C_CONTIGUOUS'] is True
-            assert beta.flags['C_CONTIGUOUS'] is True
-            print('converting scale layer, beta shape = {}, gamma shape = {}'.format(
-                beta.shape, gamma.shape))
-        elif layer_type == 'BatchNorm':
-            bn_name = layer_name
-            mean = np.array(layer_blobs[0].data)
-            var = np.array(layer_blobs[1].data)
-            rescale_factor = layer_blobs[2].data[0]
-            if rescale_factor != 0:
-                rescale_factor = 1 / rescale_factor
-            mean_name = '{}_moving_mean'.format(bn_name)
-            var_name = '{}_moving_var'.format(bn_name)
-            mean = mean.reshape(aux_shape_dic[mean_name])
-            var = var.reshape(aux_shape_dic[var_name])
-            aux_params[mean_name] = mx.nd.zeros(mean.shape)
-            aux_params[var_name] = mx.nd.zeros(var.shape)
-            # Get the original epsilon
-            for idx, layer in enumerate(layers_proto):
-                if layer.name == bn_name or re.sub('[-/]', '_', layer.name) == bn_name:
-                    bn_index = idx
-            eps_caffe = layers_proto[bn_index].batch_norm_param.eps
-            # Compensate for the epsilon shift performed in convert_symbol
-            eps_symbol = float(sym.attr_dict()[bn_name + '_moving_mean']['eps'])
-            eps_correction = eps_caffe - eps_symbol
-            # Fill parameters
-            aux_params[mean_name][:] = mean * rescale_factor
-            aux_params[var_name][:] = var * rescale_factor + eps_correction
-            assert var.flags['C_CONTIGUOUS'] is True
-            assert mean.flags['C_CONTIGUOUS'] is True
-            print('converting batchnorm layer, mean shape = {}, var shape = {}'.format(
-                mean.shape, var.shape))
-
-            fix_gamma = layers_proto[bn_index+1].type != 'Scale'
-            if fix_gamma:
-                gamma_name = '{}_gamma'.format(bn_name)
-                gamma = np.array(np.ones(arg_shape_dic[gamma_name]))
-                beta_name = '{}_beta'.format(bn_name)
-                beta = np.array(np.zeros(arg_shape_dic[beta_name]))
-                arg_params[beta_name] = mx.nd.zeros(beta.shape)
-                arg_params[gamma_name] = mx.nd.zeros(gamma.shape)
-                arg_params[beta_name][:] = beta
-                arg_params[gamma_name][:] = gamma
-                assert gamma.flags['C_CONTIGUOUS'] is True
-                assert beta.flags['C_CONTIGUOUS'] is True
-
-        else:
-            print('\tskipping layer {} of type {}'.format(layer_name, layer_type))
-            assert len(layer_blobs) == 0
-
-    if output_prefix is not None:
-        model = mx.mod.Module(symbol=sym, label_names=[prob_label(arg_names), ])
-        model.bind(data_shapes=[('data', tuple(input_dim))])
-        model.init_params(arg_params=arg_params, aux_params=aux_params)
-        model.save_checkpoint(output_prefix, 0)
-
-    return sym, arg_params, aux_params, input_dim
-
-def main():
-    parser = argparse.ArgumentParser(
-        description='Caffe prototxt to mxnet model parameter converter.')
-    parser.add_argument('prototxt', help='The prototxt filename')
-    parser.add_argument('caffemodel', help='The binary caffemodel filename')
-    parser.add_argument('save_model_name', help='The name of the output model prefix')
-    args = parser.parse_args()
-
-    convert_model(args.prototxt, args.caffemodel, args.save_model_name)
-    print('Saved model successfully to {}'.format(args.save_model_name))
-
-if __name__ == '__main__':
-    main()
diff --git a/tools/caffe_converter/convert_symbol.py b/tools/caffe_converter/convert_symbol.py
deleted file mode 100644
index 8faef04fe215..000000000000
--- a/tools/caffe_converter/convert_symbol.py
+++ /dev/null
@@ -1,333 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Convert caffe prototxt to symbol
-"""
-from __future__ import print_function
-import argparse
-import re
-import mxnet as mx
-import caffe_parser
-
-
-def _get_input(proto):
-    """Get input size
-    """
-    layer = caffe_parser.get_layers(proto)
-    if len(proto.input_dim) > 0:
-        input_dim = proto.input_dim
-    elif len(proto.input_shape) > 0:
-        input_dim = proto.input_shape[0].dim
-    elif layer[0].type == "Input":
-        input_dim = layer[0].input_param.shape[0].dim
-        layer.pop(0)
-    else:
-        raise ValueError('Cannot find input size')
-
-    assert layer[0].type != "Input", 'only support single input'
-    # We assume the first bottom blob of first layer is the output from data layer
-    input_name = layer[0].bottom[0]
-    return input_name, input_dim, layer
-
-def _convert_conv_param(param):
-    """
-    Convert convolution layer parameter from Caffe to MXNet
-    """
-    param_string = "num_filter=%d" % param.num_output
-
-    pad_w = 0
-    pad_h = 0
-    if isinstance(param.pad, int):
-        pad = param.pad
-        param_string += ", pad=(%d, %d)" % (pad, pad)
-    else:
-        if len(param.pad) > 0:
-            pad = param.pad[0]
-            param_string += ", pad=(%d, %d)" % (pad, pad)
-        else:
-            if isinstance(param.pad_w, int):
-                pad_w = param.pad_w
-            if isinstance(param.pad_h, int):
-                pad_h = param.pad_h
-            param_string += ", pad=(%d, %d)" % (pad_h, pad_w)
-
-    if isinstance(param.kernel_size, int):
-        kernel_size = param.kernel_size
-        param_string += ", kernel=(%d,%d)" % (kernel_size, kernel_size)
-    else:
-        if len(param.kernel_size) > 0:
-            kernel_size = param.kernel_size[0]
-            param_string += ", kernel=(%d,%d)" % (kernel_size, kernel_size)
-        else:
-            assert isinstance(param.kernel_w, int)
-            kernel_w = param.kernel_w
-            assert isinstance(param.kernel_h, int)
-            kernel_h = param.kernel_h
-            param_string += ", kernel=(%d,%d)" % (kernel_h, kernel_w)
-
-    stride = 1
-    if isinstance(param.stride, int):
-        stride = param.stride
-    else:
-        stride = 1 if len(param.stride) == 0 else param.stride[0]
-
-    param_string += ", stride=(%d,%d)" % (stride, stride)
-
-    dilate = 1
-    if hasattr(param, 'dilation'):
-        if isinstance(param.dilation, int):
-            dilate = param.dilation
-        else:
-            dilate = 1 if len(param.dilation) == 0 else param.dilation[0]
-
-    param_string += ", no_bias=%s" % (not param.bias_term)
-
-    # deal with dilation. Won't be in deconvolution
-    if dilate > 1:
-        param_string += ", dilate=(%d, %d)" % (dilate, dilate)
-
-    if isinstance(param.group, int):
-        if param.group != 1:
-            param_string += ", num_group=%d" % param.group
-
-    return param_string
-
-def _convert_pooling_param(param):
-    """Convert the pooling layer parameter
-    """
-    param_string = "pooling_convention='full', "
-    if param.global_pooling:
-        param_string += "global_pool=True, kernel=(1,1)"
-    else:
-        param_string += "pad=(%d,%d), kernel=(%d,%d), stride=(%d,%d)" % (
-            param.pad, param.pad, param.kernel_size, param.kernel_size,
-            param.stride, param.stride)
-    if param.pool == 0:
-        param_string += ", pool_type='max'"
-    elif param.pool == 1:
-        param_string += ", pool_type='avg'"
-    else:
-        raise ValueError("Unknown Pooling Method!")
-    return param_string
-
-def _parse_proto(prototxt_fname):
-    """Parse Caffe prototxt into symbol string
-    """
-    proto = caffe_parser.read_prototxt(prototxt_fname)
-
-    # process data layer
-    input_name, input_dim, layers = _get_input(proto)
-    # only support single input, so always use `data` as the input data
-    mapping = {input_name: 'data'}
-    need_flatten = {input_name: False}
-    symbol_string = "import mxnet as mx\ndata = mx.symbol.Variable(name='data')\n"
-
-    flatten_count = 0
-    output_name = ""
-    prev_name = None
-    _output_name = {}
-
-    # convert reset layers one by one
-    for i, layer in enumerate(layers):
-        type_string = ''
-        param_string = ''
-        skip_layer = False
-        name = re.sub('[-/]', '_', layer.name)
-        for k in range(len(layer.bottom)):
-            if layer.bottom[k] in _output_name:
-                _output_name[layer.bottom[k]]['count'] = _output_name[layer.bottom[k]]['count']+1
-            else:
-                _output_name[layer.bottom[k]] = {'count':0}
-        for k in range(len(layer.top)):
-            if layer.top[k] in _output_name:
-                _output_name[layer.top[k]]['count'] = _output_name[layer.top[k]]['count']+1
-            else:
-                _output_name[layer.top[k]] = {'count':0, 'name':name}
-        if layer.type == 'Convolution' or layer.type == 4:
-            type_string = 'mx.symbol.Convolution'
-            param_string = _convert_conv_param(layer.convolution_param)
-            need_flatten[name] = True
-        if layer.type == 'Deconvolution' or layer.type == 39:
-            type_string = 'mx.symbol.Deconvolution'
-            param_string = _convert_conv_param(layer.convolution_param)
-            need_flatten[name] = True
-        if layer.type == 'Pooling' or layer.type == 17:
-            type_string = 'mx.symbol.Pooling'
-            param_string = _convert_pooling_param(layer.pooling_param)
-            need_flatten[name] = True
-        if layer.type == 'ReLU' or layer.type == 18:
-            type_string = 'mx.symbol.Activation'
-            param_string = "act_type='relu'"
-            param = layer.relu_param
-            if hasattr(param, 'negative_slope'):
-                if param.negative_slope > 0:
-                    type_string = 'mx.symbol.LeakyReLU'
-                    param_string = "act_type='leaky', slope=%f" % param.negative_slope
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'TanH' or layer.type == 23:
-            type_string = 'mx.symbol.Activation'
-            param_string = "act_type='tanh'"
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'Sigmoid' or layer.type == 19:
-            type_string = 'mx.symbol.Activation'
-            param_string = "act_type='sigmoid'"
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'LRN' or layer.type == 15:
-            type_string = 'mx.symbol.LRN'
-            param = layer.lrn_param
-            param_string = "alpha=%f, beta=%f, knorm=%f, nsize=%d" % (
-                param.alpha, param.beta, param.k, param.local_size)
-            need_flatten[name] = True
-        if layer.type == 'InnerProduct' or layer.type == 14:
-            type_string = 'mx.symbol.FullyConnected'
-            param = layer.inner_product_param
-            param_string = "num_hidden=%d, no_bias=%s" % (
-                param.num_output, not param.bias_term)
-            need_flatten[name] = False
-        if layer.type == 'Dropout' or layer.type == 6:
-            type_string = 'mx.symbol.Dropout'
-            param = layer.dropout_param
-            param_string = "p=%f" % param.dropout_ratio
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'Softmax' or layer.type == 20:
-            type_string = 'mx.symbol.SoftmaxOutput'
-        if layer.type == 'Flatten' or layer.type == 8:
-            type_string = 'mx.symbol.Flatten'
-            need_flatten[name] = False
-        if layer.type == 'Split' or layer.type == 22:
-            type_string = 'split'  # will process later
-        if layer.type == 'Concat' or layer.type == 3:
-            type_string = 'mx.symbol.Concat'
-            need_flatten[name] = True
-        if layer.type == 'Crop':
-            type_string = 'mx.symbol.Crop'
-            need_flatten[name] = True
-            param_string = 'center_crop=True'
-        if layer.type == 'BatchNorm':
-            type_string = 'mx.symbol.BatchNorm'
-            param = layer.batch_norm_param
-            # CuDNN requires eps to be greater than 1e-05
-            # We compensate for this change in convert_model
-            epsilon = param.eps
-            if (epsilon <= 1e-05):
-                epsilon = 1e-04
-            # if next layer is scale, don't fix gamma
-            fix_gamma = layers[i+1].type != 'Scale'
-            param_string = 'use_global_stats=%s, fix_gamma=%s, eps=%f' % (
-                param.use_global_stats, fix_gamma, epsilon)
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'Scale':
-            assert layers[i-1].type == 'BatchNorm'
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-            skip_layer = True
-            prev_name = re.sub('[-/]', '_', layers[i-1].name)
-        if layer.type == 'PReLU':
-            type_string = 'mx.symbol.LeakyReLU'
-            param = layer.prelu_param
-            param_string = "act_type='prelu', slope=%f" % param.filler.value
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-        if layer.type == 'Eltwise':
-            type_string = 'mx.symbol.broadcast_add'
-            param = layer.eltwise_param
-            param_string = ""
-            need_flatten[name] = False
-        if layer.type == 'Reshape':
-            type_string = 'mx.symbol.Reshape'
-            need_flatten[name] = False
-            param = layer.reshape_param
-            param_string = "shape=(%s)" % (','.join(param.shape.dim),)
-        if layer.type == 'AbsVal':
-            type_string = 'mx.symbol.abs'
-            need_flatten[name] = need_flatten[mapping[layer.bottom[0]]]
-
-        if skip_layer:
-            assert len(layer.bottom) == 1
-            symbol_string += "%s = %s\n" % (name, prev_name)
-        elif type_string == '':
-            raise ValueError('Unknown layer %s!' % layer.type)
-        elif type_string != 'split':
-            bottom = layer.bottom
-            if param_string != "":
-                param_string = ", " + param_string
-            if len(bottom) == 1:
-                if need_flatten[mapping[bottom[0]]] and type_string == 'mx.symbol.FullyConnected':
-                    flatten_name = "flatten_%d" % flatten_count
-                    symbol_string += "%s=mx.symbol.Flatten(name='%s', data=%s)\n" % (
-                        flatten_name, flatten_name, mapping[bottom[0]])
-                    flatten_count += 1
-                    need_flatten[flatten_name] = False
-                    bottom[0] = flatten_name
-                    mapping[bottom[0]] = bottom[0]
-                symbol_string += "%s = %s(name='%s', data=%s %s)\n" % (
-                    name, type_string, name, mapping[bottom[0]], param_string)
-            else:
-                if layer.type == 'Eltwise' and param.operation == 1 and len(param.coeff) > 0:
-                    symbol_string += "%s = " % name
-                    symbol_string += " + ".join(["%s * %s" % (
-                        mapping[bottom[i]], param.coeff[i]) for i in range(len(param.coeff))])
-                    symbol_string += "\n"
-                else:
-                    symbol_string += "%s = %s(name='%s', *[%s] %s)\n" % (
-                        name, type_string, name, ','.join(
-                            [mapping[x] for x in bottom]), param_string)
-        for j in range(len(layer.top)):
-            mapping[layer.top[j]] = name
-        output_name = name
-    output_name = []
-    for i in _output_name:
-        if 'name' in _output_name[i] and _output_name[i]['count'] == 0:
-            output_name.append(_output_name[i]['name'])
-
-    return symbol_string, output_name, input_dim
-
-def convert_symbol(prototxt_fname):
-    """Convert caffe model definition into Symbol
-
-    Parameters
-    ----------
-    prototxt_fname : str
-        Filename of the prototxt file
-
-    Returns
-    -------
-    Symbol
-        Converted Symbol
-    tuple
-        Input shape
-    """
-    sym, output_name, input_dim = _parse_proto(prototxt_fname)
-    exec(sym)                   # pylint: disable=exec-used
-    _locals = locals()
-    ret = []
-    for i in  output_name:
-        exec("ret = " + i, globals(), _locals)  # pylint: disable=exec-used
-        ret.append(_locals['ret'])
-    ret = mx.sym.Group(ret)
-    return ret, input_dim
-
-def main():
-    parser = argparse.ArgumentParser(
-        description='Convert caffe prototxt into Symbol')
-    parser.add_argument('prototxt', help='The prototxt filename')
-    parser.add_argument('output', help='filename for the output json file')
-    args = parser.parse_args()
-
-    sym, _ = convert_symbol(args.prototxt)
-    sym.save(args.output)
-
-if __name__ == '__main__':
-    main()
diff --git a/tools/caffe_converter/make_win32.bat b/tools/caffe_converter/make_win32.bat
deleted file mode 100644
index e5bc9143e05c..000000000000
--- a/tools/caffe_converter/make_win32.bat
+++ /dev/null
@@ -1,20 +0,0 @@
-rem Licensed to the Apache Software Foundation (ASF) under one
-rem or more contributor license agreements.  See the NOTICE file
-rem distributed with this work for additional information
-rem regarding copyright ownership.  The ASF licenses this file
-rem to you under the Apache License, Version 2.0 (the
-rem "License"); you may not use this file except in compliance
-rem with the License.  You may obtain a copy of the License at
-rem
-rem   http://www.apache.org/licenses/LICENSE-2.0
-rem
-rem Unless required by applicable law or agreed to in writing,
-rem software distributed under the License is distributed on an
-rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-rem KIND, either express or implied.  See the License for the
-rem specific language governing permissions and limitations
-rem under the License.
-
-@protoc --python_out=./ ./caffe.proto
-@echo done.
-@pause
diff --git a/tools/caffe_converter/run.sh b/tools/caffe_converter/run.sh
deleted file mode 100755
index bdf5481624d7..000000000000
--- a/tools/caffe_converter/run.sh
+++ /dev/null
@@ -1,50 +0,0 @@
-#!/bin/bash
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-if [[ $# -ne 1 ]]; then
-    echo "usage: $0 model_name"
-    echo "   model_name: [vgg16|vgg19], ..."
-    exit -1
-fi
-
-if [[ $1 == "vgg19" ]]; then
-    if [[ ! -f VGG_ILSVRC_19_layers_deploy.prototxt ]]; then
-        wget -c https://gist.githubusercontent.com/ksimonyan/3785162f95cd2d5fee77/raw/bb2b4fe0a9bb0669211cf3d0bc949dfdda173e9e/VGG_ILSVRC_19_layers_deploy.prototxt
-    fi
-
-    if [[ ! -f VGG_ILSVRC_19_layers.caffemodel ]]; then
-        wget -c http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_19_layers.caffemodel
-    fi
-
-    echo "converting"
-    python `dirname $0`/convert_model.py VGG_ILSVRC_19_layers_deploy.prototxt VGG_ILSVRC_19_layers.caffemodel vgg19
-elif [[ $1 == "vgg16" ]]; then
-    if [[ ! -f VGG_ILSVRC_16_layers_deploy.prototxt ]]; then
-        wget -c https://gist.githubusercontent.com/ksimonyan/211839e770f7b538e2d8/raw/c3ba00e272d9f48594acef1f67e5fd12aff7a806/VGG_ILSVRC_16_layers_deploy.prototxt
-    fi
-
-    if [[ ! -f VGG_ILSVRC_16_layers.caffemodel ]]; then
-        wget -c http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
-    fi
-
-    echo "converting"
-    python `dirname $0`/convert_model.py VGG_ILSVRC_16_layers_deploy.prototxt VGG_ILSVRC_16_layers.caffemodel vgg16
-else
-    echo "unsupported model: $1"
-fi
diff --git a/tools/caffe_converter/test_converter.py b/tools/caffe_converter/test_converter.py
deleted file mode 100644
index 880de1be449f..000000000000
--- a/tools/caffe_converter/test_converter.py
+++ /dev/null
@@ -1,110 +0,0 @@
-#!/usr/bin/env python3
-# -*- coding: utf-8 -*-
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-"""Test converted models
-"""
-import os
-import argparse
-import sys
-import logging
-import mxnet as mx
-from convert_caffe_modelzoo import convert_caffe_model, get_model_meta_info, download_caffe_model
-from compare_layers import convert_and_compare_caffe_to_mxnet
-
-curr_path = os.path.abspath(os.path.dirname(__file__))
-sys.path.append(os.path.join(curr_path, "../../example/image-classification"))
-from test_score import download_data  # pylint: disable=wrong-import-position
-from score import score # pylint: disable=wrong-import-position
-logging.basicConfig(level=logging.DEBUG)
-
-def test_imagenet_model_performance(model_name, val_data, gpus, batch_size):
-    """test model performance on imagenet """
-    logging.info('test performance of model: %s', model_name)
-    meta_info = get_model_meta_info(model_name)
-    [model_name, mean] = convert_caffe_model(model_name, meta_info)
-    sym, arg_params, aux_params = mx.model.load_checkpoint(model_name, 0)
-    acc = [mx.gluon.metric.create('acc'), mx.gluon.metric.create('top_k_accuracy', top_k=5)]
-    if isinstance(mean, str):
-        mean_args = {'mean_img':mean}
-    else:
-        mean_args = {'rgb_mean':','.join([str(i) for i in mean])}
-
-    print(val_data)
-    gpus_string = '' if gpus[0] == -1 else ','.join([str(i) for i in gpus])
-    (speed,) = score(model=(sym, arg_params, aux_params),
-                     data_val=val_data,
-                     label_name='prob_label',
-                     metrics=acc,
-                     gpus=gpus_string,
-                     batch_size=batch_size,
-                     max_num_examples=500,
-                     **mean_args)
-    logging.info('speed : %f image/sec', speed)
-    for a in acc:
-        logging.info(a.get())
-    max_performance_diff_allowed = 0.03
-    assert acc[0].get()[1] > meta_info['top-1-acc'] - max_performance_diff_allowed
-    assert acc[1].get()[1] > meta_info['top-5-acc'] - max_performance_diff_allowed
-
-
-def test_model_weights_and_outputs(model_name, image_url, gpu):
-    """
-    Run the layer comparison on one of the known caffe models.
-    :param model_name: available models are listed in convert_caffe_modelzoo.py
-    :param image_url: image file or url to run inference on
-    :param gpu: gpu to use, -1 for cpu
-    """
-
-    logging.info('test weights and outputs of model: %s', model_name)
-    meta_info = get_model_meta_info(model_name)
-
-    (prototxt, caffemodel, mean) = download_caffe_model(model_name, meta_info, dst_dir='./model')
-    convert_and_compare_caffe_to_mxnet(image_url, gpu, prototxt, caffemodel, mean,
-                                       mean_diff_allowed=1e-03, max_diff_allowed=1e-01)
-
-
-def main():
-    """Entrypoint for test_converter"""
-    parser = argparse.ArgumentParser(description='Test Caffe converter')
-    parser.add_argument('--cpu', action='store_true', help='use cpu?')
-    parser.add_argument('--image_url', type=str,
-                        default='https://github.com/dmlc/web-data/raw/master/mxnet/doc/'\
-                                'tutorials/python/predict_image/cat.jpg',
-                        help='input image to test inference, can be either file path or url')
-    args = parser.parse_args()
-    if args.cpu:
-        gpus = [-1]
-        default_batch_size = 32
-    else:
-        num_gpus = mx.context.num_gpus()
-        assert num_gpus, 'At least one GPU is needed to run test_converter in GPU mode'
-        default_batch_size = 32 * num_gpus
-
-    models = ['bvlc_googlenet', 'vgg-16', 'resnet-50']
-
-    val = download_data()
-    for m in models:
-        test_model_weights_and_outputs(m, args.image_url, gpus[0])
-        # Build/testing machines tend to be short on GPU memory
-        this_batch_size = default_batch_size / 4 if m == 'vgg-16' else default_batch_size
-        test_imagenet_model_performance(m, val, gpus, this_batch_size)
-
-if __name__ == '__main__':
-    main()
diff --git a/tools/caffe_translator/README.md b/tools/caffe_translator/README.md
deleted file mode 100644
index c21ec50d2d7a..000000000000
--- a/tools/caffe_translator/README.md
+++ /dev/null
@@ -1,96 +0,0 @@
-<!--- Licensed to the Apache Software Foundation (ASF) under one -->
-<!--- or more contributor license agreements.  See the NOTICE file -->
-<!--- distributed with this work for additional information -->
-<!--- regarding copyright ownership.  The ASF licenses this file -->
-<!--- to you under the Apache License, Version 2.0 (the -->
-<!--- "License"); you may not use this file except in compliance -->
-<!--- with the License.  You may obtain a copy of the License at -->
-
-<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
-
-<!--- Unless required by applicable law or agreed to in writing, -->
-<!--- software distributed under the License is distributed on an -->
-<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
-<!--- KIND, either express or implied.  See the License for the -->
-<!--- specific language governing permissions and limitations -->
-<!--- under the License. -->
-
-# Caffe Translator
-Caffe Translator is a migration tool that helps developers migrate their existing Caffe code to MXNet and continue further development using MXNet. Note that this is different from the Caffe to MXNet model converter which is available [here](https://github.com/apache/incubator-mxnet/tree/master/tools/caffe_converter).
-
-Caffe Translator takes the training/validation prototxt ([example](https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_train_test.prototxt)) and solver prototxt ([example](https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_solver.prototxt)) as input and produces MXNet Python code ([example](https://www.caffetranslator.org/examples/lenet/lenet_translated.py)) as output. The translated Python code uses MXNet Symbol and Module API to build the network, reads data from LMDB files ([example](https://www.caffetranslator.org/datasets/mnist.tar.gz)), runs training and saves the trained model using the MXNet Module API ([example](https://www.caffetranslator.org/examples/lenet/lenet_saved_model.tar.gz)).
-
-### How to use
-
-#### Get the translator:
-Download the Caffe Translator from maven [repository](https://mvnrepository.com/artifact/org.caffetranslator/caffe-translator) or [build](build_from_source.md) from source. Java Runtime Environment (JRE) is required to run the translator.
-
-#### Translate code:
-To translate `train_val.prototxt` and `solver.prototxt` to MXNet Python code, run the following command:
-```
-java -jar caffe-translator-<version>.jar --training-prototxt <train_val_prototxt_path> \
-    --solver <solver_prototxt_path> \
-    --output-file <output_file_path>
-```
-Example:
-```
-java -jar caffe-translator-0.9.1.jar --training-prototxt lenet_train_test.prototxt \
-    --solver lenet_solver.prototxt \
-    --output-file translated_code.py
-```
-
-Here is the list of command line parameters accepted by the Caffe Translator:
-- *training-prototxt*: specifies the path to the training/validation prototxt to be translated.
-- *solver-prototxt*: specifies the path to the solver prototxt to be translated.
-- *output-file*: specifies the file to write the translated output into.
-- *params-file* (optional): specifies the .caffemodel file to initialize parameters from.
-- *custom-data-layers* (optional): Specifies a comma-separated list of types of the custom data layers used in the prototxt. The translator will use [`CaffeDataIter`](https://mxnet.apache.org/faq/caffe.html#use-io-caffedataiter) to translate these layers to MXNet.
-
-**Note:** Translated code uses [`CaffeDataIter`](https://mxnet.apache.org/faq/caffe.html#use-io-caffedataiter) to read from LMDB files. `CaffeDataIter` requires the number of examples in LMDB file to be specified as a parameter. You can provide this information before translation using a `#CaffeToMXNet` directive like shown below:
-
-```
-  data_param {
-    source: "data/mnist/mnist_train_lmdb"
-    #CaffeToMXNet num_examples: 60000
-    batch_size: 64
-    backend: LMDB
-  }
-```
-
-#### Run the translated code:
-
-Following prerequisites are required to run the translated code:
-1. Caffe with MXNet interface ([Why?](faq.md#why_caffe) [How to build?](https://github.com/apache/incubator-mxnet/tree/master/plugin/caffe#install-caffe-with-mxnet-interface))
-2. MXNet with Caffe plugin ([How to build?](https://github.com/apache/incubator-mxnet/tree/master/plugin/caffe#compile-with-caffe))
-3. The dataset in LMDB format.
-
-Once prerequisites are installed, the translated Python code can be run like any other Python code:
-
-Example:
-```
-python translated_code.py
-```
-
-### What layers are supported?
-
-Caffe Translator can currently translate the following layers:
-
-- Accuracy and Top-k
-- Batch Normalization
-- Concat
-- Convolution
-- Data<sup>*</sup>
-- Deconvolution
-- Eltwise
-- Inner Product (Fully Connected layer)
-- Flatten
-- Permute
-- Pooling
-- Power
-- Relu
-- Scale<sup>*</sup>
-- SoftmaxOutput
-
-<sup>*</sup> Uses [CaffePlugin](https://github.com/apache/incubator-mxnet/tree/master/plugin/caffe)
-
-If you want Caffe Translator to translate a layer that is not in the above list, please create an [issue](https://github.com/apache/incubator-mxnet/issues/new).
diff --git a/tools/caffe_translator/build.gradle b/tools/caffe_translator/build.gradle
deleted file mode 100644
index da5e9003a103..000000000000
--- a/tools/caffe_translator/build.gradle
+++ /dev/null
@@ -1,152 +0,0 @@
-import org.gradle.api.artifacts.maven.MavenDeployment
-
-apply plugin: 'com.github.johnrengelman.shadow'
-apply plugin: 'java'
-
-apply plugin: 'antlr'
-apply plugin: 'application'
-
-apply plugin: 'maven'
-apply plugin: 'signing'
-
-group 'org.caffetranslator'
-version '0.9.2'
-
-def isReleaseBuild
-def repositoryUrl
-
-if(hasProperty("release")) {
-    isReleaseBuild = true
-    repositoryUrl = stagingRepositoryUrl
-} else if(hasProperty("CI")) {
-    repositoryUrl = snapshotRepositoryUrl
-    version += "-SNAPSHOT"
-}
-
-buildscript {
-    repositories {
-        jcenter()
-    }
-    dependencies {
-        classpath 'com.github.jengelman.gradle.plugins:shadow:2.0.1'
-    }
-}
-
-sourceCompatibility = 1.8
-
-repositories {
-    mavenCentral()
-}
-
-dependencies {
-    antlr "org.antlr:antlr4:$antlrVersion"
-    compile group: 'commons-cli', name: 'commons-cli', version: '1.4'
-    compileOnly 'org.projectlombok:lombok:1.16.18'
-    testCompile group: 'junit', name: 'junit', version: '4.12'
-}
-
-generateGrammarSource {
-    arguments += ['-visitor']
-}
-
-jar {
-    baseName = 'caffe-translator'
-    appendix = 'slim'
-    version = version
-    manifest {
-        attributes 'Main-Class': 'io.mxnet.caffetranslator.Launcher'
-    }
-}
-
-task javadocJar(type: Jar) {
-    classifier = 'javadoc'
-    from javadoc
-}
-
-task sourcesJar(type: Jar) {
-    classifier = 'sources'
-    from sourceSets.main.allSource
-}
-
-shadowJar {
-    baseName = 'caffe-translator'
-    classifier = ''
-    version = version
-}
-
-configurations {
-    releaseJars
-    ascSignatures
-}
-
-artifacts {
-    releaseJars shadowJar
-    releaseJars sourcesJar
-    releaseJars javadocJar
-}
-
-if(isReleaseBuild) {
-    signing {
-        sign configurations.releaseJars
-    }
-} else {
-    task signReleaseJars {
-        //no-op
-    }
-}
-
-uploadShadow {
-    repositories {
-        mavenDeployer {
-            beforeDeployment { MavenDeployment deployment ->
-                if(isReleaseBuild) {
-                    signing.signPom(deployment)
-                }
-                configurations.releaseJars.artifacts.each { ra ->
-                    def ascfile = file(ra.file.path + '.asc')
-                    def ascArtifact = project.artifacts.add('ascSignatures', ascfile) {
-                        classifier = ra.classifier
-                        extension = ra.extension + '.asc'
-                        type = ra.type + '.asc'
-                    }
-                    deployment.addArtifact(ra)
-                    deployment.addArtifact(ascArtifact)
-                }
-            }
-
-            repository(url: repositoryUrl) {
-                authentication(userName: ossrhUsername, password: ossrhPassword)
-            }
-
-            pom.project {
-                name 'Caffe Translator'
-                packaging 'jar'
-                description 'Translate Caffe code to MXNet Python code'
-                url 'http://caffetranslator.org'
-
-                licenses {
-                    license {
-                        name 'The Apache Software License, Version 2.0'
-                        url 'http://www.apache.org/licenses/LICENSE-2.0.txt'
-                        distribution 'repo'
-                    }
-                }
-
-                developers {
-                    developer {
-                        name 'Indu Bharathi'
-                        email 'indhub@apache.org'
-                    }
-                }
-
-                scm {
-                    connection 'scm:git:git://github.com:apache/incubator-mxnet.git'
-                    developerConnection 'scm:git:git@github.com:apache/incubator-mxnet.git'
-                    url 'https://github.com/apache/incubator-mxnet.git'
-                }
-            }
-        }
-    }
-}
-
-mainClassName = "io.mxnet.caffetranslator.Launcher"
diff --git a/tools/caffe_translator/build_from_source.md b/tools/caffe_translator/build_from_source.md
deleted file mode 100644
index c08a423a44e7..000000000000
--- a/tools/caffe_translator/build_from_source.md
+++ /dev/null
@@ -1,38 +0,0 @@
-<!--- Licensed to the Apache Software Foundation (ASF) under one -->
-<!--- or more contributor license agreements.  See the NOTICE file -->
-<!--- distributed with this work for additional information -->
-<!--- regarding copyright ownership.  The ASF licenses this file -->
-<!--- to you under the Apache License, Version 2.0 (the -->
-<!--- "License"); you may not use this file except in compliance -->
-<!--- with the License.  You may obtain a copy of the License at -->
-
-<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
-
-<!--- Unless required by applicable law or agreed to in writing, -->
-<!--- software distributed under the License is distributed on an -->
-<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
-<!--- KIND, either express or implied.  See the License for the -->
-<!--- specific language governing permissions and limitations -->
-<!--- under the License. -->
-
-### Build Caffe Translator from source
-
-#### Prerequisites:
-- JDK
-
-#### Instructions to build
-
-Step 1: Clone the code:
-```
-git clone https://github.com/apache/incubator-mxnet.git mxnet
-```
-Step 2: CD to CaffeTranslator directory
-```
-cd mxnet/tools/caffe_translator/
-```
-Step 3: Build
-```
-gradle build
-```
-
-Caffe Translator will be built at `build/libs/caffe-translator-<version>.jar`
diff --git a/tools/caffe_translator/faq.md b/tools/caffe_translator/faq.md
deleted file mode 100644
index 5fcdb2fa9145..000000000000
--- a/tools/caffe_translator/faq.md
+++ /dev/null
@@ -1,34 +0,0 @@
-<!--- Licensed to the Apache Software Foundation (ASF) under one -->
-<!--- or more contributor license agreements.  See the NOTICE file -->
-<!--- distributed with this work for additional information -->
-<!--- regarding copyright ownership.  The ASF licenses this file -->
-<!--- to you under the Apache License, Version 2.0 (the -->
-<!--- "License"); you may not use this file except in compliance -->
-<!--- with the License.  You may obtain a copy of the License at -->
-
-<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
-
-<!--- Unless required by applicable law or agreed to in writing, -->
-<!--- software distributed under the License is distributed on an -->
-<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
-<!--- KIND, either express or implied.  See the License for the -->
-<!--- specific language governing permissions and limitations -->
-<!--- under the License. -->
-
-### Frequently asked questions
-
-[**Why is Caffe required to run the translated code?**](#why_caffe)
-
-There is a couple of reasons why Caffe is required to run the translated code:
-
-1. The translator does not convert Caffe data layer to native MXNet code because MXNet cannot read from LMDB files. Translator instead generates code that uses [`CaffeDataIter`](https://mxnet.apache.org/faq/caffe.html#use-io-caffedataiter) which can read LMDB files. `CaffeDataIter` needs Caffe to run.
-
-2. If the Caffe code to be translated uses custom layers, or layers that don't have equivalent MXNet layers, the translator will generate code that will use [CaffeOp](https://mxnet.apache.org/faq/caffe.html#use-sym-caffeop). CaffeOp needs Caffe to run.
-
-[**What version of Caffe prototxt can the translator translate?**](#what_version_of_prototxt)
-
-Caffe Translator supports the `proto2` syntax.
-
-[**Can the translator translate Caffe 2 code?**](#caffe_2_support)
-
-No. At the moment, only Caffe is supported.
diff --git a/tools/caffe_translator/gradle.properties b/tools/caffe_translator/gradle.properties
deleted file mode 100644
index 7e5363bb8577..000000000000
--- a/tools/caffe_translator/gradle.properties
+++ /dev/null
@@ -1,29 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-antlrVersion=4.7
-
-signing.keyId=<key-id>
-signing.password=<key-password>
-signing.secretKeyRingFile=<path-to-key-ring-file>
-
-snapshotRepositoryUrl=https://oss.sonatype.org/content/repositories/snapshots
-stagingRepositoryUrl=https://oss.sonatype.org/service/local/staging/deploy/maven2
-
-ossrhUsername=<ossrh-username>
-ossrhPassword=<ossrh_password>
-
diff --git a/tools/caffe_translator/gradle/wrapper/gradle-wrapper.properties b/tools/caffe_translator/gradle/wrapper/gradle-wrapper.properties
deleted file mode 100644
index a4daf1d7f29e..000000000000
--- a/tools/caffe_translator/gradle/wrapper/gradle-wrapper.properties
+++ /dev/null
@@ -1,22 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-distributionBase=GRADLE_USER_HOME
-distributionPath=wrapper/dists
-zipStoreBase=GRADLE_USER_HOME
-zipStorePath=wrapper/dists
-distributionUrl=https\://services.gradle.org/distributions/gradle-4.3.1-bin.zip
diff --git a/tools/caffe_translator/gradlew b/tools/caffe_translator/gradlew
deleted file mode 100755
index 07cc91546635..000000000000
--- a/tools/caffe_translator/gradlew
+++ /dev/null
@@ -1,189 +0,0 @@
-#!/usr/bin/env sh
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-##############################################################################
-##
-##  Gradle start up script for UN*X
-##
-##############################################################################
-
-# Attempt to set APP_HOME
-# Resolve links: $0 may be a link
-PRG="$0"
-# Need this for relative symlinks.
-while [ -h "$PRG" ] ; do
-    ls=`ls -ld "$PRG"`
-    link=`expr "$ls" : '.*-> \(.*\)$'`
-    if expr "$link" : '/.*' > /dev/null; then
-        PRG="$link"
-    else
-        PRG=`dirname "$PRG"`"/$link"
-    fi
-done
-SAVED="`pwd`"
-cd "`dirname \"$PRG\"`/" >/dev/null
-APP_HOME="`pwd -P`"
-cd "$SAVED" >/dev/null
-
-APP_NAME="Gradle"
-APP_BASE_NAME=`basename "$0"`
-
-# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
-DEFAULT_JVM_OPTS=""
-
-# Use the maximum available, or set MAX_FD != -1 to use that value.
-MAX_FD="maximum"
-
-warn () {
-    echo "$*"
-}
-
-die () {
-    echo
-    echo "$*"
-    echo
-    exit 1
-}
-
-# OS specific support (must be 'true' or 'false').
-cygwin=false
-msys=false
-darwin=false
-nonstop=false
-case "`uname`" in
-  CYGWIN* )
-    cygwin=true
-    ;;
-  Darwin* )
-    darwin=true
-    ;;
-  MINGW* )
-    msys=true
-    ;;
-  NONSTOP* )
-    nonstop=true
-    ;;
-esac
-
-CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
-
-# Determine the Java command to use to start the JVM.
-if [ -n "$JAVA_HOME" ] ; then
-    if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
-        # IBM's JDK on AIX uses strange locations for the executables
-        JAVACMD="$JAVA_HOME/jre/sh/java"
-    else
-        JAVACMD="$JAVA_HOME/bin/java"
-    fi
-    if [ ! -x "$JAVACMD" ] ; then
-        die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
-
-Please set the JAVA_HOME variable in your environment to match the
-location of your Java installation."
-    fi
-else
-    JAVACMD="java"
-    which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
-
-Please set the JAVA_HOME variable in your environment to match the
-location of your Java installation."
-fi
-
-# Increase the maximum file descriptors if we can.
-if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then
-    MAX_FD_LIMIT=`ulimit -H -n`
-    if [ $? -eq 0 ] ; then
-        if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
-            MAX_FD="$MAX_FD_LIMIT"
-        fi
-        ulimit -n $MAX_FD
-        if [ $? -ne 0 ] ; then
-            warn "Could not set maximum file descriptor limit: $MAX_FD"
-        fi
-    else
-        warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
-    fi
-fi
-
-# For Darwin, add options to specify how the application appears in the dock
-if $darwin; then
-    GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
-fi
-
-# For Cygwin, switch paths to Windows format before running java
-if $cygwin ; then
-    APP_HOME=`cygpath --path --mixed "$APP_HOME"`
-    CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
-    JAVACMD=`cygpath --unix "$JAVACMD"`
-
-    # We build the pattern for arguments to be converted via cygpath
-    ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
-    SEP=""
-    for dir in $ROOTDIRSRAW ; do
-        ROOTDIRS="$ROOTDIRS$SEP$dir"
-        SEP="|"
-    done
-    OURCYGPATTERN="(^($ROOTDIRS))"
-    # Add a user-defined pattern to the cygpath arguments
-    if [ "$GRADLE_CYGPATTERN" != "" ] ; then
-        OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
-    fi
-    # Now convert the arguments - kludge to limit ourselves to /bin/sh
-    i=0
-    for arg in "$@" ; do
-        CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
-        CHECK2=`echo "$arg"|egrep -c "^-"`                                 ### Determine if an option
-
-        if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then                    ### Added a condition
-            eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
-        else
-            eval `echo args$i`="\"$arg\""
-        fi
-        i=$((i+1))
-    done
-    case $i in
-        (0) set -- ;;
-        (1) set -- "$args0" ;;
-        (2) set -- "$args0" "$args1" ;;
-        (3) set -- "$args0" "$args1" "$args2" ;;
-        (4) set -- "$args0" "$args1" "$args2" "$args3" ;;
-        (5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
-        (6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
-        (7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
-        (8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
-        (9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
-    esac
-fi
-
-# Escape application args
-save () {
-    for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
-    echo " "
-}
-APP_ARGS=$(save "$@")
-
-# Collect all arguments for the java command, following the shell quoting and substitution rules
-eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS"
-
-# by default we should be in the correct project dir, but when run from Finder on Mac, the cwd is wrong
-if [ "$(uname)" = "Darwin" ] && [ "$HOME" = "$PWD" ]; then
-  cd "$(dirname "$0")"
-fi
-
-exec "$JAVACMD" "$@"
diff --git a/tools/caffe_translator/gradlew.bat b/tools/caffe_translator/gradlew.bat
deleted file mode 100644
index a1c49a365a0e..000000000000
--- a/tools/caffe_translator/gradlew.bat
+++ /dev/null
@@ -1,101 +0,0 @@
-rem Licensed to the Apache Software Foundation (ASF) under one
-rem or more contributor license agreements.  See the NOTICE file
-rem distributed with this work for additional information
-rem regarding copyright ownership.  The ASF licenses this file
-rem to you under the Apache License, Version 2.0 (the
-rem "License"); you may not use this file except in compliance
-rem with the License.  You may obtain a copy of the License at
-rem
-rem   http://www.apache.org/licenses/LICENSE-2.0
-rem
-rem Unless required by applicable law or agreed to in writing,
-rem software distributed under the License is distributed on an
-rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-rem KIND, either express or implied.  See the License for the
-rem specific language governing permissions and limitations
-rem under the License.
-
-@if "%DEBUG%" == "" @echo off
-@rem ##########################################################################
-@rem
-@rem  Gradle startup script for Windows
-@rem
-@rem ##########################################################################
-
-@rem Set local scope for the variables with windows NT shell
-if "%OS%"=="Windows_NT" setlocal
-
-set DIRNAME=%~dp0
-if "%DIRNAME%" == "" set DIRNAME=.
-set APP_BASE_NAME=%~n0
-set APP_HOME=%DIRNAME%
-
-@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
-set DEFAULT_JVM_OPTS=
-
-@rem Find java.exe
-if defined JAVA_HOME goto findJavaFromJavaHome
-
-set JAVA_EXE=java.exe
-%JAVA_EXE% -version >NUL 2>&1
-if "%ERRORLEVEL%" == "0" goto init
-
-echo.
-echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
-echo.
-echo Please set the JAVA_HOME variable in your environment to match the
-echo location of your Java installation.
-
-goto fail
-
-:findJavaFromJavaHome
-set JAVA_HOME=%JAVA_HOME:"=%
-set JAVA_EXE=%JAVA_HOME%/bin/java.exe
-
-if exist "%JAVA_EXE%" goto init
-
-echo.
-echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
-echo.
-echo Please set the JAVA_HOME variable in your environment to match the
-echo location of your Java installation.
-
-goto fail
-
-:init
-@rem Get command-line arguments, handling Windows variants
-
-if not "%OS%" == "Windows_NT" goto win9xME_args
-
-:win9xME_args
-@rem Slurp the command line arguments.
-set CMD_LINE_ARGS=
-set _SKIP=2
-
-:win9xME_args_slurp
-if "x%~1" == "x" goto execute
-
-set CMD_LINE_ARGS=%*
-
-:execute
-@rem Setup the command line
-
-set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
-
-@rem Execute Gradle
-"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS%
-
-:end
-@rem End local scope for the variables with windows NT shell
-if "%ERRORLEVEL%"=="0" goto mainEnd
-
-:fail
-rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
-rem the _cmd.exe /c_ return code!
-if  not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
-exit /b 1
-
-:mainEnd
-if "%OS%"=="Windows_NT" endlocal
-
-:omega
diff --git a/tools/caffe_translator/scripts/convert_caffe_model.py b/tools/caffe_translator/scripts/convert_caffe_model.py
deleted file mode 100644
index d7f13c4677de..000000000000
--- a/tools/caffe_translator/scripts/convert_caffe_model.py
+++ /dev/null
@@ -1,121 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# coding: utf-8
-"""Script to convert Caffe .modelfile to MXNet .params file"""
-from __future__ import print_function
-import argparse
-import mxnet as mx
-
-import caffe
-from caffe.proto import caffe_pb2
-
-class CaffeModelConverter(object):
-    """Converts Caffe .modelfile to MXNet .params file"""
-    def __init__(self):
-        self.dict_param = {}
-        self.layers = None
-
-    def add_param(self, param_name, layer_index, blob_index):
-        """Add a param to the .params file"""
-        blobs = self.layers[layer_index].blobs
-        self.dict_param[param_name] = mx.nd.array(caffe.io.blobproto_to_array(blobs[blob_index]))
-
-    def add_arg_param(self, param_name, layer_index, blob_index):
-        """Add an arg param to .params file. Example: weights of a fully connected layer."""
-        self.add_param('arg:%s' % param_name, layer_index, blob_index)
-
-    def add_aux_param(self, param_name, layer_index, blob_index):
-        """Add an aux param to .params file. Example: moving_mean in BatchNorm layer """
-        self.add_param('aux:%s' % param_name, layer_index, blob_index)
-
-    def add_optional_arg_param(self, param_name, layer_index, blob_index):
-        """Add an arg param. If there is no such param in .caffemodel fie, silently ignore it."""
-        blobs = self.layers[layer_index].blobs
-        if blob_index < len(blobs):
-            self.add_arg_param(param_name, layer_index, blob_index)
-
-    def convert(self, caffemodel_path, outmodel_path):
-        """Convert a Caffe .caffemodel file to MXNet .params file"""
-        net_param = caffe_pb2.NetParameter()
-        with open(caffemodel_path, 'rb') as caffe_model_file:
-            net_param.ParseFromString(caffe_model_file.read())
-
-        layers = net_param.layer
-        self.layers = layers
-
-        for idx, layer in enumerate(layers):
-            layer_name = str(layer.name)
-
-            if layer.blobs:
-
-                # If this is a layer that has only weight and bias as parameter
-                if layer.type == 'Convolution' or layer.type == 'InnerProduct' \
-                        or layer.type == 'Deconvolution':
-
-                    # Add weight and bias to the dictionary
-                    self.add_arg_param('%s_weight' % layer_name, layer_index=idx, blob_index=0)
-                    self.add_optional_arg_param('%s_bias' % layer_name, layer_index=idx,
-                                                blob_index=1)
-
-                elif layer.type == 'BatchNorm':
-
-                    gamma_param_name = '%s_gamma' % layer_name
-                    beta_param_name = '%s_beta' % layer_name
-
-                    next_layer = layers[idx + 1]
-
-                    if next_layer.type == 'Scale':
-                        # If next layer is scale layer, get gamma and beta from there
-                        self.add_arg_param(gamma_param_name, layer_index=idx+1, blob_index=0)
-                        self.add_arg_param(beta_param_name, layer_index=idx+1, blob_index=1)
-
-                    mean_param_name = '%s_moving_mean' % layer_name
-                    var_param_name = '%s_moving_var' % layer_name
-
-                    self.add_aux_param(mean_param_name, layer_index=idx, blob_index=0)
-                    self.add_aux_param(var_param_name, layer_index=idx, blob_index=1)
-
-                elif layer.type == 'Scale':
-
-                    prev_layer = layers[idx - 1]
-
-                    if prev_layer.type == 'BatchNorm':
-                        continue
-                    else:
-                        # Use the naming convention used by CaffeOp
-                        self.add_arg_param('%s_0_weight' % layer_name, layer_index=idx,
-                                           blob_index=0)
-                        self.add_optional_arg_param('%s_1_bias' % layer_name,
-                                                    layer_index=idx, blob_index=1)
-
-        mx.nd.save(outmodel_path, self.dict_param)
-
-def main():
-    """Read .caffemodel path and .params path as input from command line
-    and use CaffeModelConverter to do the conversion"""
-    parser = argparse.ArgumentParser(description='.caffemodel to MXNet .params converter.')
-    parser.add_argument('caffemodel', help='Path to the .caffemodel file to convert.')
-    parser.add_argument('output_file_name', help='Name of the output .params file.')
-
-    args = parser.parse_args()
-
-    converter = CaffeModelConverter()
-    converter.convert(args.caffemodel, args.output_file_name)
-
-if __name__ == '__main__':
-    main()
diff --git a/tools/caffe_translator/settings.gradle b/tools/caffe_translator/settings.gradle
deleted file mode 100644
index fb333ba9274c..000000000000
--- a/tools/caffe_translator/settings.gradle
+++ /dev/null
@@ -1,19 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-rootProject.name = 'caffetranslator'
-
diff --git a/tools/caffe_translator/src/main/antlr/io/mxnet/caffetranslator/CaffePrototxt.g4 b/tools/caffe_translator/src/main/antlr/io/mxnet/caffetranslator/CaffePrototxt.g4
deleted file mode 100644
index 10093825e4d0..000000000000
--- a/tools/caffe_translator/src/main/antlr/io/mxnet/caffetranslator/CaffePrototxt.g4
+++ /dev/null
@@ -1,74 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file CaffePrototxt.g4
- * \brief Grammar to parse Caffe prototxt
- */
-
-grammar CaffePrototxt;
-
-@header {
-package io.mxnet.caffetranslator;
-}
-
-
-prototxt: name layer+;
-
-solver: pair+;
-
-name: ID COLON STRING;
-
-layer: ID object;
-
-pair: ID COLON? value;
-
-value: object                   #valueObject
-     | (STRING | NUMBER | ID)   #valueLeaf
-     ;
-
-object: LPAREN pair+ RPAREN;
-
-LPAREN: '{';
-
-RPAREN: '}';
-
-COLON: ':';
-
-NUMBER : '-'? ('.' DIGIT+ | DIGIT+ ('.' DIGIT*)? ) Exponent?;
-fragment
-DIGIT : [0-9] ;
-fragment
-Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
-
-ID: LETTER (LETTER|DIGIT)*;
-
-fragment
-LETTER      :   [a-zA-Z\u0080-\u00FF_] ;
-
-STRING      :   '"' ('\\"'|.)*? '"'
-            |   '\'' ('\\\''|.)*? '\'' ;
-
-WS  :   [ \t]+ -> channel(HIDDEN) ;
-
-NL  :   [\n\r]+ -> channel(HIDDEN) ;
-
-COMMENT :  '#' ~( '\r' | '\n' )* {!getText().startsWith("#CaffeToMXNet")}? -> skip;
-
-CAFFE2MXNET: '#CaffeToMXNet' -> skip;
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Config.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Config.java
deleted file mode 100644
index 006e133de2fc..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Config.java
+++ /dev/null
@@ -1,55 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file Config.java
- * \brief Helper class to store config
- */
-
-package io.mxnet.caffetranslator;
-
-import java.util.List;
-import java.util.Vector;
-
-public class Config {
-
-    private static final Config instance = new Config();
-
-    public static Config getInstance() {
-        return instance;
-    }
-
-    private Config() {
-        if (instance != null) {
-            throw new IllegalStateException("Already instantiated");
-        }
-
-        customDataLayers = new Vector<String>();
-    }
-
-    public List<String> getCustomDataLayers() {
-        return customDataLayers;
-    }
-
-    public void addCustomDataLayer(String name) {
-        customDataLayers.add(name);
-    }
-
-    private Vector<String> customDataLayers;
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Converter.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Converter.java
deleted file mode 100644
index 96d6fec9ebdd..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Converter.java
+++ /dev/null
@@ -1,429 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file Converter.java
- * \brief Convert Caffe prototxt to MXNet Python code
- */
-
-package io.mxnet.caffetranslator;
-
-import io.mxnet.caffetranslator.generators.*;
-import lombok.Setter;
-import org.antlr.v4.runtime.CharStream;
-import org.antlr.v4.runtime.CharStreams;
-import org.antlr.v4.runtime.CommonTokenStream;
-import org.stringtemplate.v4.ST;
-import org.stringtemplate.v4.STGroup;
-import org.stringtemplate.v4.STRawGroupDir;
-
-import java.io.File;
-import java.io.FileInputStream;
-import java.io.IOException;
-import java.nio.charset.StandardCharsets;
-import java.util.ArrayList;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Set;
-
-public class Converter {
-
-    private final String trainPrototxt, solverPrototxt;
-    private final MLModel mlModel;
-    private final STGroup stGroup;
-    private final SymbolGeneratorFactory generators;
-    private final String NL;
-    private final GenerationHelper gh;
-    @Setter
-
-    private String paramsFilePath;
-    private Solver solver;
-
-    Converter(String trainPrototxt, String solverPrototxt) {
-        this.trainPrototxt = trainPrototxt;
-        this.solverPrototxt = solverPrototxt;
-        this.mlModel = new MLModel();
-        this.stGroup = new STRawGroupDir("templates");
-        this.generators = SymbolGeneratorFactory.getInstance();
-        NL = System.getProperty("line.separator");
-        gh = new GenerationHelper();
-        addGenerators();
-    }
-
-    private void addGenerators() {
-        generators.addGenerator("Convolution", new ConvolutionGenerator());
-        generators.addGenerator("Deconvolution", new DeconvolutionGenerator());
-        generators.addGenerator("Pooling", new PoolingGenerator());
-        generators.addGenerator("InnerProduct", new FCGenerator());
-        generators.addGenerator("ReLU", new ReluGenerator());
-        generators.addGenerator("SoftmaxWithLoss", new SoftmaxOutputGenerator());
-        generators.addGenerator("PluginIntLayerGenerator", new PluginIntLayerGenerator());
-        generators.addGenerator("CaffePluginLossLayer", new PluginLossGenerator());
-        generators.addGenerator("Permute", new PermuteGenerator());
-        generators.addGenerator("Concat", new ConcatGenerator());
-        generators.addGenerator("BatchNorm", new BatchNormGenerator());
-        generators.addGenerator("Power", new PowerGenerator());
-        generators.addGenerator("Eltwise", new EltwiseGenerator());
-        generators.addGenerator("Flatten", new FlattenGenerator());
-        generators.addGenerator("Dropout", new DropoutGenerator());
-        generators.addGenerator("Scale", new ScaleGenerator());
-    }
-
-    public boolean parseTrainingPrototxt() {
-
-        CharStream cs = null;
-        try {
-            FileInputStream fis = new FileInputStream(new File(trainPrototxt));
-            cs = CharStreams.fromStream(fis, StandardCharsets.UTF_8);
-        } catch (IOException e) {
-            System.err.println("Unable to read prototxt: " + trainPrototxt);
-            return false;
-        }
-
-        CaffePrototxtLexer lexer = new CaffePrototxtLexer(cs);
-
-        CommonTokenStream tokens = new CommonTokenStream(lexer);
-        CaffePrototxtParser parser = new CaffePrototxtParser(tokens);
-
-        CreateModelListener modelCreator = new CreateModelListener(parser, mlModel);
-        parser.addParseListener(modelCreator);
-        parser.prototxt();
-
-        return true;
-    }
-
-    public boolean parseSolverPrototxt() {
-        solver = new Solver(solverPrototxt);
-        return solver.parsePrototxt();
-    }
-
-    public String generateMXNetCode() {
-        if (!parseTrainingPrototxt()) {
-            return "";
-        }
-
-        if (!parseSolverPrototxt()) {
-            return "";
-        }
-
-        StringBuilder code = new StringBuilder();
-
-        code.append(generateImports());
-        code.append(System.lineSeparator());
-
-        code.append(generateLogger());
-        code.append(System.lineSeparator());
-
-        code.append(generateParamInitializer());
-        code.append(System.lineSeparator());
-
-        code.append(generateMetricsClasses());
-        code.append(System.lineSeparator());
-
-        if (paramsFilePath != null) {
-            code.append(generateParamsLoader());
-            code.append(System.lineSeparator());
-        }
-
-        // Convert data layers
-        code.append(generateIterators());
-
-        // Generate variables for data and label
-        code.append(generateInputVars());
-
-        // Convert non data layers
-        List<Layer> layers = mlModel.getNonDataLayers();
-
-        for (int layerIndex = 0; layerIndex < layers.size(); ) {
-            Layer layer = layers.get(layerIndex);
-            SymbolGenerator generator = generators.getGenerator(layer.getType());
-
-            // Handle layers for which there is no Generator
-            if (generator == null) {
-                if (layer.getType().equalsIgnoreCase("Accuracy")) {
-                    // We handle accuracy layers at a later stage. Do nothing for now.
-                } else if (layer.getType().toLowerCase().endsWith("loss")) {
-                    // This is a loss layer we don't have a generator for. Wrap it in CaffeLoss.
-                    generator = generators.getGenerator("CaffePluginLossLayer");
-                } else {
-                    // This is a layer we don't have a generator for. Wrap it in CaffeOp.
-                    generator = generators.getGenerator("PluginIntLayerGenerator");
-                }
-            }
-
-            if (generator != null) { // If we have a generator
-                // Generate code
-                GeneratorOutput out = generator.generate(layer, mlModel);
-                String segment = out.code;
-                code.append(segment);
-                code.append(NL);
-
-                // Update layerIndex depending on how many layers we ended up translating
-                layerIndex += out.numLayersTranslated;
-            } else { // If we don't have a generator
-                // We've decided to skip this layer. Generate no code. Just increment layerIndex
-                // by 1 and move on to the next layer.
-                layerIndex++;
-            }
-        }
-
-        String loss = getLoss(mlModel, code);
-
-        String evalMetric = generateValidationMetrics(mlModel);
-        code.append(evalMetric);
-
-        String runner = generateRunner(loss);
-        code.append(runner);
-
-        return code.toString();
-    }
-
-    private String generateLogger() {
-        ST st = gh.getTemplate("logging");
-        st.add("name", mlModel.getName());
-        return st.render();
-    }
-
-    private String generateRunner(String loss) {
-        ST st = gh.getTemplate("runner");
-        st.add("max_iter", solver.getProperty("max_iter"));
-        st.add("stepsize", solver.getProperty("stepsize"));
-        st.add("snapshot", solver.getProperty("snapshot"));
-        st.add("test_interval", solver.getProperty("test_interval"));
-        st.add("test_iter", solver.getProperty("test_iter"));
-        st.add("snapshot_prefix", solver.getProperty("snapshot_prefix"));
-
-        st.add("train_data_itr", getIteratorName("TRAIN"));
-        st.add("test_data_itr", getIteratorName("TEST"));
-
-        String context = solver.getProperty("solver_mode", "cpu").toLowerCase();
-        context = String.format("mx.%s()", context);
-        st.add("ctx", context);
-
-        st.add("loss", loss);
-
-        st.add("data_names", getDataNames());
-        st.add("label_names", getLabelNames());
-
-        st.add("init_params", generateInitializer());
-
-        st.add("init_optimizer", generateOptimizer());
-        st.add("gamma", solver.getProperty("gamma"));
-        st.add("power", solver.getProperty("power"));
-        st.add("lr_update", generateLRUpdate());
-
-        return st.render();
-    }
-
-    private String generateParamInitializer() {
-        return gh.getTemplate("param_initializer").render();
-    }
-
-    private String generateMetricsClasses() {
-        ST st = gh.getTemplate("metrics_classes");
-
-        String display = solver.getProperty("display");
-        String average_loss = solver.getProperty("average_loss");
-
-        if (display != null) {
-            st.add("display", display);
-        }
-
-        if (average_loss != null) {
-            st.add("average_loss", average_loss);
-        }
-
-        return st.render();
-    }
-
-    private String generateParamsLoader() {
-        return gh.getTemplate("params_loader").render();
-    }
-
-    private String getLoss(MLModel model, StringBuilder out) {
-        List<String> losses = new ArrayList<>();
-        for (Layer layer : model.getLayerList()) {
-            if (layer.getType().toLowerCase().endsWith("loss")) {
-                losses.add(gh.getVarname(layer.getTop()));
-            }
-        }
-
-        if (losses.size() == 1) {
-            return losses.get(0);
-        } else if (losses.size() > 1) {
-            String loss_var = "combined_loss";
-            ST st = gh.getTemplate("group");
-            st.add("var", loss_var);
-            st.add("symbols", losses);
-            out.append(st.render());
-            return loss_var;
-        } else {
-            System.err.println("No loss found");
-            return "unknown_loss";
-        }
-    }
-
-    private String generateLRUpdate() {
-        String code;
-        String lrPolicy = solver.getProperty("lr_policy", "fixed").toLowerCase();
-        ST st;
-        switch (lrPolicy) {
-            case "fixed":
-                // lr stays fixed. No update needed
-                code = "";
-                break;
-            case "multistep":
-                st = gh.getTemplate("lrpolicy_multistep");
-                st.add("steps", solver.getProperties("stepvalue"));
-                code = st.render();
-                break;
-            case "step":
-            case "exp":
-            case "inv":
-            case "poly":
-            case "sigmoid":
-                st = gh.getTemplate("lrpolicy_" + lrPolicy);
-                code = st.render();
-                break;
-            default:
-                String message = "Unknown lr_policy: " + lrPolicy;
-                System.err.println(message);
-                code = "# " + message + System.lineSeparator();
-                break;
-        }
-        return Utils.indent(code, 2, true, 4);
-    }
-
-    private String generateValidationMetrics(MLModel mlModel) {
-        return new AccuracyMetricsGenerator().generate(mlModel);
-    }
-
-    private String generateOptimizer() {
-        Optimizer optimizer = new Optimizer(solver);
-        return optimizer.generateInitCode();
-    }
-
-    private String generateInitializer() {
-        ST st = gh.getTemplate("init_params");
-        st.add("params_file", paramsFilePath);
-        return st.render();
-    }
-
-    private String generateImports() {
-        return gh.getTemplate("imports").render();
-    }
-
-    private StringBuilder generateIterators() {
-        StringBuilder code = new StringBuilder();
-
-        for (Layer layer : mlModel.getDataLayers()) {
-            String iterator = generateIterator(layer);
-            code.append(iterator);
-        }
-
-        return code;
-    }
-
-    private String getIteratorName(String phase) {
-        for (Layer layer : mlModel.getDataLayers()) {
-            String layerPhase = layer.getAttr("include.phase", phase);
-            if (phase.equalsIgnoreCase(layerPhase)) {
-                return layerPhase.toLowerCase() + "_" + layer.getName() + "_" + "itr";
-            }
-        }
-        return null;
-    }
-
-    private List<String> getDataNames() {
-        return getDataNames(0);
-    }
-
-    private List<String> getLabelNames() {
-        return getDataNames(1);
-    }
-
-    private List<String> getDataNames(int topIndex) {
-        List<String> dataList = new ArrayList<String>();
-        for (Layer layer : mlModel.getDataLayers()) {
-            if (layer.getAttr("include.phase").equalsIgnoreCase("train")) {
-                String dataName = layer.getTops().get(topIndex);
-                if (dataName != null) {
-                    dataList.add(String.format("'%s'", dataName));
-                }
-            }
-        }
-        return dataList;
-    }
-
-    private StringBuilder generateInputVars() {
-        StringBuilder code = new StringBuilder();
-
-        Set<String> tops = new HashSet<String>();
-
-        for (Layer layer : mlModel.getDataLayers())
-            for (String top : layer.getTops())
-                tops.add(top);
-
-        for (String top : tops)
-            code.append(gh.generateVar(gh.getVarname(top), top, null, null, null, null));
-
-        code.append(System.lineSeparator());
-        return code;
-    }
-
-    private String generateIterator(Layer layer) {
-        String iteratorName = layer.getAttr("include.phase");
-        iteratorName = iteratorName.toLowerCase();
-        iteratorName = iteratorName + "_" + layer.getName() + "_" + "itr";
-
-        ST st = stGroup.getInstanceOf("iterator");
-
-        String prototxt = layer.getPrototxt();
-        prototxt = prototxt.replace("\r", "");
-        prototxt = prototxt.replace("\n", " \\\n");
-        prototxt = "'" + prototxt + "'";
-        prototxt = Utils.indent(prototxt, 1, true, 4);
-
-        st.add("iter_name", iteratorName);
-        st.add("prototxt", prototxt);
-
-        String dataName = "???";
-        if (layer.getTops().size() >= 1) {
-            dataName = layer.getTops().get(0);
-        } else {
-            System.err.println(String.format("Data layer %s doesn't have data", layer.getName()));
-        }
-        st.add("data_name", dataName);
-
-        String labelName = "???";
-        if (layer.getTops().size() >= 1) {
-            labelName = layer.getTops().get(1);
-        } else {
-            System.err.println(String.format("Data layer %s doesn't have label", layer.getName()));
-        }
-        st.add("label_name", labelName);
-
-        if (layer.hasAttr("data_param.num_examples")) {
-            st.add("num_examples", layer.getAttr("data_param.num_examples"));
-        }
-
-        return st.render();
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/CreateModelListener.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/CreateModelListener.java
deleted file mode 100644
index 75800a18eadd..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/CreateModelListener.java
+++ /dev/null
@@ -1,144 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file CreateModelListener.java
- * \brief ANTLR listener that builds MLModel as the parser parses the Caffe prototxt
- */
-
-package io.mxnet.caffetranslator;
-
-import lombok.Getter;
-import org.antlr.v4.runtime.Token;
-import org.antlr.v4.runtime.TokenStream;
-
-import java.util.HashMap;
-import java.util.Map;
-import java.util.Stack;
-
-public class CreateModelListener extends CaffePrototxtBaseListener {
-
-    private final CaffePrototxtParser parser;
-    @Getter
-    private final MLModel mlModel;
-    private final Stack<String> keys;
-    private final ParserHelper parserHelper;
-
-    private Layer currentLayer;
-    private Map<String, String> currentParams;
-
-    public CreateModelListener(CaffePrototxtParser parser, MLModel mlModel) {
-        this.parser = parser;
-        this.mlModel = mlModel;
-        this.keys = new Stack<>();
-        this.currentParams = new HashMap<>();
-        this.parserHelper = new ParserHelper();
-    }
-
-    @Override
-    public void exitName(CaffePrototxtParser.NameContext ctx) {
-        String name = ctx.STRING().toString();
-        mlModel.setName(parserHelper.removeQuotes(name));
-    }
-
-    @Override
-    public void enterLayer(CaffePrototxtParser.LayerContext ctx) {
-        keys.clear();
-        currentLayer = new Layer();
-    }
-
-    @Override
-    public void exitLayer(CaffePrototxtParser.LayerContext ctx) {
-        TokenStream tokens = parser.getTokenStream();
-        String prototxt = getPrototxt(tokens, ctx.getStart().getTokenIndex(), ctx.getStop().getTokenIndex());
-
-        if (currentLayer.getTops().size() == 1) {
-            currentLayer.addAttr("top", currentLayer.getTops().get(0));
-        }
-
-        if (currentLayer.getBottoms().size() == 1) {
-            currentLayer.addAttr("bottom", currentLayer.getBottoms().get(0));
-        }
-
-        currentLayer.setPrototxt(prototxt);
-        mlModel.addLayer(currentLayer);
-    }
-
-    private String getPrototxt(TokenStream stream, int start, int end) {
-        StringBuilder prototxt = new StringBuilder();
-        for (int i = start; i <= end; i++) {
-            Token token = stream.get(i);
-            prototxt.append(token.getText());
-        }
-        String strPrototxt = prototxt.toString();
-        return strPrototxt.replaceAll(" +num_examples:.*\\s", "");
-    }
-
-    @Override
-    public void enterPair(CaffePrototxtParser.PairContext ctx) {
-        String key = ctx.getStart().getText();
-        keys.push(key);
-    }
-
-    @Override
-    public void exitPair(CaffePrototxtParser.PairContext ctx) {
-
-        if (getCurrentKey().equals("param")) {
-            currentLayer.getParams().add(currentParams);
-            currentParams = new HashMap<>();
-        }
-
-        keys.pop();
-    }
-
-    @Override
-    public void exitValueLeaf(CaffePrototxtParser.ValueLeafContext ctx) {
-        String value = ctx.getText();
-        value = parserHelper.removeQuotes(value);
-        processKeyValue(getCurrentKey(), value);
-    }
-
-    protected void processKeyValue(String key, String value) {
-        switch (key) {
-            case "name":
-                currentLayer.setName(value);
-                break;
-            case "top":
-                currentLayer.addTop(value);
-                return;
-            case "bottom":
-                currentLayer.addBottom(value);
-                return;
-        }
-
-        if (key.toLowerCase().startsWith("param.")) {
-            currentParams.put(key, value);
-        }
-
-        currentLayer.addAttr(key, value);
-    }
-
-    private String getCurrentKey() {
-        StringBuilder sb = new StringBuilder();
-        for (String s : keys) {
-            sb.append(s + ".");
-        }
-        return sb.substring(0, sb.length() - 1).toString();
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/GenerationHelper.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/GenerationHelper.java
deleted file mode 100644
index 1cac5468127b..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/GenerationHelper.java
+++ /dev/null
@@ -1,195 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file GenerationHelper.java
- * \brief Helper class used by generators
- */
-
-package io.mxnet.caffetranslator;
-
-import org.stringtemplate.v4.ST;
-import org.stringtemplate.v4.STErrorListener;
-import org.stringtemplate.v4.STGroup;
-import org.stringtemplate.v4.STGroupFile;
-import org.stringtemplate.v4.STRawGroupDir;
-import org.stringtemplate.v4.misc.STMessage;
-
-import java.util.ArrayList;
-import java.util.List;
-
-public class GenerationHelper {
-
-    protected final STGroup stGroupDir;
-
-    protected final STGroup stGroupFile;
-
-    private class SuppressSTErrorsListener implements STErrorListener {
-
-        @Override
-        public void compileTimeError(STMessage msg) {
-            // Do nothing
-        }
-
-        @Override
-        public void runTimeError(STMessage msg) {
-            // Do nothing
-        }
-
-        @Override
-        public void IOError(STMessage msg) {
-            throw new RuntimeException(msg.toString());
-        }
-
-        @Override
-        public void internalError(STMessage msg) {
-            throw new RuntimeException(msg.toString());
-        }
-    }
-
-    public GenerationHelper() {
-        this.stGroupDir = new STRawGroupDir("templates");
-        this.stGroupFile = new STGroupFile("templates/symbols.stg");
-
-        SuppressSTErrorsListener errListener = new SuppressSTErrorsListener();
-        stGroupDir.setListener(errListener);
-        stGroupFile.setListener(errListener);
-    }
-
-    public ST getTemplate(String name) {
-        ST st = stGroupDir.getInstanceOf(name);
-        if (st != null) {
-            return st;
-        }
-        return stGroupFile.getInstanceOf(name);
-    }
-
-    public String generateVar(String varName, String symName, String lr_mult, String wd_mult, String init, List<Integer> shape) {
-        ST st = getTemplate("var");
-        st.add("var", varName);
-        st.add("name", symName);
-
-        st.add("lr_mult", lr_mult);
-        st.add("wd_mult", wd_mult);
-        st.add("init", init);
-        st.add("shape", shape);
-
-        return st.render();
-    }
-
-    public String getInit(String fillerType, String fillerValue) {
-        if (fillerType == null && fillerValue == null) {
-            return null;
-        }
-
-        if (fillerType == null) {
-            fillerType = "constant";
-        }
-
-        if (fillerValue == null) {
-            fillerValue = "0";
-        }
-
-        String initializer;
-        switch (fillerType) {
-            case "xavier":
-                initializer = "mx.initializer.Xavier()";
-                break;
-            case "gaussian":
-                initializer = "mx.initializer.Normal()";
-                break;
-            case "constant":
-                initializer = String.format("mx.initializer.Constant(%s)", fillerValue);
-                break;
-            case "bilinear":
-                initializer = "mx.initializer.Bilinear()";
-                break;
-            default:
-                initializer = "UnknownInitializer";
-                System.err.println("Initializer " + fillerType + " not supported");
-                break;
-        }
-
-        return initializer;
-    }
-
-    public String getVarname(String name) {
-        StringBuilder sb = new StringBuilder(name);
-        for (int i = 0; i < sb.length(); i++) {
-            char ch = sb.charAt(i);
-            if (Character.isLetter(ch) || Character.isDigit(ch) || ch == '_') {
-                // do nothing
-            } else {
-                sb.replace(i, i + 1, "_");
-            }
-        }
-        return sb.toString();
-    }
-
-    public List<String> getVarNames(List<String> names) {
-        List<String> list = new ArrayList<>();
-        for (String name : names) {
-            list.add(getVarname(name));
-        }
-        return list;
-    }
-
-    public void fillNameDataAndVar(ST st, Layer layer) {
-        st.add("name", layer.getName());
-        st.add("data", getVarname(layer.getBottom()));
-        st.add("var", getVarname(layer.getTop()));
-    }
-
-    public void simpleFillTemplate(ST st, String name, Layer layer, String key, String defaultValue, String... altKeys) {
-        String value = layer.getAttr(key);
-
-        if (value == null) {
-            for (String altKey : altKeys) {
-                value = layer.getAttr(altKey);
-                if (value != null) {
-                    break;
-                }
-            }
-        }
-
-        if (value == null && defaultValue != null) {
-            value = defaultValue;
-        }
-
-        if (value == null) {
-            System.err.println(String.format("Layer %s does not contain attribute %s or alternates",
-                    layer.getName(), key));
-            value = "???";
-        }
-
-        st.add(name, value);
-    }
-
-    public GeneratorOutput makeGeneratorOutput(String code, int numLayersTranslated) {
-        return new GeneratorOutput(code, numLayersTranslated);
-    }
-
-    public String initializeParam(String varname, int childIndex, String initializer) {
-        StringBuilder out = new StringBuilder();
-        out.append(String.format("param_initializer.add_param(%s.get_children()[%d].name, %s)",
-                varname, childIndex, initializer));
-        out.append(System.lineSeparator());
-        return out.toString();
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/GeneratorOutput.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/GeneratorOutput.java
deleted file mode 100644
index fd27bb354665..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/GeneratorOutput.java
+++ /dev/null
@@ -1,35 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file GeneratorOutput.java
- * \brief Output of each generator
- */
-
-package io.mxnet.caffetranslator;
-
-public class GeneratorOutput {
-    public final String code;
-    public final int numLayersTranslated;
-
-    public GeneratorOutput(String code, int n) {
-        this.code = code;
-        this.numLayersTranslated = n;
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Launcher.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Launcher.java
deleted file mode 100644
index 9fd3cbd6199a..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Launcher.java
+++ /dev/null
@@ -1,178 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file Launcher.java
- * \brief Parses command line and invokes Converter
- */
-
-package io.mxnet.caffetranslator;
-
-
-import org.apache.commons.cli.CommandLine;
-import org.apache.commons.cli.CommandLineParser;
-import org.apache.commons.cli.DefaultParser;
-import org.apache.commons.cli.Option;
-import org.apache.commons.cli.Options;
-import org.apache.commons.cli.ParseException;
-
-import java.io.File;
-import java.io.FileNotFoundException;
-import java.io.PrintWriter;
-
-public class Launcher {
-
-    private String trainingPrototextPath, solverPrototextPath;
-    private String paramsFilePath;
-    private File outFile;
-
-    protected final String TRAINING_PROTOTXT = "training-prototxt";
-    protected final String SOLVER_PROTOTXT = "solver";
-    protected final String CUSTOM_DATA_LAYERS = "custom-data-layers";
-    protected final String OUTPUT_FILE = "output-file";
-    protected final String PARAMS_FILE = "params-file";
-    protected final String GRAPH_FILE = "graph-file";
-
-
-    public static void main(String[] args) {
-        Launcher launcher = new Launcher();
-        launcher.run(args);
-    }
-
-    public void run(String[] args) {
-        parseCommandLine(args);
-
-        Converter converter = new Converter(trainingPrototextPath, solverPrototextPath);
-        if (paramsFilePath != null) {
-            converter.setParamsFilePath(paramsFilePath);
-        }
-        String code = converter.generateMXNetCode();
-
-        writeToOutFile(code);
-        System.out.println("Translated code saved in " + outFile.getAbsolutePath());
-    }
-
-    private void writeToOutFile(String code) {
-        PrintWriter out;
-        try {
-            out = new PrintWriter(outFile);
-        } catch (FileNotFoundException e) {
-            System.err.println(String.format("Unable to open %s for writing", outFile.getAbsoluteFile()));
-            return;
-        }
-
-        out.print(code);
-        out.flush();
-    }
-
-    public void parseCommandLine(String[] args) {
-        CommandLineParser clParser = new DefaultParser();
-
-        Options options = new Options();
-
-        Option prototxtOption = Option.builder("t")
-                .longOpt(TRAINING_PROTOTXT)
-                .hasArg()
-                .desc("training/validation prototxt")
-                .build();
-        options.addOption(prototxtOption);
-
-        Option solverOption = Option.builder("s")
-                .longOpt(SOLVER_PROTOTXT)
-                .hasArg()
-                .desc("solver prototxt")
-                .build();
-        options.addOption(solverOption);
-
-        Option dataLayerOpt = Option.builder("c")
-                .longOpt(CUSTOM_DATA_LAYERS)
-                .hasArg()
-                .desc("Comma separated custom data layers")
-                .build();
-        options.addOption(dataLayerOpt);
-
-        Option outfileOpt = Option.builder("o")
-                .longOpt(OUTPUT_FILE)
-                .hasArg()
-                .desc("Output file")
-                .build();
-        options.addOption(outfileOpt);
-
-        Option paramsFileOpt = Option.builder("p")
-                .longOpt(PARAMS_FILE)
-                .hasArg()
-                .desc("Params file")
-                .build();
-        options.addOption(paramsFileOpt);
-
-        Option graphFileOpt = Option.builder("g")
-                .longOpt(GRAPH_FILE)
-                .hasArg()
-                .desc("Image file to visualize computation graph")
-                .build();
-        options.addOption(graphFileOpt);
-
-        CommandLine line = null;
-        try {
-            line = clParser.parse(options, args);
-        } catch (ParseException e) {
-            System.out.println("Exception parsing commandline:" + e.getMessage());
-            System.exit(1);
-        }
-
-        if ((trainingPrototextPath = getOption(line, TRAINING_PROTOTXT)) == null) {
-            bail("Command line argument " + TRAINING_PROTOTXT + " missing");
-        }
-
-        if ((solverPrototextPath = getOption(line, SOLVER_PROTOTXT)) == null) {
-            bail("Command line argument " + SOLVER_PROTOTXT + " missing");
-        }
-
-        String strOutFile = getOption(line, OUTPUT_FILE);
-        if (strOutFile == null) {
-            bail("Command line argument " + OUTPUT_FILE + " missing");
-        }
-        outFile = new File(strOutFile);
-
-        paramsFilePath = getOption(line, PARAMS_FILE);
-
-        String dataLayers;
-        Config config = Config.getInstance();
-        if ((dataLayers = getOption(line, CUSTOM_DATA_LAYERS)) != null) {
-            for (String name : dataLayers.split(",")) {
-                name = name.trim();
-                config.addCustomDataLayer(name);
-            }
-        }
-
-    }
-
-    private String getOption(CommandLine line, String argName) {
-        if (line.hasOption(argName)) {
-            return line.getOptionValue(argName);
-        } else {
-            return null;
-        }
-    }
-
-    private void bail(String reason) {
-        System.err.println(reason);
-        System.exit(1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Layer.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Layer.java
deleted file mode 100644
index dac8ff4ffb6d..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Layer.java
+++ /dev/null
@@ -1,141 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file Layer.java
- * \brief Model for a layer
- */
-
-package io.mxnet.caffetranslator;
-
-import lombok.Getter;
-import lombok.Setter;
-
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
-
-public class Layer {
-
-    @Getter
-    @Setter
-    private String name;
-
-    @Getter
-    @Setter
-    private int layerIndex;
-
-    @Getter
-    @Setter
-    private Kind kind;
-
-    @Getter
-    @Setter
-    private String prototxt;
-
-    @Getter
-    private final List<String> bottoms;
-
-    @Getter
-    private final List<String> tops;
-
-    @Setter
-    @Getter
-    private List<Map<String, String>> params;
-
-    @Setter
-    private Map<String, List<String>> attr;
-
-    public Layer() {
-        tops = new ArrayList<>();
-        bottoms = new ArrayList<>();
-        attr = new HashMap<>();
-        params = new ArrayList<>();
-    }
-
-    public Layer(int layerIndex) {
-        this();
-        this.layerIndex = layerIndex;
-    }
-
-    public void addAttr(String key, String value) {
-        List<String> list = attr.get(key);
-        if (list == null) {
-            list = new ArrayList<String>();
-            list.add(value);
-            attr.put(key, list);
-        } else {
-            list.add(value);
-        }
-    }
-
-    public String getAttr(String key) {
-        List<String> list = attr.get(key);
-        if (list == null) {
-            return null;
-        }
-
-        return list.get(0);
-    }
-
-    public String getAttr(String key, String defaultValue) {
-        String attr = getAttr(key);
-        return attr != null ? attr : defaultValue;
-    }
-
-    public boolean hasAttr(String key) {
-        return attr.containsKey(key);
-    }
-
-    public boolean attrEquals(String key, String value) {
-        if (!attr.containsKey(key)) {
-            return false;
-        }
-        return getAttr(key).equals(value);
-    }
-
-    public List<String> getAttrList(String key) {
-        return attr.get(key);
-    }
-
-    public void addTop(String top) {
-        tops.add(top);
-    }
-
-    public void addBottom(String bottom) {
-        bottoms.add(bottom);
-    }
-
-    public String getBottom() {
-        return bottoms.size() > 0 ? bottoms.get(0) : null;
-    }
-
-    public String getType() {
-        return attr.get("type").get(0);
-    }
-
-    public String getTop() {
-        return tops.size() > 0 ? tops.get(0) : null;
-    }
-
-    public enum Kind {
-        DATA, INTERMEDIATE, LOSS;
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/MLModel.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/MLModel.java
deleted file mode 100644
index 08f0fe7f1183..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/MLModel.java
+++ /dev/null
@@ -1,105 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file MLModel.java
- * \brief Models a ML model
- */
-
-package io.mxnet.caffetranslator;
-
-import lombok.Getter;
-import lombok.Setter;
-
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
-
-public class MLModel {
-
-    public MLModel() {
-        layerList = new ArrayList<>();
-        layerLookup = new HashMap<>();
-        layerIndex = 0;
-    }
-
-    @Getter
-    @Setter
-    private String name;
-
-    @Getter
-    @Setter
-    private List<Layer> layerList;
-
-    private final Map<String, Map<String, Layer>> layerLookup;
-
-    private int layerIndex;
-
-    public void addLayer(Layer layer) {
-
-        layer.setLayerIndex(layerIndex++);
-        layerList.add(layer);
-
-        String name = layer.getName();
-        String includePhase = layer.getAttr("include.phase");
-        includePhase = (includePhase == null) ? "" : includePhase;
-
-        if (layerLookup.containsKey(name)) {
-            layerLookup.get(name).put(includePhase, layer);
-        } else {
-            HashMap map = new HashMap();
-            map.put(includePhase, layer);
-            layerLookup.put(name, map);
-        }
-
-        String type = layer.getAttr("type");
-        Config config = Config.getInstance();
-        if (type.equals("Data") || config.getCustomDataLayers().contains(type)) {
-            layer.setKind(Layer.Kind.DATA);
-        } else if (type.toLowerCase().endsWith("loss")) {
-            layer.setKind(Layer.Kind.LOSS);
-        } else {
-            layer.setKind(Layer.Kind.INTERMEDIATE);
-        }
-    }
-
-    public List<Layer> getDataLayers() {
-        List<Layer> ret = new ArrayList<>();
-
-        for (Layer layer : layerList) {
-            if (layer.getKind() == Layer.Kind.DATA) {
-                ret.add(layer);
-            }
-        }
-        return ret;
-    }
-
-    public List<Layer> getNonDataLayers() {
-        List<Layer> ret = new ArrayList<>();
-
-        for (Layer layer : layerList) {
-            if (layer.getKind() != Layer.Kind.DATA) {
-                ret.add(layer);
-            }
-        }
-        return ret;
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Optimizer.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Optimizer.java
deleted file mode 100644
index da2494249540..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Optimizer.java
+++ /dev/null
@@ -1,48 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file Optimizer.java
- * \brief Generates optimizer from solver prototxt
- */
-
-package io.mxnet.caffetranslator;
-
-import org.stringtemplate.v4.ST;
-
-public class Optimizer {
-    private final GenerationHelper gh;
-    private final Solver solver;
-
-    public Optimizer(Solver solver) {
-        this.gh = new GenerationHelper();
-        this.solver = solver;
-    }
-
-    public String generateInitCode() {
-        ST st = gh.getTemplate("opt_" + solver.getType().toLowerCase());
-        if (st == null) {
-            System.err.println(String.format("Unknown optimizer type (%s). Using SGD instead.", solver.getType()));
-            st = gh.getTemplate("opt_sgd");
-        }
-
-        st.add("solver", solver);
-        return st.render();
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/ParserHelper.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/ParserHelper.java
deleted file mode 100644
index bf8a5814406c..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/ParserHelper.java
+++ /dev/null
@@ -1,36 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file ParserHelper.java
- * \brief Helpers required by the command line parser
- */
-
-package io.mxnet.caffetranslator;
-
-public class ParserHelper {
-    public String removeQuotes(String arg) {
-        boolean doubleQuoteStr = (arg.startsWith("\"") && arg.endsWith("\""));
-        boolean singleQuoteStr = (arg.startsWith("'") && arg.endsWith("'"));
-        if ((singleQuoteStr | doubleQuoteStr) && arg.length() > 2) {
-            arg = arg.substring(1, arg.length() - 1);
-        }
-        return arg;
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Solver.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Solver.java
deleted file mode 100644
index 969377112cd8..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Solver.java
+++ /dev/null
@@ -1,148 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file Solver.java
- * \brief Model for the Caffe solver prototxt
- */
-
-package io.mxnet.caffetranslator;
-
-import lombok.Getter;
-import org.antlr.v4.runtime.CharStream;
-import org.antlr.v4.runtime.CharStreams;
-import org.antlr.v4.runtime.CommonTokenStream;
-
-import java.io.File;
-import java.io.FileInputStream;
-import java.io.IOException;
-import java.lang.reflect.Field;
-import java.nio.charset.StandardCharsets;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
-
-public class Solver {
-
-    private final String solverPath;
-    private boolean parseDone;
-    private Map<String, List<String>> properties;
-    /**
-     * Fields corresponding to keys that can be present in the solver prototxt. 'setFields' sets these
-     * using reflection after parsing the solver prototxt. A solver object is passed to string templates
-     * and the templates read these fields.
-     */
-    @Getter
-    private String base_lr, momentum, weight_decay, lr_policy, gamma, stepsize, stepvalue, max_iter,
-            solver_mode, snapshot, snapshot_prefix, test_iter, test_interval, display, type, delta,
-            momentum2, rms_decay, solver_type;
-
-    public Solver(String solverPath) {
-        this.solverPath = solverPath;
-        properties = new HashMap<>();
-    }
-
-    public boolean parsePrototxt() {
-        CharStream cs = null;
-        try {
-            FileInputStream fis = new FileInputStream(new File(solverPath));
-            cs = CharStreams.fromStream(fis, StandardCharsets.UTF_8);
-        } catch (IOException e) {
-            System.err.println("Unable to read prototxt " + solverPath);
-            return false;
-        }
-
-        CaffePrototxtLexer lexer = new CaffePrototxtLexer(cs);
-        CommonTokenStream tokens = new CommonTokenStream(lexer);
-        CaffePrototxtParser parser = new CaffePrototxtParser(tokens);
-
-        SolverListener solverListener = new SolverListener();
-        parser.addParseListener(solverListener);
-        parser.solver();
-
-        properties = solverListener.getProperties();
-
-        setFields(properties);
-
-        parseDone = true;
-        return true;
-    }
-
-    private void setFields(Map<String, List<String>> properties) {
-        Class<?> cls = getClass();
-
-        for (Map.Entry<String, List<String>> entry : properties.entrySet()) {
-            String key = entry.getKey();
-            try {
-                Field field = cls.getDeclaredField(key);
-                field.set(this, entry.getValue().get(0));
-            } catch (NoSuchFieldException e) {
-                // Just ignore
-            } catch (IllegalAccessException e) {
-                /**
-                 * This shouldn't happen. If it does happen because we overlooked something, print
-                 * it in the console so we can investigate it.
-                 */
-                e.printStackTrace();
-            }
-        }
-
-        setDefaults();
-    }
-
-    private void setDefaults() {
-        if (type == null) {
-            type = "SGD";
-        }
-        if (delta == null) {
-            delta = "1e-8";
-        }
-        if (momentum2 == null) {
-            momentum2 = "0.999";
-        }
-        if (rms_decay == null) {
-            rms_decay = "0.99";
-        }
-    }
-
-    public String getProperty(String key) {
-        List<String> list = getProperties(key);
-        if (list == null) {
-            return null;
-        }
-        return getProperties(key).get(0);
-    }
-
-    public List<String> getProperties(String key) {
-        if (!parseDone) {
-            parsePrototxt();
-        }
-
-        return properties.get(key);
-    }
-
-    public String getProperty(String key, String defaultValue) {
-        String value = getProperty(key);
-        if (value == null) {
-            return defaultValue;
-        } else {
-            return value;
-        }
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/SolverListener.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/SolverListener.java
deleted file mode 100644
index 18b7fe1c76b0..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/SolverListener.java
+++ /dev/null
@@ -1,58 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file SolverListener.java
- * \brief ANTLR listener that builds the Solver instance as the solver prototxt is parsed
- */
-
-package io.mxnet.caffetranslator;
-
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
-
-public class SolverListener extends CaffePrototxtBaseListener {
-
-    private final Map<String, List<String>> properties;
-    private final ParserHelper parserHelper;
-
-    public SolverListener() {
-        properties = new HashMap<>();
-        parserHelper = new ParserHelper();
-    }
-
-    public Map<String, List<String>> getProperties() {
-        return properties;
-    }
-
-    @Override
-    public void exitPair(CaffePrototxtParser.PairContext ctx) {
-        String key = ctx.ID().getText();
-        String value = ctx.value().getText();
-        value = parserHelper.removeQuotes(value);
-
-        if (properties.get(key) == null) {
-            properties.put(key, new ArrayList<>());
-        }
-
-        properties.get(key).add(value);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/SymbolGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/SymbolGenerator.java
deleted file mode 100644
index 7a21aedb7361..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/SymbolGenerator.java
+++ /dev/null
@@ -1,29 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file SymbolGenerator.java
- * \brief Interface that every layer generator implements
- */
-
-package io.mxnet.caffetranslator;
-
-public interface SymbolGenerator {
-    public GeneratorOutput generate(Layer layer, MLModel model);
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/SymbolGeneratorFactory.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/SymbolGeneratorFactory.java
deleted file mode 100644
index 5dea77e45e9f..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/SymbolGeneratorFactory.java
+++ /dev/null
@@ -1,53 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file SymbolGeneratorFactory.java
- * \brief A factory used to create a generator for a given layer type
- */
-
-package io.mxnet.caffetranslator;
-
-import java.util.HashMap;
-import java.util.Map;
-
-public class SymbolGeneratorFactory {
-
-    private static SymbolGeneratorFactory instance = new SymbolGeneratorFactory();
-    Map<String, SymbolGenerator> generators;
-
-    public static SymbolGeneratorFactory getInstance() {
-        return instance;
-    }
-
-    private SymbolGeneratorFactory() {
-        if (instance != null) {
-            throw new IllegalStateException("SymbolGeneratorFactory already instantiated");
-        }
-        generators = new HashMap<>();
-    }
-
-    public SymbolGenerator getGenerator(String symbolType) {
-        return generators.get(symbolType);
-    }
-
-    public void addGenerator(String symbolType, SymbolGenerator generator) {
-        generators.put(symbolType, generator);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Utils.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Utils.java
deleted file mode 100644
index 0b006b1ad148..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/Utils.java
+++ /dev/null
@@ -1,42 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file Utils.java
- * \brief General util functions
- */
-
-package io.mxnet.caffetranslator;
-
-import java.util.Collections;
-
-public class Utils {
-    public static String indent(String str, int level, boolean useSpaces, int numSpaces) {
-        String prefix;
-        if (!useSpaces) {
-            prefix = String.join("", Collections.nCopies(level, "\t"));
-        } else {
-            String spaces = String.join("", Collections.nCopies(numSpaces, " "));
-            prefix = String.join("", Collections.nCopies(level, spaces));
-        }
-
-        String indented = str.replaceAll("(?m)^", prefix);
-        return indented;
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/AccuracyMetricsGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/AccuracyMetricsGenerator.java
deleted file mode 100644
index d1f185febdc9..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/AccuracyMetricsGenerator.java
+++ /dev/null
@@ -1,83 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file AccuracyMetricsGenerator.java
- * \brief Generate Accuracy metric
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GenerationHelper;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-import java.util.HashMap;
-import java.util.Map;
-
-public class AccuracyMetricsGenerator {
-
-    private final Map<String, String> map;
-    private final GenerationHelper gh;
-
-    public AccuracyMetricsGenerator() {
-        map = new HashMap<>();
-        gh = new GenerationHelper();
-    }
-
-    public String generate(MLModel model) {
-        StringBuilder out = new StringBuilder();
-        generateMap(model);
-
-        for (Layer layer : model.getLayerList()) {
-            if (layer.getType().equals("Accuracy")) {
-                ST st;
-                if (layer.getAttr("accuracy_param.top_k", "1").equals("1")) {
-                    st = gh.getTemplate("accuracy");
-                } else {
-                    st = gh.getTemplate("top_k_accuracy");
-                    st.add("k", layer.getAttr("accuracy_param.top_k"));
-                }
-
-                st.add("var", gh.getVarname(layer.getTop()));
-                String outputName = map.get(layer.getBottoms().get(0)) + "_output";
-                st.add("output_name", outputName);
-                st.add("label_name", layer.getBottoms().get(1));
-                st.add("name", layer.getName());
-
-                out.append(st.render());
-                out.append(System.lineSeparator());
-            }
-        }
-
-        return out.toString();
-    }
-
-    private void generateMap(MLModel model) {
-        for (Layer layer : model.getLayerList()) {
-            // If this is not SoftmaxWithLoss, move on
-            if (!layer.getType().equals("SoftmaxWithLoss")) {
-                continue;
-            }
-
-            map.put(layer.getBottoms().get(0), layer.getName());
-        }
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/BaseGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/BaseGenerator.java
deleted file mode 100644
index 0d7fc059d148..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/BaseGenerator.java
+++ /dev/null
@@ -1,60 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file BaseGenerator.java
- * \brief Base class for all source generators
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GenerationHelper;
-import io.mxnet.caffetranslator.SymbolGenerator;
-import org.stringtemplate.v4.ST;
-
-import java.util.List;
-
-public abstract class BaseGenerator implements SymbolGenerator {
-
-    protected final GenerationHelper gh;
-
-    public BaseGenerator() {
-        gh = new GenerationHelper();
-    }
-
-    protected ST getTemplate(String name) {
-        return gh.getTemplate(name);
-    }
-
-    protected String generateVar(String varName, String symName, String lr_mult, String wd_mult, String init, List<Integer> shape) {
-        ST st = getTemplate("var");
-        st.add("var", varName);
-        st.add("name", symName);
-
-        st.add("lr_mult", (lr_mult == null) ? "None" : lr_mult);
-        st.add("wd_mult", (wd_mult == null) ? "None" : wd_mult);
-        st.add("init", (init == null) ? "None" : init);
-        if (shape != null) {
-            st.add("shape", shape);
-        }
-
-        return st.render();
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/BatchNormGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/BatchNormGenerator.java
deleted file mode 100644
index 503bd3eeb001..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/BatchNormGenerator.java
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file BatchNormGenerator.java
- * \brief Generate BatchNorm layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class BatchNormGenerator extends BaseGenerator {
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        ST st = getTemplate("batchnorm");
-
-        gh.fillNameDataAndVar(st, layer);
-
-        if (layer.attrEquals("batch_norm_param.use_global_stats", "true")) {
-            st.add("use_global_stats", true);
-        }
-
-        int layerIndex = layer.getLayerIndex();
-        Layer nextLayer = model.getLayerList().get(layerIndex + 1);
-
-        boolean nextLayerIsScale = false;
-        if (nextLayer.getType().toLowerCase().equals("scale")) {
-            String axis = nextLayer.getAttr("ScaleParameter.axis", "1");
-            String numAxis = nextLayer.getAttr("ScaleParameter.num_axes", "1");
-            if (axis.equals("1") && numAxis.equals("1")) {
-                String biasTerm = nextLayer.getAttr("ScaleParameter.bias_term", "false");
-                if (biasTerm.toLowerCase().equals("false")) {
-                    nextLayerIsScale = true;
-                }
-            }
-        }
-
-        if (!nextLayerIsScale) {
-            st.add("fix_beta", true);
-            st.add("fix_gamma", true);
-        }
-
-        return new GeneratorOutput(st.render(), nextLayerIsScale ? 2 : 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ConcatGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ConcatGenerator.java
deleted file mode 100644
index c9a57944f86b..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ConcatGenerator.java
+++ /dev/null
@@ -1,49 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file ConcatGenerator.java
- * \brief Generate Concat layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class ConcatGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        ST st = getTemplate("concat");
-
-        st.add("name", layer.getName());
-        st.add("var", gh.getVarname(layer.getTop()));
-        st.add("data", gh.getVarNames(layer.getBottoms()));
-
-        String dim = layer.getAttr("concat_param.axis");
-        if (dim != null) {
-            st.add("dim", dim);
-        }
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ConvolutionGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ConvolutionGenerator.java
deleted file mode 100644
index eda59e550a84..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ConvolutionGenerator.java
+++ /dev/null
@@ -1,101 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file ConvolutionGenerator.java
- * \brief Generate Convolution layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-import java.util.Map;
-
-public class ConvolutionGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        StringBuilder out = new StringBuilder();
-
-        ST st = getTemplate("convolution");
-        gh.fillNameDataAndVar(st, layer);
-
-        // Set kernel size
-        gh.simpleFillTemplate(st, "kernel_h", layer, "convolution_param.kernel_h", null,
-                "convolution_param.kernel_size");
-        gh.simpleFillTemplate(st, "kernel_w", layer, "convolution_param.kernel_w", null,
-                "convolution_param.kernel_size");
-
-        // Set stride
-        gh.simpleFillTemplate(st, "stride_h", layer, "convolution_param.stride_h", "1",
-                "convolution_param.stride");
-        gh.simpleFillTemplate(st, "stride_w", layer, "convolution_param.stride_w", "1",
-                "convolution_param.stride");
-
-        // Set padding
-        gh.simpleFillTemplate(st, "pad_h", layer, "convolution_param.pad_h", "0",
-                "convolution_param.pad");
-        gh.simpleFillTemplate(st, "pad_w", layer, "convolution_param.pad_w", "0",
-                "convolution_param.pad");
-
-        // Use bias?
-        if (layer.attrEquals("convolution_param.bias_term", "false")) {
-            st.add("no_bias", "NoBiasPlease"); //value doesn't matter
-        }
-
-        // Number of channels in output
-        gh.simpleFillTemplate(st, "num_filter", layer, "convolution_param.num_output", null);
-
-        String weightInit = gh.getInit(
-                layer.getAttr("convolution_param.weight_filler.type"),
-                layer.getAttr("convolution_param.weight_filler.value"));
-
-        String biasInit = gh.getInit(
-                layer.getAttr("convolution_param.bias_filler.type"),
-                layer.getAttr("convolution_param.bias_filler.value"));
-
-        if (weightInit != null || layer.getParams().size() >= 1) {
-            Map<String, String> param = layer.getParams().get(0);
-            out.append(
-                    generateVar("weight", layer.getName() + "_weight",
-                            param.get("param.lr_mult"), param.get("param.decay_mult"),
-                            weightInit, null)
-            );
-            st.add("weight", "weight");
-        }
-
-        if (biasInit != null || layer.getParams().size() >= 2) {
-            Map<String, String> param = layer.getParams().get(1);
-            out.append(
-                    generateVar("bias", layer.getName() + "_bias",
-                            param.get("param.lr_mult"), param.get("param.decay_mult"),
-                            biasInit, null)
-            );
-            st.add("bias", "bias");
-        }
-
-        out.append(st.render());
-        return new GeneratorOutput(out.toString(), 1);
-    }
-}
-
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/DeconvolutionGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/DeconvolutionGenerator.java
deleted file mode 100644
index 5e79fed70f4b..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/DeconvolutionGenerator.java
+++ /dev/null
@@ -1,103 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file DeconvolutionGenerator.java
- * \brief Generate Deconvolution layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-import java.util.Map;
-
-public class DeconvolutionGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        StringBuilder out = new StringBuilder();
-        ST st = getTemplate("deconvolution");
-        gh.fillNameDataAndVar(st, layer);
-
-        // Set kernel size
-        gh.simpleFillTemplate(st, "kernel_h", layer, "convolution_param.kernel_h", null,
-                "convolution_param.kernel_size");
-        gh.simpleFillTemplate(st, "kernel_w", layer, "convolution_param.kernel_w", null,
-                "convolution_param.kernel_size");
-
-        // Set stride
-        gh.simpleFillTemplate(st, "stride_h", layer, "convolution_param.stride_h", "1",
-                "convolution_param.stride");
-        gh.simpleFillTemplate(st, "stride_w", layer, "convolution_param.stride_w", "1",
-                "convolution_param.stride");
-
-        // Set padding
-        gh.simpleFillTemplate(st, "pad_h", layer, "convolution_param.pad_h", "0",
-                "convolution_param.pad");
-        gh.simpleFillTemplate(st, "pad_w", layer, "convolution_param.pad_w", "0",
-                "convolution_param.pad");
-
-        // Use bias?
-        if (layer.attrEquals("convolution_param.bias_term", "false")) {
-            st.add("no_bias", "NoBiasPlease");
-        }
-
-        // Number of channels in output
-        gh.simpleFillTemplate(st, "num_filter", layer, "convolution_param.num_output", null);
-
-        // Group
-        gh.simpleFillTemplate(st, "group", layer, "convolution_param.group", "PP_REMOVE");
-
-
-        // Custom weight and bias if needed
-        String weightInit = gh.getInit(
-                layer.getAttr("convolution_param.weight_filler.type"),
-                layer.getAttr("convolution_param.weight_filler.value"));
-
-        String biasInit = gh.getInit(
-                layer.getAttr("convolution_param.bias_filler.type"),
-                layer.getAttr("convolution_param.bias_filler.value"));
-
-        if (weightInit != null || layer.getParams().size() >= 1) {
-            Map<String, String> param = layer.getParams().get(0);
-            out.append(
-                    generateVar("weight", layer.getName() + "_weight",
-                            param.get("param.lr_mult"), param.get("param.decay_mult"),
-                            weightInit, null)
-            );
-            st.add("weight", "weight");
-        }
-
-        if (biasInit != null || layer.getParams().size() >= 2) {
-            Map<String, String> param = layer.getParams().get(1);
-            out.append(
-                    generateVar("bias", layer.getName() + "_bias",
-                            param.get("param.lr_mult"), param.get("param.decay_mult"),
-                            biasInit, null)
-            );
-        }
-
-        out.append(st.render());
-        return new GeneratorOutput(out.toString(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/DropoutGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/DropoutGenerator.java
deleted file mode 100644
index 198f3b0cc09a..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/DropoutGenerator.java
+++ /dev/null
@@ -1,43 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file DropoutGenerator.java
- * \brief Generate Dropout layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class DropoutGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        ST st = getTemplate("dropout");
-        gh.fillNameDataAndVar(st, layer);
-
-        gh.simpleFillTemplate(st, "prob", layer, "dropout_param.dropout_ratio", "0.5");
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/EltwiseGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/EltwiseGenerator.java
deleted file mode 100644
index edd5765ebeca..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/EltwiseGenerator.java
+++ /dev/null
@@ -1,69 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file EltwiseGenerator.java
- * \brief Generate Eltwise layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-import java.util.List;
-
-public class EltwiseGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        String operation = layer.getAttr("eltwise_param.operation");
-        if (operation == null) {
-            operation = "SUM";
-        }
-
-        ST st;
-        switch (operation) {
-            case "SUM":
-                st = getTemplate("add");
-                break;
-            case "PROD":
-                st = getTemplate("mul");
-                break;
-            case "MAX":
-                st = getTemplate("maximum");
-                break;
-            default:
-                String error = "Unrecognized operation " + operation + " in Eltwise" + System.lineSeparator();
-                System.err.print(error);
-                return new GeneratorOutput(error, 1);
-        }
-
-        st.add("name", layer.getName());
-        st.add("var", gh.getVarname(layer.getTop()));
-
-        List<String> data = gh.getVarNames(layer.getBottoms());
-        st.add("data1", data.get(0));
-        st.add("data2", data.get(1));
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/FCGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/FCGenerator.java
deleted file mode 100644
index 753b8743b45c..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/FCGenerator.java
+++ /dev/null
@@ -1,82 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file FCGenerator.java
- * \brief Generate fully connected layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-import java.util.Map;
-
-public class FCGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        StringBuilder out = new StringBuilder();
-
-        ST st = getTemplate("fc");
-
-        gh.fillNameDataAndVar(st, layer);
-
-        gh.simpleFillTemplate(st, "num", layer, "inner_product_param.num_output", null);
-
-        if (layer.attrEquals("inner_product_param.bias_term", "false")) {
-            st.add("no_bias", "NoBiasPlease"); //value doesn't matter
-        }
-
-        String weightInit = gh.getInit(
-                layer.getAttr("inner_product_param.weight_filler.type"),
-                layer.getAttr("inner_product_param.weight_filler.value"));
-
-        String biasInit = gh.getInit(
-                layer.getAttr("inner_product_param.bias_filler.type"),
-                layer.getAttr("inner_product_param.bias_filler.value"));
-
-        if (weightInit != null || layer.getParams().size() >= 1) {
-            Map<String, String> param = layer.getParams().get(0);
-            out.append(
-                    generateVar("weight", layer.getName() + "_weight",
-                            param.get("param.lr_mult"), param.get("param.decay_mult"),
-                            weightInit, null)
-            );
-            st.add("weight", "weight");
-        }
-
-        if (biasInit != null || layer.getParams().size() >= 2) {
-            Map<String, String> param = layer.getParams().get(1);
-            out.append(
-                    generateVar("bias", layer.getName() + "_bias",
-                            param.get("param.lr_mult"), param.get("param.decay_mult"),
-                            biasInit, null)
-            );
-            st.add("bias", "bias");
-        }
-
-        out.append(st.render());
-
-        return gh.makeGeneratorOutput(out.toString(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/FlattenGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/FlattenGenerator.java
deleted file mode 100644
index 5eb8a1336819..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/FlattenGenerator.java
+++ /dev/null
@@ -1,49 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file FlattenGenerator.java
- * \brief Generate flatten layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class FlattenGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-
-        ST st = getTemplate("flatten");
-        gh.fillNameDataAndVar(st, layer);
-
-        String axis = layer.getAttr("flatten_param.axis");
-        if (axis != null && Integer.valueOf(axis) != 1) {
-            String error = "Axis other that 1 is not supported for flatten" + System.lineSeparator();
-            System.err.println(error);
-            return new GeneratorOutput(error, 1);
-        }
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PermuteGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PermuteGenerator.java
deleted file mode 100644
index f7383bc3518d..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PermuteGenerator.java
+++ /dev/null
@@ -1,48 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file PermuteGenerator.java
- * \brief Generate Permute layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-import java.util.List;
-
-public class PermuteGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        ST st = getTemplate("permute");
-        gh.fillNameDataAndVar(st, layer);
-
-        List<String> axes = layer.getAttrList("permute_param.order");
-        if (axes != null) {
-            st.add("axes", axes);
-        }
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PluginIntLayerGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PluginIntLayerGenerator.java
deleted file mode 100644
index 048b5373ae9d..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PluginIntLayerGenerator.java
+++ /dev/null
@@ -1,80 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file PluginIntLayerGenerator.java
- * \brief Generate a layer using Caffe Plugin
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class PluginIntLayerGenerator extends BaseGenerator {
-
-    private PluginLayerHelper helper;
-
-
-    public PluginIntLayerGenerator() {
-        super();
-        helper = new PluginLayerHelper();
-    }
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        return generate(layer, model, 0);
-    }
-
-    public GeneratorOutput generate(Layer layer, MLModel model, int num_weight) {
-        ST st = getTemplate("CaffePluginIntLayer");
-
-        st.add("name", layer.getName());
-
-        if (layer.getBottoms().size() != 1) {
-            st.add("num_data", layer.getBottoms().size());
-        }
-        if (layer.getTops().size() != 1) {
-            st.add("num_out", layer.getTops().size());
-        }
-        if (num_weight != 0) {
-            st.add("num_weight", num_weight);
-        }
-
-        String dataList = helper.getDataList(layer);
-        st.add("data", dataList);
-
-        // Set prototxt
-        String prototxt = helper.makeOneLine(layer.getPrototxt());
-        st.add("prototxt", prototxt);
-
-        // Handle multiple outputs
-        if (layer.getTops().size() > 1) {
-            st.add("tops", layer.getTops());
-            st.add("var", "out");
-        } else if (layer.getTops().size() == 1) {
-            st.add("var", gh.getVarname(layer.getTop()));
-        }
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PluginLayerHelper.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PluginLayerHelper.java
deleted file mode 100644
index 3b8506c55b0f..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PluginLayerHelper.java
+++ /dev/null
@@ -1,63 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file PluginLayerHelper.java
- * \brief Helper class to generate layers using Caffe Plugin
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GenerationHelper;
-import io.mxnet.caffetranslator.Layer;
-
-public class PluginLayerHelper {
-
-    private final GenerationHelper gh;
-
-    public PluginLayerHelper() {
-        gh = new GenerationHelper();
-    }
-
-    public String getDataList(Layer layer) {
-        StringBuilder sb = new StringBuilder();
-        int index = 0;
-
-        if (layer.getBottoms().size() == 0) {
-            return null;
-        }
-
-        for (String bottom : layer.getBottoms()) {
-            sb.append("data_" + index + "=" + gh.getVarname(bottom) + ", ");
-            index++;
-        }
-        if (sb.length() > 0) {
-            sb.setLength(sb.length() - 2);
-        }
-        return sb.toString();
-    }
-
-    public String makeOneLine(String prototxt) {
-        prototxt = prototxt.replaceAll("\n", "").replaceAll("\r", "");
-        prototxt = prototxt.replaceAll("'", "\'");
-        prototxt = prototxt.replaceAll("\\s{2,}", " ").trim();
-        return prototxt;
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PluginLossGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PluginLossGenerator.java
deleted file mode 100644
index 5e98151977f9..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PluginLossGenerator.java
+++ /dev/null
@@ -1,69 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file PluginLossGenerator.java
- * \brief Generate loss layer using Caffe Plugin
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class PluginLossGenerator extends BaseGenerator {
-
-    private final PluginLayerHelper helper;
-
-    public PluginLossGenerator() {
-        super();
-        helper = new PluginLayerHelper();
-    }
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        ST st = getTemplate("CaffePluginLossLayer");
-
-        st.add("name", layer.getName());
-
-        // Handle data
-        if (layer.getBottoms().size() != 1) {
-            st.add("num_data", layer.getBottoms().size());
-        }
-        String dataList = helper.getDataList(layer);
-        st.add("data", dataList);
-
-        // Set prototxt
-        String prototxt = helper.makeOneLine(layer.getPrototxt());
-        st.add("prototxt", prototxt);
-
-        // Handle multiple outputs
-        if (layer.getTops().size() > 1) {
-            st.add("tops", layer.getTops());
-            st.add("var", "out");
-        } else if (layer.getTops().size() == 1) {
-            st.add("var", layer.getTop());
-        }
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PoolingGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PoolingGenerator.java
deleted file mode 100644
index ad91f58c8de5..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PoolingGenerator.java
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file PoolingGenerator.java
- * \brief Generate Pooling layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class PoolingGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        ST st = getTemplate("pooling");
-
-        gh.fillNameDataAndVar(st, layer);
-
-        boolean globalPooling = layer.getAttr("pooling_param.global_pooling", "false")
-                .toLowerCase().equals("true");
-
-        if (globalPooling) {
-            st.add("global_pool", "True");
-            st.add("kernel_h", "1");
-            st.add("kernel_w", "1");
-        } else {
-            // Set kernel size
-            gh.simpleFillTemplate(st, "kernel_h", layer, "pooling_param.kernel_h", null,
-                    "pooling_param.kernel_size");
-            gh.simpleFillTemplate(st, "kernel_w", layer, "pooling_param.kernel_w", null,
-                    "pooling_param.kernel_size");
-        }
-
-        // Set stride
-        gh.simpleFillTemplate(st, "stride_h", layer, "pooling_param.stride_h", "1",
-                "pooling_param.stride");
-        gh.simpleFillTemplate(st, "stride_w", layer, "pooling_param.stride_w", "1",
-                "pooling_param.stride");
-
-        // Set padding
-        gh.simpleFillTemplate(st, "pad_h", layer, "pooling_param.pad_h", "0",
-                "pooling_param.pad");
-        gh.simpleFillTemplate(st, "pad_w", layer, "pooling_param.pad_w", "0",
-                "pooling_param.pad");
-
-        // Set type
-        String poolType = layer.getAttr("pooling_param.pool");
-        switch (poolType) {
-            case "MAX":
-                st.add("type", "max");
-                break;
-            case "AVE":
-                st.remove("type");
-                st.add("type", "avg");
-                break;
-            case "STOCHASTIC":
-                System.err.println("Stochastic pooling type not supported.");
-                st.add("type", "???");
-                break;
-        }
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PowerGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PowerGenerator.java
deleted file mode 100644
index d4676503ae82..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/PowerGenerator.java
+++ /dev/null
@@ -1,51 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file PowerGenerator.java
- * \brief Generate Power layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class PowerGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        ST st = getTemplate("power");
-
-        String power = layer.getAttr("power_param.power", "1");
-        String scale = layer.getAttr("power_param.scale", "1");
-        String shift = layer.getAttr("power_param.shift", "0");
-
-        st.add("var", gh.getVarname(layer.getTop()));
-        st.add("data", gh.getVarname(layer.getBottom()));
-
-        st.add("power", power);
-        st.add("scale", scale);
-        st.add("shift", shift);
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ReluGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ReluGenerator.java
deleted file mode 100644
index 37ac9a8cf4c9..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ReluGenerator.java
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file ReluGenerator.java
- * \brief Generate Relu layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class ReluGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        ST st = getTemplate("activation");
-
-        gh.fillNameDataAndVar(st, layer);
-        st.add("type", "relu");
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ScaleGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ScaleGenerator.java
deleted file mode 100644
index fc919e3306c7..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/ScaleGenerator.java
+++ /dev/null
@@ -1,66 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file ScaleGenerator.java
- * \brief Generate Scale layer
- */
-
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-
-public class ScaleGenerator extends BaseGenerator {
-
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        PluginIntLayerGenerator generator = new PluginIntLayerGenerator();
-
-        boolean use_bias = layer.getAttr("scale_param.bias_term", "false").toLowerCase().equals("true");
-
-        StringBuilder out = new StringBuilder();
-
-        if (use_bias) {
-            out.append(generator.generate(layer, model, 2).code);
-        } else {
-            out.append(generator.generate(layer, model, 1).code);
-        }
-
-        String fillerType = layer.getAttr("filler.type");
-        String fillerValue = layer.getAttr("filler.value");
-        if (fillerType == null && fillerValue == null) {
-            fillerValue = "1";
-        }
-        out.append(gh.initializeParam(gh.getVarname(layer.getTop()), 1, gh.getInit(fillerType, fillerValue)));
-
-        if (use_bias) {
-            fillerType = layer.getAttr("bias_filler.type");
-            fillerValue = layer.getAttr("bias_filler.value");
-            if (fillerType == null && fillerValue == null) {
-                fillerValue = "0";
-            }
-            out.append(gh.initializeParam(gh.getVarname(layer.getTop()), 2, gh.getInit(fillerType, fillerValue)));
-        }
-
-        return gh.makeGeneratorOutput(out.toString(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/SoftmaxOutputGenerator.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/SoftmaxOutputGenerator.java
deleted file mode 100644
index a017e4f6c64e..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/generators/SoftmaxOutputGenerator.java
+++ /dev/null
@@ -1,43 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file SoftmaxOutputGenerator.java
- * \brief Generate SoftmaxOutput layer
- */
-
-package io.mxnet.caffetranslator.generators;
-
-import io.mxnet.caffetranslator.GeneratorOutput;
-import io.mxnet.caffetranslator.Layer;
-import io.mxnet.caffetranslator.MLModel;
-import org.stringtemplate.v4.ST;
-
-public class SoftmaxOutputGenerator extends BaseGenerator {
-    @Override
-    public GeneratorOutput generate(Layer layer, MLModel model) {
-        ST st = getTemplate("softmaxoutput");
-        gh.fillNameDataAndVar(st, layer);
-
-        st.add("label", gh.getVarname(layer.getBottoms().get(1)));
-        st.add("label_name", layer.getBottoms().get(1));
-
-        return new GeneratorOutput(st.render(), 1);
-    }
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/misc/CollectStats.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/misc/CollectStats.java
deleted file mode 100644
index e38c2d06d9a8..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/misc/CollectStats.java
+++ /dev/null
@@ -1,73 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file CollectStats.java
- * \brief Print all unique layers used in a prototxt along with the parameters used in each layer type.
- */
-
-package io.mxnet.caffetranslator.misc;
-
-import io.mxnet.caffetranslator.CaffePrototxtLexer;
-import io.mxnet.caffetranslator.CaffePrototxtParser;
-import org.antlr.v4.runtime.CharStream;
-import org.antlr.v4.runtime.CharStreams;
-import org.antlr.v4.runtime.CommonTokenStream;
-
-import java.io.File;
-import java.io.FileInputStream;
-import java.nio.charset.StandardCharsets;
-import java.util.Iterator;
-import java.util.Map;
-import java.util.Set;
-
-public class CollectStats {
-
-    public static void main(String arsg[]) {
-        String filePath = "path";
-
-        CharStream cs = null;
-        try {
-            FileInputStream fis = new FileInputStream(new File(filePath));
-            cs = CharStreams.fromStream(fis, StandardCharsets.UTF_8);
-        } catch (Exception e) {
-            e.printStackTrace();
-        }
-
-        CaffePrototxtLexer lexer = new CaffePrototxtLexer(cs);
-        CommonTokenStream tokens = new CommonTokenStream(lexer);
-        CaffePrototxtParser parser = new CaffePrototxtParser(tokens);
-
-        StatsListener statsListener = new StatsListener();
-        parser.addParseListener(statsListener);
-        parser.prototxt();
-
-        Map<String, Set<String>> attrMap = statsListener.getAttrMap();
-
-        Iterator it = attrMap.entrySet().iterator();
-        while (it.hasNext()) {
-            Map.Entry<String, Set<String>> pair = (Map.Entry) it.next();
-            System.out.println(pair.getKey() + ":");
-            for (String value : pair.getValue()) {
-                System.out.println("    " + value);
-            }
-        }
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/misc/StatsListener.java b/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/misc/StatsListener.java
deleted file mode 100644
index 71f966228184..000000000000
--- a/tools/caffe_translator/src/main/java/io/mxnet/caffetranslator/misc/StatsListener.java
+++ /dev/null
@@ -1,103 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*!
- * \file StatsListener.java
- * \brief ANTLR listener to collect stats used by CollectStats.java
- */
-
-package io.mxnet.caffetranslator.misc;
-
-import io.mxnet.caffetranslator.CaffePrototxtBaseListener;
-import io.mxnet.caffetranslator.CaffePrototxtParser;
-import io.mxnet.caffetranslator.ParserHelper;
-import lombok.Getter;
-
-import java.util.Map;
-import java.util.Set;
-import java.util.Stack;
-import java.util.TreeMap;
-import java.util.TreeSet;
-
-public class StatsListener extends CaffePrototxtBaseListener {
-
-    private final Stack<String> keys;
-    @Getter
-    private final Map<String, Set<String>> attrMap;
-    private final ParserHelper parserHelper;
-
-    private String layerType;
-    private Set<String> curAttr;
-
-    public StatsListener() {
-        attrMap = new TreeMap<>();
-        keys = new Stack<>();
-        parserHelper = new ParserHelper();
-    }
-
-    @Override
-    public void enterLayer(CaffePrototxtParser.LayerContext ctx) {
-        keys.clear();
-        curAttr = new TreeSet<>();
-    }
-
-    @Override
-    public void exitLayer(CaffePrototxtParser.LayerContext ctx) {
-        if (!attrMap.containsKey(layerType)) {
-            attrMap.put(layerType, new TreeSet<>());
-        }
-        Set<String> set = attrMap.get(layerType);
-        set.addAll(curAttr);
-    }
-
-    @Override
-    public void exitValueLeaf(CaffePrototxtParser.ValueLeafContext ctx) {
-        String value = ctx.getText();
-        value = parserHelper.removeQuotes(value);
-        processKeyValue(getCurrentKey(), value);
-    }
-
-    private void processKeyValue(String key, String value) {
-        if (key.equals("type")) {
-            layerType = value;
-        } else {
-            curAttr.add(key);
-        }
-    }
-
-    @Override
-    public void enterPair(CaffePrototxtParser.PairContext ctx) {
-        String key = ctx.getStart().getText();
-        keys.push(key);
-    }
-
-    @Override
-    public void exitPair(CaffePrototxtParser.PairContext ctx) {
-        keys.pop();
-    }
-
-    private String getCurrentKey() {
-        StringBuilder sb = new StringBuilder();
-        for (String s : keys) {
-            sb.append(s + ".");
-        }
-        return sb.substring(0, sb.length() - 1).toString();
-    }
-
-}
diff --git a/tools/caffe_translator/src/main/resources/templates/accuracy.st b/tools/caffe_translator/src/main/resources/templates/accuracy.st
deleted file mode 100644
index cbe15f631715..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/accuracy.st
+++ /dev/null
@@ -1,20 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.metric.Accuracy(output_names=['<output_name>'], label_names=['<label_name>'], name='<name>')
-test_metrics.add(<var>)
diff --git a/tools/caffe_translator/src/main/resources/templates/activation.st b/tools/caffe_translator/src/main/resources/templates/activation.st
deleted file mode 100644
index 042c2e31754e..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/activation.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.symbol.Activation(data=<data>, act_type='<type>', name='<name>')
diff --git a/tools/caffe_translator/src/main/resources/templates/add.st b/tools/caffe_translator/src/main/resources/templates/add.st
deleted file mode 100644
index 738ac3e562c3..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/add.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = <data1> + <data2>
diff --git a/tools/caffe_translator/src/main/resources/templates/batchnorm.st b/tools/caffe_translator/src/main/resources/templates/batchnorm.st
deleted file mode 100644
index 7f2326d914b3..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/batchnorm.st
+++ /dev/null
@@ -1,32 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<if(fix_beta)>
-<var>_beta = mx.sym.BlockGrad(mx.sym.Variable("<name>_beta", init=mx.init.Constant(0)))
-<endif>
-<var> = mx.symbol.BatchNorm(data=<data>,
-<if(fix_beta)>
-    beta=<var>_beta,
-<endif>
-<if(fix_gamma)>
-    fix_gamma=True,
-<endif>
-<if(use_global_stats)>
-    use_global_stats=True,
-<endif>
-    name='<name>')
diff --git a/tools/caffe_translator/src/main/resources/templates/concat.st b/tools/caffe_translator/src/main/resources/templates/concat.st
deleted file mode 100644
index 3f332751b8d9..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/concat.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.concat(<data;separator=", "><if(dim)>, dim=<dim><endif>, name='<name>');
diff --git a/tools/caffe_translator/src/main/resources/templates/convolution.st b/tools/caffe_translator/src/main/resources/templates/convolution.st
deleted file mode 100644
index c167217ad951..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/convolution.st
+++ /dev/null
@@ -1,27 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.Convolution(data=<data>,
-    <if(weight)>weight=<weight>,<endif>
-    <if(bias)>bias=<bias>,<endif>
-    kernel=(<kernel_h>,<kernel_w>),
-    stride=(<stride_h>,<stride_w>),
-    pad=(<pad_h>,<pad_w>),
-    num_filter=<num_filter>,
-    <if(no_bias)>no_bias=True,<endif>
-    name='<name>')
diff --git a/tools/caffe_translator/src/main/resources/templates/deconvolution.st b/tools/caffe_translator/src/main/resources/templates/deconvolution.st
deleted file mode 100644
index 67483b91ff1a..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/deconvolution.st
+++ /dev/null
@@ -1,28 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.Deconvolution(data=<data>,
-    <if(use_weight)>weight=weight,<endif>
-    <if(use_bias)>bias=bias,<endif>
-    kernel=(<kernel_h>,<kernel_w>),
-    stride=(<stride_h>,<stride_w>),
-    pad=(<pad_h>,<pad_w>),
-    num_filter=<num_filter>,
-    num_group=<group>,
-    <if(no_bias)>no_bias=True,<endif>
-    name='<name>')
diff --git a/tools/caffe_translator/src/main/resources/templates/dropout.st b/tools/caffe_translator/src/main/resources/templates/dropout.st
deleted file mode 100644
index ed28dc781a24..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/dropout.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.Dropout(data=<data>, p=<prob>, name='<name>')
diff --git a/tools/caffe_translator/src/main/resources/templates/fc.st b/tools/caffe_translator/src/main/resources/templates/fc.st
deleted file mode 100644
index 353b4245ce56..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/fc.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.symbol.FullyConnected(data=<data>, <if(weight)>weight=<weight>, <endif><if(bias)>bias=<bias>, <endif>num_hidden=<num>, <if(no_bias)>no_bias=True, <endif>name='<name>')
diff --git a/tools/caffe_translator/src/main/resources/templates/flatten.st b/tools/caffe_translator/src/main/resources/templates/flatten.st
deleted file mode 100644
index 2ee6ffae7b68..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/flatten.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.flatten(data=<data>, name='<name>')
diff --git a/tools/caffe_translator/src/main/resources/templates/group.st b/tools/caffe_translator/src/main/resources/templates/group.st
deleted file mode 100644
index 9cadf656699f..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/group.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.Group([<symbols;separator=", ">]);
diff --git a/tools/caffe_translator/src/main/resources/templates/imports.st b/tools/caffe_translator/src/main/resources/templates/imports.st
deleted file mode 100644
index da03a64ed7e6..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/imports.st
+++ /dev/null
@@ -1,25 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-from __future__ import division
-import copy
-import logging
-import math
-import sys
-
-import mxnet as mx
diff --git a/tools/caffe_translator/src/main/resources/templates/init_params.st b/tools/caffe_translator/src/main/resources/templates/init_params.st
deleted file mode 100644
index 7c8d7b0677d4..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/init_params.st
+++ /dev/null
@@ -1,25 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<if(params_file)>
-arg_params, aux_params = load_params('<params_file>')
-module.init_params(initializer=mx.init.Xavier(), arg_params=arg_params, aux_params=aux_params,
-                   allow_missing=True)
-<else>
-module.init_params(initializer=mx.init.Xavier())
-<endif>
diff --git a/tools/caffe_translator/src/main/resources/templates/iterator.st b/tools/caffe_translator/src/main/resources/templates/iterator.st
deleted file mode 100644
index d60897914419..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/iterator.st
+++ /dev/null
@@ -1,28 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<iter_name> = mx.io.CaffeDataIter(
-    prototxt =
-<prototxt>,
-    data_name='<data_name>',
-    label_name='<label_name>',
-<if(num_examples)>
-    num_examples=<num_examples>,
-<endif>
-    flat = False
-)
diff --git a/tools/caffe_translator/src/main/resources/templates/logging.st b/tools/caffe_translator/src/main/resources/templates/logging.st
deleted file mode 100644
index cc94872726f5..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/logging.st
+++ /dev/null
@@ -1,29 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-def get_logger(name):
-    formatter = logging.Formatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
-                                  datefmt='%Y-%m-%d %H:%M:%S')
-    stdout_handler = logging.StreamHandler(stream=sys.stdout)
-    stdout_handler.setFormatter(formatter)
-    logger = logging.getLogger(name)
-    logger.setLevel(logging.DEBUG)
-    logger.addHandler(stdout_handler)
-    return logger
-
-logger = get_logger("<name>")
diff --git a/tools/caffe_translator/src/main/resources/templates/lrn.st b/tools/caffe_translator/src/main/resources/templates/lrn.st
deleted file mode 100644
index b67989884dae..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/lrn.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.LRN(data=<data>, alpha=<alpha>, beta=<beta>, knorm=<knorm>, nsize=<nsize>, name=<name>)
diff --git a/tools/caffe_translator/src/main/resources/templates/lrpolicy_exp.st b/tools/caffe_translator/src/main/resources/templates/lrpolicy_exp.st
deleted file mode 100644
index 03daae3564bd..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/lrpolicy_exp.st
+++ /dev/null
@@ -1,21 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-lr = optimizer_params['learning_rate']
-lr *= gamma
-optimizer_params['learning_rate'] = lr
diff --git a/tools/caffe_translator/src/main/resources/templates/lrpolicy_inv.st b/tools/caffe_translator/src/main/resources/templates/lrpolicy_inv.st
deleted file mode 100644
index e62c2d3a2251..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/lrpolicy_inv.st
+++ /dev/null
@@ -1,21 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-lr = optimizer_params['learning_rate']
-lr = base_lr * math.pow((1 + gamma * batch_num), -power)
-optimizer_params['learning_rate'] = lr
diff --git a/tools/caffe_translator/src/main/resources/templates/lrpolicy_multistep.st b/tools/caffe_translator/src/main/resources/templates/lrpolicy_multistep.st
deleted file mode 100644
index 07619087f4a7..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/lrpolicy_multistep.st
+++ /dev/null
@@ -1,23 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-lr_update_steps = [<steps;separator=", ">]
-if(batch_num in lr_update_steps):
-    lr = optimizer_params['learning_rate']
-    lr *= gamma
-    optimizer_params['learning_rate'] = lr
diff --git a/tools/caffe_translator/src/main/resources/templates/lrpolicy_poly.st b/tools/caffe_translator/src/main/resources/templates/lrpolicy_poly.st
deleted file mode 100644
index d62c64bf971d..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/lrpolicy_poly.st
+++ /dev/null
@@ -1,21 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-lr = optimizer_params['learning_rate']
-lr = math.pow(base_lr * (1 - batch_num/max_iter), power)
-optimizer_params['learning_rate'] = lr
diff --git a/tools/caffe_translator/src/main/resources/templates/lrpolicy_sigmoid.st b/tools/caffe_translator/src/main/resources/templates/lrpolicy_sigmoid.st
deleted file mode 100644
index f44ab5a9b457..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/lrpolicy_sigmoid.st
+++ /dev/null
@@ -1,21 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-lr = optimizer_params['learning_rate']
-lr = base_lr * ( 1/(1 + math.exp(-gamma * (batch_num - stepsize))))
-optimizer_params['learning_rate'] = lr
diff --git a/tools/caffe_translator/src/main/resources/templates/lrpolicy_step.st b/tools/caffe_translator/src/main/resources/templates/lrpolicy_step.st
deleted file mode 100644
index 1f3d975d77fe..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/lrpolicy_step.st
+++ /dev/null
@@ -1,22 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-if(batch_num % stepsize == 0):
-    lr = optimizer_params['learning_rate']
-    lr *= gamma
-    optimizer_params['learning_rate'] = lr
diff --git a/tools/caffe_translator/src/main/resources/templates/maxium.st b/tools/caffe_translator/src/main/resources/templates/maxium.st
deleted file mode 100644
index 9b18246c6ba0..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/maxium.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.maximum(<data1>, <data2>)
diff --git a/tools/caffe_translator/src/main/resources/templates/metrics_classes.st b/tools/caffe_translator/src/main/resources/templates/metrics_classes.st
deleted file mode 100644
index e586616c5f7e..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/metrics_classes.st
+++ /dev/null
@@ -1,104 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-class TrainMetrics():
-
-    metric_map = {}
-
-    def __init__(self, display=None, average_loss=1):
-        self.average_loss = average_loss
-        self.display = display
-
-
-    def process(self, batch_num, module, label):
-        if self.display == None:
-            return
-
-        if self.average_loss == 1:
-            if batch_num % self.display == 0:
-                self.update_metrics(module, label, reset=True)
-                self.print_metrics(batch_num)
-        else:
-            # Metrics must be print 'average_loss' iterations from now.
-            # Append a metric which will get updated starting now.
-            if((batch_num + self.average_loss) % self.display == 0):
-                self.append_one()
-
-            # Less that 'average_loss' iteration away from a display step. Update metrics.
-            if((batch_num + self.average_loss) % self.display \< self.average_loss):
-                self.update_metrics(module, label)
-
-            # At display step. Print metrics.
-            if(batch_num % self.display == 0):
-                self.print_metrics(batch_num, remove_heads=True)
-
-    def add(self, metric):
-        self.metric_map[metric.name] = [metric]
-
-    def append_one(self):
-        for key, lst in self.metric_map.iteritems():
-            last_element = lst[-1]
-            new_element = copy.deepcopy(last_element)
-            new_element.reset()
-            lst.append(new_element)
-
-    def update_metrics(self, module, label, reset=False):
-        for key, lst in self.metric_map.iteritems():
-            for metric in lst:
-                if reset:
-                    metric.reset()
-                module.update_metric(metric, label)
-
-    def print_metrics(self, batch_num, remove_heads=False):
-
-        total_loss = 0
-        for key, lst in self.metric_map.iteritems():
-                total_loss += lst[0].get()[1]
-
-        logger.info("Iteration %d, loss = %f" % (batch_num, total_loss))
-
-        for key, lst in self.metric_map.iteritems():
-            if remove_heads:
-                metric = lst.pop(0)
-            else:
-                metric = lst[0]
-
-            logger.info("    %s" % metric)
-
-
-class TestMetrics():
-
-    metrics = []
-
-    def add(self, metric):
-        self.metrics.append(metric)
-
-    def score_and_print(self, module, itr, num_batch):
-        for metric in self.metrics:
-            metric.reset()
-            module.score(itr, metric, num_batch=num_batch)
-            logger.info("    %s" % metric)
-
-<if(display)>
-display = <display>
-<endif>
-<if(average_loss)>
-average_loss = <average_loss>
-<endif>
-train_metrics = TrainMetrics(<if(display)>display=display<endif><if(average_loss)>, average_loss=average_loss<endif>)
-test_metrics = TestMetrics()
diff --git a/tools/caffe_translator/src/main/resources/templates/mul.st b/tools/caffe_translator/src/main/resources/templates/mul.st
deleted file mode 100644
index 59c4837c8371..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/mul.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = <data1> * (<data2>)
diff --git a/tools/caffe_translator/src/main/resources/templates/opt_adadelta.st b/tools/caffe_translator/src/main/resources/templates/opt_adadelta.st
deleted file mode 100644
index cfd465b5f430..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/opt_adadelta.st
+++ /dev/null
@@ -1,32 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<opt_vars(solver)>
-<if(solver.momentum)>
-rho = <solver.momentum>
-<endif>
-<if(solver.delta)>
-epsilon = <solver.delta>
-<endif>
-
-optimizer_params={'learning_rate':base_lr<\\>
-<if(solver.wd)>, 'wd':wd<endif><\\>
-<if(solver.momentum)>, 'rho':rho<endif><\\>
-<if(solver.delta)>, 'epsilon':epsilon<endif>}<\\>
-
-module.init_optimizer(optimizer='AdaDelta', optimizer_params=optimizer_params)
diff --git a/tools/caffe_translator/src/main/resources/templates/opt_adagrad.st b/tools/caffe_translator/src/main/resources/templates/opt_adagrad.st
deleted file mode 100644
index 527cedf6f87c..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/opt_adagrad.st
+++ /dev/null
@@ -1,28 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<opt_vars(solver)>
-<if(solver.delta)>
-epsilon = <solver.delta>
-<endif>
-
-optimizer_params={'learning_rate':base_lr<\\>
-<if(solver.wd)>, 'wd':wd<endif><\\>
-<if(solver.delta)>, 'epsilon':epsilon<endif>}<\\>
-
-module.init_optimizer(optimizer='AdaGrad', optimizer_params=optimizer_params)
diff --git a/tools/caffe_translator/src/main/resources/templates/opt_adam.st b/tools/caffe_translator/src/main/resources/templates/opt_adam.st
deleted file mode 100644
index b0a8ca368716..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/opt_adam.st
+++ /dev/null
@@ -1,36 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<opt_vars(solver)>
-<if(solver.momentum)>
-beta1 = <solver.momentum>
-<endif>
-<if(solver.momentum2)>
-beta2 = <solver.momentum2>
-<endif>
-<if(solver.delta)>
-epsilon = <solver.delta>
-<endif>
-
-optimizer_params={'learning_rate':base_lr<\\>
-<if(solver.wd)>, 'wd':swd<endif><\\>
-<if(solver.momentum)>, 'beta1':beta1<endif><\\>
-<if(solver.momentum2)>, 'beta2':beta2<endif><\\>
-<if(solver.delta)>, 'epsilon':epsilon<endif>}<\\>
-
-module.init_optimizer(optimizer='Adam', optimizer_params=optimizer_params)
diff --git a/tools/caffe_translator/src/main/resources/templates/opt_nesterov.st b/tools/caffe_translator/src/main/resources/templates/opt_nesterov.st
deleted file mode 100644
index 6262d4867593..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/opt_nesterov.st
+++ /dev/null
@@ -1,28 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<opt_vars(solver)>
-<if(solver.momentum)>
-momentum = <solver.momentum>
-<endif>
-
-optimizer_params={'learning_rate':base_lr<\\>
-<if(solver.wd)>, 'wd':wd<endif><\\>
-<if(solver.momentum)>, 'momentum':momentum<endif>}<\\>
-
-module.init_optimizer(optimizer='NAG', optimizer_params=optimizer_params)
diff --git a/tools/caffe_translator/src/main/resources/templates/opt_rmsprop.st b/tools/caffe_translator/src/main/resources/templates/opt_rmsprop.st
deleted file mode 100644
index 1ace86d36f61..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/opt_rmsprop.st
+++ /dev/null
@@ -1,32 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<opt_vars(solver)>
-<if(solver.rms_decay)>
-rho = <solver.rms_decay>
-<endif>
-<if(solver.delta)>
-epsilon = <solver.delta>
-<endif>
-
-optimizer_params={'learning_rate':base_lr<\\>
-<if(solver.wd)>, 'wd':wd<endif><\\>
-<if(solver.rms_decay)>, 'rho':rho<endif><\\>
-<if(solver.delta)>, 'epsilon':epsilon<endif>}<\\>
-
-module.init_optimizer(optimizer='RMSProp', optimizer_params=optimizer_params)
diff --git a/tools/caffe_translator/src/main/resources/templates/opt_sgd.st b/tools/caffe_translator/src/main/resources/templates/opt_sgd.st
deleted file mode 100644
index aa547a6141b0..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/opt_sgd.st
+++ /dev/null
@@ -1,28 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<opt_vars(solver)>
-<if(solver.momentum)>
-momentum = <solver.momentum>
-<endif>
-
-optimizer_params={'learning_rate':base_lr<\\>
-<if(solver.wd)>, 'wd':wd<endif><\\>
-<if(solver.momentum)>, 'momentum':momentum<endif>}<\\>
-
-module.init_optimizer(optimizer='SGD', optimizer_params=optimizer_params)
diff --git a/tools/caffe_translator/src/main/resources/templates/opt_vars.st b/tools/caffe_translator/src/main/resources/templates/opt_vars.st
deleted file mode 100644
index 19b2f4cc6c45..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/opt_vars.st
+++ /dev/null
@@ -1,24 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<if(solver.base_lr)>
-base_lr = <solver.base_lr>
-<endif>
-<if(solver.wd)>
-wd = <solver.wd>
-<endif>
\ No newline at end of file
diff --git a/tools/caffe_translator/src/main/resources/templates/param_initializer.st b/tools/caffe_translator/src/main/resources/templates/param_initializer.st
deleted file mode 100644
index abad5daeb1ba..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/param_initializer.st
+++ /dev/null
@@ -1,30 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-class ParamInitializer():
-    lst_patterns = []
-    lst_initializers = []
-
-    def add_param(self, pattern, initializer):
-        self.lst_patterns.append(pattern)
-        self.lst_initializers.append(initializer)
-
-    def get_initializer(self, default_initializer):
-        self.lst_patterns.append(".*")
-        self.lst_initializers.append(default_initializer)
-        return mx.initializer.Mixed(self.lst_patterns, self.lst_initializers)
diff --git a/tools/caffe_translator/src/main/resources/templates/params_loader.st b/tools/caffe_translator/src/main/resources/templates/params_loader.st
deleted file mode 100644
index c124c986d6f9..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/params_loader.st
+++ /dev/null
@@ -1,31 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-def load_params(params_file):
-    save_dict = mx.nd.load(params_file)
-    arg_params = {}
-    aux_params = {}
-    for k, value in save_dict.items():
-        arg_type, name = k.split(':', 1)
-        if arg_type == 'arg':
-            arg_params[name] = value
-        elif arg_type == 'aux':
-            aux_params[name] = value
-        else:
-            raise ValueError("Invalid param file " + fname)
-    return arg_params, aux_params
diff --git a/tools/caffe_translator/src/main/resources/templates/permute.st b/tools/caffe_translator/src/main/resources/templates/permute.st
deleted file mode 100644
index 9f94bdbf6c80..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/permute.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.transpose(data=<data>, axes=(<axes;separator=", ">), name='<name>')
diff --git a/tools/caffe_translator/src/main/resources/templates/pooling.st b/tools/caffe_translator/src/main/resources/templates/pooling.st
deleted file mode 100644
index 7aceffdf0e4f..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/pooling.st
+++ /dev/null
@@ -1,34 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.symbol.Pooling(data=<data>,
-    pool_type='<type>',
-<if(global_pool)>
-    global_pool=<global_pool>,
-<endif>
-<if(kernel_h)>
-    kernel=(<kernel_h>,<kernel_w>),
-<endif>
-<if(stride_h)>
-    stride=(<stride_h>,<stride_w>),
-<endif>
-<if(pad_h)>
-    pad=(<pad_h>,<pad_w>),
-<endif>
-    pooling_convention='full',
-    name='<name>')
diff --git a/tools/caffe_translator/src/main/resources/templates/power.st b/tools/caffe_translator/src/main/resources/templates/power.st
deleted file mode 100644
index 7fe3ee8eff47..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/power.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = (<shift> + (<scale> * <data>)) ** <power>
diff --git a/tools/caffe_translator/src/main/resources/templates/runner.st b/tools/caffe_translator/src/main/resources/templates/runner.st
deleted file mode 100644
index 8346ffe22b1c..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/runner.st
+++ /dev/null
@@ -1,75 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-ctx = <ctx>
-
-module = mx.mod.Module(symbol=<loss>, context=ctx, data_names=[<data_names;separator=", ">], label_names=[<label_names;separator=", ">])
-module.bind(data_shapes=<train_data_itr>.provide_data,
-            label_shapes=<train_data_itr>.provide_label)
-
-<init_params>
-
-<init_optimizer>
-
-epoch = 1
-batch_num = 1
-
-max_iter = <max_iter>
-snapshot = <snapshot>
-test_interval = <test_interval>
-test_iter = <test_iter>
-
-while batch_num \<= max_iter:
-    <train_data_itr>.reset()
-
-    for batch in <train_data_itr>:
-        module.forward(data_batch=batch, is_train=True)
-        module.backward()
-        module.update()
-
-        train_metrics.process(batch_num, module, batch.label)
-
-        if(batch_num % test_interval == 0):
-            logger.info("Iteration %d, Testing net" % batch_num)
-            test_metrics.score_and_print(module, <test_data_itr>, num_batch=test_iter)
-
-        if(batch_num % snapshot == 0):
-            # write snapshot
-            module.save_checkpoint(prefix="<snapshot_prefix>", epoch=batch_num, save_optimizer_states=True)
-
-        batch_num += 1
-
-        if batch_num > max_iter:
-            break
-
-<if(stepsize)>
-        stepsize = <stepsize>
-<endif>
-<if(gamma)>
-        gamma = <gamma>
-<endif>
-<if(power)>
-        power = <power>
-<endif>
-<lr_update>
-
-    epoch += 1
-
-
-logger.info("Training done. Saving model to <snapshot_prefix>")
-module.save_checkpoint(prefix="<snapshot_prefix>", epoch=batch_num, save_optimizer_states=True)
diff --git a/tools/caffe_translator/src/main/resources/templates/softmaxoutput.st b/tools/caffe_translator/src/main/resources/templates/softmaxoutput.st
deleted file mode 100644
index 57a8e719397b..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/softmaxoutput.st
+++ /dev/null
@@ -1,21 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.SoftmaxOutput(data=<data>, label=<label>, name='<name>')
-<var>_metric = mx.metric.CrossEntropy(output_names=['<name>_output'], label_names=['<label_name>'], name='<name>/metric')
-train_metrics.add(<var>_metric)
diff --git a/tools/caffe_translator/src/main/resources/templates/symbols.stg b/tools/caffe_translator/src/main/resources/templates/symbols.stg
deleted file mode 100644
index 2a76eb089cf7..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/symbols.stg
+++ /dev/null
@@ -1,25 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-CaffePluginIntLayer(var, tops, num_data, num_weight, num_out, data, prototxt, name) ::= "<var> = mx.symbol.CaffeOp(<if(data)><data>, <endif><if(num_data)>num_data=<num_data>, <endif><if(num_out)>num_out=<num_out>, <endif><if(num_weight)>num_weight=<num_weight>, <endif>prototxt='<prototxt>', name='<name>')
-<if(tops)><tops:{top|<top_assign(top, var, i0)>};separator=\"\n\"> <endif>"
-
-CaffePluginLossLayer(var, tops, num_data, data, prototxt, name) ::= "<var> = mx.symbol.CaffeLoss(<data><if(num_data)>, num_data=<num_data><endif>, prototxt='<prototxt>', name='<name>')
-<if(tops)><tops:{top|<top_assign(top, var, i0)>};separator=\"\n\"> <endif>"
-
-top_assign(top, var, index) ::= "<top> = <var>[<index>]"
diff --git a/tools/caffe_translator/src/main/resources/templates/top_k_accuracy.st b/tools/caffe_translator/src/main/resources/templates/top_k_accuracy.st
deleted file mode 100644
index 29a713fc9a74..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/top_k_accuracy.st
+++ /dev/null
@@ -1,20 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.metric.TopKAccuracy(top_k=<k>, output_names=['<output_name>'], label_names=['<label_name>'], name='<name>')
-test_metrics.add(<var>)
diff --git a/tools/caffe_translator/src/main/resources/templates/var.st b/tools/caffe_translator/src/main/resources/templates/var.st
deleted file mode 100644
index fa08cd7350fe..000000000000
--- a/tools/caffe_translator/src/main/resources/templates/var.st
+++ /dev/null
@@ -1,19 +0,0 @@
-<!
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements.  See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership.  The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied.  See the License for the
- specific language governing permissions and limitations
- under the License.
-!>
-<var> = mx.sym.Variable('<name>'<if(lr_mult)>, lr_mult=<lr_mult><endif><if(wd_mult)>, wd_mult=<wd_mult><endif><if(init)>, init=<init><endif><if(shape)>, shape=(<shape;separator=", ">)<endif>)
diff --git a/tools/coreml/README.md b/tools/coreml/README.md
deleted file mode 100644
index 31982babea78..000000000000
--- a/tools/coreml/README.md
+++ /dev/null
@@ -1,128 +0,0 @@
-<!--- Licensed to the Apache Software Foundation (ASF) under one -->
-<!--- or more contributor license agreements.  See the NOTICE file -->
-<!--- distributed with this work for additional information -->
-<!--- regarding copyright ownership.  The ASF licenses this file -->
-<!--- to you under the Apache License, Version 2.0 (the -->
-<!--- "License"); you may not use this file except in compliance -->
-<!--- with the License.  You may obtain a copy of the License at -->
-
-<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
-
-<!--- Unless required by applicable law or agreed to in writing, -->
-<!--- software distributed under the License is distributed on an -->
-<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
-<!--- KIND, either express or implied.  See the License for the -->
-<!--- specific language governing permissions and limitations -->
-<!--- under the License. -->
-
-# Convert MXNet models into Apple CoreML format.
-
-This tool helps convert MXNet models into [Apple CoreML](https://developer.apple.com/documentation/coreml) format which can then be run on Apple devices.
-
-## Installation
-In order to use this tool you need to have these:
-* MacOS - 10.11 (El Capitan) or higher (for running inferences on the converted model MacOS 10.13 or higher (for phones: iOS 11 or above) is needed)
-* python 2.7
-* mxnet-to-coreml tool: 
-
-```bash
-pip install mxnet-to-coreml
-```
-
-## How to use
-Let's say you want to use your MXNet model in an iPhone App. For the purpose of this example, let's assume it is a squeezenet-v1.1 model.
-
-1. Download the model into the directory where this converter resides. Squeezenet can be downloaded from [here](http://data.mxnet.io/models/imagenet/squeezenet/). The synset.txt file which contains all the class-labels and can be downloaded from [here](http://data.mxnet.io/models/imagenet/synset.txt).
-2. Run this command:
-
-  ```bash
-mxnet_coreml_converter.py --model-prefix='squeezenet_v1.1' --epoch=0 --input-shape='{"data":"3,227,227"}' --mode=classifier --pre-processing-arguments='{"image_input_names":"data"}' --class-labels synset.txt --output-file="squeezenetv11.mlmodel"
-```
-
-  The above command will save the converted model in CoreML format to file squeezenet-v11.mlmodel. Internally, the model is first loaded by MXNet recreating the entire symbolic graph in memory. The converter walks through this symbolic graph converting each operator into its CoreML equivalent. Some of the supplied arguments to the converter are used by MXNet to generate the graph while others are used by CoreML either to pre-process the input (before passing it to the neural network) or to process the output of the neural network in a particular way.
-
-  In the command above:
-
-  * _model-prefix_: refers to the prefix of the file containing the MXNet model that needs to be converted (may include the directory path). E.g. for squeezenet model above the model files are squeezenet_v1.1-symbol.json and squeezenet_v1.1-0000.params and, therefore, model-prefix is "squeezenet_v1.1" (or "<directory-where-model-exists>/squeezenet_v1.1")
-  * _epoch_: refers to the suffix of the MXNet model filename. For squeezenet model above, it'll be 0.
-  * _input-shape_: refers to the input shape information in a JSON string format where the key is the name of the input variable (i.e. "data") and the value is the shape of that variable. If the model takes multiple inputs, input-shape for all of them need to be provided.
-  * _mode_: refers to the coreml model mode. Can either be 'classifier', 'regressor' or None. In this case, we use 'classifier' since we want the resulting CoreML model to classify images into various categories.
-  * _pre-processing-arguments_: In the Apple world, images have to be of type "Image". By providing image_input_names as "data", the converter will assume that the input variable "data" is of type "Image".
-  * _class-labels_: refers to the name of the file which contains the classification labels (a.k.a. synset file).
-  * _output-file_: the file where resulting CoreML model will be stored.
-
-3. The generated ".mlmodel" file can directly be integrated into your app. For more instructions on how to do this, please see [Apple CoreML's tutorial](https://developer.apple.com/documentation/coreml/integrating_a_core_ml_model_into_your_app).
-
-
-### Providing class labels
-You could provide a file containing class labels (as above) so that CoreML will return the category a given image belongs to. The file should have a label per line and labels can have any special characters. The line number of the label in the file should correspond with the index of softmax output. E.g.
-
-```bash
-mxnet_coreml_converter.py --model-prefix='squeezenet_v1.1' --epoch=0 --input-shape='{"data":"3,227,227"}' --mode=classifier --class-labels synset.txt --output-file="squeezenetv11.mlmodel"
-```
-
-### Adding a pre-processing layer to CoreML model.
-You could ask CoreML to pre-process the images before passing them through the model. The following command provides image re-centering parameters for red, blue and green channel.
-
-```bash
-mxnet_coreml_converter.py --model-prefix='squeezenet_v1.1' --epoch=0 --input-shape='{"data":"3,224,224"}' --pre-processing-arguments='{"red_bias":127,"blue_bias":117,"green_bias":103}' --output-file="squeezenet_v11.mlmodel"
-```
-
-If you are building an app for a model that takes "Image" as an input, you will have to provide image_input_names as pre-processing arguments. This tells CoreML that a particular input variable is of type Image. E.g.:
-
-```bash
-mxnet_coreml_converter.py --model-prefix='squeezenet_v1.1' --epoch=0 --input-shape='{"data":"3,224,224"}' --pre-processing-arguments='{"red_bias":127,"blue_bias":117,"green_bias":103,"image_input_names":"data"}' --output-file="squeezenet_v11.mlmodel"
-```
-
-## Currently supported
-### Layers
-List of MXNet layers that can be converted into their CoreML equivalent:
-
-1. Activation
-2. Batchnorm
-3. Concat
-4. Convolution
-5. Deconvolution
-6. Dense
-7. Elementwise
-8. Flatten
-9. Pooling
-10. Reshape
-11. Softmax
-12. Transpose
-
-### Models
-Any MXNet model that uses the above operators can be converted easily. For instance, the following standard models can be converted:
-
-1. [Inception-BN](http://data.mxnet.io/models/imagenet/inception-bn/)
-
-```bash
-mxnet_coreml_converter.py --model-prefix='Inception-BN' --epoch=126 --input-shape='{"data":"3,224,224"}' --mode=classifier --pre-processing-arguments='{"image_input_names":"data"}' --class-labels synset.txt --output-file="InceptionBN.mlmodel"
-```
-
-2. [NiN](http://data.dmlc.ml/models/imagenet/nin/)
-
-```bash
-mxnet_coreml_converter.py --model-prefix='nin' --epoch=0 --input-shape='{"data":"3,224,224"}' --mode=classifier --pre-processing-arguments='{"image_input_names":"data"}' --class-labels synset.txt --output-file="nin.mlmodel"
-```
-
-3. [Resnet](http://data.mxnet.io/models/imagenet/resnet/)
-
-```bash
-mxnet_coreml_converter.py --model-prefix='resnet-50' --epoch=0 --input-shape='{"data":"3,224,224"}' --mode=classifier --pre-processing-arguments='{"image_input_names":"data"}' --class-labels synset.txt --output-file="resnet50.mlmodel"
-```
-
-4. [Squeezenet](http://data.mxnet.io/models/imagenet/squeezenet/)
-
-```bash
-mxnet_coreml_converter.py --model-prefix='squeezenet_v1.1' --epoch=0 --input-shape='{"data":"3,227,227"}' --mode=classifier --pre-processing-arguments='{"image_input_names":"data"}' --class-labels synset.txt --output-file="squeezenetv11.mlmodel"
-```
-
-5. [Vgg](http://data.mxnet.io/models/imagenet/vgg/)
-
-```bash
-mxnet_coreml_converter.py --model-prefix='vgg16' --epoch=0 --input-shape='{"data":"3,224,224"}' --mode=classifier --pre-processing-arguments='{"image_input_names":"data"}' --class-labels synset.txt --output-file="vgg16.mlmodel"
-```
-
-## Known issues
-* [Inception-V3](http://data.mxnet.io/models/imagenet/inception-v3.tar.gz) model can be converted into CoreML format but is unable to run on Xcode.
diff --git a/tools/coreml/converter/__init__.py b/tools/coreml/converter/__init__.py
deleted file mode 100644
index 13a83393a912..000000000000
--- a/tools/coreml/converter/__init__.py
+++ /dev/null
@@ -1,16 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
diff --git a/tools/coreml/converter/_add_pooling.py b/tools/coreml/converter/_add_pooling.py
deleted file mode 100644
index 51934f22190b..000000000000
--- a/tools/coreml/converter/_add_pooling.py
+++ /dev/null
@@ -1,118 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from coremltools.proto import NeuralNetwork_pb2 as _NeuralNetwork_pb2
-
-
-def add_pooling_with_padding_types(builder, name, height, width, stride_height, stride_width,
-        layer_type, padding_type, input_name, output_name,
-        padding_top = 0, padding_bottom = 0, padding_left = 0, padding_right = 0,
-        same_padding_asymmetry_mode = 'BOTTOM_RIGHT_HEAVY',
-        exclude_pad_area = True, is_global = False):
-    """
-    Add a pooling layer to the model.
-
-    This is our own implementation of add_pooling since current CoreML's version (0.5.0) of builder
-    doesn't provide support for padding types apart from valid. This support will be added in the
-    next release of coremltools. When that happens, this can be removed.
-
-    Parameters
-
-    ----------
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    name: str
-        The name of this layer.
-    height: int
-        Height of pooling region.
-    width: int
-        Number of elements to be padded on the right side of the input blob.
-    stride_height: int
-        Stride along the height direction.
-    stride_width: int
-        Stride along the height direction.
-    layer_type: str
-        Type of pooling performed. Can either be 'MAX', 'AVERAGE' or 'L2'.
-    padding_type: str
-        Option for the output blob shape. Can be either 'VALID' , 'SAME' or 'INCLUDE_LAST_PIXEL'. Kindly look at NeuralNetwork.proto for details.
-    input_name: str
-        The input blob name of this layer.
-    output_name: str
-        The output blob name of this layer.
-
-    padding_top, padding_bottom, padding_left, padding_right: int
-        values of height (top, bottom) and width (left, right) padding to be used if padding type is "VALID" or "INCLUDE_LAST_PIXEL"
-
-    same_padding_asymmetry_mode : str.
-        Type of asymmetric padding to be used when  padding_type = 'SAME'. Kindly look at NeuralNetwork.proto for details. Can be either 'BOTTOM_RIGHT_HEAVY' or  'TOP_LEFT_HEAVY'.
-
-    exclude_pad_area: boolean
-        Whether to exclude padded area in the pooling operation. Defaults to True.
-
-        - If True, the value of the padded area will be excluded.
-        - If False, the padded area will be included.
-        This flag is only used with average pooling.
-    is_global: boolean
-        Whether the pooling operation is global. Defaults to False.
-
-        - If True, the pooling operation is global -- the pooling region is of the same size of the input blob.
-        Parameters height, width, stride_height, stride_width will be ignored.
-
-        - If False, the pooling operation is not global.
-
-    See Also
-    --------
-    add_convolution, add_pooling, add_activation
-    """
-
-    spec = builder.spec
-    nn_spec = builder.nn_spec
-
-    # Add a new layer
-    spec_layer = nn_spec.layers.add()
-    spec_layer.name = name
-    spec_layer.input.append(input_name)
-    spec_layer.output.append(output_name)
-    spec_layer_params = spec_layer.pooling
-
-    # Set the parameters
-    spec_layer_params.type = \
-                _NeuralNetwork_pb2.PoolingLayerParams.PoolingType.Value(layer_type)
-
-    if padding_type == 'VALID':
-        height_border = spec_layer_params.valid.paddingAmounts.borderAmounts.add()
-        height_border.startEdgeSize = padding_top
-        height_border.endEdgeSize = padding_bottom
-        width_border = spec_layer_params.valid.paddingAmounts.borderAmounts.add()
-        width_border.startEdgeSize = padding_left
-        width_border.endEdgeSize = padding_right
-    elif padding_type == 'SAME':
-        if not (same_padding_asymmetry_mode == 'BOTTOM_RIGHT_HEAVY' or  same_padding_asymmetry_mode == 'TOP_LEFT_HEAVY'):
-            raise ValueError("Invalid value %d of same_padding_asymmetry_mode parameter" % same_padding_asymmetry_mode)
-        spec_layer_params.same.asymmetryMode = _NeuralNetwork_pb2.SamePadding.SamePaddingMode.Value(same_padding_asymmetry_mode)
-    elif padding_type == 'INCLUDE_LAST_PIXEL':
-        if padding_top != padding_bottom or padding_left != padding_right:
-            raise ValueError("Only symmetric padding is supported with the INCLUDE_LAST_PIXEL padding type")
-        spec_layer_params.includeLastPixel.paddingAmounts.append(padding_top)
-        spec_layer_params.includeLastPixel.paddingAmounts.append(padding_left)
-
-    spec_layer_params.kernelSize.append(height)
-    spec_layer_params.kernelSize.append(width)
-    spec_layer_params.stride.append(stride_height)
-    spec_layer_params.stride.append(stride_width)
-    spec_layer_params.avgPoolExcludePadding = exclude_pad_area
-    spec_layer_params.globalPooling = is_global
diff --git a/tools/coreml/converter/_layers.py b/tools/coreml/converter/_layers.py
deleted file mode 100644
index 6590b13b108f..000000000000
--- a/tools/coreml/converter/_layers.py
+++ /dev/null
@@ -1,656 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import _add_pooling
-from ast import literal_eval
-
-def _get_input_output_name(net, node, index=0):
-    name = node['name']
-    inputs = node['inputs']
-
-    if index == 'all':
-        input_name = [_get_node_name(net, inputs[idx][0]) for idx in range(len(inputs))]
-    elif type(index) == int:
-        input_name = _get_node_name(net, inputs[0][0])
-    else:
-        input_name = [_get_node_name(net, inputs[idx][0]) for idx in index]
-    return input_name, name
-
-
-def _get_node_name(net, node_id):
-    return net['nodes'][node_id]['name']
-
-
-def _get_node_shape(net, node_id):
-    return net['nodes'][node_id]['shape']
-
-def _get_attrs(node):
-    """get attribute dict from node
-
-    This functions keeps backward compatibility
-    for both attr and attrs key in the json field.
-
-    Parameters
-    ----------
-    node : dict
-       The json graph Node
-
-    Returns
-    -------
-    attrs : dict
-       The attr dict, returns empty dict if
-       the field do not exist.
-    """
-    if 'attrs' in node:
-        return node['attrs']
-    elif 'attr' in node:
-        return node['attr']
-    else:
-        return {}
-
-
-# TODO These operators still need to be converted (listing in order of priority):
-# High priority:
-# mxnet.symbol.repeat -> builder.add_repeat to flatten and repeat the NDArray sequence
-# mxnet.symbol.Crop -> builder.add_crop to crop image along spacial dimensions
-# mxnet.symbol.Pad -> builder.add_padding putting 0's on height and width for tensor
-# Low Priority:
-# depthwise seperable convolution support through groups in builder.add_convolution
-# add_optional -> for all RNNs defining what goes in and out (to define beam search or if input is streaming)
-# mx.symbol.Embedding -> add_embedding takes indicies, word ids from dict that is outside coreml or
-# in pipeline only if we have text mapping to indicies
-# FusedRNNCell -> add_bidirlstm
-#  add_unilstm -> reverse_input param true as second and concat on outputs
-# Do vanilla (0.9 mxnet) lstm, gru, vanilla_rnn
-
-
-def convert_reshape(net, node, module, builder):
-    """Converts a reshape layer from mxnet to coreml.
-
-    This doesn't currently handle the deprecated parameters for the reshape layer.
-
-    Parameters
-    ----------
-    network: net
-        An mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        A module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    target_shape = node['shape']
-
-    if any(item <= 0 for item in target_shape):
-        raise NotImplementedError('Special dimensional values less than or equal to 0 are not supported yet.'
-                                  'Feel free to file an issue here: https://github.com/dmlc/mxnet/issues.')
-
-    if 'reverse' in node and node['reverse'] == 'True':
-        raise NotImplementedError('"reverse" parameter is not supported by yet.'
-                                  'Feel free to file an issue here: https://github.com/dmlc/mxnet/issues.')
-
-    mode = 0 # CHANNEL_FIRST
-    builder.add_reshape(name, input_name, output_name, target_shape, mode)
-
-
-def convert_transpose(net, node, module, builder):
-    """Convert a transpose layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    param = _get_attrs(node)
-
-    axes = literal_eval(param['axes'])
-    builder.add_permute(name, axes, input_name, output_name)
-
-
-def convert_flatten(net, node, module, builder):
-    """Convert a flatten layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    mode = 0 # CHANNEL_FIRST
-    builder.add_flatten(name, mode, input_name, output_name)
-
-
-def convert_softmax(net, node, module, builder):
-    """Convert a softmax layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    builder.add_softmax(name=name,
-                        input_name=input_name,
-                        output_name=output_name)
-
-
-def convert_activation(net, node, module, builder):
-    """Convert an activation layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    mx_non_linearity = _get_attrs(node)['act_type']
-    #TODO add SCALED_TANH, SOFTPLUS, SOFTSIGN, SIGMOID_HARD, LEAKYRELU, PRELU, ELU, PARAMETRICSOFTPLUS, THRESHOLDEDRELU, LINEAR
-    if mx_non_linearity == 'relu':
-        non_linearity = 'RELU'
-    elif mx_non_linearity == 'tanh':
-        non_linearity = 'TANH'
-    elif mx_non_linearity == 'sigmoid':
-        non_linearity = 'SIGMOID'
-    else:
-        raise TypeError('Unknown activation type %s' % mx_non_linearity)
-    builder.add_activation(name = name,
-                           non_linearity = non_linearity,
-                           input_name = input_name,
-                           output_name = output_name)
-
-
-def convert_leakyrelu(net, node, module, builder):
-    """Convert a leakyrelu layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    inputs = node['inputs']
-    args, _ = module.get_params()
-    mx_non_linearity = _get_attrs(node)['act_type']
-    if mx_non_linearity == 'elu':
-        non_linearity = 'ELU'
-        slope = _get_attrs(node)['slope'] if 'slope' in _get_attrs(node) else 0.25
-        params = slope
-    elif mx_non_linearity == 'leaky':
-        non_linearity = 'LEAKYRELU'
-        slope = _get_attrs(node)['slope'] if 'slope' in _get_attrs(node) else 0.25
-        params = [slope]
-    elif mx_non_linearity == 'prelu':
-        non_linearity = 'PRELU'
-        params = args[_get_node_name(net, inputs[1][0])].asnumpy()
-    else:
-        raise TypeError('Unknown activation type %s' % mx_non_linearity)
-    builder.add_activation(name = name,
-                           non_linearity = non_linearity,
-                           input_name = input_name,
-                           output_name = output_name,
-                           params = params)
-
-
-def convert_elementwise_add(net, node, module, builder):
-    """Convert an elementwise add layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-
-    input_names, output_name = _get_input_output_name(net, node, [0, 1])
-    name = node['name']
-
-    builder.add_elementwise(name, input_names, output_name, 'ADD')
-
-
-def convert_dense(net, node, module, builder):
-    """Convert a dense layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-
-    inputs = node['inputs']
-    param = node['attrs']
-    if 'no_bias' in param.keys():
-        has_bias = not literal_eval(param['no_bias'])
-    else:
-        has_bias = True
-
-    args, _ = module.get_params()
-    W = args[_get_node_name(net, inputs[1][0])].asnumpy()
-    if has_bias:
-        Wb = args[_get_node_name(net, inputs[2][0])].asnumpy()
-    else:
-        Wb = None
-    nC, nB = W.shape
-
-    builder.add_inner_product(
-        name=name,
-        W=W,
-        b=Wb,
-        input_channels=nB,
-        output_channels=nC,
-        has_bias=has_bias,
-        input_name=input_name,
-        output_name=output_name
-    )
-
-
-def convert_convolution(net, node, module, builder):
-    """Convert a convolution layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    param = _get_attrs(node)
-    inputs = node['inputs']
-    args, _ = module.get_params()
-
-    if 'no_bias' in param.keys():
-        has_bias = not literal_eval(param['no_bias'])
-    else:
-        has_bias = True
-
-    if 'pad' in param.keys() and literal_eval(param['pad']) != (0, 0):
-        pad = literal_eval(param['pad'])
-        builder.add_padding(
-            name=name+"_pad",
-            left=pad[1],
-            right=pad[1],
-            top=pad[0],
-            bottom=pad[0],
-            value=0,
-            input_name=input_name,
-            output_name=name+"_pad_output")
-        input_name = name+"_pad_output"
-
-    border_mode = "valid"
-
-    n_filters = int(param['num_filter'])
-    n_groups = int(param['num_group']) if 'num_group' in param else 1
-
-    W = args[_get_node_name(net, inputs[1][0])].asnumpy()
-    if has_bias:
-        Wb = args[_get_node_name(net, inputs[2][0])].asnumpy()
-    else:
-        Wb = None
-
-    channels = W.shape[1]
-
-    stride_height = 1
-    stride_width = 1
-    if 'stride' in param.keys():
-        stride_height, stride_width = literal_eval(param['stride'])
-
-    kernel_height, kernel_width = literal_eval(param['kernel'])
-
-    W = W.transpose((2, 3, 1, 0))
-    builder.add_convolution(
-        name=name,
-        kernel_channels=channels,
-        output_channels=n_filters,
-        height=kernel_height,
-        width=kernel_width,
-        stride_height=stride_height,
-        stride_width=stride_width,
-        border_mode=border_mode,
-        groups=n_groups,
-        W=W,
-        b=Wb,
-        has_bias=has_bias,
-        is_deconv=False,
-        output_shape=None,
-        input_name=input_name,
-        output_name=output_name)
-
-
-def convert_pooling(net, node, module, builder):
-    """Convert a pooling layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    param = _get_attrs(node)
-
-    layer_type_mx = param['pool_type']
-    if layer_type_mx == 'max':
-        layer_type = 'MAX'
-    elif layer_type_mx == 'avg':
-        layer_type = 'AVERAGE'
-    else:
-        raise TypeError("Pooling type %s not supported" % layer_type_mx)
-
-    # Add padding if there is any
-    if 'pad' in param.keys() and literal_eval(param['pad']) != (0, 0):
-        pad = literal_eval(param['pad'])
-        builder.add_padding(
-            name=name+"_pad",
-            left=pad[1],
-            right=pad[1],
-            top=pad[0],
-            bottom=pad[0],
-            value=0,
-            input_name=input_name,
-            output_name=name+"_pad_output")
-        input_name = name+"_pad_output"
-
-    stride_height = 1
-    stride_width = 1
-    if 'stride' in param.keys():
-        stride_height, stride_width = literal_eval(param['stride'])
-
-    kernel_width, kernel_height = literal_eval(param['kernel'])
-
-    type_map = {'valid': 'VALID', 'full': 'INCLUDE_LAST_PIXEL'}
-    padding_type = param['pooling_convention'] if 'pooling_convention' in param else 'valid'
-    if padding_type not in type_map:
-        raise KeyError("%s type is not supported in this converter. It is a Github issue.")
-    padding_type = type_map[padding_type]
-
-    if 'global_pool' in param.keys():
-        is_global = literal_eval(param['global_pool'])
-    else:
-        is_global = False
-
-    # For reasons why we are not using the standard builder but having our own implementation,
-    # see the function documentation.
-    _add_pooling.add_pooling_with_padding_types(
-        builder=builder,
-        name=name,
-        height=kernel_height,
-        width=kernel_width,
-        stride_height=stride_height,
-        stride_width=stride_width,
-        layer_type=layer_type,
-        padding_type=padding_type,
-        exclude_pad_area=False,
-        is_global=is_global,
-        input_name=input_name,
-        output_name=output_name
-    )
-
-
-def convert_batchnorm(net, node, module, builder):
-    """Convert a batchnorm layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    inputs = node['inputs']
-
-
-    eps = 1e-3  # Default value of eps for MXNet.
-    use_global_stats = False  # Default value of use_global_stats for MXNet.
-    fix_gamma = True  # Default value of fix_gamma for MXNet.
-    attrs = _get_attrs(node)
-    if 'eps' in attrs:
-        eps = literal_eval(attrs['eps'])
-    if 'fix_gamma' in attrs:
-        fix_gamma = literal_eval(attrs['fix_gamma'])
-
-    args, aux = module.get_params()
-    gamma = args[_get_node_name(net, inputs[1][0])].asnumpy()
-    beta = args[_get_node_name(net, inputs[2][0])].asnumpy()
-    mean = aux[_get_node_name(net, inputs[3][0])].asnumpy()
-    variance = aux[_get_node_name(net, inputs[4][0])].asnumpy()
-    nb_channels = gamma.shape[0]
-    if fix_gamma:
-        gamma.fill(1.)
-    builder.add_batchnorm(
-        name=name,
-        channels=nb_channels,
-        gamma=gamma,
-        beta=beta,
-        mean=mean,
-        variance=variance,
-        input_name=input_name,
-        output_name=output_name,
-        epsilon=eps)
-
-
-def convert_concat(net, node, module, builder):
-    """Convert concat layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    # Get input and output names
-    input_names, output_name = _get_input_output_name(net, node, 'all')
-    name = node['name']
-    mode = 'CONCAT'
-    builder.add_elementwise(name = name, input_names = input_names,
-            output_name = output_name, mode = mode)
-
-
-def convert_deconvolution(net, node, module, builder):
-    """Convert a deconvolution layer from mxnet to coreml.
-
-    Parameters
-    ----------
-    network: net
-        A mxnet network object.
-
-    layer: node
-        Node to convert.
-
-    module: module
-        An module for MXNet
-
-    builder: NeuralNetworkBuilder
-        A neural network builder object.
-    """
-    input_name, output_name = _get_input_output_name(net, node)
-    name = node['name']
-    param = _get_attrs(node)
-    inputs = node['inputs']
-    args, _ = module.get_params()
-
-    if 'no_bias' in param.keys():
-        has_bias = not literal_eval(param['no_bias'])
-    else:
-        has_bias = False
-
-    border_mode = "valid"
-
-    n_filters = int(param['num_filter'])
-
-    output_shape = None
-    if 'target_shape' in param:
-        target_shape = literal_eval(param['target_shape'])
-        output_shape = (int(target_shape[0]), int(target_shape[1]))
-
-    W = args[_get_node_name(net, inputs[1][0])].asnumpy()
-
-    if has_bias:
-        Wb = args[_get_node_name(net, inputs[2][0])].asnumpy()
-    else:
-        Wb = None
-
-    channels = W.shape[0]
-    stride_height, stride_width = literal_eval(param['stride'])
-    kernel_height, kernel_width = literal_eval(param['kernel'])
-    W = W.transpose((2, 3, 0, 1))
-
-    use_crop = False
-    if literal_eval(param['pad']) != (0, 0) and output_shape is None:
-        use_crop = True
-
-    builder.add_convolution(
-        name=name,
-        kernel_channels=channels,
-        output_channels=n_filters,
-        height=kernel_height,
-        width=kernel_width,
-        stride_height=stride_height,
-        stride_width=stride_width,
-        border_mode=border_mode,
-        groups=1,
-        W=W,
-        b=Wb,
-        has_bias=has_bias,
-        is_deconv=True,
-        output_shape=output_shape,
-        input_name=input_name,
-        output_name=output_name+'before_pad' if use_crop else output_name
-    )
-
-    if use_crop:
-        pad = literal_eval(param['pad'])
-        builder.add_crop(
-            name=name+"_pad",
-            left=pad[1],
-            right=pad[1],
-            top=pad[0],
-            bottom=pad[0],
-            offset=0,
-            input_names=[output_name+'before_pad'],
-            output_name=output_name
-        )
diff --git a/tools/coreml/converter/_mxnet_converter.py b/tools/coreml/converter/_mxnet_converter.py
deleted file mode 100644
index c5fd37b34aed..000000000000
--- a/tools/coreml/converter/_mxnet_converter.py
+++ /dev/null
@@ -1,235 +0,0 @@
-from __future__ import print_function
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import _layers
-import coremltools as _coremltools
-import coremltools.models.datatypes as _datatypes
-from coremltools.models import neural_network as _neural_network
-
-import json as _json
-import mxnet as _mxnet
-import numpy as _np
-
-_MXNET_LAYER_REGISTRY  = {
-    'FullyConnected' : _layers.convert_dense,
-    'Activation'     : _layers.convert_activation,
-    'SoftmaxOutput'  : _layers.convert_softmax,
-    'Convolution'    : _layers.convert_convolution,
-    'Pooling'        : _layers.convert_pooling,
-    'Flatten'        : _layers.convert_flatten,
-    'transpose'      : _layers.convert_transpose,
-    'Concat'         : _layers.convert_concat,
-    'BatchNorm'      : _layers.convert_batchnorm,
-    'elemwise_add'   : _layers.convert_elementwise_add,
-    'Reshape'        : _layers.convert_reshape,
-    'Deconvolution'  : _layers.convert_deconvolution,
-    'LeakyReLU'      : _layers.convert_leakyrelu,
-}
-
-_MXNET_SKIP_LAYERS = [
-    '_MulScalar',
-    'Dropout',
-    '_minus_scalar',
-    '_mul_scalar',
-]
-
-def _mxnet_remove_batch(input_data):
-    for blob in input_data:
-        input_data[blob] = _np.reshape(input_data[blob], input_data[blob].shape[1:])
-    return input_data
-
-def check_error(model, path, shapes, output = 'softmax_output', verbose = True):
-    """
-    Check the difference between predictions from MXNet and CoreML.
-    """
-    coreml_model = _coremltools.models.MLModel(path)
-    input_data = {}
-    input_data_copy = {}
-    for ip in shapes:
-        input_data[ip] = _np.random.rand(*shapes[ip]).astype('f')
-        input_data_copy[ip] = _np.copy(input_data[ip])
-
-    dataIter = _mxnet.io.NDArrayIter(input_data_copy)
-    mx_out = model.predict(dataIter).flatten()
-
-    e_out_dict = coreml_model.predict(_mxnet_remove_batch(input_data))
-    e_out = e_out_dict[output].flatten()
-    error = _np.linalg.norm(e_out - mx_out)
-
-    if verbose:
-        print("First few predictions from CoreML : %s" % e_out[0:10])
-        print("First few predictions from MXNet  : %s" % e_out[0:10])
-        print("L2 Error on random data %s" % error)
-    return error
-
-def _set_input_output_layers(builder, input_names, output_names):
-    input_layers_indices = []
-    output_layers_indices = []
-    layers = builder.spec.neuralNetwork.layers
-    for idx, l in enumerate(layers):
-        if set(input_names).intersection(l.input):
-            input_layers_indices.append(idx)
-        if set(output_names).intersection(l.output):
-            output_layers_indices.append(idx)
-
-    builder.input_layers_indices = input_layers_indices
-    builder.output_layers_indices = output_layers_indices
-    builder.input_layers_is1d = [False for _ in input_names]
-    builder.output_layers_is1d = [False for _ in output_names]
-
-def _get_layer_converter_fn(layer):
-    """Get the right converter function for MXNet
-    """
-    if layer in _MXNET_LAYER_REGISTRY:
-        return _MXNET_LAYER_REGISTRY[layer]
-    else:
-        raise TypeError("MXNet layer of type %s is not supported." % layer)
-
-
-def convert(model, input_shape, order = None, class_labels = None, mode = None, preprocessor_args = None):
-    """Convert an MXNet model to the protobuf spec.
-
-    Parameters
-    ----------
-    model: MXNet model
-        A trained MXNet neural network model.
-
-    order: Order of inputs
-
-    class_labels: A string or list of strings.
-        As a string it represents the name of the file which contains the classification labels (one per line).
-        As a list of strings it represents a list of categories that map the index of the output of a neural network to labels in a classifier.
-
-    mode: str ('classifier', 'regressor' or None)
-        Mode of the converted coreml model.
-        When mode = 'classifier', a NeuralNetworkClassifier spec will be constructed.
-        When mode = 'regressor', a NeuralNetworkRegressor spec will be constructed.
-
-    **kwargs :
-        Provide keyword arguments for:
-        - input shapes. Supplied as a dictionary object with keyword "input_shape".
-        - pre-processing arguments: Supplied as a dictionary object with keyword "preprocessor_args". The parameters in the dictionary
-            tell the converted coreml model how to pre-process any input before an inference is run on it.
-            For the list of pre-processing arguments see
-            http://pythonhosted.org/coremltools/generated/coremltools.models.neural_network.html#coremltools.models.neural_network.NeuralNetworkBuilder.set_pre_processing_parameters
-
-    Returns
-    -------
-    model: A coreml model.
-    """
-    if not isinstance(input_shape, dict):
-         raise TypeError("Must provide a dictionary for input shape. e.g input_shape={'data':(3,224,224)}")
-
-    def remove_batch(dim):
-        return dim[1:]
-
-    if order is None:
-        input_names = input_shape.keys()
-        input_dims  = map(remove_batch, input_shape.values())
-    else:
-        names = input_shape.keys()
-        shapes = map(remove_batch, input_shape.values())
-        input_names = [names[i] for i in order]
-        input_dims = [shapes[i] for i in order]
-
-    net = model.symbol
-
-    # Infer shapes and store in a dictionary
-    shapes = net.infer_shape(**input_shape)
-    arg_names = net.list_arguments()
-    output_names = net.list_outputs()
-    aux_names = net.list_auxiliary_states()
-    shape_dict = {}
-    for idx, op in enumerate(arg_names):
-        shape_dict[op] = shapes[0][idx]
-    for idx, op in enumerate(output_names):
-        shape_dict[op] = shapes[1][idx]
-    for idx, op in enumerate(aux_names):
-        shape_dict[op] = shapes[2][idx]
-
-    # Get the inputs and outputs
-    output_dims = shapes[1]
-    input_types = [_datatypes.Array(*dim) for dim in input_dims]
-    output_types = [_datatypes.Array(*dim) for dim in output_dims]
-
-    # Make the builder
-    input_features = zip(input_names, input_types)
-    output_features = zip(output_names, output_types)
-    builder = _neural_network.NeuralNetworkBuilder(input_features, output_features, mode)
-    # Get out the layers
-    net = _json.loads(net.tojson())
-    nodes = net['nodes']
-
-    for i, node in enumerate(nodes):
-        node['id'] = i
-
-        if node['name'] in shape_dict:
-            node['shape'] = shape_dict[node['name']]
-
-        node['outputs'] = []
-        if 'inputs' in node:
-            for ip in node['inputs']:
-                nodes[ip[0]]['outputs'].append([i, 0])
-        else:
-            node['inputs'] = []
-
-    # Mark the head nodes
-    for head in net['heads']:
-        head_id = head[0]
-        head_node = nodes[head_id]
-        head_node['outputs'] = [head]
-        head_node['name'] += "_output"
-        head_node['shape'] = shape_dict[head_node['name']]
-
-    # For skipped layers, make sure nodes are modified
-    for node in nodes:
-        op = node['op']
-        inputs = node['inputs']
-        outputs = node['outputs']
-        if op in _MXNET_SKIP_LAYERS:
-            nodes[inputs[0][0]]['outputs'][0] = outputs[0]
-            nodes[outputs[0][0]]['inputs'][0] = inputs[0]
-
-    # Find the input and output names for this node
-    for idx, node in enumerate(nodes):
-        op = node['op']
-        if op == 'null' or op in _MXNET_SKIP_LAYERS:
-            continue
-        name = node['name']
-        print("%d : %s, %s" % (idx, name, op))
-        converter_func = _get_layer_converter_fn(op)
-        converter_func(net, node, model, builder)
-
-    # Set the right inputs and outputs
-    _set_input_output_layers(builder, input_names, output_names)
-    builder.set_input(input_names, input_dims)
-    builder.set_output(output_names, output_dims)
-    if preprocessor_args is not None:
-        builder.set_pre_processing_parameters(**preprocessor_args)
-
-    if class_labels is not None:
-        if type(class_labels) is str:
-            labels = [l.strip() for l in open(class_labels).readlines()]
-        elif type(class_labels) is list:
-            labels = class_labels
-        else:
-            raise TypeError("synset variable of unknown type. Type found: %s. Expected either string or list of strings." % type(class_labels))
-        builder.set_class_labels(class_labels = labels)
-
-    # Return the model
-    return _coremltools.models.MLModel(builder.spec)
diff --git a/tools/coreml/converter/utils.py b/tools/coreml/converter/utils.py
deleted file mode 100644
index b4e3ecb14017..000000000000
--- a/tools/coreml/converter/utils.py
+++ /dev/null
@@ -1,118 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import mxnet as mx
-
-
-def load_model(model_name, epoch_num, data_shapes, label_shapes, label_names, gpus=''):
-    """Returns a module loaded with the provided model.
-
-    Parameters
-    ----------
-    model_name: str
-        Prefix of the MXNet model name as stored on the local directory.
-
-    epoch_num : int
-        Epoch number of model we would like to load.
-
-    input_shape: tuple
-        The shape of the input data in the form of (batch_size, channels, height, width)
-
-    files: list of strings
-        List of URLs pertaining to files that need to be downloaded in order to use the model.
-
-    data_shapes: list of tuples.
-        List of tuples where each tuple is a pair of input variable name and its shape.
-
-    label_shapes: list of (str, tuple)
-        Typically is ``data_iter.provide_label``.
-
-    label_names: list of str
-        Name of the output labels in the MXNet symbolic graph.
-
-    gpus: str
-        Comma separated string of gpu ids on which inferences are executed. E.g. 3,5,6 would refer to GPUs 3, 5 and 6.
-        If empty, we use CPU.
-
-    Returns
-    -------
-    MXNet module
-    """
-    sym, arg_params, aux_params = mx.model.load_checkpoint(model_name, epoch_num)
-
-    mod = create_module(sym, data_shapes, label_shapes, label_names, gpus)
-
-    mod.set_params(
-        arg_params=arg_params,
-        aux_params=aux_params,
-        allow_missing=True
-    )
-
-    return mod
-
-
-def create_module(sym, data_shapes, label_shapes, label_names, gpus=''):
-    """Creates a new MXNet module.
-
-    Parameters
-    ----------
-    sym : Symbol
-        An MXNet symbol.
-
-    input_shape: tuple
-        The shape of the input data in the form of (batch_size, channels, height, width)
-
-    files: list of strings
-        List of URLs pertaining to files that need to be downloaded in order to use the model.
-
-    data_shapes: list of tuples.
-        List of tuples where each tuple is a pair of input variable name and its shape.
-
-    label_shapes: list of (str, tuple)
-        Typically is ``data_iter.provide_label``.
-
-    label_names: list of str
-        Name of the output labels in the MXNet symbolic graph.
-
-    gpus: str
-        Comma separated string of gpu ids on which inferences are executed. E.g. 3,5,6 would refer to GPUs 3, 5 and 6.
-        If empty, we use CPU.
-
-    Returns
-    -------
-    MXNet module
-    """
-    if gpus == '':
-        devices = mx.cpu()
-    else:
-        devices = [mx.gpu(int(i)) for i in gpus.split(',')]
-
-    data_names = [data_shape[0] for data_shape in data_shapes]
-
-    mod = mx.mod.Module(
-        symbol=sym,
-        data_names=data_names,
-        context=devices,
-        label_names=label_names
-    )
-    mod.bind(
-        for_training=False,
-        data_shapes=data_shapes,
-        label_shapes=label_shapes
-    )
-    return mod
-
diff --git a/tools/coreml/mxnet_coreml_converter.py b/tools/coreml/mxnet_coreml_converter.py
deleted file mode 100644
index 7051ee392b0c..000000000000
--- a/tools/coreml/mxnet_coreml_converter.py
+++ /dev/null
@@ -1,115 +0,0 @@
-#!/usr/bin/env python
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-import argparse
-from converter._mxnet_converter import convert
-from converter.utils import load_model
-import yaml
-from ast import literal_eval
-
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='Converts an MXNet model to a CoreML model')
-
-    parser.add_argument(
-        '--model-prefix', required=True, type=str,
-        help="Prefix of the existing model. The model is expected to be stored in the same directory from where "
-             "this tool is being run. E.g. --model-prefix=squeezenet_v1.1. Note that this can include entire "
-             "directory name too. E.g. --model-prefix=~/Downloads/squeezenet_v1.1."
-    )
-    parser.add_argument(
-        '--epoch', required=True, type=int,
-        help="The suffix of the MXNet model name which usually indicate the number of epochs. E.g. --epoch=0"
-    )
-    parser.add_argument(
-        '--output-file', required=True, type=str,
-        help="File where the resulting CoreML model will be saved. E.g. --output-file=\"squeezenet-v11.mlmodel\""
-    )
-    parser.add_argument(
-        '--input-shape', required=True, type=str,
-        help="Input shape information in a JSON string format. E.g. --input-shape='{\"data\":\"3,224,224\"}' where"
-             " 'data' is the name of the input variable of the MXNet model and '3,244,244' is its shape "
-             "(channel, height and weight) of the input image data."
-    )
-    parser.add_argument(
-        '--label-names', required=False, type=str, default='softmax_label',
-        help="label-names of the MXNet model's output variables. E.g. --label-names=softmax_label. "
-             "(Usually this is the name of the last layer followed by suffix _label.)"
-    )
-    parser.add_argument(
-        '--mode', required=False, type=str, default=None,
-        help="When mode='classifier', a CoreML NeuralNetworkClassifier will be constructed. "
-             "When mode='regressor', a CoreML NeuralNetworkRegressor will be constructed. "
-             "When mode=None (default), a CoreML NeuralNetwork will be constructed."
-    )
-    parser.add_argument(
-        '--class-labels', required=False, type=str, default=None,
-        help="As a string it represents the name of the file which contains the classification labels (synset file)."
-    )
-    parser.add_argument(
-        '--pre-processing-arguments', required=False, type=str, default=None,
-        help="The parameters in the dictionary tell the converted coreml model how to pre-process any input "
-             "before an inference is run on it. For the list of pre-processing arguments see https://goo.gl/GzFe86"
-             "e.g. --pre-processing-arguments='{\"red_bias\": 127, \"blue_bias\":117, \"green_bias\": 103}'"
-    )
-
-    # TODO
-    # We need to test how to use the order
-    # parser.add_argument(
-    #     '--order', required=True, type=str, default=None,
-    #     help=""
-    # )
-
-    args, unknown = parser.parse_known_args()
-
-    model_name = args.model_prefix
-    epoch_num = args.epoch
-    output_file = args.output_file
-    mode = args.mode
-    class_labels=args.class_labels
-
-    # parse the input data name/shape and label name/shape
-    input_shape = yaml.safe_load(args.input_shape)
-    data_shapes = []
-    for key in input_shape:
-        # We prepend 1 because the coreml model only accept 1 input data at a time (=batch-size).
-        shape = (1,)+literal_eval(input_shape[key])
-        input_shape[key] = shape
-        data_shapes.append((key, shape))
-
-    # if label name is not in input then do not use the label
-    label_names = [args.label_names,] if args.label_names in input_shape else None
-
-    pre_processing_arguments = args.pre_processing_arguments
-
-    mod = load_model(
-        model_name=model_name,
-        epoch_num=epoch_num,
-        data_shapes=data_shapes,
-        label_shapes=None,
-        label_names=label_names
-    )
-
-    kwargs = {'input_shape': input_shape}
-    if pre_processing_arguments is not None:
-        kwargs['preprocessor_args'] = yaml.safe_load(pre_processing_arguments)
-
-    coreml_model = convert(model=mod, mode=mode, class_labels=class_labels, **kwargs)
-    coreml_model.save(output_file)
-    print("\nSUCCESS\nModel %s has been converted and saved at %s\n" % (model_name, output_file))
diff --git a/tools/coreml/pip_package/.gitignore b/tools/coreml/pip_package/.gitignore
deleted file mode 100644
index 7c67bf467970..000000000000
--- a/tools/coreml/pip_package/.gitignore
+++ /dev/null
@@ -1,10 +0,0 @@
-# Compiled python modules.
-*.pyc
-
-# Setuptools distribution folder.
-/dist/
-
-# Python egg metadata, regenerated from source files by setuptools.
-/*.egg-info
-/*.egg
-
diff --git a/tools/coreml/pip_package/MANIFEST.in b/tools/coreml/pip_package/MANIFEST.in
deleted file mode 100644
index 79fb0cbf1508..000000000000
--- a/tools/coreml/pip_package/MANIFEST.in
+++ /dev/null
@@ -1,19 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# Documentation for pypi webpage
-include README.rst
diff --git a/tools/coreml/pip_package/README.rst b/tools/coreml/pip_package/README.rst
deleted file mode 100644
index 495d6430c516..000000000000
--- a/tools/coreml/pip_package/README.rst
+++ /dev/null
@@ -1,61 +0,0 @@
-.. Licensed to the Apache Software Foundation (ASF) under one
-   or more contributor license agreements.  See the NOTICE file
-   distributed with this work for additional information
-   regarding copyright ownership.  The ASF licenses this file
-   to you under the Apache License, Version 2.0 (the
-   "License"); you may not use this file except in compliance
-   with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing,
-   software distributed under the License is distributed on an
-   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-   KIND, either express or implied.  See the License for the
-   specific language governing permissions and limitations
-   under the License.
-
-MXNET -> CoreML Converter
-=========================
-
-`Apache MXNet <https://github.com/apache/incubator-mxnet>`_ (incubating) is a deep learning framework designed for both efficiency and flexibility. It allows you to mix `symbolic and imperative programming <https://mxnet.apache.org/api/architecture/program_model>`_ to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines.
-
-`Core ML <http://developer.apple.com/documentation/coreml>`_ is an Apple framework which allows developers to simply and easily integrate machine learning (ML) models into apps running on Apple devices (including iOS, watchOS, macOS, and tvOS). Core ML introduces a public file format (.mlmodel) for a broad set of ML methods including deep neural networks (both convolutional and recurrent), tree ensembles with boosting, and generalized linear models. Models in this format can be directly integrated into apps through Xcode.
-
-This tool helps convert `MXNet models <https://github.com/apache/incubator-mxnet>`_ into `Apple CoreML <https://developer.apple.com/documentation/coreml>`_ format which can then be run on Apple devices. You can find more information about this tool on our `github <https://github.com/apache/incubator-mxnet/tree/master/tools/coreml>`_ page.
-
-Prerequisites
--------------
-This package can only be installed on MacOS X since it relies on Apple's CoreML SDK. It can be run on MacOS 10.11 or higher though for running inferences on the converted model MacOS 10.13 or higher is needed (or for phones, iOS 11 or above).
-
-Installation
-------------
-The method for installing this tool follows the `standard python package installation steps <https://packaging.python.org/installing/>`_. Once you have set up a python environment, run::
-
-  pip install mxnet-to-coreml
-
-The package `documentation <https://github.com/apache/incubator-mxnet/tree/master/tools/coreml>`_ contains more details on how to use coremltools.
-
-Dependencies
-------------
-This tool has the following dependencies:
-
-* mxnet (0.10.0+)
-* coremltools (0.5.1+)
-* pyyaml (3.12+)
-
-Sample Usage
-------------
-
-In order to convert, say a `Squeezenet model <http://data.mxnet.io/models/imagenet/squeezenet/>`_, with labels from `synset.txt <http://data.mxnet.io/models/imagenet/synset.txt>`_, execute this ::
-
-  mxnet_coreml_converter.py --model-prefix='squeezenet_v1.1' \
-  --epoch=0 --input-shape='{"data":"3,227,227"}' \
-  --mode=classifier --pre-processing-arguments='{"image_input_names":"data"}' \
-  --class-labels synset.txt --output-file="squeezenetv11.mlmodel"
-
-More Information
-----------------
-* `On Github <https://github.com/apache/incubator-mxnet/tree/master/tools/coreml>`_
-* `MXNet framework <https://github.com/apache/incubator-mxnet>`_
-* `Apple CoreML <https://developer.apple.com/documentation/coreml>`_
diff --git a/tools/coreml/pip_package/setup.py b/tools/coreml/pip_package/setup.py
deleted file mode 100644
index 35614271bfdd..000000000000
--- a/tools/coreml/pip_package/setup.py
+++ /dev/null
@@ -1,69 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from setuptools import setup
-from setuptools import find_packages
-
-# We are overriding the default behavior of bdist_wheel which is generating
-# pure python wheels while we need platform specific wheel since this tool
-# can only work on MacOS.
-try:
-    from wheel.bdist_wheel import bdist_wheel as _bdist_wheel
-    class bdist_wheel(_bdist_wheel):
-        def finalize_options(self):
-            _bdist_wheel.finalize_options(self)
-            self.root_is_pure = False
-except ImportError:
-    bdist_wheel = None
-
-
-def readme():
-    """
-    Reads README.rst file and allows us to provide
-    a better experience for pypi webpage.
-    """
-    with open('README.rst') as f:
-        return f.read()
-
-setup(name='mxnet-to-coreml',
-      version='0.1.3',
-      description='Tool to convert MXNet models into Apple CoreML model format.',
-      long_description=readme(),
-      classifiers=[
-        'Development Status :: 4 - Beta',
-        'Intended Audience :: Developers',
-        'License :: OSI Approved :: Apache Software License',
-        'Operating System :: MacOS :: MacOS X',
-        'Programming Language :: Python :: 2.7',
-        'Topic :: Software Development :: Libraries :: Python Modules'
-      ],
-      keywords='Apache MXNet Apple CoreML Converter Deep Learning',
-      url='https://github.com/apache/incubator-mxnet/tree/master/tools/coreml',
-      author='pracheer',
-      author_email='pracheer_gupta@hotmail.com',
-      license='Apache 2.0',
-      package_dir = {'': '..'},
-      packages=['converter'],
-      install_requires=[
-          'mxnet',
-          'coremltools',
-          'pyyaml',
-      ],
-      scripts=['../mxnet_coreml_converter.py'],
-      python_requires='~=2.7',
-      zip_safe=False,
-      cmdclass={'bdist_wheel': bdist_wheel},)
diff --git a/tools/coreml/test/test_mxnet_converter.py b/tools/coreml/test/test_mxnet_converter.py
deleted file mode 100644
index 192090588fde..000000000000
--- a/tools/coreml/test/test_mxnet_converter.py
+++ /dev/null
@@ -1,1091 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import unittest
-import mxnet as mx
-import numpy as np
-
-from converter._mxnet_converter import convert
-from collections import namedtuple
-from converter import utils
-
-
-def _mxnet_remove_batch(input_data):
-    for blob in input_data:
-        input_data[blob] = np.reshape(input_data[blob], input_data[blob].shape[1:])
-    return input_data
-
-
-def _get_mxnet_module(net, data_shapes, mode, label_names, input_names=None):
-    """ Given a symbolic graph, input shape and the initialization mode,
-        returns an MXNet module.
-    """
-    mx.random.seed(1993)
-
-    mod = utils.create_module(sym=net, data_shapes=data_shapes, label_shapes=input_names,
-                              label_names=label_names)
-
-    if mode == 'random':
-        mod.init_params(
-            initializer=mx.init.Uniform(scale=.1)
-        )
-    elif mode == 'zeros':
-        mod.init_params(
-            initializer=mx.init.Zero()
-        )
-    elif mode == 'ones':
-        mod.init_params(
-            initializer=mx.init.One()
-        )
-    else:
-        Exception(KeyError("%s is not a valid initialization mode" % mode))
-
-    return mod
-
-
-class SingleLayerTest(unittest.TestCase):
-    """
-    Unit test class for testing where converter is able to convert individual layers or not.
-    In order to do so, it converts model and generates preds on both CoreML and MXNet and check
-    they are the same.
-    """
-    def _test_mxnet_model(self, net, input_shape, mode, class_labels=None,
-                          coreml_mode=None, label_names=None, delta=1e-2,
-                          pre_processing_args=None, input_name='data'):
-        """ Helper method that convert the CoreML model into CoreML and compares the predictions
-        over random data.
-
-        Parameters
-        ----------
-        net: MXNet Symbol Graph
-            The graph that we'll be converting into CoreML.
-
-        input_shape: tuple of ints
-            The shape of input data. Generally of the format (batch-size, channels, height, width)
-
-        mode: (random|zeros|ones)
-            The mode to use in order to set the parameters (weights and biases).
-
-        label_names: list of strings
-            The names of the output labels. Default: None
-
-        delta: float
-            The maximum difference b/w predictions of MXNet and CoreML that is tolerable.
-
-        input_name: str
-            The name of the input variable to the symbolic graph.
-        """
-
-        data_shapes = [(input_name, input_shape)]
-
-        mod = _get_mxnet_module(net, data_shapes, mode, label_names)
-
-        # Generate some dummy data
-        input_data = {input_name: np.random.uniform(-10., 10., input_shape)}
-        Batch = namedtuple('Batch', ['data'])
-        mod.forward(Batch([mx.nd.array(input_data[input_name])]))
-        mxnet_preds = mod.get_outputs()[0].asnumpy().flatten()
-
-        # Get predictions from coreml
-        coreml_model = convert(
-            model=mod,
-            class_labels=class_labels,
-            mode=coreml_mode,
-            input_shape={input_name: input_shape},
-            preprocessor_args=pre_processing_args
-        )
-        coreml_preds = coreml_model.predict(_mxnet_remove_batch(input_data)).values()[0].flatten()
-
-        # Check prediction accuracy
-        self.assertEquals(len(mxnet_preds), len(coreml_preds))
-        for i in range(len(mxnet_preds)):
-            self.assertAlmostEquals(mxnet_preds[i], coreml_preds[i], delta=delta)
-
-    def test_tiny_inner_product_zero_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        self._test_mxnet_model(net, input_shape=input_shape, mode='zeros')
-
-    def test_really_tiny_inner_product_ones_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=1)
-        self._test_mxnet_model(net, input_shape=input_shape, mode='ones')
-
-    def test_really_tiny_2_inner_product_ones_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        self._test_mxnet_model(net, input_shape=input_shape, mode='ones')
-
-    def test_tiny_inner_product_ones_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        self._test_mxnet_model(net, input_shape=input_shape, mode='ones', delta=0.05)
-
-    def test_tiny_inner_product_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_inner_product_no_bias(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5, no_bias=True)
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_softmax_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        net = mx.sym.SoftmaxOutput(net, name='softmax')
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random',
-                               label_names=['softmax_label'])
-
-    def test_tiny_relu_activation_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        net = mx.sym.Activation(net, name='relu1', act_type="relu")
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_sigmoid_activation_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        net = mx.sym.Activation(net, name='sigmoid1', act_type="sigmoid")
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_tanh_activation_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-
-        # Define a model
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        net = mx.sym.Activation(net, name='tanh1', act_type="tanh")
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_elu_leakyrelu_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        slope = 0.1
-        net = mx.sym.LeakyReLU(net, name='elu1', act_type="elu", slope=slope)
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_leaky_leakyrelu_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        slope = 0.1
-        net = mx.sym.LeakyReLU(net, name='leaky1', act_type="leaky", slope=slope)
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_prelu_leakyrelu_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        gamma = mx.sym.Variable('gamma')
-        net = mx.sym.LeakyReLU(net, gamma=gamma, name='prelu1', act_type="prelu")
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_really_tiny_conv_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (1 ,1)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # Define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_conv_ones_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # Define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='ones', delta=0.05)
-
-    def test_tiny_conv_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_asym_conv_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (5 ,3)
-        stride = (1, 1)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_asym_conv_random_asym_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 28, 18)
-        num_filter = 16
-        kernel = (5, 3)
-        stride = (1, 1)
-        pad = (0, 0)
-        dilate = (1, 1)
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1',
-            dilate=dilate)
-        net = mx.sym.Activation(net, name='tanh', act_type="tanh")
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_conv_valid_pooling_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (2, 2)
-        stride = (2, 2)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        net = mx.symbol.Pooling(
-            data=net,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='pool_1',
-            pool_type='avg',
-            pooling_convention='valid'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_conv_pooling_full_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (2, 2)
-        stride = (2, 2)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        net = mx.symbol.Pooling(
-            data=net,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='pool_1',
-            pool_type='avg',
-            pooling_convention='full'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_conv_pooling_full_random_input_with_padding(self):
-        np.random.seed(1988)
-        input_shape = (1, 3, 10, 10)
-        num_filter = 2
-        kernel = (2, 2)
-        stride = (2, 2)
-        pad = (1, 1)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        net = mx.symbol.Pooling(
-            data=net,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='pool_1',
-            pool_type='avg',
-            pooling_convention='full'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_really_tiny_conv_random_3d_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 3, 10, 10)
-        num_filter = 1
-        kernel = (1, 1)
-        stride = (1, 1)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_really_tiny_conv_random_input_multi_filter(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 64
-        kernel = (1, 1)
-        stride = (1, 1)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_really_tiny_conv_random_input_multi_group(self):
-        np.random.seed(1988)
-        input_shape = (1, 16, 10, 10)
-        num_filter = 16
-        num_group = 4
-        kernel = (1, 1)
-        stride = (1, 1)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            num_group=num_group,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_conv_random_3d_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 3, 10, 10)
-        num_filter = 1
-        kernel = (5 ,5)
-        stride = (1, 1)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_conv_random_input_multi_filter(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 64
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_conv_random_input_multi_group(self):
-        np.random.seed(1988)
-        input_shape = (1, 16, 10, 10)
-        num_filter = 16
-        num_group = 4
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            num_group=num_group,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_conv_random(self):
-        np.random.seed(1988)
-        input_shape = (1, 3, 10, 10)
-        num_filter = 64
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_flatten(self):
-        np.random.seed(1988)
-        input_shape = (1, 3, 10, 10)
-        num_filter = 64
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        net = mx.sym.Flatten(data=net, name='flatten1')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        net = mx.sym.SoftmaxOutput(net, name='softmax')
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random',
-                               label_names=['softmax_label'])
-
-    def test_transpose(self):
-        np.random.seed(1988)
-        input_shape = (1, 3, 10, 10)
-        num_filter = 64
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        net = mx.sym.Variable('data')
-        net = mx.sym.transpose(data=net, name='transpose', axes=(0, 1, 2, 3))
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_reshape(self):
-        np.random.seed(1988)
-        input_shape = (1, 8)
-        net = mx.sym.Variable('data')
-        net = mx.sym.reshape(data=net, shape=(1, 2, 2, 2))
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_synset_random_input(self):
-        np.random.seed(1989)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        net = mx.sym.SoftmaxOutput(net, name='softmax')
-        mod = _get_mxnet_module(net, data_shapes=[('data', input_shape)],
-                                mode='random', label_names=['softmax_label'])
-
-        # Generate some dummy data
-        input_data = np.random.uniform(-0.1, 0.1, input_shape)
-
-        Batch = namedtuple('Batch', ['data'])
-        mod.forward(Batch([mx.nd.array(input_data)]))
-
-        kwargs = {'input_shape': {'data': input_shape}}
-        # Get predictions from coreml
-        coreml_model = convert(
-            model=mod,
-            class_labels=['Category1', 'Category2', 'Category3', 'Category4', 'Category5'],
-            mode='classifier',
-            **kwargs
-        )
-
-        prediction = coreml_model.predict(
-            _mxnet_remove_batch({'data': input_data}))
-        self.assertEqual(prediction['classLabel'], 'Category3')
-
-    def test_really_tiny_deconv_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (1, 1)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # Define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='deconv_1'
-        )
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_deconv_ones_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # Define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='deconv_1'
-        )
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='ones', delta=0.05)
-
-    def test_tiny_deconv_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # Define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='deconv_1'
-        )
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_asym_deconv_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (5, 3)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # Define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='deconv_1'
-        )
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_asym_deconv_random_asym_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 28, 18)
-        num_filter = 16
-        kernel = (5, 3)
-        stride = (1, 1)
-        pad = (0, 0)
-        dilate = (1, 1)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            dilate=dilate,
-            name='deconv_1'
-        )
-        net = mx.sym.Activation(net, name = 'tanh', act_type = "tanh")
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_deconv_pooling_random_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='deconv_1'
-        )
-        net = mx.symbol.Pooling(
-            data=net,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='pool_1',
-            pool_type='max'
-        )
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_really_tiny_deconv_random_3d_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 3, 10, 10)
-        num_filter = 1
-        kernel = (1, 1)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='deconv_1'
-        )
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_really_tiny_deconv_random_input_multi_filter(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 64
-        kernel = (1, 1)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='deconv_1'
-        )
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_deconv_random_3d_input(self):
-        np.random.seed(1988)
-        input_shape = (1, 3, 10, 10)
-        num_filter = 1
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='deconv_1'
-        )
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_tiny_deconv_random_input_multi_filter(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 64
-        kernel = (5 ,5)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            name='deconv_1'
-        )
-        # Test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_deconv_random(self):
-        np.random.seed(1988)
-        input_shape = (1, 10, 4, 4)
-        num_filter = 3
-        kernel = (2, 2)
-        stride = (1, 1)
-        pad = (0, 0)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            no_bias=False,
-            name='deconv_1'
-        )
-        # test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_deconv_random_output_shape(self):
-        np.random.seed(1988)
-        input_shape = (1, 10, 4, 4)
-        num_filter = 3
-        kernel = (2, 2)
-        stride = (1, 1)
-        pad = (0, 0)
-        target_shape = (5, 5)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            no_bias=False,
-            target_shape=target_shape,
-            name='deconv_1'
-        )
-        # test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_deconv_random_padding(self):
-        np.random.seed(1988)
-        input_shape = (1, 10, 9, 9)
-        num_filter = 3
-        kernel = (3, 3)
-        stride = (3, 3)
-        pad = (2, 2)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-                data=net,
-                num_filter=num_filter,
-                kernel=kernel,
-                stride=stride,
-                pad=pad,
-                no_bias=False,
-                name='deconv_1')
-        # test the mxnet model
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_conv_random_padding_odd(self):
-        np.random.seed(1988)
-        input_shape = (1, 10, 6, 6)
-        num_filter = 3
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (3, 3)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            no_bias=False,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_conv_random_padding_even(self):
-        np.random.seed(1988)
-        input_shape = (1, 10, 6, 6)
-        num_filter = 3
-        kernel = (5, 5)
-        stride = (1, 1)
-        pad = (2, 2)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            no_bias=False,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_deconv_random_all_inputs(self):
-        np.random.seed(1988)
-        input_shape = (1, 10, 5, 5)
-        num_filter = 3
-        kernel = (3, 3)
-        stride = (2, 2)
-        pad = (1, 1)
-        dilate = (1, 1)
-        target_shape = (11, 11)
-
-        # define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Deconvolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            stride=stride,
-            pad=pad,
-            no_bias=False,
-            target_shape=target_shape,
-            dilate=dilate,
-            name='deconv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    def test_batch_norm(self):
-        np.random.seed(1988)
-        input_shape = (1, 1, 2, 3)
-
-        net = mx.sym.Variable('data')
-        gamma = mx.sym.Variable('gamma')
-        beta = mx.sym.Variable('beta')
-        moving_mean = mx.sym.Variable('moving_mean')
-        moving_var = mx.sym.Variable('moving_var')
-        net = mx.symbol.BatchNorm(
-            data=net,
-            gamma=gamma,
-            beta=beta,
-            moving_mean=moving_mean,
-            moving_var=moving_var,
-            use_global_stats=True,
-            name='batch_norm_1')
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random', delta=1e-2)
-
-    def test_batch_norm_no_global_stats(self):
-        """ This test should throw an exception since converter doesn't support
-            use_global_stats=False). The reason for this is CoreML doesn't support
-            local batch stats.
-        """
-        np.random.seed(1988)
-        input_shape = (1, 1, 2, 3)
-
-        net = mx.sym.Variable('data')
-        gamma = mx.sym.Variable('gamma')
-        beta = mx.sym.Variable('beta')
-        moving_mean = mx.sym.Variable('moving_mean')
-        moving_var = mx.sym.Variable('moving_var')
-        net = mx.symbol.BatchNorm(
-            data=net,
-            gamma=gamma,
-            beta=beta,
-            moving_mean=moving_mean,
-            moving_var=moving_var,
-            use_global_stats=False,
-            name='batch_norm_1')
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random', delta=1e-2)
-
-    def test_batch_norm_with_fix_gamma(self):
-        """ The gamma will always be an array of ones when fix_gamma=True. The values
-            of gamma may be changed accidentally if there have been fix_gamma=False before
-            the final trained model.
-        """
-        np.random.seed(1988)
-        input_shape = (1, 1, 2, 3)
-
-        net = mx.sym.Variable('data')
-        gamma = mx.sym.Variable('gamma')
-        beta = mx.sym.Variable('beta')
-        moving_mean = mx.sym.Variable('moving_mean')
-        moving_var = mx.sym.Variable('moving_var')
-        net = mx.symbol.BatchNorm(
-            data=net,
-            gamma=gamma,
-            beta=beta,
-            moving_mean=moving_mean,
-            moving_var=moving_var,
-            fix_gamma=True,
-            name='batch_norm_1')
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random', delta=1e-2)
-
-        np.random.seed(1988)
-        net = mx.symbol.BatchNorm(
-            data=net,
-            gamma=gamma,
-            beta=beta,
-            moving_mean=moving_mean,
-            moving_var=moving_var,
-            fix_gamma=False,
-            name='batch_norm_2')
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random', delta=1e-2)
-
-    def test_pre_processing_args(self):
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        net = mx.sym.SoftmaxOutput(net, name='softmax')
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random',
-                               label_names=['softmax_label'],
-                               pre_processing_args={'red_bias': 0,
-                                                    'blue_bias': 0,
-                                                    'green_bias': 0,
-                                                    'image_scale': 1})
-
-    def test_different_input_variables(self):
-        """
-        Verifying the behavior when input variable name is different than the
-        standard name - 'data'.
-        """
-        np.random.seed(1988)
-        input_shape = (1, 10)
-        net = mx.sym.Variable('data1')
-        net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=5)
-        self._test_mxnet_model(net, input_shape=input_shape, mode='zeros', input_name='data1')
-
-    def test_really_tiny_conv_optional_params(self):
-        """
-        Verifying the behavior of a convolutional layer when stride and pad
-        are not provided.
-        """
-        np.random.seed(1988)
-        input_shape = (1, 1, 10, 10)
-        num_filter = 1
-        kernel = (1, 1)
-
-        # Define a model
-        net = mx.sym.Variable('data')
-        net = mx.symbol.Convolution(
-            data=net,
-            num_filter=num_filter,
-            kernel=kernel,
-            name='conv_1'
-        )
-        self._test_mxnet_model(net, input_shape=input_shape, mode='random')
-
-    # TODO test_concat
-
-
-if __name__ == '__main__':
-    suite = unittest.TestLoader().loadTestsFromTestCase(SingleLayerTest)
-    unittest.TextTestRunner(verbosity=2).run(suite)
diff --git a/tools/coreml/test/test_mxnet_image.py b/tools/coreml/test/test_mxnet_image.py
deleted file mode 100644
index 8951c7442545..000000000000
--- a/tools/coreml/test/test_mxnet_image.py
+++ /dev/null
@@ -1,142 +0,0 @@
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import unittest
-import mxnet as mx
-import numpy as np
-
-from six.moves import xrange
-from converter._mxnet_converter import convert
-from converter import utils
-
-VAL_DATA = 'data/val-5k-256.rec'
-URL = 'http://data.mxnet.io/data/val-5k-256.rec'
-
-
-def download_data():
-    return mx.test_utils.download(URL, VAL_DATA)
-
-
-def read_image(data_val, label_name):
-    data = mx.io.ImageRecordIter(
-        path_imgrec=data_val,
-        label_width=1,
-        preprocess_threads=4,
-        batch_size=32,
-        data_shape=(3, 224, 224),
-        label_name=label_name,
-        rand_corp=False,
-        rand_mirror=False,
-        shuffle=True
-    )
-    return data
-
-
-def is_correct_top_one(predict, label):
-    assert isinstance(predict, np.ndarray)
-    assert isinstance(label, np.float32)
-    predicted_label = np.argmax(predict)
-    return predicted_label == label
-
-
-def is_correct_top_five(predict, label):
-    assert isinstance(predict, np.ndarray)
-    assert isinstance(label, np.float32)
-    top_five_preds = set(predict.argsort()[-5:])
-    return label in top_five_preds
-
-
-class ImageNetTest(unittest.TestCase):
-    def _test_image_prediction(self, model_name, epoch, label_name):
-        try:
-            data = read_image(VAL_DATA, label_name=label_name)
-        except:
-            download_data()
-            data = read_image(VAL_DATA, label_name=label_name)
-
-        mod = utils.load_model(
-            model_name=model_name,
-            epoch_num=epoch,
-            data_shapes=data.provide_data,
-            label_shapes=data.provide_label,
-            label_names=[label_name, ]
-        )
-
-        input_shape = (1, 3, 224, 224)
-        coreml_model = convert(mod, input_shape={'data': input_shape})
-
-        mxnet_acc = []
-        mxnet_top_5_acc = []
-        coreml_acc = []
-        coreml_top_5_acc = []
-
-        num_batch = 0
-
-        for batch in data:
-            mod.forward(batch, is_train=False)
-            mxnet_preds = mod.get_outputs()[0].asnumpy()
-            data_numpy = batch.data[0].asnumpy()
-            label_numpy = batch.label[0].asnumpy()
-            for i in xrange(32):
-                input_data = {'data': data_numpy[i]}
-                coreml_predict = coreml_model.predict(input_data).values()[0].flatten()
-                mxnet_predict = mxnet_preds[i]
-                label = label_numpy[i]
-                mxnet_acc.append(is_correct_top_one(mxnet_predict, label))
-                mxnet_top_5_acc.append(is_correct_top_five(mxnet_predict,
-                                                           label))
-                coreml_acc.append(is_correct_top_one(coreml_predict, label))
-                coreml_top_5_acc.append(is_correct_top_five(coreml_predict,
-                                                            label))
-                num_batch += 1
-            if (num_batch == 5):
-                break  # we only use a subset of the batches.
-
-        print("MXNet acc %s" % np.mean(mxnet_acc))
-        print("Coreml acc %s" % np.mean(coreml_acc))
-        print("MXNet top 5 acc %s" % np.mean(mxnet_top_5_acc))
-        print("Coreml top 5 acc %s" % np.mean(coreml_top_5_acc))
-        self.assertAlmostEqual(np.mean(mxnet_acc), np.mean(coreml_acc), delta=1e-4)
-        self.assertAlmostEqual(np.mean(mxnet_top_5_acc),
-                               np.mean(coreml_top_5_acc),
-                               delta=1e-4)
-
-    def test_squeezenet(self):
-        print("Testing Image Classification with Squeezenet")
-        self._test_image_prediction(model_name='squeezenet_v1.1', epoch=0,
-                                    label_name='prob_label')
-
-    def test_inception_with_batch_normalization(self):
-        print("Testing Image Classification with Inception/BatchNorm")
-        self._test_image_prediction(model_name='Inception-BN', epoch=126,
-                                    label_name='softmax_label')
-
-    def test_resnet18(self):
-        print("Testing Image Classification with ResNet18")
-        self._test_image_prediction(model_name='resnet-18', epoch=0,
-                                    label_name='softmax_label')
-
-    def test_vgg16(self):
-        print("Testing Image Classification with vgg16")
-        self._test_image_prediction(model_name='vgg16', epoch=0,
-                                    label_name='prob_label')
-
-
-if __name__ == '__main__':
-    suite = unittest.TestLoader().loadTestsFromTestCase(ImageNetTest)
-    unittest.TextTestRunner(verbosity=2).run(suite)
diff --git a/tools/coreml/test/test_mxnet_models.py b/tools/coreml/test/test_mxnet_models.py
deleted file mode 100644
index 8dd319ab4e0d..000000000000
--- a/tools/coreml/test/test_mxnet_models.py
+++ /dev/null
@@ -1,165 +0,0 @@
-from __future__ import print_function
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import unittest
-import mxnet as mx
-import numpy as np
-
-from collections import namedtuple
-from converter._mxnet_converter import convert
-from six.moves import xrange
-
-
-def _mxnet_remove_batch(input_data):
-    for blob in input_data:
-        input_data[blob] = np.reshape(input_data[blob], input_data[blob]
-                                      .shape[1:])
-    return input_data
-
-
-def _kl_divergence(distribution1, distribution2):
-    """ Calculates Kullback-Leibler Divergence b/w two distributions.
-
-    Parameters
-    ----------
-    distribution1: list of floats
-    distribution2: list of floats
-    """
-    assert len(distribution1) == len(distribution2)
-    n = len(distribution1)
-    result = 1./n * sum(distribution1 * (np.log(distribution1) - np.log(distribution2)))
-    return result
-
-
-class ModelsTest(unittest.TestCase):
-    """
-    Unit test class that tests converter on entire MXNet models.
-    In order to test each unit test converts MXNet model into CoreML model using the converter,
-    generate predictions on both MXNet and CoreML and verifies that predictions are same
-    (or similar).
-    """
-    def _load_model(self, model_name, epoch_num, input_shape):
-        sym, arg_params, aux_params = mx.model.load_checkpoint(model_name, epoch_num)
-        mod = mx.mod.Module(
-            symbol=sym,
-            context=mx.cpu(),
-            label_names=None
-        )
-        mod.bind(
-            for_training=False,
-            data_shapes=[('data', input_shape)],
-            label_shapes=mod._label_shapes
-        )
-        mod.set_params(
-            arg_params=arg_params,
-            aux_params=aux_params,
-            allow_missing=True
-        )
-        return mod
-
-    def _test_model(self, model_name, epoch_num, input_shape=(1, 3, 224, 224),
-                    files=None):
-        """ Tests whether the converted CoreML model's preds are equal to MXNet
-        preds for a given model or not.
-
-        Parameters
-        ----------
-        model_name: str
-            Prefix of the MXNet model name as stored on the local directory.
-
-        epoch_num : int
-            Epoch number of model we would like to load.
-
-        input_shape: tuple
-            The shape of the input data in the form of (batch_size, channels,
-            height, width)
-
-        files: list of strings
-            List of URLs pertaining to files that need to be downloaded in
-            order to use the model.
-        """
-
-        if files is not None:
-            print("Downloading files from urls: %s" % (files))
-            for url in files:
-                mx.test_utils.download(url)
-                print("Downloaded %s" % (url))
-
-        module = self._load_model(
-            model_name=model_name,
-            epoch_num=epoch_num,
-            input_shape=input_shape
-        )
-
-        coreml_model = convert(module, input_shape={'data': input_shape})
-
-        # Get predictions from MXNet and coreml
-        div = []  # For storing KL divergence for each input.
-        for _ in xrange(1):
-            np.random.seed(1993)
-            input_data = {'data': np.random.uniform(0, 1, input_shape)
-                                           .astype(np.float32)}
-            Batch = namedtuple('Batch', ['data'])
-            module.forward(Batch([mx.nd.array(input_data['data'])]),
-                           is_train=False)
-            mxnet_pred = module.get_outputs()[0].asnumpy().flatten()
-            coreml_pred = coreml_model \
-                .predict(_mxnet_remove_batch(input_data)) \
-                .values()[0] \
-                .flatten()
-            self.assertEqual(len(mxnet_pred), len(coreml_pred))
-            div.append(_kl_divergence(mxnet_pred, coreml_pred))
-
-        print("Average KL divergence is % s" % np.mean(div))
-        self.assertTrue(np.mean(div) < 1e-4)
-
-    def test_pred_inception_bn(self):
-        self._test_model(model_name='Inception-BN', epoch_num=126,
-                         files=["http://data.mxnet.io/models/imagenet/inception-bn/Inception-BN-0126.params",
-                                "http://data.mxnet.io/models/imagenet/inception-bn/Inception-BN-symbol.json"])
-
-    def test_pred_squeezenet_v11(self):
-        self._test_model(model_name='squeezenet_v1.1', epoch_num=0,
-                         files=["http://data.mxnet.io/models/imagenet/squeezenet/squeezenet_v1.1-symbol.json",
-                                "http://data.mxnet.io/models/imagenet/squeezenet/squeezenet_v1.1-0000.params"])
-
-    def test_pred_resnet_50(self):
-        self._test_model(model_name='resnet-50', epoch_num=0,
-                         files=["http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50-symbol.json",
-                                "http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50-0000.params"])
-
-    @unittest.skip("Model is too big for unit test")
-    def test_pred_vgg16(self):
-        self._test_model(model_name='vgg16', epoch_num=0,
-                         files=["http://data.mxnet.io/models/imagenet/vgg/vgg16-symbol.json",
-                                "http://data.mxnet.io/models/imagenet/vgg/vgg16-0000.params"])
-
-    def test_pred_nin(self):
-        self._test_model(model_name='nin', epoch_num=0,
-                         files=["http://data.dmlc.ml/models/imagenet/nin/nin-symbol.json",
-                                "http://data.dmlc.ml/models/imagenet/nin/nin-0000.params"])
-
-    @unittest.skip("You need to download and unzip file: "
-                   "http://data.mxnet.io/models/imagenet/inception-v3.tar.gz in order to run this test.")
-    def test_pred_inception_v3(self):
-        self._test_model(model_name='Inception-7', epoch_num=1, input_shape=(1, 3, 299, 299))
-
-
-if __name__ == '__main__':
-    suite = unittest.TestLoader().loadTestsFromTestCase(ModelsTest)
-    unittest.TextTestRunner(verbosity=2).run(suite)
diff --git a/tools/pip/setup.py b/tools/pip/setup.py
index 616100e77af2..2377e6177641 100644
--- a/tools/pip/setup.py
+++ b/tools/pip/setup.py
@@ -87,7 +87,6 @@ def has_ext_modules(self):
 shutil.copy(os.path.join(CURRENT_DIR, 'mxnet-build/tools/kill-mxnet.py'), os.path.join(CURRENT_DIR, 'mxnet/tools'))
 shutil.copy(os.path.join(CURRENT_DIR, 'mxnet-build/tools/parse_log.py'), os.path.join(CURRENT_DIR, 'mxnet/tools'))
 shutil.copy(os.path.join(CURRENT_DIR, 'mxnet-build/tools/diagnose.py'), os.path.join(CURRENT_DIR, 'mxnet/tools'))
-shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/tools/caffe_converter'), os.path.join(CURRENT_DIR, 'mxnet/tools/caffe_converter'))
 shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/tools/bandwidth'), os.path.join(CURRENT_DIR, 'mxnet/tools/bandwidth'))
 
 # copy headers to mxnet package