Skip to content

Latest commit

 

History

History
205 lines (165 loc) · 12.4 KB

README.md

File metadata and controls

205 lines (165 loc) · 12.4 KB

Dolphin - Deep Neural Network

dolphin-dnn is a deep learning framework built on Apache REEF. It is capable of both BSP-style synchronous deep learning and parameter server-backed asynchronous deep learning. dolphin-dnn is designed for training large neural network models on big data by supporting data partitioning as well as model partitioning, inspired by Google's DistBelief, although the current codebase only contains methods for data partitioning; model partitioning is on-going work.

  • Data partitioning: Input data are distributed across evaluators, each of which has a replica of the whole neural network model. Every replica independently trains its model on its own data, and the updated models are shared between replicas periodically. The model sharing can be done either synchronously or asynchronously, depending on the implementation.

Data Partitioning

  • Model partitioning: Each partition works on a certain portion of the neural network model. Partitions of a model need to process the same training data at a given time, whereas in data partitioning model replicas make progress without regard to each other.

Model Partitioning

Currently dolphin-dnn only supports fully connected layers, but other types of layers such as convolutional layers and subsampling layers will be supported in the future.

Architecture

Dolphin DNN Architecture

A typical REEF evaluator in dolphin-dnn is made up of two components; a neural network model and a parameter provider. A neural network model consists of layers that are defined by a protocol buffer definition file. A parameter provider is an instance that receives parameter gradients from the neural network model and sends these gradients to a parameter server, which in turn generates new parameters using gradients.

The training procedure of a neural network model is as follows. First, each REEF evaluator builds its neural network model from a configuration that is provided by the REEF driver. This neural network model replica then computes activation values for each layer with given training input data. Using these activation values, the model computes parameter gradients for each layer via backpropagation and hands these gradients to the parameter provider. The parameter provider acts as a communication media between the model and the server, by interacting with the server to update model parameters and providing new parameters for the local model. The implementation of a parameter provider and server may differ per design (see 'Parameter Provider' section below). After receiving the new parameters, each model replica repeats the above steps with the update parameters for many epochs.

Input file format

dolphin-dnn can process Numpy-compatible plain text input files which are stored in the following format.

Input File Format

Each line represents a vector whose elements are separated using a delimiter, specified via the command line parameter delim. Vectors should consist of serialized input data and other metadata. We assume that each element can be converted to a floating number float.

  • Serialized input data: an input data object that is serialized as a vector. The shape of input data can be specified in a separate protocol buffer definition file.
  • Output: the expected output for a given input data object.
  • Validation flag: a flag that indicates whether an input data is used for validation: 1.0 for validating, and 0.0 for training.

Configuration

To create a neural network model, you must define the architecture of your neural network model in a protocol buffer definition file. Only fully connected layers are supported, for now.

Common Fields

  • batch_size: the number of training inputs used per parameter update.
  • stepsize: step size (learning rate) for stochastic gradient descent.
  • input_shape: the shape of input data.

Parameter Provider

The parameter provider is an instance that receives parameter gradients for each training input from a neural network model and provides the model with updated parameters (weights and biases). You must select the type of parameter provider you want to use by specifying the field parameter_provider.

Local Parameter Provider

Local parameter providers do not communicate with a separate parameter server. Instead, it locally updates parameters using gradients received from a neural network model. This provider is used mainly for testing the correctness of a network.

parameter_provider {
  type: "local"
}
Group Communication Parameter Provider

Group communication providers are used for BSP-style network training. The group communication provider communicates with a group communication parameter server using Apache REEF's Group Communication Service. The server aggregates parameter gradients received from providers using the MPI Reduce operation. After updating parameters, the server broadcasts the updated parameters back to all providers. All operations are done synchronously, hence the name group communication.

parameter_provider {
  type: "groupcomm"
}
Parameter Server Parameter Provider

Parameter server parameter providers are used for asynchronous training. This provider is used together with Dolphin's parameter server module dolphin-ps. Parameter server providers can send parameter push or pull requests to the server. The server updates parameters of a model when gradients are pushed, and provides a model replica with the latest parameters when it receives a pull request. After replacing its parameters with the updated ones, each model replica proceeds with its next input data without waiting for other replicas to finish their updates, in contrast to the group communication parameter provider where all providers start with the same model weights. Thus, there is some inconsistency between replicas; the parameters used for training can be different from each other.

parameter_provider {
  type: "paramserver"
}

Layers

Fully Connected Layer
  • Layer type: FullyConnected
  • Parameters (FullyConnectedLayerConfiguration fully_connected_param)
    • init_weight: the standard deviation that is used to initialize the weights in this layer from a Gaussian distribution with mean 0.
    • init_bias: constant value with which the biases of this layer are initialized.
    • random_seed: the seed for generating random initial parameters.
    • num_output: the number of outputs for this layer.
Pooling Layer
  • Layer type: Pooling
  • Parameters (PoolingLayerConfiguration pooling_param)
    • pooling_type[default="MAX"]: the type of pooling for this layer. Available types are MAX and AVERAGE.
    • padding_height[default=0]: the space on the border of the input volume.
    • padding_width[default=0]: the space on the border of the input volume.
    • stride_height[default=1]: the interval at which pooling layers apply filters to inputs.
    • stride_width[default=1]: the interval at which pooling layers apply filters to inputs.
    • kernel_height: the height of kernel for this layer.
    • kernel_width: the width of kernel for this layer.
Convolutional Layer
  • Layer type: Convolutional
  • Parameters (ConvolutionalLayerConfiguration convolutional_param)
    • kernel_height: the height of kernel for this layer.
    • kernel_width: the width of kernel for this layer.
    • padding_height[default = 0]: the space on the border of the input volume.
    • padding_width[default = 0]: the space on the border of the input volume.
    • stride_height[default = 1]: the interval at which convolutional layers apply filters to inputs.
    • stride_width[default = 1]: the interval at which convolutional layers apply filters to inputs.
    • init_weight: the standard deviation that is used to initialize the weights in this layer from a Gaussian distribution with mean 0.
    • init_bias: constant value with which the biases of this layer are initialized.
    • random_seed: the seed for generating random initial parameters.
    • num_output: the number of outputs for this layer.
Activation Layer
  • Layer type: Activation
  • Parameters (ActivationLayerConfiguration activation_param)
    • activation_function: the activation function to produce output values for this layer.
Activation with Loss Layer
  • Layer type: ActivationWithLoss
  • Parameters (ActivationWithLossLayerConfiguration activation_with_loss_param)
    • activation_function: the activation function to produce output values for this layer.
    • loss_function: the loss function that is used to compute loss and calculate the loss gradient for backpropagation.
Activation Functions

The following activation functions are supported.

  • Sigmoid: sigmoid
  • ReLU: relu
  • TanhH: tanh
  • Power: pow (squared value)
  • Absolute: abs
  • Softmax: softmax
Loss Functions

The following loss functions are supported.

  • CrossEntropy: crossEntropy

How to run

A script for training a neural network model is included with the source code, in bin/run_neuralnetwork.sh. test/resources/data/neuralnet is a sample subset of the MNIST dataset, composed of 1,000 training images and 100 test images. test/resources/configuration/neuralnet is an example of a protocol buffer definition file; it defines a neural network model that uses two fully connected layers and a local parameter provider.

You can run a network of the given example on REEF local runtime environment by

cd $DOLPHIN_HOME
bin/run_neuralnet.sh -local true -maxIter 100 -conf dolphin-dnn/src/test/resources/configuration/neuralnet -input dolphin-dnn/src/test/resources/data/neuralnet -timeout 800000

Command line parameters

  • Required
    • input: path of the input data file to use.
    • conf: path of the protocol buffer definition file to use.
  • Optional
    • local[default=false]: a boolean value that indicates whether to use REEF local runtime environment or not. If false, the neural network will run on YARN environment.
    • maxIter[default=20]: the maximum number of allowed iterations before the neural network training stops.
    • delim[default=,]: the delimiter that is used for separating elements of input data.
    • timeout[default=100000]: allowed time until neural network training ends. (unit: milliseconds)

Example

A example of protocol buffer definition file for MNIST

The following is the example of a protocol buffer definition file for the MNIST dataset. It can be found at dolphin-dnn/src/test/resources/configuration/neuralnet.

batch_size: 10
stepsize: 1e-3
input_shape {
  dim: 28
  dim: 28
}
parameter_provider {
  type: "local"
}
layer {
  type: "FullyConnected"
  fully_connected_param {
    init_weight: 1e-4
    init_bias: 2e-4
    num_output: 50
  }
}
layer {
  type: "Activation"
  activation_param {
    activation_function: "relu"
  }
}
layer {
  type: "FullyConnected"
  fully_connected_param {
    init_weight: 1e-2
    init_bias: 2e-2
    num_output: 10
  }
}
layer {
  type: "ActivationWithLoss"
  activation_with_loss_param {
    activation_function: "softmax"
    loss_function: "crossEntropy"
  }
}

This model comprises two fully connected layers with 50 and 10 features, respectively, and a local parameter provider. input_shape specifies the shape of input data. For the MNIST dataset, each data object is a 28 * 28 image and thus input_shape is configured as the following.

input_shape {
  dim: 28
  dim: 28
}

Parameters are updated when every 10 inputs are processed since batch_size is specified as 10, and 1e-3 is used as the learning rate for the stochastic gradient descent algorithm.