diff --git a/README.md b/README.md index 2c28bc58f..ea7b8577d 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ mlp = @mx.chain mx.Variable(:data) => mx.FullyConnected(name=:fc2, num_hidden=64) => mx.Activation(name=:relu2, act_type=:relu) => mx.FullyConnected(name=:fc3, num_hidden=10) => - mx.Softmax(name=:softmax) + mx.SoftmaxOutput(name=:softmax) # data provider batch_size = 100 diff --git a/docs/api/ndarray.rst b/docs/api/ndarray.rst index 8ac5e9bda..05a3dccba 100644 --- a/docs/api/ndarray.rst +++ b/docs/api/ndarray.rst @@ -364,9 +364,9 @@ object (:class:`NDArray`) is returned. Otherwise, a tuple containing all the out Public APIs ^^^^^^^^^^^ -.. function:: choose_element(...) +.. function:: choose_element_0index(...) - Choose one element from each line(row for python, column for R/Julia) in lhs according to index indicated by rhs + Choose one element from each line(row for python, column for R/Julia) in lhs according to index indicated by rhs. This function assume rhs uses 0-based index. :param lhs: Left operand to the function. :type lhs: NDArray @@ -413,9 +413,42 @@ Public APIs +.. function:: exp(...) + + Take exp of the src + + :param src: Source input to the function + :type src: NDArray + + + + + +.. function:: log(...) + + Take log of the src + + :param src: Source input to the function + :type src: NDArray + + + + + +.. function:: norm(...) + + Take L2 norm of the src.The result will be ndarray of shape (1,) on the same device. + + :param src: Source input to the function + :type src: NDArray + + + + + .. function:: sqrt(...) - Take square root of the src + Take sqrt of the src :param src: Source input to the function :type src: NDArray diff --git a/docs/api/symbol.rst b/docs/api/symbol.rst index 2c7df712c..e01ecb359 100644 --- a/docs/api/symbol.rst +++ b/docs/api/symbol.rst @@ -143,7 +143,7 @@ Public APIs :type num_filter: int (non-negative), required - :param num_group: number of groups partition + :param num_group: Number of groups partition. This option is not supported by CuDNN, you can use SliceChannel to num_group,apply convolution and concat instead to achieve the same need. :type num_group: int (non-negative), optional, default=1 @@ -162,6 +162,57 @@ Public APIs +.. function:: Deconvolution(...) + + Apply deconvolution to input then add a bias. + + :param data: Input data to the DeconvolutionOp. + :type data: Symbol + + + :param weight: Weight matrix. + :type weight: Symbol + + + :param bias: Bias parameter. + :type bias: Symbol + + + :param kernel: deconvolution kernel size: (y, x) + :type kernel: Shape(tuple), required + + + :param stride: deconvolution stride: (y, x) + :type stride: Shape(tuple), optional, default=(1, 1) + + + :param pad: pad for deconvolution: (y, x) + :type pad: Shape(tuple), optional, default=(0, 0) + + + :param num_filter: deconvolution filter(channel) number + :type num_filter: int (non-negative), required + + + :param num_group: number of groups partition + :type num_group: int (non-negative), optional, default=1 + + + :param workspace: Tmp workspace for deconvolution (MB) + :type workspace: long (non-negative), optional, default=512 + + + :param no_bias: Whether to disable bias parameter. + :type no_bias: boolean, optional, default=True + + :param Base.Symbol name: The name of the symbol. (e.g. `:my_symbol`), optional. + + :return: the constructed :class:`Symbol`. + + + + + .. function:: Dropout(...) Apply dropout to input @@ -412,7 +463,7 @@ Public APIs .. function:: Softmax(...) - Perform a softmax transformation on input. + DEPRECATED: Perform a softmax transformation on input. Please use SoftmaxOutput :param data: Input data to softmax. :type data: Symbol @@ -433,9 +484,62 @@ Public APIs +.. function:: SoftmaxOutput(...) + + Perform a softmax transformation on input, backprop with logloss. + + :param data: Input data to softmax. + :type data: Symbol + + + :param grad_scale: Scale the gradient by a float factor + :type grad_scale: float, optional, default=1 + + + :param multi_output: If set to true, for a (n,k,x_1,..,x_n) dimensionalinput tensor, softmax will generate n*x_1*...*x_n output, eachhas k classes + :type multi_output: boolean, optional, default=False + + :param Base.Symbol name: The name of the symbol. (e.g. `:my_symbol`), optional. + + :return: the constructed :class:`Symbol`. + + + + + +.. function:: exp(...) + + Take exp of the src + + :param src: Source symbolic input to the function + :type src: Symbol + + :param Base.Symbol name: The name of the symbol. (e.g. `:my_symbol`), optional. + + :return: the constructed :class:`Symbol`. + + + + + +.. function:: log(...) + + Take log of the src + + :param src: Source symbolic input to the function + :type src: Symbol + + :param Base.Symbol name: The name of the symbol. (e.g. `:my_symbol`), optional. + + :return: the constructed :class:`Symbol`. + + + + + .. function:: sqrt(...) - Take square root of the src + Take sqrt of the src :param src: Source symbolic input to the function :type src: Symbol @@ -505,6 +609,25 @@ Internal APIs +.. function:: _Native(...) + + Stub for implementing an operator implemented in native frontend language. + + :param info: + :type info: , required + + + :param need_top_grad: Whether this layer needs out grad for backward. Should be false for loss layers. + :type need_top_grad: boolean, optional, default=True + + :param Base.Symbol name: The name of the symbol. (e.g. `:my_symbol`), optional. + + :return: the constructed :class:`Symbol`. + + + + + .. function:: _Plus(...) Perform an elementwise plus. diff --git a/docs/tutorial/mnist.rst b/docs/tutorial/mnist.rst index 5fe21cde1..fc2e548dd 100644 --- a/docs/tutorial/mnist.rst +++ b/docs/tutorial/mnist.rst @@ -41,11 +41,11 @@ Note each composition we take the previous symbol as the `data` argument, formin Input --> 128 units (ReLU) --> 64 units (ReLU) --> 10 units where the last 10 units correspond to the 10 output classes (digits 0,...,9). We -then add a final ``Softmax`` operation to turn the 10-dimensional prediction to proper probability values for the 10 classes: +then add a final :class:`SoftmaxOutput` operation to turn the 10-dimensional prediction to proper probability values for the 10 classes: .. code-block:: julia - mlp = mx.Softmax(data = fc3, name=:softmax) + mlp = mx.SoftmaxOutput(data = fc3, name=:softmax) As we can see, the MLP is just a chain of layers. For this case, we can also use the ``mx.chain`` macro. The same architecture above can be defined as @@ -58,7 +58,7 @@ the ``mx.chain`` macro. The same architecture above can be defined as mx.FullyConnected(name=:fc2, num_hidden=64) => mx.Activation(name=:relu2, act_type=:relu) => mx.FullyConnected(name=:fc3, num_hidden=10) => - mx.Softmax(name=:softmax) + mx.SoftmaxOutput(name=:softmax) After defining the architecture, we are ready to load the MNIST data. MXNet.jl provide built-in data providers for the MNIST dataset, which could automatically diff --git a/examples/mnist/mlp.jl b/examples/mnist/mlp.jl index b0703c56e..05d008d52 100644 --- a/examples/mnist/mlp.jl +++ b/examples/mnist/mlp.jl @@ -11,7 +11,7 @@ using MXNet # fc2 = mx.FullyConnected(data = act1, name=:fc2, num_hidden=64) # act2 = mx.Activation(data = fc2, name=:relu2, act_type=:relu) # fc3 = mx.FullyConnected(data = act2, name=:fc3, num_hidden=10) -# mlp = mx.Softmax(data = fc3, name=:softmax) +# mlp = mx.SoftmaxOutput(data = fc3, name=:softmax) #-- Option 2: using the mx.chain macro mlp = @mx.chain mx.Variable(:data) =>