Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Commit

Permalink
add two tutorials
Browse files Browse the repository at this point in the history
  • Loading branch information
hetong007 committed Oct 17, 2015
1 parent 158b602 commit b2ab9e8
Show file tree
Hide file tree
Showing 6 changed files with 848 additions and 70 deletions.
113 changes: 113 additions & 0 deletions R-package/vignettes/mnistCompetition.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
title: "Handwritten Digits Classification Competition"
author: "Tong He"
date: "October 17, 2015"
output: html_document
---

[MNIST](http://yann.lecun.com/exdb/mnist/) is a handwritten digits image data set created by Yann LeCun. Every digit is represented by a 28x28 image. It has become a standard data set to test classifiers on simple image input. Neural network is no doubt a strong model for image classification tasks. There's a [long-term hosted competition](https://www.kaggle.com/c/digit-recognizer) on Kaggle using this data set. We will present the basic usage of `mxnet` to compete in this challenge.

## Data Loading

First, let us download the data from [here](https://www.kaggle.com/c/digit-recognizer/data), and put them under the `data/` folder in your working directory.

Then we can read them in R and convert to matrices.

```{r, eval=FALSE}
train <- read.csv('data/train.csv', header=TRUE)
test <- read.csv('data/test.csv', header=TRUE)
train <- data.matrix(train)
test <- data.matrix(test)
train.x <- train[,-1]
train.y <- train[,1]
```

Here every image is represented as a single row in train/test. The greyscale of each image falls in the range [0, 255], we can linearly transform it into [0,1] by

```{r, eval = FALSE}
train.x <- train.x/255
test <- test/255
```

In the label part, we see the number of each digit is fairly even:

```{r, eval=FALSE}
table(train.y)
```

## Network Configuration

Now we have the data. The next step is to configure the structure of our network.

```{r}
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)
act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
fc2 <- mx.symbol.FullyConnected(act1, name = "fc2", num_hidden = 64)
act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=10)
softmax <- mx.symbol.Softmax(fc3, name = "sm")
```

1. In `mxnet`, we use its own data type `symbol` to configure the network. `data <- mx.symbol.Variable("data")` use `data` to represent the input data, i.e. the input layer.
2. Then we set the first hidden layer by `fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)`. This layer has `data` as the input, its name and the number of hidden neurons.
3. The activation is set by `act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")`. The activation function takes the output from the first hidden layer `fc1`.
4. The second hidden layer takes the result from `act1` as the input, with its name as "fc2" and the number of hidden neurons as 64.
5. the second activation is almost the same as `act1`, except we have a different input source and name.
6. Here comes the output layer. Since there's only 10 digits, we set the number of neurons to 10.
7. Finally we set the activation to softmax to get a probabilistic prediction.

## Training

We are almost ready for the training process. Before we start the computation, let's decide what device should we use.

```{r}
devices <- lapply(1:2, function(i) {
mx.cpu(i)
})
```

Here we assign two threads of our CPU to `mxnet`. After all these preparation, you can run the following command to train the neural network!

```{r}
set.seed(0)
model <- mx.model.FeedForward.create(softmax, X=train.x, y=train.y,
ctx=devices, num.round=10, array.batch.size=100,
learning.rate=0.07, momentum=0.9,
initializer=mx.init.uniform(0.07),
epoch.end.callback=mx.callback.log.train.metric(100))
```

## Prediction and Submission

To make prediction, we can simply write

```{r}
preds <- predict(model, test)
dim(preds)
```

It is a matrix with 28000 rows and 10 cols, containing the desired classification probabilities from the output layer. To extract the maximum label for each row, we can use the `max.col` in R:

```{r}
pred.label <- max.col(preds) - 1
table(pred.label)
```

With a little extra effort in the csv format, we can have our submission to the competition!

```{r}
submission <- data.frame(ImageId=1:nrow(test), Label=pred.label)
write.csv(submission, file='submission.csv', row.names=FALSE, quote=FALSE)
```










Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
MXNet R Overview Tutorial
MXNet R Tutorial on NDArray and Symbol
============================

This vignette gives a general overview of MXNet's R package. MXNet contains a
Expand Down Expand Up @@ -27,27 +27,27 @@ CPU and GPU

Let's create `NDArray` on either GPU or CPU

```r
```{r}
require(mxnet)
a = mx.nd.zeros(c(2, 3)) # create a 2-by-3 matrix on cpu
b = mx.nd.zeros(c(2, 3), mx.gpu()) # create a 2-by-3 matrix on gpu 0
c = mx.nd.zeros(c(2, 3), mx.gpu(2)) # create a 2-by-3 matrix on gpu 0
a <- mx.nd.zeros(c(2, 3)) # create a 2-by-3 matrix on cpu
b <- mx.nd.zeros(c(2, 3), mx.gpu()) # create a 2-by-3 matrix on gpu 0
c <- mx.nd.zeros(c(2, 3), mx.gpu(2)) # create a 2-by-3 matrix on gpu 0
c$dim()
```

We can also initialize an `NDArray` object in various ways:

```r
a = mx.nd.ones(c(4, 4))
b = mx.rnorm(c(4, 5))
c = mx.nd.array(1:5)
```{r}
a <- mx.nd.ones(c(4, 4))
b <- mx.rnorm(c(4, 5))
c <- mx.nd.array(1:5)
```

To check the numbers in an `NDArray`, we can simply run

```r
a = mx.nd.ones(c(2, 3))
b = as.array(a)
```{r}
a <- mx.nd.ones(c(2, 3))
b <- as.array(a)
class(b)
b
```
Expand All @@ -58,47 +58,47 @@ b

You can perform elemental-wise operations on `NDArray` objects:

```r
a = mx.nd.ones(c(2, 3)) * 2
b = mx.nd.ones(c(2, 4)) / 8
```{r}
a <- mx.nd.ones(c(2, 3)) * 2
b <- mx.nd.ones(c(2, 4)) / 8
as.array(a)
as.array(b)
c = a + b
c <- a + b
as.array(c)
d = c / a - 5
d <- c / a - 5
as.array(d)
```

If two `NDArray`s sit on different divices, we need to explicitly move them
into the same one. For instance:

```r
a = mx.nd.ones(c(2, 3)) * 2
b = mx.nd.ones(c(2, 3), mx.gpu()) / 8
c = mx.nd.copyto(a, mx.gpu()) * b
```{r}
a <- mx.nd.ones(c(2, 3)) * 2
b <- mx.nd.ones(c(2, 3), mx.gpu()) / 8
c <- mx.nd.copyto(a, mx.gpu()) * b
as.array(c)
```

#### Load and Save

You can save an `NDArray` object to your disk with `mx.nd.save`:

```r
a = mx.nd.ones(c(2, 3))
```{r}
a <- mx.nd.ones(c(2, 3))
mx.nd.save(a, 'temp.ndarray')
```

You can also load it back easily:

```r
a = mx.nd.load('temp.ndarray')
```{r}
a <- mx.nd.load('temp.ndarray')
as.array(a[[1]])
```

In case you want to save data to the distributed file system such as S3 and HDFS,
we can directly save to and load from them. For example:

```r
```{r,eval=FALSE}
mx.nd.save(a, 's3://mybucket/mydata.bin')
mx.nd.save(a, 'hdfs///users/myname/mydata.bin')
```
Expand All @@ -108,22 +108,22 @@ mx.nd.save(a, 'hdfs///users/myname/mydata.bin')
`NDArray` can automatically execute operations in parallel. It is desirable when we
use multiple resources such as CPU, GPU cards, and CPU-to-GPU memory bandwidth.

For example, if we write `a = a + 1` followed by `b = b + 1`, and `a` is on CPU while
For example, if we write `a <- a + 1` followed by `b <- b + 1`, and `a` is on CPU while
`b` is on GPU, then want to execute them in parallel to improve the
efficiency. Furthermore, data copy between CPU and GPU are also expensive, we
hope to run it parallel with other computations as well.

However, finding the codes can be executed in parallel by eye is hard. In the
following example, `a = a + 1` and `c = c * 3` can be executed in parallel, but `a = a + 1` and
`b = b * 3` should be in sequential.

```r
a = mx.nd.ones(c(2,3))
b = a
c = mx.nd.copyto(a, mx.cpu())
a = a + 1
b = b * 3
c = c * 3
following example, `a <- a + 1` and `c <- c * 3` can be executed in parallel, but `a <- a + 1` and
`b <- b * 3` should be in sequential.

```{r}
a <- mx.nd.ones(c(2,3))
b <- a
c <- mx.nd.copyto(a, mx.cpu())
a <- a + 1
b <- b * 3
c <- c * 3
```

Luckily, MXNet can automatically resolve the dependencies and
Expand All @@ -133,7 +133,7 @@ automatically dispatch it into multi-devices, such as multi GPU cards or multi
machines.

It is achieved by lazy evaluation. Any operation we write down is issued into a
internal engine, and then returned. For example, if we run `a = a + 1`, it
internal engine, and then returned. For example, if we run `a <- a + 1`, it
returns immediately after pushing the plus operator to the engine. This
asynchronous allows us to push more operators to the engine, so it can determine
the read and write dependency and find a best way to execute them in
Expand All @@ -152,13 +152,13 @@ WIth the computational unit `NDArray`, we need a way to construct neural network

The following codes create a two layer perceptrons network:

```r
```{r}
require(mxnet)
net = mx.symbol.Variable('data')
net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
net = mx.symbol.Activation(data=net, name='relu1', act_type="relu")
net = mx.symbol.FullyConnected(data=net, name='fc2', num_hidden=64)
net = mx.symbol.Softmax(data=net, name='out')
net <- mx.symbol.Variable('data')
net <- mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
net <- mx.symbol.Activation(data=net, name='relu1', act_type="relu")
net <- mx.symbol.FullyConnected(data=net, name='fc2', num_hidden=64)
net <- mx.symbol.Softmax(data=net, name='out')
class(net)
```

Expand All @@ -170,7 +170,7 @@ or the activation type (*act_type*).
The symbol can be simply viewed as a function taking several arguments, whose
names are automatically generated and can be get by

```r
```{r}
arguments(net)
```

Expand All @@ -183,10 +183,10 @@ As can be seen, these arguments are the parameters need by each symbol:

We can also specify the automatic generated names explicitly:

```r
net = mx.symbol.Variable('data')
w = mx.symbol.Variable('myweight')
net = sym.FullyConnected(data=data, weight=w, name='fc1', num_hidden=128)
```{r}
net <- mx.symbol.Variable('data')
w <- mx.symbol.Variable('myweight')
net <- sym.FullyConnected(data=data, weight=w, name='fc1', num_hidden=128)
arguments(net)
```

Expand All @@ -198,22 +198,22 @@ commonly used layers in deep learning. We can also easily define new operators
in python. The following example first performs an elementwise add between two
symbols, then feed them to the fully connected operator.

```r
lhs = mx.symbol.Variable('data1')
rhs = mx.symbol.Variable('data2')
net = mx.symbol.FullyConnected(data=lhs + rhs, name='fc1', num_hidden=128)
```{r}
lhs <- mx.symbol.Variable('data1')
rhs <- mx.symbol.Variable('data2')
net <- mx.symbol.FullyConnected(data=lhs + rhs, name='fc1', num_hidden=128)
arguments(net)
```

We can also construct symbol in a more flexible way rather than the single
forward composition we addressed before.

```r
net = mx.symbol.Variable('data')
net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
net2 = mx.symbol.Variable('data2')
net2 = mx.symbol.FullyConnected(data=net2, name='net2', num_hidden=128)
composed_net = net(data=net2, name='compose')
```{r}
net <- mx.symbol.Variable('data')
net <- mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
net2 <- mx.symbol.Variable('data2')
net2 <- mx.symbol.FullyConnected(data=net2, name='net2', num_hidden=128)
composed_net <- net(data=net2, name='compose')
arguments(composed_net)
```

Expand All @@ -226,9 +226,9 @@ In the above example, *net* is used a function to apply to an existing symbol
Now we have known how to define the symbol. Next we can inference the shapes of
all the arguments it needed by given the input data shape.

```r
net = mx.symbol.Variable('data')
net = mx.symbol.FullyConnected(data=ent, name='fc1', num_hidden=10)
```{r}
net <- mx.symbol.Variable('data')
net <- mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=10)
```

The shape inference can be used as an earlier debugging mechanism to detect
Expand All @@ -243,19 +243,17 @@ For neural nets, a more commonly used pattern is ```simple_bind```, which will c
all the arguments arrays for you. Then you can call forward, and backward(if gradient is needed)
to get the gradient.

```r
# Todo: refine code
# define computation graphs
A = mx.symbol.Variable('A')
B = mx.symbol.Variable('B')
C = A * B
```{r, eval=FALSE}
A <- mx.symbol.Variable('A')
B <- mx.symbol.Variable('B')
C <- A * B
texec = mx.simple.bind(C)
texec <- mx.simple.bind(C)
texec.forward()
texec.backward()
```

The [model API](../../python/mxnet/model.py) is a thin wrapper around the symbolic executors to support neural net training.
The [model API](../../R-package/R/model.R) is a thin wrapper around the symbolic executors to support neural net training.

You are also highly encouraged to read [Symbolic Configuration and Execution in Pictures](symbol_in_pictures.md),
which provides a detailed explanation of concepts in pictures.
Expand Down
2 changes: 2 additions & 0 deletions doc/R-package/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ PKGROOT=../../R-package

# ADD The Markdown to be built here
classifyRealImageWithPretrainedModel.md:
mnistCompetition.Rmd:
ndarrayAndSymbolTutorial.Rmd:

# General Rules for build rmarkdowns, need knitr
%.md: $(PKGROOT)/vignettes/%.Rmd
Expand Down
2 changes: 2 additions & 0 deletions doc/R-package/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ The MXNet R packages brings flexible and efficient GPU computing and deep learni
Tutorials
---------
* [Classify Realworld Images with Pretrained Model](classifyRealImageWithPretrainedModel.md)
* [Handwritten Digits Classification Competition](mnistCompetition.md)
* [Tutorial on NDArray and Symbol](ndarrayAndSymbolTutorial.md)

Installation
------------
Expand Down
Loading

0 comments on commit b2ab9e8

Please sign in to comment.