Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-1358]Fit api tutorial #15353

Merged
merged 28 commits into from
Aug 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
10d8bfa
Added tutorial for FIT API
piyushghai Mar 18, 2019
39d19e0
Added tests for Fit API tutorial
piyushghai Mar 18, 2019
83b0a9f
Updated index.md for the new tutorial to show up
piyushghai Mar 18, 2019
76e15a3
Addressed PR feedback
piyushghai Mar 19, 2019
1f26c24
Addressed PR feedback
piyushghai Mar 20, 2019
8ff448e
Removed spurious comment for Py2 and Py3 compatibility
piyushghai Mar 20, 2019
5770507
Address PR feedback
piyushghai Apr 5, 2019
c223af1
Addressed PR feedback
piyushghai Apr 5, 2019
d1f662f
Fixed typo
piyushghai Apr 5, 2019
97de8c4
Added example to showcase custom event handler
piyushghai Apr 5, 2019
44a560a
Fixed imports as estimator moved to contrib package
piyushghai Apr 5, 2019
f7b1356
Added a side note to inform about estimator reference being updated b…
piyushghai Apr 5, 2019
dd7f94f
Corrected typo
piyushghai Apr 10, 2019
3f21351
update tutorial
roywei Jun 19, 2019
c808ac0
address comments
roywei Jul 16, 2019
eb63fcc
new line
roywei Jul 16, 2019
1ce7e51
fix import
roywei Jul 19, 2019
6907df0
fix cached graph
roywei Jul 25, 2019
19d85f3
fix import
roywei Jul 29, 2019
0f99e89
address comments
roywei Jul 29, 2019
75ec743
fix doc gen
roywei Jul 30, 2019
3b1b185
add softmax
roywei Jul 30, 2019
0b8ebb0
add to website index
roywei Jul 30, 2019
55c54e5
fix doc string
roywei Jul 30, 2019
a69b406
Fix doc gen (#12)
roywei Jul 31, 2019
2b2f85d
fix test (#13)
roywei Jul 31, 2019
5436b62
fix warning (#14)
roywei Jul 31, 2019
81be5a0
fix href (#15)
roywei Jul 31, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions docs/api/python/gluon/contrib.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,33 @@ In the rest of this document, we list routines provided by the `gluon.contrib` p
WikiText103
```

### Estimator

```eval_rst
.. currentmodule:: mxnet.gluon.contrib.estimator
.. autosummary::
:nosignatures:
Estimator
```

#### EventHandler

```eval_rst
.. currentmodule:: mxnet.gluon.contrib.estimator
.. autosummary::
:nosignatures:
StoppingHandler
MetricHandler
ValidationHandler
LoggingHandler
CheckpointHandler
EarlyStoppingHandler
```

## API Reference

<script type="text/javascript" src='../../../_static/js/auto_module_index.js'></script>
Expand Down Expand Up @@ -144,6 +171,9 @@ In the rest of this document, we list routines provided by the `gluon.contrib` p
:members:
:imported-members:
.. automodule:: mxnet.gluon.contrib.estimator
:members:
:imported-members:
```

<script>auto_index("api-reference");</script>
271 changes: 271 additions & 0 deletions docs/tutorials/gluon/fit_api_tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements. See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership. The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License. You may obtain a copy of the License at -->

<!--- http://www.apache.org/licenses/LICENSE-2.0 -->

<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied. See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->


# MXNet Gluon Fit API

In this tutorial, you will learn how to use the [Gluon Fit API](https://cwiki.apache.org/confluence/display/MXNET/Gluon+Fit+API+-+Tech+Design) which is the easiest way to train deep learning models using the [Gluon API](http://mxnet.incubator.apache.org/versions/master/gluon/index.html) in Apache MXNet.

With the Fit API, you can train a deep learning model with a minimal amount of code. Just specify the network, loss function and the data you want to train on. You don't need to worry about the boiler plate code to loop through the dataset in batches (often called as 'training loop'). Advanced users can train with bespoke training loops, and many of these use cases will be covered by the Fit API.

To demonstrate the Fit API, you will train an image classification model using the [ResNet-18](https://arxiv.org/abs/1512.03385) neural network architecture. The model will be trained using the [Fashion-MNIST dataset](https://research.zalando.com/welcome/mission/research-projects/fashion-mnist/).

## Prerequisites

To complete this tutorial, you will need:

- [MXNet](https://mxnet.incubator.apache.org/install/#overview) (The version of MXNet will be >= 1.5.0, you can use `pip install mxnet` to get 1.5.0 release pip package or build from source with master, refer to [MXNet installation](http://mxnet.incubator.apache.org/versions/master/install/index.html?platform=Linux&language=Python&processor=CPU)
- [Jupyter Notebook](https://jupyter.org/index.html) (For interactively running the provided .ipynb file)




```python
import mxnet as mx
from mxnet import gluon
from mxnet.gluon.model_zoo import vision
from mxnet.gluon.contrib.estimator import estimator
from mxnet.gluon.contrib.estimator.event_handler import TrainBegin, TrainEnd, EpochEnd, CheckpointHandler

gpu_count = mx.context.num_gpus()
ctx = [mx.gpu(i) for i in range(gpu_count)] if gpu_count > 0 else mx.cpu()
```

## Dataset

[Fashion-MNIST](https://research.zalando.com/welcome/mission/research-projects/fashion-mnist/) dataset consists of fashion items divided into ten categories: t-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag and ankle boot.

- It has 60,000 grayscale images of size 28 * 28 for training.
- It has 10,000 grayscale images of size 28 * 28 for testing/validation.

We will use the ```gluon.data.vision``` package to directly import the Fashion-MNIST dataset and perform pre-processing on it.


```python
# Get the training data
fashion_mnist_train = gluon.data.vision.FashionMNIST(train=True)

# Get the validation data
fashion_mnist_val = gluon.data.vision.FashionMNIST(train=False)
```


```python
transforms = [gluon.data.vision.transforms.Resize(224), # We pick 224 as the model we use takes an input of size 224.
gluon.data.vision.transforms.ToTensor()]

# Now we will stack all these together.
transforms = gluon.data.vision.transforms.Compose(transforms)
```


```python
# Apply the transformations
fashion_mnist_train = fashion_mnist_train.transform_first(transforms)
fashion_mnist_val = fashion_mnist_val.transform_first(transforms)
```


```python
batch_size = 256 # Batch size of the images
num_workers = 4 # The number of parallel workers for loading the data using Data Loaders.

train_data_loader = gluon.data.DataLoader(fashion_mnist_train, batch_size=batch_size,
shuffle=True, num_workers=num_workers)
val_data_loader = gluon.data.DataLoader(fashion_mnist_val, batch_size=batch_size,
shuffle=False, num_workers=num_workers)
```

## Model and Optimizers

Let's load the resnet-18 model architecture from [Gluon Model Zoo](http://mxnet.apache.org/api/python/gluon/model_zoo.html) and initialize its parameters. The Gluon Model Zoo contains a repository of pre-trained models as well the model architecture definitions. We are using the model architecture from the model zoo in order to train it from scratch.


```python
resnet_18_v1 = vision.resnet18_v1(pretrained=False, classes = 10)
resnet_18_v1.initialize(init = mx.init.Xavier(), ctx=ctx)
```

We will be using `SoftmaxCrossEntropyLoss` as the loss function since this is a multi-class classification problem. We will be using `sgd` (Stochastic Gradient Descent) as the optimizer.
You can experiment with a [different loss](http://mxnet.incubator.apache.org/versions/master/api/python/gluon/loss.html) or [optimizer](http://mxnet.incubator.apache.org/versions/master/api/python/optimization/optimization.html) as well.


```python
loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()
```

Let's define the trainer object for training the model.


```python
learning_rate = 0.04 # You can experiment with your own learning rate here
num_epochs = 2 # You can run training for more epochs
trainer = gluon.Trainer(resnet_18_v1.collect_params(),
'sgd', {'learning_rate': learning_rate})
```

## Train using Fit API

As stated earlier, the Fit API greatly simplifies the boiler plate code and complexity for training using MXNet Gluon.

In the basic usage example, with just 2 lines of code, we will set up our model for training.

### Basic Usage


```python
train_acc = mx.metric.Accuracy() # Metric to monitor

# Define the estimator, by passing to it the model, loss function, metrics, trainer object and context
est = estimator.Estimator(net=resnet_18_v1,
loss=loss_fn,
metrics=train_acc,
trainer=trainer,
context=ctx)

# ignore warnings for nightly test on CI only
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
# Magic line
est.fit(train_data=train_data_loader,
epochs=num_epochs)
```

Training begin: using optimizer SGD with current learning rate 0.0400 <!--notebook-skip-line-->
Train for 2 epochs. <!--notebook-skip-line-->

[Epoch 0] finished in 25.110s: train_accuracy : 0.7877 train_softmaxcrossentropyloss0 : 0.5905 <!--notebook-skip-line-->

[Epoch 1] finished in 23.595s: train_accuracy : 0.8823 train_softmaxcrossentropyloss0 : 0.3197 <!--notebook-skip-line-->
Train finished using total 48s at epoch 1. train_accuracy : 0.8823 train_softmaxcrossentropyloss0 : 0.3197 <!--notebook-skip-line-->


### Advanced Usage

The Fit API is also customizable with several `Event Handlers` which give a fine grained control over the steps in training and exposes callback methods that provide control over the stages involved in training. Available callback methods are: `train_begin`, `train_end`, `batch_begin`, `batch_end`, `epoch_begin` and `epoch_end`.

You can use built-in event handlers such as `LoggingHandler`, `CheckpointHandler` or `EarlyStoppingHandler` to log and save the model at certain time-steps during training. You can also stop the training when the model's performance plateaus.
There are also some default utility handlers that will be added to your estimator by default. For example, `StoppingHandler` is used to control when the training ends, based on number of epochs or number of batches trained.
`MetricHandler` is used to calculate training metrics at end of each batch and epoch.
`ValidationHandler` is used to validate your model on test data at each epoch's end and then calculate validation metrics.
You can create these utility handlers with different configurations and pass to estimator. This will override the default handler configuration.
You can create a custom handler by inheriting one or multiple
[base event handlers](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/contrib/estimator/event_handler.py#L32)
including: `TrainBegin`, `TrainEnd`, `EpochBegin`, `EpochEnd`, `BatchBegin`, `BatchEnd`.


### Custom Event Handler

Here we will showcase an example custom event handler the inherits features from a few base handler classes.
Our custom event handler is a simple one: record the loss values at the end of every epoch in our training phase.

Note: For each of the method, the `Estimator` object is passed along, so you can access training metrics.

```python
class LossRecordHandler(TrainBegin, TrainEnd, EpochEnd):
def __init__(self):
super(LossRecordHandler, self).__init__()
self.loss_history = {}

def train_begin(self, estimator, *args, **kwargs):
print("Training begin")

def train_end(self, estimator, *args, **kwargs):
# Print all the losses at the end of training
print("Training ended")
for loss_name in self.loss_history:
for i, loss_val in enumerate(self.loss_history[loss_name]):
print("Epoch: {}, Loss name: {}, Loss value: {}".format(i, loss_name, loss_val))

def epoch_end(self, estimator, *args, **kwargs):
for metric in estimator.train_metrics:
# look for train Loss in training metrics
# we wrapped loss value as a metric to record it
if isinstance(metric, mx.metric.Loss):
loss_name, loss_val = metric.get()
# append loss value for this epoch
self.loss_history.setdefault(loss_name, []).append(loss_val)
```


```python
# Let's reset the model, trainer and accuracy objects from above

resnet_18_v1.initialize(force_reinit=True, init = mx.init.Xavier(), ctx=ctx)
trainer = gluon.Trainer(resnet_18_v1.collect_params(),
'sgd', {'learning_rate': learning_rate})
train_acc = mx.metric.Accuracy()
```


```python
# Define the estimator, by passing to it the model, loss function, metrics, trainer object and context
est = estimator.Estimator(net=resnet_18_v1,
loss=loss_fn,
metrics=train_acc,
trainer=trainer,
context=ctx)

# Define the handlers, let's say in built Checkpointhandler
checkpoint_handler = CheckpointHandler(model_dir='./',
model_prefix='my_model',
monitor=train_acc, # Monitors a metric
save_best=True) # Save the best model in terms of
# Let's instantiate another handler which we defined above
loss_record_handler = LossRecordHandler()
# ignore warnings for nightly test on CI only
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
# Magic line
est.fit(train_data=train_data_loader,
val_data=val_data_loader,
epochs=num_epochs,
event_handlers=[checkpoint_handler, loss_record_handler]) # Add the event handlers
```

Training begin: using optimizer SGD with current learning rate 0.0400 <!--notebook-skip-line-->
Train for 2 epochs. <!--notebook-skip-line-->

[Epoch 0] finished in 25.236s: train_accuracy : 0.7917 train_softmaxcrossentropyloss0 : 0.5741 val_accuracy : 0.6612 val_softmaxcrossentropyloss0 : 0.8627 <!--notebook-skip-line-->

[Epoch 1] finished in 24.892s: train_accuracy : 0.8826 train_softmaxcrossentropyloss0 : 0.3229 val_accuracy : 0.8474 val_softmaxcrossentropyloss0 : 0.4262 <!--notebook-skip-line-->

Train finished using total 50s at epoch 1. train_accuracy : 0.8826 train_softmaxcrossentropyloss0 : 0.3229 val_accuracy : 0.8474 val_softmaxcrossentropyloss0 : 0.4262 <!--notebook-skip-line-->

Training begin <!--notebook-skip-line-->
Epoch 1, loss 0.5741 <!--notebook-skip-line-->
Epoch 2, loss 0.3229 <!--notebook-skip-line-->

You can load the saved model, by using the `load_parameters` API in Gluon. For more details refer to the [Loading model parameters from file tutorial](save_load_params.html#saving-model-parameters-to-file)


```python
resnet_18_v1 = vision.resnet18_v1(pretrained=False, classes=10)
resnet_18_v1.load_parameters('./my_model-best.params', ctx=ctx)
```

## Summary

- To learn more about deep learning with MXNeT, see [Dive Into Deep Learning](http://gluon.io)

## Next Steps

- For more hands on learning about deep learning, check out [Dive into Deep Learning](https://d2l.ai)
roywei marked this conversation as resolved.
Show resolved Hide resolved

<!-- INSERT SOURCE DOWNLOAD BUTTONS -->
2 changes: 2 additions & 0 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,8 @@ Select API:&nbsp;
* [Data Transforms](/tutorials/gluon/transforms.html)
* [Applying Data Augmentation](/tutorials/gluon/data_augmentation.html)
* [Data Augmentation with Masks (for Object Segmentation)](https://mxnet.incubator.apache.org/tutorials/python/data_augmentation_with_masks.html)
* Fit API
* [Using Fit API](/tutorials/gluon/fit_api_tutorial.html)
</div> <!--end of gluon-->

<div class="module">
Expand Down
2 changes: 2 additions & 0 deletions python/mxnet/gluon/contrib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,5 @@
from . import cnn

from . import data

from . import estimator
2 changes: 2 additions & 0 deletions python/mxnet/gluon/contrib/estimator/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,7 @@

# pylint: disable=wildcard-import
"""Gluon Estimator Module"""
from . import estimator
from . import event_handler
from .estimator import *
from .event_handler import *
Loading