From 4b87b586eb89332bec0a4dbb18e9e4936002c67e Mon Sep 17 00:00:00 2001 From: Mu Li Date: Mon, 21 Dec 2015 13:59:36 -0500 Subject: [PATCH] Update README.md --- example/image-classification/README.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/example/image-classification/README.md b/example/image-classification/README.md index 468b04b6b8e2..b55ee6fa1152 100644 --- a/example/image-classification/README.md +++ b/example/image-classification/README.md @@ -147,31 +147,31 @@ model.fit(X=train_data, y=train_label) The following factors may significant affect the performance: -- Use a fast backend. A fast BLAS library, e.g. openblas, altas, +1. Use a fast backend. A fast BLAS library, e.g. openblas, altas, and mkl, is necessary if only using CPU. While for Nvidia GPUs, we strongly recommend to use CUDNN. -- Three important things for the input data: - - data format. If you are using the `rec` format, then everything should be +2. Three important things for the input data: + 1. data format. If you are using the `rec` format, then everything should be fine. - - decoding. In default MXNet uses 4 CPU threads for decoding the images, which + 2. decoding. In default MXNet uses 4 CPU threads for decoding the images, which are often able to decode over 1k images per second. You may increase the number of threads if either you are using a low-end CPU or you GPUs are very powerful. - - place to store the data. Any local or distributed filesystem (HDFS, Amazon + 3. place to store the data. Any local or distributed filesystem (HDFS, Amazon S3) should be fine. There may be a problem if multiple machines read the data from the network shared filesystem (NFS) at the same time. -- Use a large batch size. We often choose the largest one which can fit into +3. Use a large batch size. We often choose the largest one which can fit into the GPU memory. But a too large value may slow down the convergence. For example, the safe batch size for CIFAR 10 is around 200, while for ImageNet 1K, the batch size can go beyond 1K. -- Choose the proper `kvstore` if using more than one GPU. (See +4. Choose the proper `kvstore` if using more than one GPU. (See [doc/developer-guide/multi_node.md](../../doc/developer-guide/multi_node.md) for more information) - - For a single machine, often the default `local` is good enough. But you may want + 1. For a single machine, often the default `local` is good enough. But you may want to use `local_allreduce_device` for models with size >> 100MB such as AlexNet and VGG. But also note that `local_allreduce_device` takes more GPU memory than others. - - For multiple machines, we recommend to try `dist_sync` first. But if the + 2. For multiple machines, we recommend to try `dist_sync` first. But if the model size is quite large or you use a large number of machines, you may want to use `dist_async`. ## Results