Merge remote-tracking branch 'upstream/master' into terry

yzhliu · Dec 22, 2015 · ef46e26 · ef46e26
2 parents edaeeec + 4b87b58
commit ef46e26
Show file tree

Hide file tree

Showing 5 changed files with 27 additions and 13 deletions.
diff --git a/example/image-classification/README.md b/example/image-classification/README.md
@@ -147,31 +147,31 @@ model.fit(X=train_data, y=train_label)
 
 The following factors may significant affect the performance:
 
-- Use a fast backend. A fast BLAS library, e.g. openblas, altas,
+1. Use a fast backend. A fast BLAS library, e.g. openblas, altas,
 and mkl, is necessary if only using CPU. While for Nvidia GPUs, we strongly
 recommend to use CUDNN.
-- Three important things for the input data:
-  - data format. If you are using the `rec` format, then everything should be
+2. Three important things for the input data:
+  1. data format. If you are using the `rec` format, then everything should be
     fine.
-  - decoding. In default MXNet uses 4 CPU threads for decoding the images, which
+  2. decoding. In default MXNet uses 4 CPU threads for decoding the images, which
     are often able to decode over 1k images per second. You
     may increase the number of threads if either you are using a low-end CPU or
     you GPUs are very powerful.
-  - place to store the data. Any local or distributed filesystem (HDFS, Amazon
+  3. place to store the data. Any local or distributed filesystem (HDFS, Amazon
     S3) should be fine. There may be a problem if multiple machines read the
     data from the network shared filesystem (NFS) at the same time.
-- Use a large batch size. We often choose the largest one which can fit into
+3. Use a large batch size. We often choose the largest one which can fit into
   the GPU memory. But a too large value may slow down the convergence. For
   example, the safe batch size for CIFAR 10 is around 200, while for ImageNet
   1K, the batch size can go beyond 1K.
-- Choose the proper `kvstore` if using more than one GPU. (See
+4. Choose the proper `kvstore` if using more than one GPU. (See
   [doc/developer-guide/multi_node.md](../../doc/developer-guide/multi_node.md)
   for more information)
-  - For a single machine, often the default `local` is good enough. But you may want
+  1. For a single machine, often the default `local` is good enough. But you may want
   to use `local_allreduce_device` for models with size >> 100MB such as AlexNet
   and VGG. But also note that `local_allreduce_device` takes more GPU memory than
   others.
-  - For multiple machines, we recommend to try `dist_sync` first. But if the
+  2. For multiple machines, we recommend to try `dist_sync` first. But if the
   model size is quite large or you use a large number of machines, you may want to use `dist_async`.
 
 ## Results

diff --git a/example/rnn/get_ptb_data.sh b/example/rnn/get_ptb_data.sh
@@ -0,0 +1,13 @@
+#!/bin/env bash
+
+RNN_DIR=$(cd `dirname $0`; pwd)
+DATA_DIR="${RNN_DIR}/data/"
+
+if [[ ! -d "${DATA_DIR}" ]]; then
+  echo "${DATA_DIR} doesn't exist, will create one";
+  mkdir -p ${DATA_DIR}
+fi
+
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt; 
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
diff --git a/example/rnn/lstm.py b/example/rnn/lstm.py
@@ -187,8 +187,8 @@ def calc_nll(seq_label_probs, X, begin):
 def train_lstm(model, X_train_batch, X_val_batch,
                num_round, update_period,
                optimizer='rmsprop', half_life=2,max_grad_norm = 5.0, **kwargs):
-    print("Training swith train.shape=%s" % str(X_train_batch.shape))
-    print("Training swith val.shape=%s" % str(X_val_batch.shape))
+    print("Training with train.shape=%s" % str(X_train_batch.shape))
+    print("Training with val.shape=%s" % str(X_val_batch.shape))
     m = model
     seq_len = len(m.seq_data)
     batch_size = m.seq_data[0].shape[0]

diff --git a/src/symbol/graph_executor.cc b/src/symbol/graph_executor.cc
@@ -300,7 +300,7 @@ void GraphExecutor::InitGraph(const Symbol &symbol,
   }
   std::sort(head_nodes.begin(), head_nodes.end());
   head_nodes.resize(std::unique(head_nodes.begin(), head_nodes.end()) - head_nodes.begin());
-  std::vector<uint32_t> fwd_nodes = graph_.PostDFSOrder(head_nodes, {});
+  std::vector<uint32_t> fwd_nodes = graph_.PostDFSOrder(head_nodes, std::unordered_set<uint32_t>());
   num_forward_nodes_ = fwd_nodes.size();
 
   std::unordered_set<uint32_t> fwd_set(fwd_nodes.begin(), fwd_nodes.end());

diff --git a/src/symbol/static_graph.h b/src/symbol/static_graph.h
@@ -183,7 +183,8 @@ class StaticGraph {
    * \return a post DFS visit order of nodes that can reach heads.
    */
   std::vector<uint32_t> PostDFSOrder(const std::vector<uint32_t>& head_nodes,
-                                     const std::unordered_set<uint32_t>& banned = {}) const;
+                                     const std::unordered_set<uint32_t>& banned
+                                     = std::unordered_set<uint32_t>()) const;
   /*!
    * \brief infer the node shapes in the computation graph.
    *