Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into terry
Browse files Browse the repository at this point in the history
  • Loading branch information
terrytangyuan committed Dec 22, 2015
2 parents edaeeec + 4b87b58 commit ef46e26
Show file tree
Hide file tree
Showing 5 changed files with 27 additions and 13 deletions.
18 changes: 9 additions & 9 deletions example/image-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,31 +147,31 @@ model.fit(X=train_data, y=train_label)
The following factors may significant affect the performance:
- Use a fast backend. A fast BLAS library, e.g. openblas, altas,
1. Use a fast backend. A fast BLAS library, e.g. openblas, altas,
and mkl, is necessary if only using CPU. While for Nvidia GPUs, we strongly
recommend to use CUDNN.
- Three important things for the input data:
- data format. If you are using the `rec` format, then everything should be
2. Three important things for the input data:
1. data format. If you are using the `rec` format, then everything should be
fine.
- decoding. In default MXNet uses 4 CPU threads for decoding the images, which
2. decoding. In default MXNet uses 4 CPU threads for decoding the images, which
are often able to decode over 1k images per second. You
may increase the number of threads if either you are using a low-end CPU or
you GPUs are very powerful.
- place to store the data. Any local or distributed filesystem (HDFS, Amazon
3. place to store the data. Any local or distributed filesystem (HDFS, Amazon
S3) should be fine. There may be a problem if multiple machines read the
data from the network shared filesystem (NFS) at the same time.
- Use a large batch size. We often choose the largest one which can fit into
3. Use a large batch size. We often choose the largest one which can fit into
the GPU memory. But a too large value may slow down the convergence. For
example, the safe batch size for CIFAR 10 is around 200, while for ImageNet
1K, the batch size can go beyond 1K.
- Choose the proper `kvstore` if using more than one GPU. (See
4. Choose the proper `kvstore` if using more than one GPU. (See
[doc/developer-guide/multi_node.md](../../doc/developer-guide/multi_node.md)
for more information)
- For a single machine, often the default `local` is good enough. But you may want
1. For a single machine, often the default `local` is good enough. But you may want
to use `local_allreduce_device` for models with size >> 100MB such as AlexNet
and VGG. But also note that `local_allreduce_device` takes more GPU memory than
others.
- For multiple machines, we recommend to try `dist_sync` first. But if the
2. For multiple machines, we recommend to try `dist_sync` first. But if the
model size is quite large or you use a large number of machines, you may want to use `dist_async`.
## Results
Expand Down
13 changes: 13 additions & 0 deletions example/rnn/get_ptb_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/env bash

RNN_DIR=$(cd `dirname $0`; pwd)
DATA_DIR="${RNN_DIR}/data/"

if [[ ! -d "${DATA_DIR}" ]]; then
echo "${DATA_DIR} doesn't exist, will create one";
mkdir -p ${DATA_DIR}
fi

wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
4 changes: 2 additions & 2 deletions example/rnn/lstm.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,8 @@ def calc_nll(seq_label_probs, X, begin):
def train_lstm(model, X_train_batch, X_val_batch,
num_round, update_period,
optimizer='rmsprop', half_life=2,max_grad_norm = 5.0, **kwargs):
print("Training swith train.shape=%s" % str(X_train_batch.shape))
print("Training swith val.shape=%s" % str(X_val_batch.shape))
print("Training with train.shape=%s" % str(X_train_batch.shape))
print("Training with val.shape=%s" % str(X_val_batch.shape))
m = model
seq_len = len(m.seq_data)
batch_size = m.seq_data[0].shape[0]
Expand Down
2 changes: 1 addition & 1 deletion src/symbol/graph_executor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ void GraphExecutor::InitGraph(const Symbol &symbol,
}
std::sort(head_nodes.begin(), head_nodes.end());
head_nodes.resize(std::unique(head_nodes.begin(), head_nodes.end()) - head_nodes.begin());
std::vector<uint32_t> fwd_nodes = graph_.PostDFSOrder(head_nodes, {});
std::vector<uint32_t> fwd_nodes = graph_.PostDFSOrder(head_nodes, std::unordered_set<uint32_t>());
num_forward_nodes_ = fwd_nodes.size();

std::unordered_set<uint32_t> fwd_set(fwd_nodes.begin(), fwd_nodes.end());
Expand Down
3 changes: 2 additions & 1 deletion src/symbol/static_graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,8 @@ class StaticGraph {
* \return a post DFS visit order of nodes that can reach heads.
*/
std::vector<uint32_t> PostDFSOrder(const std::vector<uint32_t>& head_nodes,
const std::unordered_set<uint32_t>& banned = {}) const;
const std::unordered_set<uint32_t>& banned
= std::unordered_set<uint32_t>()) const;
/*!
* \brief infer the node shapes in the computation graph.
*
Expand Down

0 comments on commit ef46e26

Please sign in to comment.