diff --git a/NEWS.md b/NEWS.md
index 62458fabf..356b17636 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -4,7 +4,8 @@ Knet v0.8.6 Release Notes
 Compatibility
 -------------
 * Windows GPU support implemented.
-* MacOS GPU support improved: libnvidia-ml only used when available.
+* MacOS GPU support improved: nvml only used when available.
+* CUDA up to v"9.1" and cuDNN up to v"7.0.5" are tested.
 * Pre-0.6 Julia versions no longer supported.
 
 General
diff --git a/README.md b/README.md
index 7006deac5..7a27cdfcb 100644
--- a/README.md
+++ b/README.md
@@ -29,6 +29,9 @@
     - [Convolutional neural network](#Convolutional-neural-network-1)
     - [Recurrent neural network](#Recurrent-neural-network-1)
 - [Benchmarks](#Benchmarks-1)
+    - [Knet Benchmarks (Sep 30, 2016)](#Knet-Benchmarks-(Sep-30,-2016)-1)
+    - [DyNet Benchmarks (Dec 15, 2017)](#DyNet-Benchmarks-(Dec-15,-2017)-1)
+    - [DeepLearningFrameworks (Nov 24, 2017)](#DeepLearningFrameworks-(Nov-24,-2017)-1)
 - [Under the hood](#Under-the-hood-1)
     - [KnetArrays](#KnetArrays-1)
     - [AutoGrad](#AutoGrad-1)
@@ -525,10 +528,12 @@ Here is a random sample of 1000 characters from the model.  Note that the model
 ## Benchmarks
 
 
-Each of the examples above was used as a benchmark to compare Knet with other frameworks. The table below shows the number of seconds it takes to train a given model for a particular dataset, number of epochs and minibatch size for Knet, Theano, Torch, Caffe and TensorFlow. Knet had comparable performance to other commonly used frameworks.
+<a id='Knet-Benchmarks-(Sep-30,-2016)-1'></a>
+
+### Knet Benchmarks (Sep 30, 2016)
 
 
-Knet Benchmarks (Sep 30, 2016):
+Each of the examples above was used as a benchmark to compare Knet with other frameworks. The table below shows the number of seconds it takes to train a given model for a particular dataset, number of epochs and minibatch size for Knet, Theano, Torch, Caffe and TensorFlow. Knet had comparable performance to other commonly used frameworks.
 
 
 | model   | dataset  | epochs | batch | Knet | Theano | Torch | Caffe | TFlow |
@@ -543,7 +548,37 @@ Knet Benchmarks (Sep 30, 2016):
 The benchmarking was done on g2.2xlarge GPU instances on Amazon AWS. The code is available at [github](https://github.com/ozanarkancan/Knet8-Benchmarks) and as machine image `deep_AMI_v6` at AWS N.California. See the section on [Using Amazon AWS](http://denizyuret.github.io/Knet.jl/latest/install.html#Using-Amazon-AWS-1) for more information. The datasets are available online using the following links: [Housing](https://archive.ics.uci.edu/ml/datasets/Housing), [MNIST](http://yann.lecun.com/exdb/mnist), [Hiawatha](http://www.gutenberg.org/files/19/19.txt). The MLP uses a single hidden layer of 64 units. CharLM uses a single layer LSTM language model with embedding and hidden layer sizes set to 256 and trained using BPTT with a sequence length of 100. Each dataset was minibatched and transferred to GPU prior to benchmarking when possible.
 
 
-More recently (Nov 24, 2017), @ilkarman has published CNN and RNN [benchmarks](https://github.com/ilkarman/DeepLearningFrameworks) on Nvidia K80 GPUs, using the Microsoft Azure Data Science Virtual Machine for Linux (Ubuntu). The results are copied below.  You can find versions of the Knet notebooks used for these benchmarks in the Knet/examples/DeepLearningFrameworks directory.
+<a id='DyNet-Benchmarks-(Dec-15,-2017)-1'></a>
+
+### DyNet Benchmarks (Dec 15, 2017)
+
+
+We implemented dynamic neural network examples from the [dynet-benchmark](https://github.com/neulab/dynet-benchmark) repo to compare Knet with DyNet and Chainer. See [DyNet technical report](https://arxiv.org/abs/1701.03980) for the architectural details of the implemented examples and the [github repo](https://github.com/neulab/dynet-benchmark) for the source code.
+
+
+  * [rnnlm-batch](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/rnnlm-batch.jl): A recurrent neural network language model on [PTB](https://catalog.ldc.upenn.edu/ldc99t42) corpus.
+  * [bilstm-tagger](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/bilstm-tagger.jl): A bidirectional LSTM network that predicts a tag for each word. It is trained on [WikiNER](https://github.com/neulab/dynet-benchmark/tree/master/data/tags) dataset.
+  * [bilstm-tagger-withchar](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/bilstm-tagger-withchar.jl): Similar to bilstm-tagger, but uses characer-based embeddings for unknown words.
+  * [treenn](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/treenn.jl): A tree-structured LSTM sentiment classifier trained on [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/index.html) dataset.
+
+
+Benchmarks were run on a server with Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz and Tesla K80.
+
+
+|                                                                                                                          Model |    Metric |  Knet | DyNet | Chainer |
+| ------------------------------------------------------------------------------------------------------------------------------:| ---------:| -----:| -----:| -------:|
+|                       [rnnlm-batch](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/rnnlm-batch.jl) | words/sec | 28.5k | 18.7k |     16k |
+|                   [bilstm-tagger](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/bilstm-tagger.jl) | words/sec |  6800 |  1200 |     157 |
+| [bilstm-tagger-withchar](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/bilstm-tagger-withchar.jl) | words/sec |  1300 |   900 |     128 |
+|                                 [treenn](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/treenn.jl) | sents/sec |    43 |    68 |      10 |
+
+
+<a id='DeepLearningFrameworks-(Nov-24,-2017)-1'></a>
+
+### DeepLearningFrameworks (Nov 24, 2017)
+
+
+More recently, @ilkarman has published CNN and RNN [benchmarks](https://github.com/ilkarman/DeepLearningFrameworks) on Nvidia K80 GPUs, using the Microsoft Azure Data Science Virtual Machine for Linux (Ubuntu). The results are copied below.  You can find versions of the Knet notebooks used for these benchmarks in the Knet/examples/DeepLearningFrameworks directory.
 
 
 Training CNN (VGG-style) on CIFAR-10 - Image Recognition
diff --git a/docs/src/tutorial.md b/docs/src/tutorial.md
index f3324e725..3fb48304b 100644
--- a/docs/src/tutorial.md
+++ b/docs/src/tutorial.md
@@ -594,6 +594,8 @@ notebook.
 
 ## Benchmarks
 
+### Knet Benchmarks (Sep 30, 2016)
+
 Each of the examples above was used as a benchmark to compare Knet
 with other frameworks. The table below shows the number of seconds it
 takes to train a given model for a particular dataset, number of
@@ -601,8 +603,6 @@ epochs and minibatch size for Knet, Theano, Torch, Caffe and
 TensorFlow. Knet had comparable performance to other commonly used
 frameworks.
 
-Knet Benchmarks (Sep 30, 2016):
-
 |model|dataset|epochs|batch|Knet|Theano|Torch|Caffe|TFlow|
 |:----|:------|:-----|:----|:---|:-----|:----|:----|:----|
 |LinReg|Housing|10K|506|2.84|1.88|2.66|2.35|5.92|
@@ -626,7 +626,35 @@ language model with embedding and hidden layer sizes set to 256 and
 trained using BPTT with a sequence length of 100. Each dataset was
 minibatched and transferred to GPU prior to benchmarking when possible.
 
-More recently (Nov 24, 2017), @ilkarman has published CNN and RNN
+
+### DyNet Benchmarks (Dec 15, 2017)
+
+We implemented dynamic neural network examples from the
+[dynet-benchmark](https://github.com/neulab/dynet-benchmark) repo to
+compare Knet with DyNet and Chainer. See [DyNet technical
+report](https://arxiv.org/abs/1701.03980) for the architectural
+details of the implemented examples and the [github
+repo](https://github.com/neulab/dynet-benchmark) for the source code.
+
+- [rnnlm-batch](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/rnnlm-batch.jl): A recurrent neural network language model on [PTB](https://catalog.ldc.upenn.edu/ldc99t42) corpus.
+- [bilstm-tagger](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/bilstm-tagger.jl): A bidirectional LSTM network that predicts a tag for each word. It is trained on [WikiNER](https://github.com/neulab/dynet-benchmark/tree/master/data/tags) dataset.
+- [bilstm-tagger-withchar](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/bilstm-tagger-withchar.jl): Similar to bilstm-tagger, but uses characer-based embeddings for unknown words.
+- [treenn](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/treenn.jl): A tree-structured LSTM sentiment classifier trained on [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/index.html) dataset.
+
+Benchmarks were run on a server with Intel(R) Xeon(R) CPU E5-2695 v4 @
+2.10GHz and Tesla K80.
+
+| Model                                               | Metric    |  Knet    | DyNet     | Chainer     |
+| ----------------------------------------------------| --------- | -------- | --------- |------------ |
+| [rnnlm-batch](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/rnnlm-batch.jl)                       | words/sec | 28.5k    | 18.7k     | 16k         |
+| [bilstm-tagger](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/bilstm-tagger.jl)                   | words/sec | 6800     | 1200      | 157         |
+| [bilstm-tagger-withchar](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/bilstm-tagger-withchar.jl) | words/sec | 1300     | 900       | 128         |
+| [treenn](https://github.com/denizyuret/Knet.jl/blob/master/examples/dynet-benchmark/treenn.jl)                                 | sents/sec | 43       | 68        | 10          |
+
+
+### DeepLearningFrameworks (Nov 24, 2017)
+
+More recently, @ilkarman has published CNN and RNN
 [benchmarks](https://github.com/ilkarman/DeepLearningFrameworks) on
 Nvidia K80 GPUs, using the Microsoft Azure Data Science Virtual
 Machine for Linux (Ubuntu). The results are copied below.  You can
diff --git a/examples/README.md b/examples/README.md
index 9975bd280..61f488223 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -1,23 +1,39 @@
 # Examples
 
+## Tutorials
 - [julia-tutorial](julia-tutorial): Julia examples demonstrating arrays, tuples, dictionaries, indexing etc.
 - [knet-tutorial](knet-tutorial): Notebook with Knet models for linreg, softmax, mlp, cnn, rnn.
+- [optimizers](optimizers): Try various optimizers (SGD, Momentum, Adam...) on [LeNet].
+- [overfitting](overfitting): Notebook on underfitting, overfitting, regularization, dropout.
+- [rnn-tutorial](rnn-tutorial): RNN tutorial notebook with BPTT, LSTM, S2S.
+
+## Benchmarks
+- [DeepLearningFrameworks](DeepLearningFrameworks): Notebooks comparing CNTK, Caffe2, Chainer, Gluon, Keras, Knet, Lasagne, MXNet, PyTorch, TensorFlow on CNN and RNN examples.
+- [dynet-benchmark](dynet-benchmark): Four dynamic neural network examples comparing Knet with DyNet and Chainer from [dynet-benchmark](https://github.com/neulab/dynet-benchmark).
+
+## Models
+
+### Linear
 - [synthetic-linreg](synthetic-linreg): Simple linear regression example using artificial data.
 - [housing-linreg](housing-linreg): Linear regression on the [Boston Housing] dataset.
+
+### MLP
 - [mnist-mlp](mnist-mlp): Multi-layer perceptron trained on [MNIST].
 - [fashion-mnist](fashion-mnist): Multi-layer perceptron trained on [Fashion-MNIST].
+
+### CNN
 - [lenet](lenet): The [LeNet] model trained on [MNIST].
 - [cifar10-cnn](cifar10-cnn): CNN model for [CIFAR-10] with batchnorm.
-- [rnn-tutorial](rnn-tutorial): RNN tutorial notebook with BPTT, LSTM, S2S.
-- [charlm](charlm): Character-level RNN language model from [Karpathy].
-- [rnnlm](rnnlm): Word-level RNN language model trained on the [Mikolov-PTB] corpus.
-- [optimizers](optimizers): Try various optimizers (SGD, Momentum, Adam...) on [LeNet].
-- [overfitting](overfitting): Notebook on underfitting, overfitting, regularization, dropout.
 - [resnet](resnet): Knet implementation of [ResNet] 50, 101, and 152 models.
 - [vgg](vgg): Knet implementation of [VGG] D and E models.
+
+### RNN
+- [charlm](charlm): Character-level RNN language model from [Karpathy].
+- [rnnlm](rnnlm): Word-level RNN language model trained on the [Mikolov-PTB] corpus.
+
+### Other
 - [variational-autoencoder](variational-autoencoder): Train a Variational Autoencoder on [MNIST].
-- [DeepLearningFrameworks](DeepLearningFrameworks): Notebooks comparing CNTK, Caffe2, Chainer, Gluon, Keras, Knet, Lasagne, MXNet, PyTorch, TensorFlow on CNN and RNN examples.
-- [dynet-benchmark](dynet-benchmark): 4 different dynamic neural network examples in Knet implemented for [dynet-benchmark](https://github.com/neulab/dynet-benchmark).
+
 
 [MNIST]: http://yann.lecun.com/exdb/mnist
 [LeNet]: http://yann.lecun.com/exdb/lenet
diff --git a/examples/dynet-benchmark/README.md b/examples/dynet-benchmark/README.md
index 7919352fa..90aa6595a 100644
--- a/examples/dynet-benchmark/README.md
+++ b/examples/dynet-benchmark/README.md
@@ -1,11 +1,25 @@
 # dynet-benchmark (last updated Dec 14, 2017)
 
+This directory contains examples implemented for [dynet-benchmark](https://github.com/neulab/dynet-benchmark) repo. See [DyNet technical report](https://arxiv.org/abs/1701.03980) for the architectural details of the implemented examples.
+
 - [rnnlm-batch](rnnlm-batch.jl): A recurrent neural network language model on [PTB](https://catalog.ldc.upenn.edu/ldc99t42) corpus.
 - [bilstm-tagger](bilstm-tagger.jl): A bidirectional LSTM network that predicts a tag for each word. It is trained on [WikiNER](https://github.com/neulab/dynet-benchmark/tree/master/data/tags) dataset.
 - [bilstm-tagger-withchar](bilstm-tagger-withchar.jl): Similar to bilstm-tagger, but uses characer-based embeddings for unknown words.
 - [treenn](treenn.jl): A tree-structured LSTM sentiment classifier trained on [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/index.html) dataset.
 
-This directory contains examples implemented for [dynet-benchmark](https://github.com/neulab/dynet-benchmark) repo. See [DyNet technical report](https://arxiv.org/abs/1701.03980) for the architectural details of the implemented examples.
+
+## Results on Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz, Tesla K80
+
+See the [dynet-benchmark](https://github.com/neulab/dynet-benchmark)
+repo for the source code for each model and each framework.
+
+| Model                                               | Metric    |  Knet    | DyNet     | Chainer     |
+| ----------------------------------------------------| --------- | -------- | --------- |------------ |
+| [rnnlm-batch](rnnlm-batch.jl)                       | words/sec | 28.5k    | 18.7k     | 16k         |
+| [bilstm-tagger](bilstm-tagger.jl)                   | words/sec | 6800     | 1200      | 157         |
+| [bilstm-tagger-withchar](bilstm-tagger-withchar.jl) | words/sec | 1300     | 900       | 128         |
+| [treenn](treenn.jl)                                 | sents/sec | 43       | 68        | 10          |
+
 
 ## Example Usage
 
@@ -52,11 +66,3 @@ To run examples without time limit, pass a non-positive integer to `--timeout` o
 $ julia bilstm-tagger.jl --usegpu --timeout 0
 ```
 
-## Results on Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz, Tesla K80
-
-| Model                                               | Metric    |  Knet    | DyNet     | Chainer     |
-| ----------------------------------------------------| --------- | -------- | --------- |------------ |
-| [rnnlm-batch](rnnlm-batch.jl)                       | words/sec | 28.5k    | 18.7k     | 16k         |
-| [bilstm-tagger](bilstm-tagger.jl)                   | words/sec | 6800     | 1200      | 157         |
-| [bilstm-tagger-withchar](bilstm-tagger-withchar.jl) | words/sec | 1300     | 900       | 128         |
-| [treenn](treenn.jl)                                 | sents/sec | 43       | 68        | 10          |