Skip to content

Commit

Permalink
[numpy] Misc fixes (#364)
Browse files Browse the repository at this point in the history
* Fix

* Fix

* Fix gd.md

* Remove FIXME

* Update mxnet pip install to latest
  • Loading branch information
reminisce authored and astonzhang committed Sep 19, 2019
1 parent c6abe50 commit fc7a5ce
Show file tree
Hide file tree
Showing 15 changed files with 82 additions and 86 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@ build/
/chapter_deep-learning-computation/mydict
/chapter_deep-learning-computation/x-file
/chapter_deep-learning-computation/x-files
.idea
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ stage("Build and Publish") {
rm -rf ~/miniconda3/envs/${ENV_NAME}
conda create -n ${ENV_NAME} pip -y
conda activate ${ENV_NAME}
pip install https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/python/numpy/20190619/mxnet_cu101mkl-1.5.0b20190619-py2.py3-none-manylinux1_x86_64.whl
pip install https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/python/numpy/latest/mxnet_cu101mkl-1.5.0-py2.py3-none-manylinux1_x86_64.whl
pip install git+https://github.com/d2l-ai/d2l-book
python setup.py develop
pip list
Expand Down
5 changes: 3 additions & 2 deletions chapter_computational-performance/hybridize.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,14 +132,15 @@ As is observed in the above results, after a HybridSequential instance calls the
We can save the symbolic program and model parameters to the hard disk through the use of the `export` function after the `net` model has finished computing the output based on the input, such as in the case of `net(x)` in the `benchmark` function.

```{.python .input}
#net.export('my_mlp') # FIXME
net.export('my_mlp')
```

The .json and .params files generated during this process are a symbolic program and a model parameter, respectively. They can be read by other front-end languages supported by Python or MXNet, such as C++, R, Scala, and Perl. This allows us to deploy trained models to other devices and easily use other front-end programming languages. At the same time, because symbolic programming was used during deployment, the computing performance is often superior to that based on imperative programming.

In MXNet, a symbolic program refers to a program that makes use of the Symbol type. We know that, when the NDArray input `x` is provided to `net`, `net(x)` will directly calculate the model output and return a result based on `x`. For models that have called the `hybridize` function, we can also provide a Symbol-type input variable, and `net(x)` will return Symbol type results.

```{.python .input}
# Delete this since symbol is not supposed to be exposed directly to users?
#x = sym.np.var('data') # FIXME
#net(x)
```
Expand All @@ -160,7 +161,7 @@ class HybridNet(nn.HybridBlock):
def hybrid_forward(self, F, x):
print('F: ', F)
print('x: ', x)
# x = F.np.relu(self.hidden(x)) # FIMXE
x = F.npx.relu(self.hidden(x))
print('hidden: ', x)
return self.output(x)
```
Expand Down
18 changes: 11 additions & 7 deletions chapter_crashcourse/naive-bayes.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ Now on to slightly more difficult things $P_{xy}$. Since we picked black and whi
```{.python .input n=66}
n_x = np.zeros((10, 28, 28))
for y in range(10):
n_x[y] = np.array(X.asnumpy()[Y==y].sum(axis=0))
n_x[y] = np.array(X.asnumpy()[Y.asnumpy()==y].sum(axis=0))
P_xy = (n_x+1) / (n_y+1).reshape((10,1,1))
show_images(P_xy, 2, 5);
Expand All @@ -155,7 +155,7 @@ np.expand_dims?
def bayes_pred(x):
x = np.expand_dims(x, axis=0) # (28, 28) -> (1, 28, 28)
p_xy = P_xy * x + (1-P_xy)*(1-x)
p_xy = p_xy.reshape((10,-1)).asnumpy().prod(axis=1) # p(x|y) # FIXME
p_xy = p_xy.reshape((10,-1)).prod(axis=1) # p(x|y)
return np.array(p_xy) * P_y
image, label = mnist_test[0]
Expand Down Expand Up @@ -197,25 +197,29 @@ py
Check if the prediction is correct.

```{.python .input}
py.argmax(axis=0) == label
# convert label which is a scalar tensor of int32 dtype
# to a Python scalar integer for comparison
py.argmax(axis=0) == int(label)
```

Now predict a few validation examples, we can see the Bayes
classifier works pretty well except for the 9th 16th digits.

```{.python .input}
def predict(X):
return [str(bayes_pred_stable(x).argmax(axis=0)) for x in X]
return [bayes_pred_stable(x).argmax(axis=0).astype(np.int32) for x in X]
X, y = mnist_test[:18]
show_images(X, 2, 9, titles=predict(X));
preds = predict(X)
show_images(X, 2, 9, titles=[str(d) for d in preds]);
```

Finally, let's compute the overall accuracy of the classifier.

```{.python .input}
X, y = mnist_test[:] # FIXME, y is a numpy not mx.np
'Validation accuracy', (np.array(predict(X)) == np.array(y)).sum() / len(y)
X, y = mnist_test[:]
preds = np.array(predict(X), dtype=np.int32)
'Validation accuracy', float((preds == y).sum()) / len(y)
```

Modern deep networks achieve error rates of less than 0.01. While Naive Bayes classifiers used to be popular in the 80s and 90s, e.g. for spam filtering, their heydays are over. The poor performance is due to the incorrect statistical assumptions that we made in our model: we assumed that each and every pixel are *independently* generated, depending only on the label. This is clearly not how humans write digits, and this wrong assumption led to the downfall of our overly naive (Bayes) classifier. Time to start building Deep Networks.
Expand Down
2 changes: 0 additions & 2 deletions chapter_crashcourse/ndarray.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,8 +202,6 @@ id(x) == before
Converting MXNet NDArrays to and from NumPy is easy. The converted arrays do *not* share memory. This minor inconvenience is actually quite important: when you perform operations on the CPU or one of the GPUs, you do not want MXNet having to wait whether NumPy might want to be doing something else with the same chunk of memory. The `array` and `asnumpy` functions do the trick.

```{.python .input n=22}
import numpy as onp
a = x.asnumpy()
print(type(a))
b = np.array(a)
Expand Down
60 changes: 29 additions & 31 deletions chapter_crashcourse/probability.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,10 @@ large numbers tell us that as the number of tosses grows this estimate will draw

To start, let's import the necessary packages:

```{.python .input}
```{.python .input n=13}
%matplotlib inline
from IPython import display
import numpy as onp
from mxnet import np, npx
import math
from matplotlib import pyplot as plt
import random
npx.set_np()
Expand All @@ -64,8 +62,8 @@ can be called in many ways, but we'll focus on the simplest.
To draw a single
sample, we simply pass in a vector of probabilities.

```{.python .input}
onp.random.multinomial(1, [1.0/6]*6)
```{.python .input n=14}
np.random.multinomial(1, [1.0/6]*6)
```

If you run the sampler a bunch of times, you'll find that you get out random
Expand All @@ -75,24 +73,24 @@ do this with a Python `for` loop, so `random.multinomial` supports drawing
multiple samples at once, returning an array of independent samples in any shape
we might desire.

```{.python .input}
onp.random.multinomial(10, [1.0/6]*6)
```{.python .input n=15}
np.random.multinomial(10, [1.0/6]*6)
```

Now that we know how to sample rolls of a die, we can simulate 1000 rolls. We
can then go through and count, after each of the 1000 rolls, how many times each
number was rolled.

```{.python .input}
counts = onp.random.multinomial(1000, [1.0/6]*6)
```{.python .input n=16}
counts = np.random.multinomial(1000, [1.0/6]*6).astype(np.float32)
counts / 1000
```

As you can see, the lowest estimated probability for any of the numbers is about $.15$ and the highest estimated probability is $0.188$. Because we generated the data from a fair die, we know that each number actually has probability of $1/6$, roughly $.167$, so these estimates are pretty good. We can also visualize how these probabilities converge over time towards reasonable estimates.

First we define a function that specifies `matplotlib` to output the SVG figures for sharper images, and another one to specify the figure sizes.

```{.python .input}
```{.python .input n=17}
# Save to the d2l package.
def use_svg_display():
"""Use the svg format to display plot in jupyter."""
Expand All @@ -107,13 +105,13 @@ def set_figsize(figsize=(3.5, 2.5)):

Now visualize the data.

```{.python .input}
estimates = onp.random.multinomial(100, [1.0/6]*6, size=100).cumsum(axis=0)
```{.python .input n=18}
estimates = np.random.multinomial(100, [1.0/6]*6, size=100).astype(np.float32).cumsum(axis=0)
estimates = estimates / estimates.sum(axis=1, keepdims=True)
set_figsize((6, 4))
for i in range(6):
plt.plot(estimates[:,i], label=("P(die=" + str(i) +")"))
plt.plot(estimates[:,i].asnumpy(), label=("P(die=" + str(i) +")"))
plt.axhline(y=0.16666, color='black', linestyle='dashed')
plt.legend();
```
Expand Down Expand Up @@ -205,23 +203,23 @@ That is, the second test allowed us to gain much higher confidence that not all

Often, when working with probabilistic models, we'll want not just to estimate distributions from data, but also to generate data by sampling from distributions. One of the simplest ways to sample random numbers is to invoke the `random` method from Python's `random` package.

```{.python .input}
```{.python .input n=7}
[random.random() for _ in range(10)]
```

### Uniform Distribution

These numbers likely *appear* random. Note that their range is between 0 and 1 and they are evenly distributed. Because these numbers are generated by default from the uniform distribution, there should be no two sub-intervals of $[0,1]$ of equal size where numbers are more likely to lie in one interval than the other. In other words, the chances of any of these numbers to fall into the interval $[0.2,0.3)$ are the same as in the interval $[.593264, .693264)$. In fact, these numbers are pseudo-random, and the computer generates them by first producing a random integer and then dividing it by its maximum range. To sample random integers directly, we can run the following snippet, which generates integers in the range between 1 and 100.

```{.python .input}
```{.python .input n=8}
[random.randint(1, 100) for _ in range(10)]
```

How might we check that ``randint`` is really uniform? Intuitively, the best
strategy would be to run sampler many times, say 1 million, and then count the
number of times it generates each value to ensure that the results are approximately uniform.

```{.python .input}
```{.python .input n=9}
counts = np.zeros(100)
fig, axes = plt.subplots(2, 2, sharex=True)
axes = axes.flatten()
Expand All @@ -230,7 +228,7 @@ axes = axes.flatten()
for i in range(1, 100001):
counts[random.randint(0, 99)] += 1
if i in [100, 1000, 10000, 100000]:
axes[int(math.log10(i))-2].bar(np.arange(1, 101).asnumpy(), counts)
axes[int(np.log10(i))-2].bar(np.arange(1, 101).asnumpy(), counts)
```

We can see from these figures that the initial number of counts looks *strikingly* uneven. If we sample fewer than 100 draws from a distribution over
Expand All @@ -242,17 +240,17 @@ situation where the probability of drawing a number $x$ is given by $p(x)$.

Drawing from a uniform distribution over a set of 100 outcomes is simple. But what if we have nonuniform probabilities? Let's start with a simple case, a biased coin which comes up heads with probability 0.35 and tails with probability 0.65. A simple way to sample from that is to generate a uniform random variable over $[0,1]$ and if the number is less than $0.35$, we output heads and otherwise we generate tails. Let's try this out.

```{.python .input}
```{.python .input n=12}
# Number of samples
n = 1000000
y = onp.random.uniform(0, 1, n)
x = onp.arange(1, n+1)
y = np.random.uniform(0, 1, n)
x = np.arange(1, n+1)
# Count number of occurrences and divide by the number of total draws
p0 = onp.cumsum(y < 0.35) / x
p1 = onp.cumsum(y >= 0.35) / x
p0 = np.cumsum(y < 0.35) / x
p1 = np.cumsum(y >= 0.35) / x
plt.semilogx(x, p0)
plt.semilogx(x, p1)
plt.semilogx(x.asnumpy(), p0.asnumpy())
plt.semilogx(x.asnumpy(), p1.asnumpy())
plt.axhline(y=0.35, color='black', linestyle='dashed')
plt.axhline(y=0.65, color='black', linestyle='dashed');
```
Expand All @@ -271,7 +269,7 @@ The standard Normal distribution (aka the standard Gaussian distribution) is giv

```{.python .input}
x = np.arange(-10, 10, 0.01)
p = (1/math.sqrt(2 * math.pi)) * np.exp(-0.5 * x**2)
p = (1/np.sqrt(2 * np.pi)) * np.exp(-0.5 * x**2)
plt.plot(x.asnumpy(), p.asnumpy());
```

Expand Down Expand Up @@ -299,21 +297,21 @@ Now we are ready to state one of the most fundamental theorems in statistics, th

```{.python .input}
# Generate 10 random sequences of 10,000 uniformly distributed random variables
tmp = onp.random.uniform(size=(10000,10))
tmp = np.random.uniform(size=(10000,10))
x = 1.0 * (tmp > 0.3) + 1.0 * (tmp > 0.8)
mean = 1 * 0.5 + 2 * 0.2
variance = 1 * 0.5 + 4 * 0.2 - mean**2
print('mean {}, variance {}'.format(mean, variance))
# Cumulative sum and normalization
y = onp.arange(1,10001).reshape(10000,1)
z = onp.cumsum(x,axis=0) / y
y = np.arange(1,10001).reshape(10000,1)
z = np.cumsum(x,axis=0) / y
for i in range(10):
plt.semilogx(y, z[:,i])
plt.semilogx(y.asnumpy(), z[:,i].asnumpy())
plt.semilogx(y, (variance**0.5) * onp.power(y,-0.5) + mean,'r')
plt.semilogx(y,-(variance**0.5) * onp.power(y,-0.5) + mean,'r');
plt.semilogx(y.asnumpy(), ((variance**0.5) * np.power(y,-0.5) + mean).asnumpy(),'r')
plt.semilogx(y.asnumpy(), (-(variance**0.5) * np.power(y,-0.5) + mean).asnumpy(),'r');
```

This looks very similar to the initial example, at least in the limit of averages of large numbers of variables. This is confirmed by theory. Denote by
Expand Down
14 changes: 7 additions & 7 deletions chapter_deep-learning-computation/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ class MyInit(init.Initializer):
def _init_weight(self, name, data):
print('Init', name, data.shape)
data[:] = np.random.uniform(-10, 10, data.shape)
data *= data.abs() >= 5
data *= np.abs(data) >= 5
net.initialize(MyInit(), force_reinit=True)
net[0].weight.data()[0]
Expand All @@ -196,21 +196,21 @@ net = nn.Sequential()
# We need to give the shared layer a name such that we can reference its
# parameters
shared = nn.Dense(8, activation='relu')
net.add(nn.Dense(8, activation='relu'), # FIXME
#shared, # FIXME
#nn.Dense(8, activation='relu', params=shared.params),
net.add(nn.Dense(8, activation='relu'),
shared,
nn.Dense(8, activation='relu', params=shared.params),
nn.Dense(10))
net.initialize()
x = np.random.uniform(size=(2, 20))
net(x)
# Check whether the parameters are the same
#print(net[1].weight.data()[0] == net[2].weight.data()[0])
#net[1].weight.data()[0,0] = 100
print(net[1].weight.data()[0] == net[2].weight.data()[0])
net[1].weight.data()[0,0] = 100
# Make sure that they're actually the same object rather than just having the
# same value
#print(net[1].weight.data()[0] == net[2].weight.data()[0])
print(net[1].weight.data()[0] == net[2].weight.data()[0])
```

The above example shows that the parameters of the second and third layer are tied. They are identical rather than just being equal. That is, by changing one of the parameters the other one changes, too. What happens to the gradients is quite ingenious. Since the model parameters contain gradients, the gradients of the second hidden layer and the third hidden layer are accumulated in the `shared.params.grad( )` during backpropagation.
Expand Down
12 changes: 6 additions & 6 deletions chapter_deep-learning-computation/read-write.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,31 +12,31 @@ from mxnet.gluon import nn
npx.set_np()
x = np.arange(4)
np.save('x-file', x)
npx.save('x-file', x)
```

Then, we read the data from the stored file back into memory.

```{.python .input}
x2 = np.load('x-file')
x2 = npx.load('x-file')
x2
```

We can also store a list of NDArrays and read them back into memory.

```{.python .input n=2}
y = np.zeros(4)
np.save('x-files', [x, y])
x2, y2 = np.load('x-files')
npx.save('x-files', [x, y])
x2, y2 = npx.load('x-files')
(x2, y2)
```

We can even write and read a dictionary that maps from a string to an NDArray. This is convenient, for instance when we want to read or write all the weights in a model.

```{.python .input n=4}
mydict = {'x': x, 'y': y}
np.save('mydict', mydict)
mydict2 = np.load('mydict')
npx.save('mydict', mydict)
mydict2 = npx.load('mydict')
mydict2
```

Expand Down
3 changes: 1 addition & 2 deletions chapter_linear-networks/linear-regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,6 @@ In model training or prediction, we often use vector calculations and process mu
import d2l
import math
from mxnet import np
import numpy as onp
import time
n = 10000
Expand Down Expand Up @@ -190,7 +189,7 @@ class Timer(object):
def cumsum(self):
"""Return the accumuated times"""
return onp.array(self.times).cumsum().tolist()
return np.array(self.times).cumsum().tolist()
```

Now we can benchmark the workloads. One way to add vectors is to add them one coordinate at a time using a for loop.
Expand Down
1 change: 0 additions & 1 deletion chapter_multilayer-perceptrons/kaggle-house-price.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,6 @@ we can install pandas without even leaving the notebook.
import d2l
from mxnet import autograd, gluon, init, np, npx
from mxnet.gluon import nn
import numpy as onp
import pandas as pd
npx.set_np()
```
Expand Down
Loading

0 comments on commit fc7a5ce

Please sign in to comment.