[numpy] Misc fixes (#364)

* Fix * Fix * Fix gd.md * Remove FIXME * Update mxnet pip install to latest
d2l-ai · Sep 19, 2019 · fc7a5ce · fc7a5ce
1 parent c6abe50
commit fc7a5ce
Show file tree

Hide file tree

Showing 15 changed files with 82 additions and 86 deletions.
diff --git a/.gitignore b/.gitignore
@@ -28,3 +28,4 @@ build/
 /chapter_deep-learning-computation/mydict
 /chapter_deep-learning-computation/x-file
 /chapter_deep-learning-computation/x-files
+.idea
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -11,7 +11,7 @@ stage("Build and Publish") {
       rm -rf ~/miniconda3/envs/${ENV_NAME}
       conda create -n ${ENV_NAME} pip -y
       conda activate ${ENV_NAME}
-      pip install https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/python/numpy/20190619/mxnet_cu101mkl-1.5.0b20190619-py2.py3-none-manylinux1_x86_64.whl
+      pip install https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/python/numpy/latest/mxnet_cu101mkl-1.5.0-py2.py3-none-manylinux1_x86_64.whl
       pip install git+https://github.com/d2l-ai/d2l-book
       python setup.py develop
       pip list

diff --git a/chapter_computational-performance/hybridize.md b/chapter_computational-performance/hybridize.md
@@ -132,14 +132,15 @@ As is observed in the above results, after a HybridSequential instance calls the
 We can save the symbolic program and model parameters to the hard disk through the use of the `export` function after the `net` model has finished computing the output based on the input, such as in the case of `net(x)` in the `benchmark` function.
 
 ```{.python .input}
-#net.export('my_mlp')  # FIXME
+net.export('my_mlp')
 ```
 
 The .json and .params files generated during this process are a symbolic program and a model parameter, respectively. They can be read by other front-end languages supported by Python or MXNet, such as C++, R, Scala, and Perl. This allows us to deploy trained models to other devices and easily use other front-end programming languages. At the same time, because symbolic programming was used during deployment, the computing performance is often superior to that based on imperative programming.
 
 In MXNet, a symbolic program refers to a program that makes use of the Symbol type. We know that, when the NDArray input `x` is provided to `net`, `net(x)` will directly calculate the model output and return a result based on `x`. For models that have called the `hybridize` function, we can also provide a Symbol-type input variable, and `net(x)` will return Symbol type results.
 
 ```{.python .input}
+# Delete this since symbol is not supposed to be exposed directly to users?
 #x = sym.np.var('data')  # FIXME
 #net(x)
 ```
@@ -160,7 +161,7 @@ class HybridNet(nn.HybridBlock):
     def hybrid_forward(self, F, x):
         print('F: ', F)
         print('x: ', x)
-        # x = F.np.relu(self.hidden(x))  # FIMXE
+        x = F.npx.relu(self.hidden(x))
         print('hidden: ', x)
         return self.output(x)
 ```

diff --git a/chapter_crashcourse/naive-bayes.md b/chapter_crashcourse/naive-bayes.md
@@ -137,7 +137,7 @@ Now on to slightly more difficult things $P_{xy}$. Since we picked black and whi
 ```{.python .input  n=66}
 n_x = np.zeros((10, 28, 28))
 for y in range(10):
-    n_x[y] = np.array(X.asnumpy()[Y==y].sum(axis=0))
+    n_x[y] = np.array(X.asnumpy()[Y.asnumpy()==y].sum(axis=0))
 P_xy = (n_x+1) / (n_y+1).reshape((10,1,1))
 
 show_images(P_xy, 2, 5);
@@ -155,7 +155,7 @@ np.expand_dims?
 def bayes_pred(x):
     x = np.expand_dims(x, axis=0)  # (28, 28) -> (1, 28, 28)
     p_xy = P_xy * x + (1-P_xy)*(1-x)
-    p_xy = p_xy.reshape((10,-1)).asnumpy().prod(axis=1) # p(x|y)  # FIXME
+    p_xy = p_xy.reshape((10,-1)).prod(axis=1) # p(x|y)
     return np.array(p_xy) * P_y
 
 image, label = mnist_test[0]
@@ -197,25 +197,29 @@ py
 Check if the prediction is correct.
 
 ```{.python .input}
-py.argmax(axis=0) == label
+# convert label which is a scalar tensor of int32 dtype
+# to a Python scalar integer for comparison
+py.argmax(axis=0) == int(label)
 ```
 
 Now predict a few validation examples, we can see the Bayes
 classifier works pretty well except for the 9th 16th digits.
 
 ```{.python .input}
 def predict(X):
-    return [str(bayes_pred_stable(x).argmax(axis=0)) for x in X]
+    return [bayes_pred_stable(x).argmax(axis=0).astype(np.int32) for x in X]
 
 X, y = mnist_test[:18]
-show_images(X, 2, 9, titles=predict(X));
+preds = predict(X)
+show_images(X, 2, 9, titles=[str(d) for d in preds]);
 ```
 
 Finally, let's compute the overall accuracy of the classifier.
 
 ```{.python .input}
-X, y = mnist_test[:]  # FIXME, y is a numpy not mx.np
-'Validation accuracy', (np.array(predict(X)) == np.array(y)).sum() / len(y)
+X, y = mnist_test[:]
+preds = np.array(predict(X), dtype=np.int32)
+'Validation accuracy', float((preds == y).sum()) / len(y)
 ```
 
 Modern deep networks achieve error rates of less than 0.01. While Naive Bayes classifiers used to be popular in the 80s and 90s, e.g. for spam filtering, their heydays are over. The poor performance is due to the incorrect statistical assumptions that we made in our model: we assumed that each and every pixel are *independently* generated, depending only on the label. This is clearly not how humans write digits, and this wrong assumption led to the downfall of our overly naive (Bayes) classifier. Time to start building Deep Networks.

diff --git a/chapter_crashcourse/ndarray.md b/chapter_crashcourse/ndarray.md
@@ -202,8 +202,6 @@ id(x) == before
 Converting MXNet NDArrays to and from NumPy is easy. The converted arrays do *not* share memory. This minor inconvenience is actually quite important: when you perform operations on the CPU or one of the GPUs, you do not want MXNet having to wait whether NumPy might want to be doing something else with the same chunk of memory. The  `array` and `asnumpy` functions do the trick.
 
 ```{.python .input  n=22}
-import numpy as onp
-
 a = x.asnumpy()
 print(type(a))
 b = np.array(a)

diff --git a/chapter_crashcourse/probability.md b/chapter_crashcourse/probability.md
@@ -40,12 +40,10 @@ large numbers tell us that as the number of tosses grows this estimate will draw
 
 To start, let's import the necessary packages:
 
-```{.python .input}
+```{.python .input  n=13}
 %matplotlib inline
 from IPython import display
-import numpy as onp
 from mxnet import np, npx
-import math
 from matplotlib import pyplot as plt
 import random
 npx.set_np()
@@ -64,8 +62,8 @@ can be called in many ways, but we'll focus on the simplest.
 To draw a single
 sample, we simply pass in a vector of probabilities.
 
-```{.python .input}
-onp.random.multinomial(1, [1.0/6]*6)
+```{.python .input  n=14}
+np.random.multinomial(1, [1.0/6]*6)
 ```
 
 If you run the sampler a bunch of times, you'll find that you get out random
@@ -75,24 +73,24 @@ do this with a Python `for` loop, so `random.multinomial` supports drawing
 multiple samples at once, returning an array of independent samples in any shape
 we might desire.
 
-```{.python .input}
-onp.random.multinomial(10, [1.0/6]*6)
+```{.python .input  n=15}
+np.random.multinomial(10, [1.0/6]*6)
 ```
 
 Now that we know how to sample rolls of a die, we can simulate 1000 rolls. We
 can then go through and count, after each of the 1000 rolls, how many times each
 number was rolled.
 
-```{.python .input}
-counts = onp.random.multinomial(1000, [1.0/6]*6)
+```{.python .input  n=16}
+counts = np.random.multinomial(1000, [1.0/6]*6).astype(np.float32)
 counts / 1000
 ```
 
 As you can see, the lowest estimated probability for any of the numbers is about $.15$ and the highest estimated probability is $0.188$. Because we generated the data from a fair die, we know that each number actually has probability of $1/6$, roughly $.167$, so these estimates are pretty good. We can also visualize how these probabilities converge over time towards reasonable estimates.
 
 First we define a function that specifies `matplotlib` to output the SVG figures for sharper images, and another one to specify the figure sizes.
 
-```{.python .input}
+```{.python .input  n=17}
 # Save to the d2l package.
 def use_svg_display():
     """Use the svg format to display plot in jupyter."""
@@ -107,13 +105,13 @@ def set_figsize(figsize=(3.5, 2.5)):
 
 Now visualize the data.
 
-```{.python .input}
-estimates = onp.random.multinomial(100, [1.0/6]*6, size=100).cumsum(axis=0)
+```{.python .input  n=18}
+estimates = np.random.multinomial(100, [1.0/6]*6, size=100).astype(np.float32).cumsum(axis=0)
 estimates = estimates / estimates.sum(axis=1, keepdims=True)
 
 set_figsize((6, 4))
 for i in range(6):
-    plt.plot(estimates[:,i], label=("P(die=" + str(i) +")"))
+    plt.plot(estimates[:,i].asnumpy(), label=("P(die=" + str(i) +")"))
 plt.axhline(y=0.16666, color='black', linestyle='dashed')
 plt.legend();
 ```
@@ -205,23 +203,23 @@ That is, the second test allowed us to gain much higher confidence that not all
 
 Often, when working with probabilistic models, we'll want not just to estimate distributions from data, but also to generate data by sampling from distributions. One of the simplest ways to sample random numbers is to invoke the `random` method from Python's `random` package.
 
-```{.python .input}
+```{.python .input  n=7}
 [random.random() for _ in range(10)]
 ```
 
 ### Uniform Distribution
 
 These numbers likely *appear* random. Note that their range is between 0 and 1 and they are evenly distributed. Because these numbers are generated by default from the uniform distribution, there should be no two sub-intervals of $[0,1]$ of equal size where numbers are more likely to lie in one interval than the other. In other words, the chances of any of these numbers to fall into the interval $[0.2,0.3)$ are the same as in the interval $[.593264, .693264)$. In fact, these numbers are pseudo-random, and the computer generates them by first producing a random integer and then dividing it by its maximum range. To sample random integers directly, we can run the following snippet, which generates integers in the range between 1 and 100.
 
-```{.python .input}
+```{.python .input  n=8}
 [random.randint(1, 100) for _ in range(10)]
 ```
 
 How might we check that ``randint`` is really uniform? Intuitively, the best
 strategy would be to run sampler many times, say 1 million, and then count the
 number of times it generates each value to ensure that the results are approximately uniform.
 
-```{.python .input}
+```{.python .input  n=9}
 counts = np.zeros(100)
 fig, axes = plt.subplots(2, 2, sharex=True)
 axes = axes.flatten()
@@ -230,7 +228,7 @@ axes = axes.flatten()
 for i in range(1, 100001):
     counts[random.randint(0, 99)] += 1
     if i in [100, 1000, 10000, 100000]:
-        axes[int(math.log10(i))-2].bar(np.arange(1, 101).asnumpy(), counts)
+        axes[int(np.log10(i))-2].bar(np.arange(1, 101).asnumpy(), counts)
 ```
 
 We can see from these figures that the initial number of counts looks *strikingly* uneven. If we sample fewer than 100 draws from a distribution over
@@ -242,17 +240,17 @@ situation where the probability of drawing a number $x$ is given by $p(x)$.
 
 Drawing from a uniform distribution over a set of 100 outcomes is simple. But what if we have nonuniform probabilities? Let's start with a simple case, a biased coin which comes up heads with probability 0.35 and tails with probability 0.65. A simple way to sample from that is to generate a uniform random variable over $[0,1]$ and if the number is less than $0.35$, we output heads and otherwise we generate tails. Let's try this out.
 
-```{.python .input}
+```{.python .input  n=12}
 # Number of samples
 n = 1000000
-y = onp.random.uniform(0, 1, n)
-x = onp.arange(1, n+1)
+y = np.random.uniform(0, 1, n)
+x = np.arange(1, n+1)
 # Count number of occurrences and divide by the number of total draws
-p0 = onp.cumsum(y < 0.35) / x
-p1 = onp.cumsum(y >= 0.35) / x
+p0 = np.cumsum(y < 0.35) / x
+p1 = np.cumsum(y >= 0.35) / x
 
-plt.semilogx(x, p0)
-plt.semilogx(x, p1)
+plt.semilogx(x.asnumpy(), p0.asnumpy())
+plt.semilogx(x.asnumpy(), p1.asnumpy())
 plt.axhline(y=0.35, color='black', linestyle='dashed')
 plt.axhline(y=0.65, color='black', linestyle='dashed');
 ```
@@ -271,7 +269,7 @@ The standard Normal distribution (aka the standard Gaussian distribution) is giv
 
 ```{.python .input}
 x = np.arange(-10, 10, 0.01)
-p = (1/math.sqrt(2 * math.pi)) * np.exp(-0.5 * x**2)
+p = (1/np.sqrt(2 * np.pi)) * np.exp(-0.5 * x**2)
 plt.plot(x.asnumpy(), p.asnumpy());
 ```
 
@@ -299,21 +297,21 @@ Now we are ready to state one of the most fundamental theorems in statistics, th
 
 ```{.python .input}
 # Generate 10 random sequences of 10,000 uniformly distributed random variables
-tmp = onp.random.uniform(size=(10000,10))
+tmp = np.random.uniform(size=(10000,10))
 x = 1.0 * (tmp > 0.3) + 1.0 * (tmp > 0.8)
 mean = 1 * 0.5 + 2 * 0.2
 variance = 1 * 0.5 + 4 * 0.2 - mean**2
 print('mean {}, variance {}'.format(mean, variance))
 
 # Cumulative sum and normalization
-y = onp.arange(1,10001).reshape(10000,1)
-z = onp.cumsum(x,axis=0) / y
+y = np.arange(1,10001).reshape(10000,1)
+z = np.cumsum(x,axis=0) / y
 
 for i in range(10):
-    plt.semilogx(y, z[:,i])
+    plt.semilogx(y.asnumpy(), z[:,i].asnumpy())
 
-plt.semilogx(y, (variance**0.5) * onp.power(y,-0.5) + mean,'r')
-plt.semilogx(y,-(variance**0.5) * onp.power(y,-0.5) + mean,'r');
+plt.semilogx(y.asnumpy(), ((variance**0.5) * np.power(y,-0.5) + mean).asnumpy(),'r')
+plt.semilogx(y.asnumpy(), (-(variance**0.5) * np.power(y,-0.5) + mean).asnumpy(),'r');
 ```
 
 This looks very similar to the initial example, at least in the limit of averages of large numbers of variables. This is confirmed by theory. Denote by

diff --git a/chapter_deep-learning-computation/parameters.md b/chapter_deep-learning-computation/parameters.md
@@ -173,7 +173,7 @@ class MyInit(init.Initializer):
     def _init_weight(self, name, data):
         print('Init', name, data.shape)
         data[:] = np.random.uniform(-10, 10, data.shape)
-        data *= data.abs() >= 5
+        data *= np.abs(data) >= 5
 
 net.initialize(MyInit(), force_reinit=True)
 net[0].weight.data()[0]
@@ -196,21 +196,21 @@ net = nn.Sequential()
 # We need to give the shared layer a name such that we can reference its
 # parameters
 shared = nn.Dense(8, activation='relu')
-net.add(nn.Dense(8, activation='relu'), # FIXME
-        #shared, # FIXME
-        #nn.Dense(8, activation='relu', params=shared.params),
+net.add(nn.Dense(8, activation='relu'),
+        shared,
+        nn.Dense(8, activation='relu', params=shared.params),
         nn.Dense(10))
 net.initialize()
 
 x = np.random.uniform(size=(2, 20))
 net(x)
 
 # Check whether the parameters are the same
-#print(net[1].weight.data()[0] == net[2].weight.data()[0])
-#net[1].weight.data()[0,0] = 100
+print(net[1].weight.data()[0] == net[2].weight.data()[0])
+net[1].weight.data()[0,0] = 100
 # Make sure that they're actually the same object rather than just having the
 # same value
-#print(net[1].weight.data()[0] == net[2].weight.data()[0])
+print(net[1].weight.data()[0] == net[2].weight.data()[0])
 ```
 
 The above example shows that the parameters of the second and third layer are tied. They are identical rather than just being equal. That is, by changing one of the parameters the other one changes, too. What happens to the gradients is quite ingenious. Since the model parameters contain gradients, the gradients of the second hidden layer and the third hidden layer are accumulated in the `shared.params.grad( )` during backpropagation.

diff --git a/chapter_deep-learning-computation/read-write.md b/chapter_deep-learning-computation/read-write.md
@@ -12,31 +12,31 @@ from mxnet.gluon import nn
 npx.set_np()
 
 x = np.arange(4)
-np.save('x-file', x) 
+npx.save('x-file', x) 
 ```
 
 Then, we read the data from the stored file back into memory.
 
 ```{.python .input}
-x2 = np.load('x-file')
+x2 = npx.load('x-file')
 x2
 ```
 
 We can also store a list of NDArrays and read them back into memory.
 
 ```{.python .input  n=2}
 y = np.zeros(4)
-np.save('x-files', [x, y])
-x2, y2 = np.load('x-files')
+npx.save('x-files', [x, y])
+x2, y2 = npx.load('x-files')
 (x2, y2)
 ```
 
 We can even write and read a dictionary that maps from a string to an NDArray. This is convenient, for instance when we want to read or write all the weights in a model.
 
 ```{.python .input  n=4}
 mydict = {'x': x, 'y': y}
-np.save('mydict', mydict)
-mydict2 = np.load('mydict')
+npx.save('mydict', mydict)
+mydict2 = npx.load('mydict')
 mydict2
 ```
 

diff --git a/chapter_linear-networks/linear-regression.md b/chapter_linear-networks/linear-regression.md
@@ -153,7 +153,6 @@ In model training or prediction, we often use vector calculations and process mu
 import d2l 
 import math
 from mxnet import np
-import numpy as onp
 import time
 
 n = 10000
@@ -190,7 +189,7 @@ class Timer(object):
         
     def cumsum(self):
         """Return the accumuated times"""
-        return onp.array(self.times).cumsum().tolist()
+        return np.array(self.times).cumsum().tolist()
 ```
 
 Now we can benchmark the workloads. One way to add vectors is to add them one coordinate at a time using a for loop.

diff --git a/chapter_multilayer-perceptrons/kaggle-house-price.md b/chapter_multilayer-perceptrons/kaggle-house-price.md
@@ -84,7 +84,6 @@ we can install pandas without even leaving the notebook.
 import d2l
 from mxnet import autograd, gluon, init, np, npx
 from mxnet.gluon import nn
-import numpy as onp
 import pandas as pd
 npx.set_np()
 ```