-
Notifications
You must be signed in to change notification settings - Fork 22
/
backprop-mnist.lhs
651 lines (524 loc) · 22.3 KB
/
backprop-mnist.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
% Learning MNIST with Neural Networks with backprop library
% Justin Le
The *backprop* library performs back-propagation over a *hetereogeneous*
system of relationships. back-propagation is done automatically (as
reverse-mode automatic differentiation), and you work with your values as if
you were writing normal functions with them, with the help of [lens][].
[ad]: http://hackage.haskell.org/package/ad
[lens]: http://hackage.haskell.org/package/lens
Repository source is [on github][repo], and docs are [on hackage][hackage].
[repo]: https://github.com/mstksg/backprop
[hackage]: http://hackage.haskell.org/package/backprop
If you're reading this as a literate haskell file, you should know that a
[rendered pdf version is available on github.][rendered]. If you are reading
this as a pdf file, you should know that a [literate haskell version that
you can run][lhs] is also available on github!
[rendered]: https://github.com/mstksg/backprop/blob/master/renders/backprop-mnist.pdf
[lhs]: https://github.com/mstksg/backprop/blob/master/samples/backprop-mnist.lhs
The (extra) packages involved are:
* hmatrix
* lens
* mnist-idx
* mwc-random
* one-liner-instances
* split
> {-# LANGUAGE BangPatterns #-}
> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE DeriveGeneric #-}
> {-# LANGUAGE FlexibleContexts #-}
> {-# LANGUAGE GADTs #-}
> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> {-# LANGUAGE TemplateHaskell #-}
> {-# LANGUAGE TupleSections #-}
> {-# LANGUAGE TypeApplications #-}
> {-# LANGUAGE ViewPatterns #-}
> {-# OPTIONS_GHC -Wno-incomplete-patterns #-}
> {-# OPTIONS_GHC -Wno-orphans #-}
> {-# OPTIONS_GHC -Wno-unused-top-binds #-}
>
> import Control.DeepSeq
> import Control.Exception
> import Control.Monad
> import Control.Monad.IO.Class
> import Control.Monad.Trans.Maybe
> import Control.Monad.Trans.State
> import Data.Bitraversable
> import Data.Foldable
> import Data.IDX
> import Data.List.Split
> import Data.Time.Clock
> import Data.Traversable
> import Data.Tuple
> import GHC.Generics (Generic)
> import GHC.TypeLits
> import Lens.Micro
> import Lens.Micro.TH
> import Numeric.Backprop
> import Numeric.Backprop.Class
> import Numeric.LinearAlgebra.Static
> import Numeric.OneLiner
> import Text.Printf
> import qualified Data.Vector as V
> import qualified Data.Vector.Generic as VG
> import qualified Data.Vector.Unboxed as VU
> import qualified Numeric.LinearAlgebra as HM
> import qualified System.Random.MWC as MWC
> import qualified System.Random.MWC.Distributions as MWC
Introduction
============
In this walkthrough, we'll be building a classifier for the *[MNIST][]* data
set. This is meant to mirror the [Tensorflow Tutorial][tf-intro] for
beginners.
[tf-intro]: https://www.tensorflow.org/versions/r1.2/get_started/mnist/beginners
Essentially, we use a two-layer artificial neural network -- or a series of
matrix multiplications, differentiable function applications, and vector
additions. We feed our input image to the ANN and then try to get a label
from it. Training an ANN is a matter of finding the right matrices to
multiply by, and the right vectors to add.
To do that, we train our network by treating our network's accuracy as a
function `Network -> Error`. If we can find the gradient of the input network
with respect to the error, we can perform [gradient descent][], and slowly
make our network better and better.
[gradient descent]: https://en.wikipedia.org/wiki/Gradient_descent
Finding the gradient is usually complicated, but *backprop* makes it simpler:
1. Write a function to compute the error from the network
2. That's it!
Hooray! Once you do that, the library finds the gradient function
*automatically*, without any further intervention!
Types
=====
For the most part, we're going to be using the great *[hmatrix][]* library
and its vector and matrix types. It offers a type `L m n` for $m \times n$
matrices, and a type `R n` for an $n$ vector.
[hmatrix]: http://hackage.haskell.org/package/hmatrix
First things first: let's define our neural networks as simple containers
of parameters (weight matrices and bias vectors).
First, a type for layers:
> data Layer i o =
> Layer { _lWeights :: !(L o i)
> , _lBiases :: !(R o)
> }
> deriving (Show, Generic)
>
> instance NFData (Layer i o)
> makeLenses ''Layer
And a type for a simple feed-forward network with two hidden layers:
> data Network i h1 h2 o =
> Net { _nLayer1 :: !(Layer i h1)
> , _nLayer2 :: !(Layer h1 h2)
> , _nLayer3 :: !(Layer h2 o)
> }
> deriving (Show, Generic)
>
> instance NFData (Network i h1 h2 o)
> makeLenses ''Network
These are pretty straightforward container types...pretty much exactly the
type you'd make to represent these networks! Note that, following true
Haskell form, we separate out logic from data. This should be all we need.
Instances
---------
Things are much simplier if we had `Num` and `Fractional` instances for
everything, so let's just go ahead and define that now, as well. Just a
little bit of boilerplate, made easier using *[one-liner-instances][]* to
auto-derive instances using Generics.
[one-liner-instances]: http://hackage.haskell.org/package/one-liner-instances
> instance (KnownNat i, KnownNat o) => Num (Layer i o) where
> (+) = gPlus
> (-) = gMinus
> (*) = gTimes
> negate = gNegate
> abs = gAbs
> signum = gSignum
> fromInteger = gFromInteger
>
> instance ( KnownNat i
> , KnownNat h1
> , KnownNat h2
> , KnownNat o
> ) => Num (Network i h1 h2 o) where
> (+) = gPlus
> (-) = gMinus
> (*) = gTimes
> negate = gNegate
> abs = gAbs
> signum = gSignum
> fromInteger = gFromInteger
>
> instance (KnownNat i, KnownNat o) => Fractional (Layer i o) where
> (/) = gDivide
> recip = gRecip
> fromRational = gFromRational
>
> instance ( KnownNat i
> , KnownNat h1
> , KnownNat h2
> , KnownNat o
> ) => Fractional (Network i h1 h2 o) where
> (/) = gDivide
> recip = gRecip
> fromRational = gFromRational
`KnownNat` comes from *base*; it's a typeclass that *hmatrix* uses to refer
to the numbers in its type and use it to go about its normal hmatrixy
business.
Now we need instances of `Backprop` for our types in order to use them for
automatic differentiation. Luckily, these can be generated automatically
using GHC Generics:
> instance (KnownNat i, KnownNat o) => Backprop (Layer i o)
> instance (KnownNat i, KnownNat h1, KnownNat h2, KnownNat o) => Backprop (Network i h1 h2 o)
Ops
===
Now, *backprop* does require *primitive* differentiable operations on our
relevant types to be defined. *backprop* uses these primitive operations to
tie everything together. Ideally we'd import these from a library that
implements these for you, and the end-user never has to make these primitives.
But in this case, I'm going to put the definitions here to show that there
isn't any magic going on. If you're curious, refer to [documentation for
`Op`][opdoc] for more details on how `Op` is implemented and how this works.
[opdoc]: http://hackage.haskell.org/package/backprop/docs/Numeric-Backprop-Op.html
First, matrix-vector multiplication primitive, giving an explicit gradient
function.
> infixr 8 #>!
> (#>!)
> :: (KnownNat m, KnownNat n, Reifies s W)
> => BVar s (L m n)
> -> BVar s (R n)
> -> BVar s (R m)
> (#>!) = liftOp2 . op2 $ \m v ->
> ( m #> v, \g -> (g `outer` v, tr m #> g) )
Dot products would be nice too.
> infixr 8 <.>!
> (<.>!)
> :: (KnownNat n, Reifies s W)
> => BVar s (R n)
> -> BVar s (R n)
> -> BVar s Double
> (<.>!) = liftOp2 . op2 $ \x y ->
> ( x <.> y, \g -> (konst g * y, x * konst g)
> )
Also a function to fill a vector with the same element:
> konst'
> :: (KnownNat n, Reifies s W)
> => BVar s Double
> -> BVar s (R n)
> konst' = liftOp1 . op1 $ \c -> (konst c, HM.sumElements . extract)
Finally, an operation to sum all of the items in the vector.
> sumElements'
> :: (KnownNat n, Reifies s W)
> => BVar s (R n)
> -> BVar s Double
> sumElements' = liftOp1 . op1 $ \x -> (HM.sumElements (extract x), konst)
Again, these are not intended to be used by end-users of *backprop*, but
rather are meant to be provided by libraries as primitive operations for users
of the library to use.
Running our Network
===================
Now that we have our primitives in place, let's actually write a function
to run our network! And, once we do this, we automatically also have
functions to back-propagate our network!
Normally, to write this function, we'd write:
> runLayerNormal
> :: (KnownNat i, KnownNat o)
> => Layer i o
> -> R i
> -> R o
> runLayerNormal l x = (l ^. lWeights) #> x + (l ^. lBiases)
> {-# INLINE runLayerNormal #-}
Using the `lWeights` and `lBiases` lenses to access the weights and biases of
our layer. However, we can translate this to *backprop* by operating on
`BVar`s instead of the type directly, and using our backprop-aware `#>!`:
> runLayer
> :: (KnownNat i, KnownNat o, Reifies s W)
> => BVar s (Layer i o)
> -> BVar s (R i)
> -> BVar s (R o)
> runLayer l x = (l ^^. lWeights) #>! x + (l ^^. lBiases)
> {-# INLINE runLayer #-}
`^.` lets to access data within a value using a lens, and `^^.` lets you
access data within a `BVar` using a lens:
```haskell
(^.) :: a -> Lens' a b -> b
(^^.) :: BVar s a -> Lens' a b -> BVar s b
```
(There is also `^^?`, which can use a `Prism` or `Traversal` to extract a
target that might not exist, `^^..`, which uses a `Traversal` to extract all
targets, and `.~~`, which uses a `Lens` to update a value inside `BVar`)
Now `runLayer` is a function on two inputs that can be backpropagated,
automatically! We can find its gradient given any input, and also run it to
get our expected output as well.
Before writing our final network runner, we need a function to compute the
"softmax" of our output vector. Writing it normally would look like:
> softMaxNormal :: KnownNat n => R n -> R n
> softMaxNormal x = konst (1 / HM.sumElements (extract expx)) * expx
> where
> expx = exp x
> {-# INLINE softMaxNormal #-}
But we can make the mechanical shift to the backpropagatable version:
> softMax :: (KnownNat n, Reifies s W) => BVar s (R n) -> BVar s (R n)
> softMax x = konst' (1 / sumElements' expx) * expx
> where
> expx = exp x
> {-# INLINE softMax #-}
We also need the [logistic function][], which is our activation function
between layer outputs. Because `BVar`s have a `Floating` instance, we can just
write it using typeclass functions.
[logistic function]: https://en.wikipedia.org/wiki/Logistic_function
> logistic :: Floating a => a -> a
> logistic x = 1 / (1 + exp (-x))
> {-# INLINE logistic #-}
With those in hand, let's compare how we would normally write a function to run
our network:
> runNetNormal
> :: (KnownNat i, KnownNat h1, KnownNat h2, KnownNat o)
> => Network i h1 h2 o
> -> R i
> -> R o
> runNetNormal n = softMaxNormal
> . runLayerNormal (n ^. nLayer3)
> . logistic
> . runLayerNormal (n ^. nLayer2)
> . logistic
> . runLayerNormal (n ^. nLayer1)
> {-# INLINE runNetNormal #-}
Basic function composition, neat. We use our lenses `nLayer1`, `nLayer2`, and
`nLayer3` to extract the first, second, and third layers from our network.
Writing it in a way that backprop can use is also very similar:
> runNetwork
> :: (KnownNat i, KnownNat h1, KnownNat h2, KnownNat o, Reifies s W)
> => BVar s (Network i h1 h2 o)
> -> R i
> -> BVar s (R o)
> runNetwork n = softMax
> . runLayer (n ^^. nLayer3)
> . logistic
> . runLayer (n ^^. nLayer2)
> . logistic
> . runLayer (n ^^. nLayer1)
> . constVar
> {-# INLINE runNetwork #-}
We use `constVar` on the input vector, because we don't care about its
gradient and so treat it as a constant.
And now here again we use `^^.` (instead of `^.`) to extract a value from our
`BVar` of a `Network`, using a lens.
Computing Errors
----------------
Now, training a neural network is about calculating its gradient with respect
to some error function. The library calculatues the gradient for us -- we
just need to tell it how to compute the error function.
For classification problems, we usually use a [cross entropy][] error. Given
a target vector, how does our neural network's output differ from what is
expected? Lower numbers are better!
[cross entropy]: https://en.wikipedia.org/wiki/Cross_entropy
Again, let's look at a "normal" implementation, regular variables and no
backprop:
> crossEntropyNormal :: KnownNat n => R n -> R n -> Double
> crossEntropyNormal targ res = -(log res <.> targ)
> {-# INLINE crossEntropyNormal #-}
And we can see that the backpropable version is pretty similar. We see
`constVar t`, to introduce a `BVar` that is a constant value (that we don't
care about the gradient of).
> crossEntropy
> :: (KnownNat n, Reifies s W)
> => R n
> -> BVar s (R n)
> -> BVar s Double
> crossEntropy targ res = -(log res <.>! constVar targ)
> {-# INLINE crossEntropy #-}
Our final "error function", then, is:
> netErr
> :: (KnownNat i, KnownNat h1, KnownNat h2, KnownNat o, Reifies s W)
> => R i
> -> R o
> -> BVar s (Network i h1 h2 o)
> -> BVar s Double
> netErr x targ n = crossEntropy targ (runNetwork n x)
> {-# INLINE netErr #-}
The Magic
=========
The actual "magic" of the library happens with the functions to "run" the
functions we defined earlier:
```haskell
evalBP :: (forall s. Reifies s W => BVar s a -> BVar s b) -> a -> b
gradBP :: (forall s. Reifies s W => BVar s a -> BVar s b) -> a -> a
backprop :: (forall s. Reifies s W => BVar s a -> BVar s b) -> a -> (b, a)
```
`evalBP` "runs" the function like normal, `gradBP` computes the gradient of
the function, and `backprop` computes both the result and the gradient.
So, if we have a network `net0`, an input vector `x`, and a target vector `t`,
we could compute its error using:
```haskell
evalBP (netErr x targ) net0 :: Double
```
And we can calculate its *gradient* using:
```haskell
gradBP (netErr x targ) net0 :: (Network i h1 h2 o, R i)
```
Pulling it all together
=======================
Let's write a simple function to step our network in the direction opposite of
the gradient to train our model:
> trainStep
> :: forall i h1 h2 o. (KnownNat i, KnownNat h1, KnownNat h2, KnownNat o)
> => Double -- ^ learning rate
> -> R i -- ^ input
> -> R o -- ^ target
> -> Network i h1 h2 o -- ^ initial network
> -> Network i h1 h2 o
> trainStep r !x !targ !n = n - realToFrac r * gradBP (netErr x targ) n
> {-# INLINE trainStep #-}
Here's a convenient wrapper for training over all of the observations in a
list:
> trainList
> :: (KnownNat i, KnownNat h1, KnownNat h2, KnownNat o)
> => Double -- ^ learning rate
> -> [(R i, R o)] -- ^ input and target pairs
> -> Network i h1 h2 o -- ^ initial network
> -> Network i h1 h2 o
> trainList r = flip $ foldl' (\n (x,y) -> trainStep r x y n)
> {-# INLINE trainList #-}
`testNet` will be a quick way to test our net by computing the percentage
of correct guesses: (mostly using *hmatrix* stuff, so don't mind too much)
> testNet
> :: forall i h1 h2 o. (KnownNat i, KnownNat h1, KnownNat h2, KnownNat o)
> => [(R i, R o)]
> -> Network i h1 h2 o
> -> Double
> testNet xs n = sum (map (uncurry test) xs) / fromIntegral (length xs)
> where
> test :: R i -> R o -> Double -- test if the max index is correct
> test x (extract->t)
> | HM.maxIndex t == HM.maxIndex (extract r) = 1
> | otherwise = 0
> where
> r :: R o
> r = evalBP (`runNetwork` x) n
And now, a main loop!
If you are following along at home, download the [mnist data set files][MNIST]
and uncompress them into the folder `data`, and everything should work fine.
[MNIST]: http://yann.lecun.com/exdb/mnist/
> main :: IO ()
> main = MWC.withSystemRandom $ \g -> do
> Just train <- loadMNIST "data/train-images-idx3-ubyte" "data/train-labels-idx1-ubyte"
> Just test <- loadMNIST "data/t10k-images-idx3-ubyte" "data/t10k-labels-idx1-ubyte"
> putStrLn "Loaded data."
> net0 <- MWC.uniformR @(Network 784 300 100 10) (-0.5, 0.5) g
> flip evalStateT net0 . forM_ [1..] $ \e -> do
> train' <- liftIO . fmap V.toList $ MWC.uniformShuffle (V.fromList train) g
> liftIO $ printf "[Epoch %d]\n" (e :: Int)
>
> forM_ ([1..] `zip` chunksOf batch train') $ \(b, chnk) -> StateT $ \n0 -> do
> printf "(Batch %d)\n" (b :: Int)
>
> t0 <- getCurrentTime
> n' <- evaluate . force $ trainList rate chnk n0
> t1 <- getCurrentTime
> printf "Trained on %d points in %s.\n" batch (show (t1 `diffUTCTime` t0))
>
> let trainScore = testNet chnk n'
> testScore = testNet test n'
> printf "Training error: %.2f%%\n" ((1 - trainScore) * 100)
> printf "Validation error: %.2f%%\n" ((1 - testScore ) * 100)
>
> return ((), n')
> where
> rate = 0.02
> batch = 5000
Each iteration of the loop:
1. Shuffles the training set
2. Splits it into chunks of `batch` size
3. Uses `trainList` to train over the batch
4. Computes the score based on `testNet` based on the training set and the
test set
5. Prints out the results
And, that's really it!
Performance
-----------
Currently, benchmarks show that *running* the network has virtually zero
overhead (~ 4%) over writing the running function directly. The actual
gradient descent process (compute gradient, then descend) carries about 60%
overhead over writing the gradients manually, but it is unclear how much of
this is because of the library, and how much of it is just because of
automatic differentation giving slightly less efficient matrix/vector
multiplication operations.
The [README][repo] has some more detailed benchmarks and statistics, if you
want to get more detailed information.
Main takeaways
==============
Most of the actual heavy lifting/logic actually came from the *hmatrix*
library itself. We just created simple types to wrap up our bare matrices.
Basically, all that *backprop* did was give you an API to define *how to
run* a neural net --- how to *run* a net based on a `Network` and `R i` input
you were given. The goal of the library is to let you write down how to
run things in as natural way as possible.
And then, after things are run, we can just get the gradient and roll from
there!
Because the heavy lifting is done by the data types themselves, we can
presumably plug in *any* type and any tensor/numerical backend, and reap
the benefits of those libraries' optimizations and parallelizations. *Any*
type can be backpropagated! :D
What now?
---------
Check out the docs for the [Numeric.Backprop][] module for a more detailed
picture of what's going on, or find more examples at the [github repo][repo]!
[Numeric.Backprop]: http://hackage.haskell.org/package/backprop/docs/Numeric-Backprop.html
Also, check out follow-up writeup to this tutorial, expanding on using the
library with more advanced extensible neural network types, like the ones
described in [this blog post][blog]. Check out the [literate haskell
here][neural-lhs], and the [rendered PDF here][neural-pdf].
[blog]: https://blog.jle.im/entries/series/+practical-dependent-types-in-haskell.html
[neural-lhs]: https://github.com/mstksg/backprop/blob/master/samples/extensible-neural.lhs
[neural-pdf]: https://github.com/mstksg/backprop/blob/master/renders/extensible-neural.pdf
Boring stuff
============
Here is a small wrapper function over the [mnist-idx][] library loading the
contents of the idx files into *hmatrix* vectors:
[mnist-idx]: http://hackage.haskell.org/package/mnist-idx
> loadMNIST
> :: FilePath
> -> FilePath
> -> IO (Maybe [(R 784, R 10)])
> loadMNIST fpI fpL = runMaybeT $ do
> i <- MaybeT $ decodeIDXFile fpI
> l <- MaybeT $ decodeIDXLabelsFile fpL
> d <- MaybeT . return $ labeledIntData l i
> r <- MaybeT . return $ for d (bitraverse mkImage mkLabel . swap)
> liftIO . evaluate $ force r
> where
> mkImage :: VU.Vector Int -> Maybe (R 784)
> mkImage = create . VG.convert . VG.map (\i -> fromIntegral i / 255)
> mkLabel :: Int -> Maybe (R 10)
> mkLabel n = create $ HM.build 10 (\i -> if round i == n then 1 else 0)
And here are instances to generating random
vectors/matrices/layers/networks, used for the initialization step.
> instance KnownNat n => MWC.Variate (R n) where
> uniform g = randomVector <$> MWC.uniform g <*> pure Uniform
> uniformR (l, h) g = (\x -> x * (h - l) + l) <$> MWC.uniform g
>
> instance (KnownNat m, KnownNat n) => MWC.Variate (L m n) where
> uniform g = uniformSample <$> MWC.uniform g <*> pure 0 <*> pure 1
> uniformR (l, h) g = (\x -> x * (h - l) + l) <$> MWC.uniform g
>
> instance (KnownNat i, KnownNat o) => MWC.Variate (Layer i o) where
> uniform g = Layer <$> MWC.uniform g <*> MWC.uniform g
> uniformR (l, h) g = (\x -> x * (h - l) + l) <$> MWC.uniform g
>
> instance ( KnownNat i
> , KnownNat h1
> , KnownNat h2
> , KnownNat o
> )
> => MWC.Variate (Network i h1 h2 o) where
> uniform g = Net <$> MWC.uniform g <*> MWC.uniform g <*> MWC.uniform g
> uniformR (l, h) g = (\x -> x * (h - l) + l) <$> MWC.uniform g
Also, some orphan instances of `Backprop` for vector and matrix types. These
are provided by the [hmatrix-backprop][] library normally:
> instance Backprop (R n) where
> zero = zeroNum
> add = addNum
> one = oneNum
>
> instance (KnownNat n, KnownNat m) => Backprop (L m n) where
> zero = zeroNum
> add = addNum
> one = oneNum
[hmatrix-backprop]: http://hackage.haskell.org/package/hmatrix-backprop