-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathWriting a Neural Net from Scratch.txt
103 lines (85 loc) · 4.32 KB
/
Writing a Neural Net from Scratch.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
Writing a Neural Net from Scratch - Joe Albahari
https://www.youtube.com/watch?v=z8DY5DndmxI
What is Neuron?
One or more inputs, usually at least two
All inputs have weights
Total input = bias + all inputs
[Input(1), ... Input(n)] and bias -> total input -> activation function -> output
Total input goes through activation function and return an output
In this case, all values are numbers
The simplest neuron is perceptron
[Total Input] -> [Comparator: total input > 0 ?] -> [Output]
Comparator ia the activation function here
If total input > 0, output is 1; else it's 0
Assume this comparator has 3 inputs and their weights are:
Input(1) has weight(1) = 0.5
Input(2) has weight(2) = -1.0
Input(3) has weight(3) = 1.5
Also, Bias = -1.5
For a set of inputs:
Input(1) = 8
Input(2) = 5
Input(3) = 2
Total input = (8 * 0.5) + (5 * -1.0) + (2 * 1.5) + (-1.5) = 0.5
Total input > 0; Therefore the output is 1
The state of of neuron has two aspects:
Long-term state made of bias and weights, determines neuron's behavior
Transient state:
When neuron is firing, there is output
And intermediate variables in determining output like total input
Application: NAND gate
Assuming OFF = 0 and ON = 1
Truth table
Input A Input B Output
0 0 1
1 0 1
0 1 1
1 1 0
Build NAND gate w/ Perceptron
Weight(1) = Weight(2) = -2
Bias = 3
For Input A = 1, Input B = 0, Total input = (1 * -2) + (0 * -2) + 3 = 1
Since total input > 0, the output is 1 (ON)
How to get perceptron to learn about correct values for weights and bias
Can these values be learned from examples?
1. Initialize random (relatively) small values for weights and bias
For example, Weight(1) = 0.2, Weight(2) = -0.3, Bias = -0.1
2. Pass in training data: Input(1) = 0, Input(2) = 1
Total input = (0 * 0.2) + (1 * -0.3) + -0.1 = -0.4
Since total input < 0, the output is 0 (OFF)
However, the desired output is 1 (ON)
3. If little changes are made to weights and bias, would it make output that is less wrong?
Unlikely - since comparative function (comparator) is either ON or OFF (like square wave function)
- it is not conducive to learning
Instead, comparative function can be turned into logistic sigmoid (more smooth continuous curve)
with uncertain area in between
This also has the advantage for building networks of neurons and connect them together.
- because the output is fractional number, it become more expressive
4. The formula for logistic sigmoid: y = 1 / (1 + e^-x)
This is rather expensive to calculate quickly - especially with large sample size and neuron network
5. A good substitution is Rectified Linear Unit (ReLU)
double Relu (double x) => x >= 0 ? x : 0;
6. In most cases, improvements can be made with Leaky Rectified Linear Unit (Leaky ReLU) that
by making result leaky when x < 0
double Relu (double x) => x >= 0 ? x : x / 100;
so we have a quick neuron that returns fractional numbers instead oof whole numbers
7. To interpret fractional numbers as ON or OFF:
For example, when output = 0.2, and 0.2 is closer to 0 (OFF) than 1 (ON)
Therefore result is interpreted as 0 (OFF)
Error = actual output - desired output = 0.2 - 0 = 0.2
(If desired output is 1, error would be -0.8)
8. Discourage individual high error over general low error
- individual high error may result in wrong outut while multiple low errors can produce correct output w/ overall higher error
For example:
Case 1: total error = .2 + .2 + .2 + .2 = .8
Case 2: total error = 0 + 0 + 0 + .6 = .6
while case 2 has lower total error, there is one wrong output,
but all outputs from case 1 are correct - and that's preferred
9. Instead of minimize total error, Loss is the veriable to minimize
Loss function can be defined in many ways, but for categorization - sum of (error)^2 is usually used
Loss = sum of (error)^2
= sum of (actual output - desired output)^2
To produce mathmatical symmetry, Loss = 1/2 * SUM(error)^2
Alternatively, instead of sum, average / mean can be used
Loss is also known as Cost
10. Loss is not usually calculated: