-
Notifications
You must be signed in to change notification settings - Fork 1
/
HowtoLearn.html
134 lines (132 loc) · 14 KB
/
HowtoLearn.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.4"/>
<title>OpenANN: Applying Neural Networks</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><img alt="Logo" src="openann-logo-small.png"/></td>
<td style="padding-left: 0.5em;">
<div id="projectname">OpenANN
 <span id="projectnumber">1.1.0</span>
</div>
<div id="projectbrief">An open source library for artificial neural networks.</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.4 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main Page</span></a></li>
<li><a href="annotated.html"><span>Classes</span></a></li>
<li><a href="files.html"><span>Files</span></a></li>
<li>
<div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</li>
</ul>
</div>
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark"> </span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark"> </span>Classes</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark"> </span>Namespaces</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark"> </span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark"> </span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark"> </span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark"> </span>Typedefs</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark"> </span>Enumerations</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(8)"><span class="SelectionMark"> </span>Enumerator</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(9)"><span class="SelectionMark"> </span>Friends</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(10)"><span class="SelectionMark"> </span>Macros</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(11)"><span class="SelectionMark"> </span>Pages</a></div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
</div><!-- top -->
<div class="header">
<div class="headertitle">
<div class="title">Applying Neural Networks </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>This is a short summary of best practices for applying multilayer neural networks to arbitrary supervised learning problems and the capabilities of OpenANN.</p>
<h1><a class="anchor" id="NetworkArchitecture"></a>
Network Architecture</h1>
<p>The neural network should be as simple as possible to avoid overfitting. Start with a linear network without hidden layers and only add hidden layers or nodes if it improves the performance of the network. In principle, a neural network with one hidden layer, a nonlinear activation function in the hidden layer and a "sufficient" number of hidden units is able to approximate arbitrary functions with arbitrary precision. In practice, adding more layers can improve the performance of the neural network in terms of time. A few number of hidden nodes is usually not sufficient to fit the training set good enough. However, if the number of hidden nodes is to high, the generalization is not good enough, i.e. the neural net overfits the training data. Tuning the network architecture is not simple.</p>
<h1><a class="anchor" id="Layers"></a>
Types of Layers</h1>
<p>A neural network can contain many types of layers. In OpenANN, the multilayer neural network class is called <a class="el" href="classOpenANN_1_1Net.html" title="Feedforward multilayer neural network. ">Net</a>. To initialize a <a class="el" href="classOpenANN_1_1Net.html" title="Feedforward multilayer neural network. ">Net</a> you have to define its layers which is done by calling member functions of <a class="el" href="classOpenANN_1_1Net.html" title="Feedforward multilayer neural network. ">Net</a>. The most important layers are the input layer and the output layer. These are required to specify the input and output dimensions of the network. If there are no hidden layers we are only able to approximate linear functions. To represent more complex functions we can add various types of layers. Here is an incomplete list of available types of hidden layers.</p>
<ul>
<li><a class="el" href="classOpenANN_1_1FullyConnected.html" title="Fully connected layer. ">FullyConnected</a>: each neuron is connected to each neuron of the previous layer.</li>
<li><a class="el" href="classOpenANN_1_1RBM.html" title="Restricted Boltzmann Machine. ">RBM</a>: a restricted boltzmann machine that can be pretrained with unlabeled data.</li>
<li><a class="el" href="classOpenANN_1_1Compressed.html" title="Fully connected layer with compressed weights. ">Compressed</a>: fully connected layer. The I incoming weights of a neuron are represented by M (usually M < I) parameters.</li>
<li><a class="el" href="classOpenANN_1_1Extreme.html" title="Fully connected layer with fixed random weights. ">Extreme</a>: fully connected layer with fixed random weights.</li>
<li><a class="el" href="classOpenANN_1_1Convolutional.html" title="Applies a learnable filter on a 2D or 3D input. ">Convolutional</a>: consists of a number of 2-dimensional feature maps. Each feature map is connected to each feature map of the previous layer. The activations are computed by applying a parametrizable convolution, i. e. this kind of layer uses weight sharing and sparse connections to reduce the number of weights in comparison to fully connected layers.</li>
<li><a class="el" href="classOpenANN_1_1Subsampling.html" title="Performs average pooling on 2D input feature maps. ">Subsampling</a>: these will be used to quickly reduce the number of nodes after a convolution and obtain little translation invarianc. A non-overlapping group of nodes is summed up, multiplied with a weight and added to a learnable bias to obtain the activation of a neuron. This is sometimes called average pooling.</li>
<li><a class="el" href="classOpenANN_1_1MaxPooling.html" title="Performs max-pooling on 2D input feature maps. ">MaxPooling</a>: this is an alternative to subsampling layers and works usually better. Instead of the sum it computes the maximum of a group and has no learnable weights or biases.</li>
<li><a class="el" href="classOpenANN_1_1LocalResponseNormalization.html" title="Local response normalization. ">LocalResponseNormalization</a>: lateral inhibition of neurons at the same positions in adjacent feature maps.</li>
<li><a class="el" href="classOpenANN_1_1AlphaBetaFilter.html" title="A recurrent layer that can be used to smooth the input and estimate its derivative. ">AlphaBetaFilter</a>: this is a recurrent layer that estimates the position and velocity of the inputs from the noisy observation of the positions. Usually we need this layer for partially observable markov decision processes in reinforcement learning.</li>
<li><a class="el" href="classOpenANN_1_1Dropout.html" title="Dropout mask. ">Dropout</a> layer: a technique to increase the generalization of a neural network. Neurons are randomly dropped out during training so that they do not rely on each other.</li>
</ul>
<h1><a class="anchor" id="Functions"></a>
Activation Functions and Error Functions</h1>
<p>For regression problems, the error function that should be optimized is the mean sum of squared errors (MSE) and in the output layer the activation function should be linear (LINEAR). For multiclass classification problems, the error function usually should be cross entropy (CE) and the activation function softmax (SOFTMAX, internally SOFTMAX has the same value as LINEAR, the actual activation function depends on the error function, i.e. it is not possible to use softmax activation function in combination with MSE). Thus, the labels have to be represented through 1-of-c encoding, that is to represent C classes C outputs are required. Each output is binary and only one output should be 1, all other outputs have to be 0. The index of the 1 indicates the actual class c. The predictions of the network might not always be 0 or 1. Since the softmax activation function assures that all outputs sum up to 1, we can even interpret the outputs as class probabilities. To obtain the most likely predicted class, we compute the index of the maximum value. However, for two classes, MSE and TANH activation function sometimes work well enough, i.e. we only need one output and devide its range into two regions of (usually) equal size and each region corresponds to one of the two class.</p>
<p>In the hidden layers, nonlinear activation function are required. Available options are:</p>
<ul>
<li>LOGISTIC</li>
<li>TANH or TANH_SCALED</li>
<li>RECTIFIER</li>
</ul>
<p>We can distinguish saturating activation functions (sigmoid: LOGISTIC, TANH, TANH_SCALED) and non-saturating activation functions (RECTIFIER). The advantage of sigmoid activation function is that they generate more smooth functions. Their disadvantage is that they do not work very well for deep architectures because they make the error gradient of the first layers very small.</p>
<h1><a class="anchor" id="Optimization"></a>
Optimization Algorithm</h1>
<p>We can choose between stochastic gradient descent (<a class="el" href="classOpenANN_1_1MBSGD.html" title="Mini-batch stochastic gradient descent. ">MBSGD</a>), conjugate gradient (<a class="el" href="classOpenANN_1_1CG.html" title="Conjugate Gradient. ">CG</a>), limited storage Broyden-Fletcher-Goldfarb-Shanno (<a class="el" href="classOpenANN_1_1LBFGS.html" title="Limited storage Broyden-Fletcher-Goldfarb-Shanno. ">LBFGS</a>) and Levenberg-Marquardt (<a class="el" href="classOpenANN_1_1LMA.html" title="Levenberg-Marquardt Algorithm. ">LMA</a>). <a class="el" href="classOpenANN_1_1LMA.html" title="Levenberg-Marquardt Algorithm. ">LMA</a> is usally the best algorithm because it uses second-order information of the error function, i.e. it approximates the second derivative. But it has some drawbacks:</p>
<ul>
<li>It works only for MSE.</li>
<li>It has time complexity <img class="formulaInl" alt="$ O(L^3) $" src="form_44.png"/>, where L is the number of weights.</li>
<li>It has space complexity <img class="formulaInl" alt="$ O(LN) $" src="form_74.png"/>, where N is the number of examples.</li>
</ul>
<p>Thus, it is neither applicable for large nets, nor for large datasets. In this case, we often use <a class="el" href="classOpenANN_1_1MBSGD.html" title="Mini-batch stochastic gradient descent. ">MBSGD</a> because it has only <img class="formulaInl" alt="$ O(L) $" src="form_75.png"/> time and space complexity. It usually works very well for large redundant datasets for classification. It might also be useful to take a look at conjugate gradient for datasets that are not redundant, e.g. regression problems. For networks like auto-encoders, L-BFGS is usually the standard optimization algorithm.</p>
<h1><a class="anchor" id="References"></a>
References</h1>
<p>More tips can be found in the following documents. They are freely available.</p>
<p>[1] Sarle, W. S.: Neural Network FAQ, postings to the Usenet newsgroup comp.ai.neural-nets, 1997, <a href="ftp://ftp.sas.com/pub/neural/FAQ.html">ftp://ftp.sas.com/pub/neural/FAQ.html</a></p>
<p>[2] LeCun, Y.; Bottou, L.; Orr, G. B.; Müller, K.-R.: Efficient backprop, Neural Networks: Tricks of the Trade. Springer, pp. 9-50. </p>
</div></div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated on Wed Jul 9 2014 08:57:52 for OpenANN by  <a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.4
</small></address>
</body>
</html>