Skip to content

Commit afdc435

Browse files
authored
Merge pull request #7 from dvgodoy/boundaries
Boundaries
2 parents 528877a + 1f9f937 commit afdc435

File tree

8 files changed

+7385
-16654
lines changed

8 files changed

+7385
-16654
lines changed

README.md

+39-9
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,15 @@ It contains:
1616
- a class ***Replay***, which leverages the collected data to build several kinds of visualizations.
1717

1818
The available visualizations are:
19-
- ***Feature Space***: plot of a 2-D grid representing the twisted and turned feature space, corresponding to the output of a hidden layer (only 2-unit hidden layers supported for now);
20-
- ***Probabilities***: histograms of the resulting class probabilities for the inputs, corresponding to the output of the final layer (only binary classification supported for now);
19+
- ***Feature Space***: plot representing the twisted and turned feature space, corresponding to the output of a hidden layer (only 2-unit hidden layers supported for now), including grid lines if the input is 2-dimensional;
20+
- ***Decision Boundary***: plot of a 2-D grid representing the original feature space, together with the decision boundary (only 2-dimensional inputs supported for now);
21+
- ***Probabilities***: two histograms of the resulting classification probabilities for the inputs, corresponding to the output of the final layer (only binary classification supported for now);
2122
- ***Loss and Metric***: line plot for the loss and a chosen metric, computed over all the inputs;
2223
- ***Losses***: histogram of the losses computed over all the inputs (only binary cross-entropy loss suported for now).
2324

24-
Feature Space | Class Probability | Loss/Metric | Losses
25-
:-:|:-:|:-:|:-:
26-
![Feature Space](/images/feature_space.png) | ![Probability Histogram](/images/prob_histogram.png) | ![Loss and Metric](/images/loss_and_metric.png) | ![Loss Histogram](/images/loss_histogram.png)
25+
Feature Space | Decision Boundary | Class Probability | Loss/Metric | Losses
26+
:-:|:-:|:-:|:-:|:-:
27+
![Feature Space](/images/feature_space.png) | ![Decision Boundary](/images/decision_boundary.png) | ![Probability Histogram](/images/prob_histogram.png) | ![Loss and Metric](/images/loss_and_metric.png) | ![Loss Histogram](/images/loss_histogram.png)
2728

2829
### Google Colab
2930

@@ -129,17 +130,43 @@ fs.animate().save('feature_space_animation.mp4', dpi=120, writer=writer)
129130

130131
## FAQ
131132

132-
### Grid lines are missing!
133+
### 1. Grid lines are missing!
133134

134135
Does your input have more than 2 dimensions? If so, this is expected, as grid lines are only plot for 2-dimensional inputs.
135136

136137
If your input is 2-dimensional and grid lines are missing nonetheless, please open an [issue](https://github.com/dvgodoy/deepreplay/issues).
137138

138-
### My hidden layer has more than 2 units! How can I plot it anyway?
139+
### 2. My hidden layer has more than 2 units! How can I plot it anyway?
139140

140141
Apart from toy datasets, it is likely the (last) hidden layer has more than 2 units. But ***DeepReplay*** only supports ***FeatureSpace*** plots based on 2-unit hidden layers. So, what can you do?
141142

142-
Well, you can add an extra hidden layer with ***2 units*** and a ***LINEAR*** activation function and tell ****DeepReplay*** to use this layer for plotting the ***FeatureSpace***!
143+
There are two different ways of handling this: if your inputs are 2-dimensional, you can plot them directly, together with the decision boundary. Otherwise, you can (train and) plot 2-dimensional latent space.
144+
145+
#### 2.1 Using Raw Inputs
146+
147+
Instead of using ***FeatureSpace***, you can use ***DecisionBoundary*** and plot the inputs in their original feature space, with the decision boundary as of any given epoch.
148+
149+
In this case, there is no need to specify any layer, as it will use the raw inputs.
150+
151+
```python
152+
## Input layer has 2 units
153+
## Hidden layer has 10 units
154+
model = Sequential()
155+
model.add(Dense(input_dim=2, units=10, kernel_initializer='he', activation='tanh'))
156+
157+
## Typical output layer for binary classification
158+
model.add(Dense(units=1, kernel_initializer='normal', activation='sigmoid', name='output'))
159+
160+
...
161+
162+
fs = replay.build_decision_boundary(ax_fs)
163+
```
164+
165+
For an example, check the [Circles Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/circles_dataset.ipynb).
166+
167+
#### 2.2 Using a Latent Space
168+
169+
You can add an extra hidden layer with ***2 units*** and a ***LINEAR*** activation function and tell ***DeepReplay*** to use this layer for plotting the ***FeatureSpace***!
143170

144171
```python
145172
## Input layer has 57 units
@@ -160,7 +187,10 @@ fs = replay.build_feature_space(ax_fs, layer_name='hidden')
160187

161188
By doing so, you will be including a transformation from a highly dimensional space to a 2-dimensional space, which is also going to be learned by the network.
162189

163-
For examples, check either the [Circles Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/circles_dataset.ipynb) or [UCI Spambase Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/UCI_spambase_dataset.ipynb) notebooks.
190+
In fact, the model will be learning a 2-dimensional latent space, which will then feed the last layer. You can think of this as a logistic regression with 2 inputs, in this case, the latent factors.
191+
192+
For examples, check either the [Moons Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/moons_dataset.ipynb) or [UCI Spambase Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/UCI_spambase_dataset.ipynb) notebooks.
193+
164194

165195
## Comments, questions, suggestions, bugs
166196

deepreplay/plot.py

+22-2
Original file line numberDiff line numberDiff line change
@@ -150,14 +150,15 @@ def compose_plots(objects, epoch, title=''):
150150
title += ' - '
151151
fig.suptitle('{}Epoch {}'.format(title, epoch), fontsize=14)
152152
fig.tight_layout()
153-
fig.subplots_adjust(top=0.9)
153+
fig.subplots_adjust(top=0.85)
154154
return fig
155155

156156
class Basic(object):
157157
"""Basic plot class, NOT to be instantiated directly.
158158
"""
159159
def __init__(self, ax):
160160
self._title = ''
161+
self._custom_title = ''
161162
self.n_epochs = 0
162163

163164
self.ax = ax
@@ -166,7 +167,11 @@ def __init__(self, ax):
166167

167168
@property
168169
def title(self):
169-
return self._title if isinstance(self._title, tuple) else (self._title,)
170+
title = self._title
171+
if not isinstance(title, tuple):
172+
title = (self._title,)
173+
title = tuple([' '.join([self._custom_title, t]) for t in title])
174+
return title
170175

171176
@property
172177
def axes(self):
@@ -183,6 +188,20 @@ def _prepare_plot(self):
183188
def _update(i, object, epoch_start=0):
184189
pass
185190

191+
def set_title(self, title):
192+
"""Prepends a custom title to the plot.
193+
194+
Parameters
195+
----------
196+
title: String
197+
Custom title to prepend.
198+
199+
Returns
200+
-------
201+
None
202+
"""
203+
self._custom_title = title
204+
186205
def plot(self, epoch):
187206
"""Plots data at a given epoch.
188207
@@ -353,6 +372,7 @@ class ProbabilityHistogram(Basic):
353372
"""
354373
def __init__(self, ax1, ax2):
355374
self._title = ('Negative Cases', 'Positive Cases')
375+
self._custom_title = ''
356376
self.ax1 = ax1
357377
self.ax2 = ax2
358378
self.ax1.clear()

deepreplay/replay.py

+98
Original file line numberDiff line numberDiff line change
@@ -97,11 +97,13 @@ def __init__(self, replay_filename, group_name, model_filename=''):
9797
self._loss_hist_data = None
9898
self._loss_and_metric_data = None
9999
self._prob_hist_data = None
100+
self._decision_boundary_data = None
100101
# Attributes for the visualizations - Plot objects
101102
self._feature_space_plot = None
102103
self._loss_hist_plot = None
103104
self._loss_and_metric_plot = None
104105
self._prob_hist_plot = None
106+
self._decision_boundary_plot = None
105107

106108
def _retrieve_weights(self):
107109
# Generates ranges for the number of different weight arrays in each layer
@@ -124,6 +126,10 @@ def _make_function(self, inputs, layer):
124126
def _predict_proba(self, inputs, weights):
125127
return self._get_output([self.learning_phase, inputs] + weights)
126128

129+
@property
130+
def decision_boundary(self):
131+
return self._decision_boundary_plot, self._decision_boundary_data
132+
127133
@property
128134
def feature_space(self):
129135
return self._feature_space_plot, self._feature_space_data
@@ -328,6 +334,98 @@ def build_probability_histogram(self, ax_negative, ax_positive, epoch_start=0, e
328334
self._prob_hist_plot = ProbabilityHistogram(ax_negative, ax_positive).load_data(self._prob_hist_data)
329335
return self._prob_hist_plot
330336

337+
def build_decision_boundary(self, ax, contour_points=1000, xlim=(-1, 1), ylim=(-1, 1), display_grid=True,
338+
epoch_start=0, epoch_end=-1):
339+
"""Builds a FeatureSpace object to be used for plotting and
340+
animating the raw inputs and the decision boundary.
341+
The underlying data, that is, grid lines, inputs and contour
342+
lines, as well as the corresponding predictions for the
343+
contour lines, can be later accessed as the second element of
344+
the `decision_boundary` property.
345+
346+
Only inputs with 2 dimensions are supported!
347+
348+
Parameters
349+
----------
350+
ax: AxesSubplot
351+
Subplot of a Matplotlib figure.
352+
contour_points: int, optional
353+
Number of points in each axis of the contour.
354+
Default is 1,000.
355+
xlim: tuple of ints, optional
356+
Boundaries for the X axis of the grid.
357+
ylim: tuple of ints, optional
358+
Boundaries for the Y axis of the grid.
359+
display_grid: boolean, optional
360+
If True, display grid lines (for 2-dimensional inputs).
361+
Default is True.
362+
epoch_start: int, optional
363+
First epoch to consider.
364+
epoch_end: int, optional
365+
Last epoch to consider.
366+
367+
Returns
368+
-------
369+
decision_boundary_plot: FeatureSpace
370+
An instance of a FeatureSpace object to make plots and
371+
animations.
372+
"""
373+
input_dims = self.model.input_shape[-1]
374+
assert input_dims == 2, 'Only layers with 2-dimensional inputs are supported!'
375+
376+
if epoch_end == -1:
377+
epoch_end = self.n_epochs
378+
epoch_end = min(epoch_end, self.n_epochs)
379+
380+
X = self.inputs
381+
y = self.targets
382+
383+
y_ind = y.squeeze().argsort()
384+
X = X.squeeze()[y_ind].reshape(X.shape)
385+
y = y.squeeze()[y_ind]
386+
387+
n_classes = len(np.unique(y))
388+
389+
# Builds a 2D grid and the corresponding contour coordinates
390+
grid_lines = np.array([])
391+
if display_grid:
392+
grid_lines = build_2d_grid(xlim, ylim)
393+
394+
contour_lines = build_2d_grid(xlim, ylim, contour_points, contour_points)
395+
get_predictions = self._make_function(self.model.inputs, self.model.layers[-1])
396+
397+
bent_lines = []
398+
bent_inputs = []
399+
bent_contour_lines = []
400+
bent_preds = []
401+
# For each epoch, uses the corresponding weights
402+
for epoch in range(epoch_start, epoch_end + 1):
403+
weights = self.weights[epoch]
404+
405+
bent_lines.append(grid_lines)
406+
bent_inputs.append(X)
407+
bent_contour_lines.append(contour_lines)
408+
409+
inputs = [TEST_MODE, contour_lines.reshape(-1, 2)] + weights
410+
output_shape = (contour_lines.shape[:2]) + (-1,)
411+
# Makes predictions for each point in the contour surface
412+
bent_preds.append((get_predictions(inputs=inputs)[0].reshape(output_shape) > .5).astype(np.int))
413+
414+
# Makes lists into ndarrays and wrap them as namedtuples
415+
bent_inputs = np.array(bent_inputs)
416+
bent_lines = np.array(bent_lines)
417+
bent_contour_lines = np.array(bent_contour_lines)
418+
bent_preds = np.array(bent_preds)
419+
420+
line_data = FeatureSpaceLines(grid=grid_lines, input=X, contour=contour_lines)
421+
bent_line_data = FeatureSpaceLines(grid=bent_lines, input=bent_inputs, contour=bent_contour_lines)
422+
self._decision_boundary_data = FeatureSpaceData(line=line_data, bent_line=bent_line_data,
423+
prediction=bent_preds, target=y)
424+
425+
# Creates a FeatureSpace plot object and load data into it
426+
self._decision_boundary_plot = FeatureSpace(ax, True).load_data(self._decision_boundary_data)
427+
return self._decision_boundary_plot
428+
331429
def build_feature_space(self, ax, layer_name, contour_points=1000, xlim=(-1, 1), ylim=(-1, 1), scale_fixed=True,
332430
display_grid=True, epoch_start=0, epoch_end=-1):
333431
"""Builds a FeatureSpace object to be used for plotting and

examples/circles_dataset.py

+4-9
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
X, y = make_circles(n_samples=2000, random_state=27, noise=0.03)
1717

18-
sgd = SGD(lr=0.01)
18+
sgd = SGD(lr=0.02)
1919

2020
he_initializer = he_normal(seed=42)
2121
normal_initializer = normal(seed=42)
@@ -30,10 +30,6 @@
3030
model.add(Dense(units=3,
3131
kernel_initializer=he_initializer))
3232
model.add(Activation('relu'))
33-
model.add(Dense(units=2,
34-
kernel_initializer=normal_initializer,
35-
activation='linear',
36-
name='hidden'))
3733
model.add(Dense(units=1,
3834
kernel_initializer=normal_initializer,
3935
activation='sigmoid',
@@ -43,7 +39,7 @@
4339
optimizer=sgd,
4440
metrics=['acc'])
4541

46-
model.fit(X, y, epochs=300, batch_size=16, callbacks=[replaydata])
42+
model.fit(X, y, epochs=200, batch_size=16, callbacks=[replaydata])
4743

4844
replay = Replay(replay_filename='circles_dataset.h5', group_name=group_name)
4945

@@ -54,13 +50,12 @@
5450
ax_lm = plt.subplot2grid((2, 4), (0, 3))
5551
ax_lh = plt.subplot2grid((2, 4), (1, 3))
5652

57-
fs = replay.build_feature_space(ax_fs, layer_name='hidden',
58-
display_grid=False, scale_fixed=False)
53+
fs = replay.build_decision_boundary(ax_fs, xlim=(-1.5, 1.5), ylim=(-1.5, 1.5))
5954
ph = replay.build_probability_histogram(ax_ph_neg, ax_ph_pos)
6055
lh = replay.build_loss_histogram(ax_lh)
6156
lm = replay.build_loss_and_metric(ax_lm, 'acc')
6257

63-
sample_figure = compose_plots([fs, ph, lm, lh], 280)
58+
sample_figure = compose_plots([fs, ph, lm, lh], 150)
6459
sample_figure.savefig('circles.png', dpi=120, format='png')
6560

6661
sample_anim = compose_animations([fs, ph, lm, lh])
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
from keras.layers import Dense
2+
from keras.models import Sequential
3+
from keras.optimizers import SGD
4+
from keras.initializers import glorot_normal, normal
5+
6+
from deepreplay.datasets.parabola import load_data
7+
from deepreplay.callbacks import ReplayData
8+
from deepreplay.replay import Replay
9+
from deepreplay.plot import compose_animations, compose_plots
10+
11+
import matplotlib.pyplot as plt
12+
13+
X, y = load_data()
14+
15+
sgd = SGD(lr=0.05)
16+
17+
for activation in ['sigmoid', 'tanh', 'relu']:
18+
glorot_initializer = glorot_normal(seed=42)
19+
normal_initializer = normal(seed=42)
20+
21+
replaydata = ReplayData(X, y, filename='comparison_activation_functions.h5', group_name=activation)
22+
23+
model = Sequential()
24+
model.add(Dense(input_dim=2,
25+
units=2,
26+
kernel_initializer=glorot_initializer,
27+
activation=activation,
28+
name='hidden'))
29+
30+
model.add(Dense(units=1,
31+
kernel_initializer=normal_initializer,
32+
activation='sigmoid',
33+
name='output'))
34+
35+
model.compile(loss='binary_crossentropy',
36+
optimizer=sgd,
37+
metrics=['acc'])
38+
39+
model.fit(X, y, epochs=150, batch_size=16, callbacks=[replaydata])
40+
41+
fig, axs = plt.subplots(1, 3, figsize=(12, 4))
42+
43+
replays = []
44+
for activation in ['sigmoid', 'tanh', 'relu']:
45+
replays.append(Replay(replay_filename='comparison_activation_functions.h5', group_name=activation))
46+
47+
spaces = []
48+
for ax, replay, activation in zip(axs, replays, ['sigmoid', 'tanh', 'relu']):
49+
space = replay.build_feature_space(ax, layer_name='hidden')
50+
space.set_title(activation)
51+
spaces.append(space)
52+
53+
sample_figure = compose_plots(spaces, 80)
54+
sample_figure.savefig('comparison.png', dpi=120, format='png')
55+
56+
#sample_anim = compose_animations(spaces)
57+
#sample_anim.save(filename='comparison.mp4', dpi=120, fps=5)
58+

images/decision_boundary.png

14.2 KB
Loading

0 commit comments

Comments
 (0)