secondmind-labs · vdutor · Aug 24, 2021 · Aug 2, 2021 · Aug 2, 2021 · Aug 2, 2021
diff --git a/docs/notebooks/deep_cde.ipynb b/docs/notebooks/deep_cde.ipynb
@@ -234,7 +234,8 @@
    "source": [
     "## Deep Gaussian process with latent variables\n",
     "\n",
-    "We suggest a Deep Gaussian process with a latent variable in the first layer to improve the error bars on the given dataset. The latent variable allows to model the heteroscedasticity, while an extra layer makes the model more expressive to catch sharp transitions.\n",
+    "To tackle the problem we suggest a Deep Gaussian process with a latent variable in the first layer. The latent variable will be able to capture the \n",
+    "heteroscedasticity, while the two layered deep GP is able to model the sharp transitions. \n",
     "\n",
     "Note that a GPflux Deep Gaussian process by itself (i.e. without the latent variable layer) is not able to capture the heteroscedasticity of this dataset. This is a consequence of the noise-less hidden layers and the doubly-stochastic variational inference training procedure, as forumated in <cite data-cite=\"salimbeni2017doubly\">. On the contrary, the original deep GP suggested by Damianou and Lawrence <cite data-cite=\"damianou2013deep\">, using a different variational approximation for training, can model this dataset without a latent variable, as shown in [this blogpost](https://inverseprobability.com/talks/notes/deep-gps.html). "
    ],
@@ -380,7 +381,7 @@
    "source": [
     "### Fit\n",
     "\n",
-    "We can now fit the model. Because of the `DirectlyParameterizedEncoder`, which stores a sorted array of means and std. dev. for each point in the dataset, it is important to set the `batch_size` to the number of datapoints and set `shuffle` to `False`."
+    "We can now fit the model. Because of the `DirectlyParameterizedEncoder` it is important to set the batch size to the number of datapoints and turn off shuffle. This is so that we use the associated latent variable for each datapoint. If we would use an Amortized Encoder network this would not be necessary."
    ],
    "metadata": {}
   },

diff --git a/docs/notebooks/gpflux_with_keras_layers.py b/docs/notebooks/gpflux_with_keras_layers.py
@@ -73,7 +73,7 @@
 """
 
 # %%
-likelihood = gpflow.likelihoods.Gaussian(0.001)
+likelihood = gpflow.likelihoods.Gaussian(0.1)
 
 # So that Keras can track the likelihood variance, we need to provide the likelihood as part of a "dummy" layer:
 likelihood_container = gpflux.layers.TrackableLayer()
@@ -111,13 +111,13 @@ def plot(model, X, Y, ax=None):
     if ax is None:
         fig, ax = plt.subplots()
 
-    x_margin = 1.0
+    x_margin = 2.0
     N_test = 100
     X_test = np.linspace(X.min() - x_margin, X.max() + x_margin, N_test).reshape(-1, 1)
     f_distribution = model(X_test)
 
     mean = f_distribution.mean().numpy().squeeze()
-    var = f_distribution.variance().numpy().squeeze()
+    var = f_distribution.variance().numpy().squeeze() + model.layers[-1].likelihood.variance.numpy()
     X_test = X_test.squeeze()
     lower = mean - 2 * np.sqrt(var)
     upper = mean + 2 * np.sqrt(var)
@@ -130,3 +130,6 @@ def plot(model, X, Y, ax=None):
 
 
 plot(model, X, Y)
+
+# %%
+gpflow.utilities.print_summary(model, fmt="notebook")