Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional Density Estimation notebook #40

Merged
merged 11 commits into from
Aug 3, 2021
Merged

Conversation

vdutor
Copy link
Member

@vdutor vdutor commented Aug 2, 2021

Notebook building and fitting a deep (two layer) latent variable model using VI. No changes to the core of GPflux are required but careful setting of fitting options is necessary. For example, it is important to set shuffle to False and batch_size to the number of datapoints to have correct optimisation of the latent variables.

@@ -43,6 +43,7 @@ def motorcycle_data():
df = pd.read_csv("./data/motor.csv", index_col=0)
X, Y = df["times"].values.reshape(-1, 1), df["accel"].values.reshape(-1, 1)
Y = (Y - Y.mean()) / Y.std()
X /= X.max()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this relevant to the problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rescales inputs between [0,1] instead of the [0,60], which makes the length-scales initialisation to 1 more sensible.

@awav
Copy link
Collaborator

awav commented Aug 2, 2021

Can you summarise what did you add to fix the issue?

@st--
Copy link
Member

st-- commented Aug 3, 2021

@vdutor was it intentional that you did not save it as jupytext .py instead of .ipynb format?

@vdutor
Copy link
Member Author

vdutor commented Aug 3, 2021

Hi @st-- , yes this notebook takes a little bit too long to generate on the fly. I was thinking that it makes sense to have some precompiled notebooks as well in our documentation to explain some more complicated models rather than keeping it to toy examples. What do you think?

@vdutor vdutor requested a review from awav August 3, 2021 09:28
Copy link
Collaborator

@awav awav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is still different comparing to Neil's post. But I guess this is okay, as the perfect match is not possible because of the difference in the modelling.

Overall, LGTM. (check the comments before merging)

"source": [
"## Deep Gaussian process with latent variables\n",
"\n",
"To tackle the problem we suggest a Deep Gaussian process with a latent variable in the first layer. The latent variable will be able to capture the \n",
Copy link
Collaborator

@awav awav Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of word shuffling: "We suggest a Deep Gaussian process with a latent variable in the first layer to improve the error bars on the given dataset. The latent variable allows to model the heteroscedasticity, while an extra layer makes the model more expressive to catch sharp transitions."

"source": [
"### Latent Variable Layer\n",
"\n",
"This layer concatenates the inputs with a latent variable. See Dutordoir, Salimbeni et al. Conditional Density with Gaussian processes (2018) <cite data-cite=\"dutordoir2018cde\"/> for full details. We choose a one-dimensional input and a full parameterisation for the latent variables. This means that we do not need to train a recognition network, which is useful for fitting but can only be done in the case of small datasets, as is the case here."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a ref [...] instead of the long text?

"source": [
"### Fit\n",
"\n",
"We can now fit the model. Because of the `DirectlyParameterizedEncoder` it is important to set the batch size to the number of datapoints and turn off shuffle. This is so that we use the associated latent variable for each datapoint. If we would use an Amortized Encoder network this would not be necessary."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It is important to set the batch size to the number of datapoints and turn off shuffle, checkout DirectlyParameterizedEncoder for details"? This needs a bit of explanation, what DirectlyParameterizedEncoder is.

@vdutor vdutor merged commit 88fe2b9 into develop Aug 3, 2021
@vdutor vdutor deleted the vincent/notebook/cde branch August 3, 2021 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants