Add save_trace and load_trace by ColCarroll · Pull Request #2975 · pymc-devs/pymc

ColCarroll · 2018-05-15T23:05:53Z

This provides functions to save and load traces, avoiding pickle. My main use would be saving traces while running a large notebook, or distributing the traces with code containing the models used to produce them.

Pros:

it should be compatible between python versions (so long as these functions retain compatibility),
it avoids security concerns (all files are json or .npy)
appears to be smaller (though missing some stuff) -- the test model was 400kb, compared to 900kb pickled
answers a question that comes up reasonably often in issues about saving traces

Cons:

Requires model context to reload (in particular, the model is stored in the pickle, but not this file)
Does not contain any part of the trace.report yet (though that could be added without breaking compatibility)
Requires maintenance

ColCarroll · 2018-05-16T00:23:34Z

Also, here is an example of it in use. This creates a local directory called .pymc.trace by default to save the trace to.

junpenglao · 2018-05-16T04:45:54Z

THANK YOU! This will solve so much pickling issues!

springcoil · 2018-05-18T05:38:11Z

Just so I understand what's the pickle issues? Incompatible between python versions and security concerns?

springcoil · 2018-05-18T09:33:22Z

This looks good to me and ready to merge. Any objections @ColCarroll

springcoil

LGTM

twiecki · 2018-05-18T10:02:38Z

Great stuff!

twiecki · 2018-05-18T10:03:00Z

Oh, I forgot, we should add this to the release-notes, and also add some example docs somewhere.

springcoil · 2018-05-18T11:13:27Z

A separate PR for that would work. I'll have a stab at adding some docs on this this afternoon.

…

On Fri, 18 May 2018, 11:03 am Thomas Wiecki, ***@***.***> wrote: Oh, I forgot, we should add this to the release-notes, and also add some example docs somewhere. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2975 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8DiGDKgQzXEbf0Oi_HRtC5JUQZ-KUSks5tzpxZgaJpZM4UAZ6x> .

ColCarroll · 2018-05-18T11:44:23Z

I will add release notes, and I realized that there's an edge case I didn't cover that requires deleting the directory before writing to it (if you save a model with variables x, y,, and then save a new model with z, w, loading will give you a model with x, y, z, w).

sudiptamazumda · 2018-08-25T15:45:30Z

Does anyone have a sample code to predict with this trace load functionality? I have a Gaussian model :
y= f(x) + e...
f(x) ~ Gaussian(a, b),
e ~ N(0, sigma^2)
Trace saves the posterior of a,b and sigma...

My objective is to predict f(x) for a new x in a new python session without running the model training piece...

ColCarroll · 2018-08-25T17:11:48Z

I might need more detail for what you're trying to do. Here's an example, though:

First, generate a random model:

import os

import numpy as np
import matplotlib.pyplot as plt
import theano
import theano.tensor as tt

dims = 2
N = 100

true_weights = np.random.normal(size=(dims,))

data = np.random.normal(size=(N, dims))
noise = np.random.normal(0, 0.5, size=N)

y = np.dot(data, true_weights) + noise
print(true_weights)

Now do a cached prediction -- running this multiple times will work, even changing the predict_data.

cache_file = 'my_trace.trace'


s_data = theano.shared(data)

with pm.Model() as model:
    weights = pm.Normal('weights', mu=0, sd=1, shape=dims)
    y_obs = pm.Normal('y_obs', mu=tt.dot(s_data, weights), sd=0.5, observed=y, shape=s_data.shape[0].eval())

if not os.path.exists(cache_file):
    with model:
        trace = pm.sample()

    pm.save_trace(trace, directory=cache_file)
else:
    trace = pm.load_trace(cache_file, model=model)

    
predict_data = np.array([
    [0, 1],
    [1, 0],
    [1, 1],
    [2, 2],
])

s_data.set_value(predict_data)

with model:
    ppc = pm.sample_ppc(trace)

print(trace['weights'].mean(axis=0))  # pretty close to true weights
print(ppc['y_obs'].mean(axis=0))  # should be reasonable

sudiptamazumda · 2018-08-25T18:33:00Z

Thanks very much. Here is what I'm trying to do.... *Defining Priors:* with pm.Model() as gp_fit: ρ = pm.Gamma('ρ', 1, 2) η = pm.Gamma('η', 1, 2) K = η * pm.gp.cov.ExpQuad(2, ρ) with gp_fit: M = pm.gp.mean.Zero() σ = pm.HalfCauchy('σ', 0.5) Initial Pseudo Points k_m_point=20 Xu_init = pm.gp.util.kmeans_inducing_points(k_m_point, Xs) *Sparse Gaussian Optimization:* with gp_fit: gp = pm.gp.MarginalSparse(cov_func=K, approx="VFE") #Xu=Xu_init Xu = pm.Flat("Xu", shape=(20, 2), testval=Xu_init) y_ = gp.marginal_likelihood("y_", X=Xs, Xu=Xu, y=y, noise=σ) #y_ = gp.marginal_likelihood("y_", X=Xs, Xu=Xu, y=y, noise=σ) mp = pm.find_MAP() trace = pm.sample(500, n_init=1000) *Like to Save the fitted model at this stage....* *Prediction:* mu_dev, var_dev = gp.predict(X_new, point=mp, diag=True) I would like to do this prediction without the preceding steps by loading the model/trace... What all do I need to save? Do I need to also load the pseudo points at the time of inference?

…

On Sat, Aug 25, 2018 at 1:11 PM Colin ***@***.***> wrote: I might need more detail for what you're trying to do. Here's an example, though: First, generate a random model: import os import numpy as npimport matplotlib.pyplot as pltimport theanoimport theano.tensor as tt dims = 2 N = 100 true_weights = np.random.normal(size=(dims,)) data = np.random.normal(size=(N, dims)) noise = np.random.normal(0, 0.5, size=N) y = np.dot(data, true_weights) + noiseprint(true_weights) Now do a cached prediction -- running this multiple times will work, even changing the predict_data. cache_file = 'my_trace.trace' s_data = theano.shared(data) with pm.Model() as model: weights = pm.Normal('weights', mu=0, sd=1, shape=dims) y_obs = pm.Normal('y_obs', mu=tt.dot(s_data, weights), sd=0.5, observed=y, shape=s_data.shape[0].eval()) if not os.path.exists(cache_file): with model: trace = pm.sample() pm.save_trace(trace, directory=cache_file)else: trace = pm.load_trace(cache_file, model=model) predict_data = np.array([ [0, 1], [1, 0], [1, 1], [2, 2], ]) s_data.set_value(predict_data) with model: ppc = pm.sample_ppc(trace) print(trace['weights'].mean(axis=0)) # pretty close to true weightsprint(ppc['y_obs'].mean(axis=0)) # should be reasonable — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2975 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AotFpVie8LwPrLBfizLnhbmAThOOBjfEks5uUYVbgaJpZM4UAZ6x> .

--

------------------------- Sudipta Mazumdar Home: 905-604-3325 Cell: 647-687-5900

ColCarroll added 2 commits May 15, 2018 18:54

Add save_trace and load_trace

7f25bda

Make 2.7 compatible

51a21ff

springcoil approved these changes May 18, 2018

View reviewed changes

twiecki merged commit 850a2a7 into pymc-devs:master May 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add save_trace and load_trace#2975

Add save_trace and load_trace#2975
twiecki merged 2 commits intopymc-devs:masterfrom
ColCarroll:save_ndarray

ColCarroll commented May 15, 2018

Uh oh!

ColCarroll commented May 16, 2018

Uh oh!

junpenglao commented May 16, 2018

Uh oh!

springcoil commented May 18, 2018

Uh oh!

springcoil commented May 18, 2018

Uh oh!

springcoil left a comment

Uh oh!

twiecki commented May 18, 2018

Uh oh!

twiecki commented May 18, 2018

Uh oh!

springcoil commented May 18, 2018 via email

Uh oh!

ColCarroll commented May 18, 2018

Uh oh!

sudiptamazumda commented Aug 25, 2018 •

edited

Loading

Uh oh!

ColCarroll commented Aug 25, 2018

Uh oh!

sudiptamazumda commented Aug 25, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ColCarroll commented May 15, 2018

Uh oh!

ColCarroll commented May 16, 2018

Uh oh!

junpenglao commented May 16, 2018

Uh oh!

springcoil commented May 18, 2018

Uh oh!

springcoil commented May 18, 2018

Uh oh!

springcoil left a comment

Choose a reason for hiding this comment

Uh oh!

twiecki commented May 18, 2018

Uh oh!

twiecki commented May 18, 2018

Uh oh!

springcoil commented May 18, 2018 via email

Uh oh!

ColCarroll commented May 18, 2018

Uh oh!

sudiptamazumda commented Aug 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ColCarroll commented Aug 25, 2018

Uh oh!

sudiptamazumda commented Aug 25, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sudiptamazumda commented Aug 25, 2018 •

edited

Loading