From 8f458d1c0db49005caf1ef1eeabde529db679801 Mon Sep 17 00:00:00 2001
From: Adrian Seyboldt <adrian.seyboldt@gmail.com>
Date: Thu, 6 Jul 2017 15:36:20 +0200
Subject: [PATCH 1/3] Add draft of theano introduction

---
 docs/source/theano.rst | 157 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 157 insertions(+)
 create mode 100644 docs/source/theano.rst

diff --git a/docs/source/theano.rst b/docs/source/theano.rst
new file mode 100644
index 0000000000..a05e9e8bb5
--- /dev/null
+++ b/docs/source/theano.rst
@@ -0,0 +1,157 @@
+# pymc3 and theano
+
+## What is theano
+
+Theano is a package that allows us to define functions involving array
+operations and linear algebra. When we define a pymc3 model, we implicitly
+build up a theano function from the space of our parameters to
+their posterior probability density up to a constant factor. We then use
+symbolic manipulations of this function to also get access to its gradient.
+
+For a thorough introduction into theano see the thenao docs (link),
+but for the most part you don't need many details about it as long
+as you are not trying to define new distributions or other extensions
+of pymc3. But let's look at a simple example to give you a rough
+idea about how it works. Say, we'd like to define the (completely
+arbitrarily chosen) function
+$$
+  f\colon \mathbb{R} \times \mathbb{R}^n \mathbb{N}^n \to \mathbb{R}\\
+  (a, x, y) \mapsto \sum_{i=0}{n} \exp(ax_i^3 + y_i^2).
+$$
+
+First, we need to define symbolic variables for our inputs (this
+is similar to eg sympy's `Symbol`)::
+
+    import theano
+    import theano.tensor as tt
+    # We don't specify the dtype of our input variables, so it
+    # defaults to using float64 without any special config.
+    a = tt.scalar('a')
+    x = tt.vector('x')
+    y = tt.vector('y')
+
+Next, we use those variables to build up a symbolic representation
+of the output of our function. Note, that no computation is actually
+being done at this point. We only record what operations we need to
+do to compute the output::
+
+    inner = a * x**3 + y**2
+    out = tt.exp(inner).sum()
+
+.. note::
+
+   In this example we use `tt.exp` to create a symbolic representation
+   of the exponential of `inner`. Somewhat surprisingly, it
+   would also have worked if we used `np.exp`. This is because numpy
+   gives objects it operates on a chance to define the results of
+   operations themselves. Theano variables do this for a large number
+   of operations. We usually still prefer it if people use the theano
+   function instead of the numpy versions, as that makes it clear that
+   we are working with symbolic input instead of plain arrays.
+
+Now we can tell theano to build a function that does this computation.
+With a typical configuration theano generates C code, compiles it,
+and creates a python function which wrapps the C function::
+
+    func = theano.function([a, x, y], [out])
+
+We can call this function with actual arrays as many times as we want::
+
+    a_val = 1.2
+    x_vals = np.random.randn(10)
+    y_vals = np.random.randn(10)
+
+    out = func(a_val, x_vals, y_vals)
+
+For the most part the symbolic theano variables can be operated on
+like numpy arrays. Most numpy functions are available in `theano.tensor`
+(which is typically imported as `tt`). A lot of linear algebra operations
+can be found in `tt.nlinalg` and `tt.slinalg` (the numpy and scipy
+operations respectively). Some support for sparse matrices is available
+in `theano.sparse`.
+
+A notable exception where theano variables *don't* behave like
+numpy arrays are operations involving conditional execution:
+
+Code like this won't work as expected if used on theano variables::
+
+    a = tt.vector('a')
+    if (a > 0).all():
+        b = tt.sqrt(a)
+    else:
+        b = -a
+
+`(a > 0).all()` isn't actually a boolean as it would be in numpy, but
+still a symbolic variable. Python will convert this object to a boolean
+and according to the rules for this conversions, things that aren't empty
+containers or zero are converted to True. So the code is equivalent
+to this::
+
+    a = tt.vector('a')
+    b = tt.sqrt(a)
+
+To get the desired behaviour, we can use `tt.switch`::
+
+    a = tt.vector('a')
+    b = tt.switch((a > 0).all(), tt.sqrt(a), -a)
+
+Indexing also works similar to numpy::
+
+    a = tt.vector('a')
+    # Access the 10th element. This will fail when a function build
+    # from this expression is executed with an array that is too short.
+    b = a[10]
+
+    # Extract a subvector
+    b = a[[1, 2, 10]]
+
+Changing element of an array is possible using `tt.set_subtensor`::
+
+    a = tt.vector('a')
+    b = tt.set_subtensor(a[:10], 1)
+
+    # is roughly equivalent this (although theano is usually able
+    # to avoid the copy)
+    a = np.random.randn(10)
+    b = a.copy()
+    b[:10] = 1
+
+## How pymc3 uses theano
+
+Now that we have a basic understanding of theano we can look at what
+happens if we define a pymc3 model. Let's look at a simple example::
+
+    true_mu = 0.1
+    data = true_mu + np.random.randn(50)
+
+    with pm.Model() as model:
+        mu = pm.Normal('mu', mu=0, sd=1)
+        y = pm.Normal('y', mu=mu, sd=1, observed=data)
+
+In this model we define two variables: `mu` and `y`. The first is
+a free variable that we want to infer, the second is an observed
+variable. To sample from the posterior we need to build the function
+$$
+    \log P(μ|y) + C = \log P(y|μ) + \log P(μ) =: f(μ)\\
+$$
+where with the normal likelihood $N(x|μ,σ^2)$
+$$
+    f\colon \mathbb{R} \to \mathbb{R}\\
+    f(μ) = \log N(μ|0, 1) + \log N(y|0, 1),
+$$
+
+To build that function we need to keep track of two things: The parameter
+space (the *free variables*) and the logp function. For each free variable
+we generate a theano variable. And for each variable (observed or otherwise)
+we add a term to the global logp. In the background something similar to
+this is happening::
+
+    # For illustration only, those functions don't actually exist
+    # in exactly this way!
+    model = pm.Model()
+
+    mu = tt.scalar('mu')
+    model.add_free_variable(mu)
+    model.add_logp_term(normal_logp(μ| 0, 1))
+
+    model.add_logp_term(normal_logp(data| mu, 1))

From b85e348d4c385342e13ac89455d1811a5312a169 Mon Sep 17 00:00:00 2001
From: Adrian Seyboldt <adrian.seyboldt@gmail.com>
Date: Thu, 6 Jul 2017 16:17:08 +0200
Subject: [PATCH 2/3] Fix formatting in theano intro

---
 docs/source/getting_started.rst |  1 +
 docs/source/theano.rst          | 84 +++++++++++++++++++--------------
 2 files changed, 49 insertions(+), 36 deletions(-)

diff --git a/docs/source/getting_started.rst b/docs/source/getting_started.rst
index 04d0836bb8..096a80f898 100644
--- a/docs/source/getting_started.rst
+++ b/docs/source/getting_started.rst
@@ -8,3 +8,4 @@ Getting started
    notebooks/getting_started.ipynb
    notebooks/api_quickstart.ipynb
    notebooks/variational_api_quickstart.ipynb
+   theano
diff --git a/docs/source/theano.rst b/docs/source/theano.rst
index a05e9e8bb5..07e6e49a53 100644
--- a/docs/source/theano.rst
+++ b/docs/source/theano.rst
@@ -1,26 +1,32 @@
-# pymc3 and theano
+================
+PyMC3 and Theano
+================
 
-## What is theano
+What is Theano
+==============
 
 Theano is a package that allows us to define functions involving array
-operations and linear algebra. When we define a pymc3 model, we implicitly
-build up a theano function from the space of our parameters to
+operations and linear algebra. When we define a PyMC3 model, we implicitly
+build up a Theano function from the space of our parameters to
 their posterior probability density up to a constant factor. We then use
 symbolic manipulations of this function to also get access to its gradient.
 
-For a thorough introduction into theano see the thenao docs (link),
+For a thorough introduction to Theano see the
+`theano docs <http://deeplearning.net/software/theano/introduction.html>`_,
 but for the most part you don't need many details about it as long
 as you are not trying to define new distributions or other extensions
-of pymc3. But let's look at a simple example to give you a rough
+of PyMC3. But let's look at a simple example to give you a rough
 idea about how it works. Say, we'd like to define the (completely
 arbitrarily chosen) function
-$$
-  f\colon \mathbb{R} \times \mathbb{R}^n \mathbb{N}^n \to \mathbb{R}\\
-  (a, x, y) \mapsto \sum_{i=0}{n} \exp(ax_i^3 + y_i^2).
-$$
+
+.. math::
+
+  f\colon \mathbb{R} \times \mathbb{R}^n \times \mathbb{N}^n \to \mathbb{R}\\
+  (a, x, y) \mapsto \sum_{i=0}^{n} \exp(ax_i^3 + y_i^2).
+
 
 First, we need to define symbolic variables for our inputs (this
-is similar to eg sympy's `Symbol`)::
+is similar to eg SymPy's `Symbol`)::
 
     import theano
     import theano.tensor as tt
@@ -28,7 +34,8 @@ is similar to eg sympy's `Symbol`)::
     # defaults to using float64 without any special config.
     a = tt.scalar('a')
     x = tt.vector('x')
-    y = tt.vector('y')
+    # `tt.ivector` creates a symbolic vector of integers.
+    y = tt.ivector('y')
 
 Next, we use those variables to build up a symbolic representation
 of the output of our function. Note, that no computation is actually
@@ -49,9 +56,9 @@ do to compute the output::
    function instead of the numpy versions, as that makes it clear that
    we are working with symbolic input instead of plain arrays.
 
-Now we can tell theano to build a function that does this computation.
-With a typical configuration theano generates C code, compiles it,
-and creates a python function which wrapps the C function::
+Now we can tell Theano to build a function that does this computation.
+With a typical configuration Theano generates C code, compiles it,
+and creates a python function which wraps the C function::
 
     func = theano.function([a, x, y], [out])
 
@@ -63,17 +70,18 @@ We can call this function with actual arrays as many times as we want::
 
     out = func(a_val, x_vals, y_vals)
 
-For the most part the symbolic theano variables can be operated on
-like numpy arrays. Most numpy functions are available in `theano.tensor`
+For the most part the symbolic Theano variables can be operated on
+like NumPy arrays. Most NumPy functions are available in `theano.tensor`
 (which is typically imported as `tt`). A lot of linear algebra operations
-can be found in `tt.nlinalg` and `tt.slinalg` (the numpy and scipy
+can be found in `tt.nlinalg` and `tt.slinalg` (the NumPy and SciPy
 operations respectively). Some support for sparse matrices is available
-in `theano.sparse`.
+in `theano.sparse`. For a detailed overview of available operations,
+see `the theano api docs <http://deeplearning.net/software/theano/library/tensor/index.html>`_.
 
 A notable exception where theano variables *don't* behave like
-numpy arrays are operations involving conditional execution:
+NumPy arrays are operations involving conditional execution:
 
-Code like this won't work as expected if used on theano variables::
+Code like this won't work as expected if used on Theano variables::
 
     a = tt.vector('a')
     if (a > 0).all():
@@ -81,10 +89,10 @@ Code like this won't work as expected if used on theano variables::
     else:
         b = -a
 
-`(a > 0).all()` isn't actually a boolean as it would be in numpy, but
+`(a > 0).all()` isn't actually a boolean as it would be in NumPy, but
 still a symbolic variable. Python will convert this object to a boolean
 and according to the rules for this conversions, things that aren't empty
-containers or zero are converted to True. So the code is equivalent
+containers or zero are converted to `True`. So the code is equivalent
 to this::
 
     a = tt.vector('a')
@@ -95,7 +103,7 @@ To get the desired behaviour, we can use `tt.switch`::
     a = tt.vector('a')
     b = tt.switch((a > 0).all(), tt.sqrt(a), -a)
 
-Indexing also works similar to numpy::
+Indexing also works similar to NumPy::
 
     a = tt.vector('a')
     # Access the 10th element. This will fail when a function build
@@ -110,16 +118,17 @@ Changing element of an array is possible using `tt.set_subtensor`::
     a = tt.vector('a')
     b = tt.set_subtensor(a[:10], 1)
 
-    # is roughly equivalent this (although theano is usually able
-    # to avoid the copy)
+    # is roughly equivalent to this (although theano is usually
+    # able to avoid the copy)
     a = np.random.randn(10)
     b = a.copy()
     b[:10] = 1
 
-## How pymc3 uses theano
+How PyMC3 uses Theano
+=====================
 
-Now that we have a basic understanding of theano we can look at what
-happens if we define a pymc3 model. Let's look at a simple example::
+Now that we have a basic understanding of Theano we can look at what
+happens if we define a PyMC3 model. Let's look at a simple example::
 
     true_mu = 0.1
     data = true_mu + np.random.randn(50)
@@ -131,18 +140,21 @@ happens if we define a pymc3 model. Let's look at a simple example::
 In this model we define two variables: `mu` and `y`. The first is
 a free variable that we want to infer, the second is an observed
 variable. To sample from the posterior we need to build the function
-$$
-    \log P(μ|y) + C = \log P(y|μ) + \log P(μ) =: f(μ)\\
-$$
-where with the normal likelihood $N(x|μ,σ^2)$
-$$
+
+.. math::
+
+   \log P(μ|y) + C = \log P(y|μ) + \log P(μ) =: f(μ)\\
+
+where with the normal likelihood :math:`N(x|μ,σ^2)`
+
+.. math::
+
     f\colon \mathbb{R} \to \mathbb{R}\\
     f(μ) = \log N(μ|0, 1) + \log N(y|0, 1),
-$$
 
 To build that function we need to keep track of two things: The parameter
 space (the *free variables*) and the logp function. For each free variable
-we generate a theano variable. And for each variable (observed or otherwise)
+we generate a Theano variable. And for each variable (observed or otherwise)
 we add a term to the global logp. In the background something similar to
 this is happening::
 

From 70433417aafa7617d513bc1878ce12dc85fde6c4 Mon Sep 17 00:00:00 2001
From: Adrian Seyboldt <adrian.seyboldt@gmail.com>
Date: Thu, 6 Jul 2017 20:08:46 +0200
Subject: [PATCH 3/3] Update doc on Theano

---
 docs/source/advanced_theano.rst | 101 ++++++++++++++++++++++++++++++++
 docs/source/getting_started.rst |   3 +-
 docs/source/theano.rst          |  79 ++++++++++++++++++++-----
 3 files changed, 166 insertions(+), 17 deletions(-)
 create mode 100644 docs/source/advanced_theano.rst

diff --git a/docs/source/advanced_theano.rst b/docs/source/advanced_theano.rst
new file mode 100644
index 0000000000..36f6b97600
--- /dev/null
+++ b/docs/source/advanced_theano.rst
@@ -0,0 +1,101 @@
+=================================
+Advanced usage of Theano in PyMC3
+=================================
+
+Using shared variables
+======================
+
+Shared variables allow us to use values in theano functions that are
+not considered an input to the function, but can still be changed
+later. They are very similar to global variables in may ways.::
+
+    a = tt.scalar('a')
+    # Create a new shared variable with initial value of 0.1
+    b = theano.shared(0.1)
+    func = theano.function([a], a * b)
+    assert func(2.) == 0.2
+
+    b.set_value(10.)
+    assert func(2.) == 20.
+
+Shared variables can also contain arrays, and are allowed to change
+their shape as long as the number of dimensions stays the same.
+
+We can use shared variables in PyMC3 to fit the same model to several
+datasets without the need to recreate the model each time (which can
+be time consuming if the number of datasets is large).::
+
+    # We generate 10 datasets
+    true_mu = [np.random.randn() for _ in range(10)]
+    observed_data = [mu + np.random.randn(20) for mu in true_mu]
+
+    data = theano.shared(observed_data[0])
+    pm.Model() as model:
+        mu = pm.Normal('mu', 0, 10)
+        pm.Normal('y', mu=mu, sd=1, observed=data)
+
+    # Generate one trace for each dataset
+    traces = []
+    for data_vals in observed_data:
+        # Switch out the observed dataset
+        data.set_value(data_vals)
+        with model:
+            traces.append(pm.sample())
+
+We can also sometimes use shared variables to work around limitations
+in the current PyMC3 api. A common task in Machine Learning is to predict
+values for unseen data, and one way to achieve this is to use a shared
+variable for our observations::
+
+    x = np.random.randn(100)
+    y = x > 0
+
+    x_shared = theano.shared(x)
+
+    with pm.Model() as model:
+      coeff = pm.Normal('x', mu=0, sd=1)
+      logistic = pm.math.sigmoid(coeff * x_shared)
+      pm.Bernoulli('obs', p=logistic, observed=y)
+
+      # fit the model
+      trace = pm.sample()
+
+      # Switch out the observations and use `sample_ppc` to predict
+      x_shared.set_value([-1, 0, 1.])
+      post_pred = pm.sample_ppc(trace, samples=500)
+
+However, due to the way we handle shapes at the moment, it is
+not possible to change the shape of a shared variable if that would
+also change the shape of one of the variables.
+
+
+Writing custom Theano Ops
+=========================
+
+While Theano includes a wide range of operations, there are cases where
+it makes sense to write your own. But before doing this it is a good
+idea to think hard if it is actually necessary. Especially if you want
+to use algorithms that need gradient information — this includes NUTS and
+all variational methods, and you probably *should* want to use those —
+this is often quite a bit of work and also requires some math and
+debugging skills for the gradients.
+
+Good reasons for defining a custom Op might be the following:
+
+- You require an operation that is not available in Theano and can't
+  be build up out of existing Theano operations. This could for example
+  include models where you need to solve differential equations or
+  integrals, or find a root or minimum of a function that depends
+  on your parameters.
+- You want to connect your PyMC3 model to some existing external code.
+- After carefully considering different parametrizations and a lot
+  of profiling your model is still too slow, but you know of a faster
+  way to compute the gradient than what theano is doing. This faster
+  way might be anything from clever maths to using more hardware.
+  There is nothing stopping anyone from using a cluster via MPI in
+  a custom node, if a part of the gradient computation is slow enough
+  and sufficiently parallelizable to make the cost worth it.
+  We would definitely like to hear about any such examples.
+
+Theano has extensive `documentation, <http://deeplearning.net/software/theano/extending/index.html>`_
+about how to write new Ops.
diff --git a/docs/source/getting_started.rst b/docs/source/getting_started.rst
index 096a80f898..28bfc29169 100644
--- a/docs/source/getting_started.rst
+++ b/docs/source/getting_started.rst
@@ -8,4 +8,5 @@ Getting started
    notebooks/getting_started.ipynb
    notebooks/api_quickstart.ipynb
    notebooks/variational_api_quickstart.ipynb
-   theano
+   theano.rst
+   advanced_theano.rst
diff --git a/docs/source/theano.rst b/docs/source/theano.rst
index 07e6e49a53..096e2d229e 100644
--- a/docs/source/theano.rst
+++ b/docs/source/theano.rst
@@ -13,9 +13,9 @@ symbolic manipulations of this function to also get access to its gradient.
 
 For a thorough introduction to Theano see the
 `theano docs <http://deeplearning.net/software/theano/introduction.html>`_,
-but for the most part you don't need many details about it as long
+but for the most part you don't need detailed knowledge about it as long
 as you are not trying to define new distributions or other extensions
-of PyMC3. But let's look at a simple example to give you a rough
+of PyMC3. But let's look at a simple example to get a rough
 idea about how it works. Say, we'd like to define the (completely
 arbitrarily chosen) function
 
@@ -52,8 +52,8 @@ do to compute the output::
    would also have worked if we used `np.exp`. This is because numpy
    gives objects it operates on a chance to define the results of
    operations themselves. Theano variables do this for a large number
-   of operations. We usually still prefer it if people use the theano
-   function instead of the numpy versions, as that makes it clear that
+   of operations. We usually still prefer the theano
+   functions instead of the numpy versions, as that makes it clear that
    we are working with symbolic input instead of plain arrays.
 
 Now we can tell Theano to build a function that does this computation.
@@ -78,10 +78,10 @@ operations respectively). Some support for sparse matrices is available
 in `theano.sparse`. For a detailed overview of available operations,
 see `the theano api docs <http://deeplearning.net/software/theano/library/tensor/index.html>`_.
 
-A notable exception where theano variables *don't* behave like
+A notable exception where theano variables do *not* behave like
 NumPy arrays are operations involving conditional execution:
 
-Code like this won't work as expected if used on Theano variables::
+Code like this won't work as expected::
 
     a = tt.vector('a')
     if (a > 0).all():
@@ -91,7 +91,7 @@ Code like this won't work as expected if used on Theano variables::
 
 `(a > 0).all()` isn't actually a boolean as it would be in NumPy, but
 still a symbolic variable. Python will convert this object to a boolean
-and according to the rules for this conversions, things that aren't empty
+and according to the rules for this conversion, things that aren't empty
 containers or zero are converted to `True`. So the code is equivalent
 to this::
 
@@ -103,7 +103,7 @@ To get the desired behaviour, we can use `tt.switch`::
     a = tt.vector('a')
     b = tt.switch((a > 0).all(), tt.sqrt(a), -a)
 
-Indexing also works similar to NumPy::
+Indexing also works similarly to NumPy::
 
     a = tt.vector('a')
     # Access the 10th element. This will fail when a function build
@@ -113,13 +113,13 @@ Indexing also works similar to NumPy::
     # Extract a subvector
     b = a[[1, 2, 10]]
 
-Changing element of an array is possible using `tt.set_subtensor`::
+Changing elements of an array is possible using `tt.set_subtensor`::
 
     a = tt.vector('a')
     b = tt.set_subtensor(a[:10], 1)
 
-    # is roughly equivalent to this (although theano is usually
-    # able to avoid the copy)
+    # is roughly equivalent to this (although theano avoids
+    # the copy if `a` isn't used anymore)
     a = np.random.randn(10)
     b = a.copy()
     b[:10] = 1
@@ -143,14 +143,14 @@ variable. To sample from the posterior we need to build the function
 
 .. math::
 
-   \log P(μ|y) + C = \log P(y|μ) + \log P(μ) =: f(μ)\\
+   \log P(μ|y) + C = \log P(y|μ) + \log P(μ) =: \text{logp}(μ)\\
 
 where with the normal likelihood :math:`N(x|μ,σ^2)`
 
 .. math::
 
-    f\colon \mathbb{R} \to \mathbb{R}\\
-    f(μ) = \log N(μ|0, 1) + \log N(y|0, 1),
+    \text{logp}\colon \mathbb{R} \to \mathbb{R}\\
+    μ \mapsto \log N(μ|0, 1) + \log N(y|0, 1),
 
 To build that function we need to keep track of two things: The parameter
 space (the *free variables*) and the logp function. For each free variable
@@ -164,6 +164,53 @@ this is happening::
 
     mu = tt.scalar('mu')
     model.add_free_variable(mu)
-    model.add_logp_term(normal_logp(μ| 0, 1))
+    model.add_logp_term(pm.Normal.dist(0, 1).logp(mu))
 
-    model.add_logp_term(normal_logp(data| mu, 1))
+    model.add_logp_term(pm.Normal.dist(mu, 1).logp(data))
+
+So calling `pm.Normal()` modifies the model: It changes the logp function
+of the model. If the `observed` keyword isn't set it also creates a new
+free variable. In contrast, `pm.Normal.dist()` doesn't care about the model,
+it just creates an object that represents the normal distribution. Calling
+`logp` on this object creates a theano variable for the logp probability
+or log probability density of the distribution, but again without changing
+the model in any way.
+
+Continuous variables with support only on a subset of the real numbers
+are treated a bit differently. We create a transformed variable
+that has support on the reals and then modify this variable. For
+example::
+
+    with pm.Model() as model:
+        mu = pm.Normal('mu', 0, 1)
+        sd = pm.HalfNormal('sd', 1)
+        y = pm.Normal('y', mu=mu, sd=sd, observed=data)
+
+is roughly equivalent to this::
+
+    # For illustration only, not real code!
+    model = pm.Model()
+    mu = tt.scalar('mu')
+    model.add_free_variable(mu)
+    model.add_logp_term(pm.Normal.dist(0, 1).logp(mu))
+
+    sd_log__ = tt.scalar('sd_log__')
+    model.add_free_variable(sd_log__)
+    model.add_logp_term(corrected_logp_half_normal(sd_log__))
+
+    sd = tt.exp(sd_log__)
+    model.add_deterministic_variable(sd)
+
+    model.add_logp_term(pm.Normal.dist(mu, sd).logp(data))
+
+The return values of the variable constructors are subclasses
+of theano variables, so when we define a variable we can use any
+theano operation on them::
+
+    design_matrix = np.array([[...]])
+    with pm.Model() as model:
+        # beta is a tt.dvector
+        beta = pm.Normal('beta', 0, 1, shape=len(design_matrix))
+        predict = tt.dot(design_matrix, beta)
+        sd = pm.HalfCauchy('sd', beta=2.5)
+        pm.Normal('y', mu=predict, sd=sd, observed=data)