-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add draft of theano introduction #2387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,101 @@ | ||
| ================================= | ||
| Advanced usage of Theano in PyMC3 | ||
| ================================= | ||
|
|
||
| Using shared variables | ||
| ====================== | ||
|
|
||
| Shared variables allow us to use values in theano functions that are | ||
| not considered an input to the function, but can still be changed | ||
| later. They are very similar to global variables in may ways.:: | ||
|
|
||
| a = tt.scalar('a') | ||
| # Create a new shared variable with initial value of 0.1 | ||
| b = theano.shared(0.1) | ||
| func = theano.function([a], a * b) | ||
| assert func(2.) == 0.2 | ||
|
|
||
| b.set_value(10.) | ||
| assert func(2.) == 20. | ||
|
|
||
| Shared variables can also contain arrays, and are allowed to change | ||
| their shape as long as the number of dimensions stays the same. | ||
|
|
||
| We can use shared variables in PyMC3 to fit the same model to several | ||
| datasets without the need to recreate the model each time (which can | ||
| be time consuming if the number of datasets is large).:: | ||
|
|
||
| # We generate 10 datasets | ||
| true_mu = [np.random.randn() for _ in range(10)] | ||
| observed_data = [mu + np.random.randn(20) for mu in true_mu] | ||
|
|
||
| data = theano.shared(observed_data[0]) | ||
| pm.Model() as model: | ||
| mu = pm.Normal('mu', 0, 10) | ||
| pm.Normal('y', mu=mu, sd=1, observed=data) | ||
|
|
||
| # Generate one trace for each dataset | ||
| traces = [] | ||
| for data_vals in observed_data: | ||
| # Switch out the observed dataset | ||
| data.set_value(data_vals) | ||
| with model: | ||
| traces.append(pm.sample()) | ||
|
|
||
| We can also sometimes use shared variables to work around limitations | ||
| in the current PyMC3 api. A common task in Machine Learning is to predict | ||
| values for unseen data, and one way to achieve this is to use a shared | ||
| variable for our observations:: | ||
|
|
||
| x = np.random.randn(100) | ||
| y = x > 0 | ||
|
|
||
| x_shared = theano.shared(x) | ||
|
|
||
| with pm.Model() as model: | ||
| coeff = pm.Normal('x', mu=0, sd=1) | ||
| logistic = pm.math.sigmoid(coeff * x_shared) | ||
| pm.Bernoulli('obs', p=logistic, observed=y) | ||
|
|
||
| # fit the model | ||
| trace = pm.sample() | ||
|
|
||
| # Switch out the observations and use `sample_ppc` to predict | ||
| x_shared.set_value([-1, 0, 1.]) | ||
| post_pred = pm.sample_ppc(trace, samples=500) | ||
|
|
||
| However, due to the way we handle shapes at the moment, it is | ||
| not possible to change the shape of a shared variable if that would | ||
| also change the shape of one of the variables. | ||
|
|
||
|
|
||
| Writing custom Theano Ops | ||
| ========================= | ||
|
|
||
| While Theano includes a wide range of operations, there are cases where | ||
| it makes sense to write your own. But before doing this it is a good | ||
| idea to think hard if it is actually necessary. Especially if you want | ||
| to use algorithms that need gradient information — this includes NUTS and | ||
| all variational methods, and you probably *should* want to use those — | ||
| this is often quite a bit of work and also requires some math and | ||
| debugging skills for the gradients. | ||
|
|
||
| Good reasons for defining a custom Op might be the following: | ||
|
|
||
| - You require an operation that is not available in Theano and can't | ||
| be build up out of existing Theano operations. This could for example | ||
| include models where you need to solve differential equations or | ||
| integrals, or find a root or minimum of a function that depends | ||
| on your parameters. | ||
| - You want to connect your PyMC3 model to some existing external code. | ||
| - After carefully considering different parametrizations and a lot | ||
| of profiling your model is still too slow, but you know of a faster | ||
| way to compute the gradient than what theano is doing. This faster | ||
| way might be anything from clever maths to using more hardware. | ||
| There is nothing stopping anyone from using a cluster via MPI in | ||
| a custom node, if a part of the gradient computation is slow enough | ||
| and sufficiently parallelizable to make the cost worth it. | ||
| We would definitely like to hear about any such examples. | ||
|
|
||
| Theano has extensive `documentation, <http://deeplearning.net/software/theano/extending/index.html>`_ | ||
| about how to write new Ops. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,216 @@ | ||
| ================ | ||
| PyMC3 and Theano | ||
| ================ | ||
|
|
||
| What is Theano | ||
| ============== | ||
|
|
||
| Theano is a package that allows us to define functions involving array | ||
| operations and linear algebra. When we define a PyMC3 model, we implicitly | ||
| build up a Theano function from the space of our parameters to | ||
| their posterior probability density up to a constant factor. We then use | ||
| symbolic manipulations of this function to also get access to its gradient. | ||
|
|
||
| For a thorough introduction to Theano see the | ||
| `theano docs <http://deeplearning.net/software/theano/introduction.html>`_, | ||
| but for the most part you don't need detailed knowledge about it as long | ||
| as you are not trying to define new distributions or other extensions | ||
| of PyMC3. But let's look at a simple example to get a rough | ||
| idea about how it works. Say, we'd like to define the (completely | ||
| arbitrarily chosen) function | ||
|
|
||
| .. math:: | ||
|
|
||
| f\colon \mathbb{R} \times \mathbb{R}^n \times \mathbb{N}^n \to \mathbb{R}\\ | ||
| (a, x, y) \mapsto \sum_{i=0}^{n} \exp(ax_i^3 + y_i^2). | ||
|
|
||
|
|
||
| First, we need to define symbolic variables for our inputs (this | ||
| is similar to eg SymPy's `Symbol`):: | ||
|
|
||
| import theano | ||
| import theano.tensor as tt | ||
| # We don't specify the dtype of our input variables, so it | ||
| # defaults to using float64 without any special config. | ||
| a = tt.scalar('a') | ||
| x = tt.vector('x') | ||
| # `tt.ivector` creates a symbolic vector of integers. | ||
| y = tt.ivector('y') | ||
|
|
||
| Next, we use those variables to build up a symbolic representation | ||
| of the output of our function. Note, that no computation is actually | ||
| being done at this point. We only record what operations we need to | ||
| do to compute the output:: | ||
|
|
||
| inner = a * x**3 + y**2 | ||
| out = tt.exp(inner).sum() | ||
|
|
||
| .. note:: | ||
|
|
||
| In this example we use `tt.exp` to create a symbolic representation | ||
| of the exponential of `inner`. Somewhat surprisingly, it | ||
| would also have worked if we used `np.exp`. This is because numpy | ||
| gives objects it operates on a chance to define the results of | ||
| operations themselves. Theano variables do this for a large number | ||
| of operations. We usually still prefer the theano | ||
| functions instead of the numpy versions, as that makes it clear that | ||
| we are working with symbolic input instead of plain arrays. | ||
|
|
||
| Now we can tell Theano to build a function that does this computation. | ||
| With a typical configuration Theano generates C code, compiles it, | ||
| and creates a python function which wraps the C function:: | ||
|
|
||
| func = theano.function([a, x, y], [out]) | ||
|
|
||
| We can call this function with actual arrays as many times as we want:: | ||
|
|
||
| a_val = 1.2 | ||
| x_vals = np.random.randn(10) | ||
| y_vals = np.random.randn(10) | ||
|
|
||
| out = func(a_val, x_vals, y_vals) | ||
|
|
||
| For the most part the symbolic Theano variables can be operated on | ||
| like NumPy arrays. Most NumPy functions are available in `theano.tensor` | ||
| (which is typically imported as `tt`). A lot of linear algebra operations | ||
| can be found in `tt.nlinalg` and `tt.slinalg` (the NumPy and SciPy | ||
| operations respectively). Some support for sparse matrices is available | ||
| in `theano.sparse`. For a detailed overview of available operations, | ||
| see `the theano api docs <http://deeplearning.net/software/theano/library/tensor/index.html>`_. | ||
|
|
||
| A notable exception where theano variables do *not* behave like | ||
| NumPy arrays are operations involving conditional execution: | ||
|
|
||
| Code like this won't work as expected:: | ||
|
|
||
| a = tt.vector('a') | ||
| if (a > 0).all(): | ||
| b = tt.sqrt(a) | ||
| else: | ||
| b = -a | ||
|
|
||
| `(a > 0).all()` isn't actually a boolean as it would be in NumPy, but | ||
| still a symbolic variable. Python will convert this object to a boolean | ||
| and according to the rules for this conversion, things that aren't empty | ||
| containers or zero are converted to `True`. So the code is equivalent | ||
| to this:: | ||
|
|
||
| a = tt.vector('a') | ||
| b = tt.sqrt(a) | ||
|
|
||
| To get the desired behaviour, we can use `tt.switch`:: | ||
|
|
||
| a = tt.vector('a') | ||
| b = tt.switch((a > 0).all(), tt.sqrt(a), -a) | ||
|
|
||
| Indexing also works similarly to NumPy:: | ||
|
|
||
| a = tt.vector('a') | ||
| # Access the 10th element. This will fail when a function build | ||
| # from this expression is executed with an array that is too short. | ||
| b = a[10] | ||
|
|
||
| # Extract a subvector | ||
| b = a[[1, 2, 10]] | ||
|
|
||
| Changing elements of an array is possible using `tt.set_subtensor`:: | ||
|
|
||
| a = tt.vector('a') | ||
| b = tt.set_subtensor(a[:10], 1) | ||
|
|
||
| # is roughly equivalent to this (although theano avoids | ||
| # the copy if `a` isn't used anymore) | ||
| a = np.random.randn(10) | ||
| b = a.copy() | ||
| b[:10] = 1 | ||
|
|
||
| How PyMC3 uses Theano | ||
| ===================== | ||
|
|
||
| Now that we have a basic understanding of Theano we can look at what | ||
| happens if we define a PyMC3 model. Let's look at a simple example:: | ||
|
|
||
| true_mu = 0.1 | ||
| data = true_mu + np.random.randn(50) | ||
|
|
||
| with pm.Model() as model: | ||
| mu = pm.Normal('mu', mu=0, sd=1) | ||
| y = pm.Normal('y', mu=mu, sd=1, observed=data) | ||
|
|
||
| In this model we define two variables: `mu` and `y`. The first is | ||
| a free variable that we want to infer, the second is an observed | ||
| variable. To sample from the posterior we need to build the function | ||
|
|
||
| .. math:: | ||
|
|
||
| \log P(μ|y) + C = \log P(y|μ) + \log P(μ) =: \text{logp}(μ)\\ | ||
|
|
||
| where with the normal likelihood :math:`N(x|μ,σ^2)` | ||
|
|
||
| .. math:: | ||
|
|
||
| \text{logp}\colon \mathbb{R} \to \mathbb{R}\\ | ||
| μ \mapsto \log N(μ|0, 1) + \log N(y|0, 1), | ||
|
|
||
| To build that function we need to keep track of two things: The parameter | ||
| space (the *free variables*) and the logp function. For each free variable | ||
| we generate a Theano variable. And for each variable (observed or otherwise) | ||
| we add a term to the global logp. In the background something similar to | ||
| this is happening:: | ||
|
|
||
| # For illustration only, those functions don't actually exist | ||
| # in exactly this way! | ||
| model = pm.Model() | ||
|
|
||
| mu = tt.scalar('mu') | ||
| model.add_free_variable(mu) | ||
| model.add_logp_term(pm.Normal.dist(0, 1).logp(mu)) | ||
|
|
||
| model.add_logp_term(pm.Normal.dist(mu, 1).logp(data)) | ||
|
|
||
| So calling `pm.Normal()` modifies the model: It changes the logp function | ||
| of the model. If the `observed` keyword isn't set it also creates a new | ||
| free variable. In contrast, `pm.Normal.dist()` doesn't care about the model, | ||
| it just creates an object that represents the normal distribution. Calling | ||
| `logp` on this object creates a theano variable for the logp probability | ||
| or log probability density of the distribution, but again without changing | ||
| the model in any way. | ||
|
|
||
| Continuous variables with support only on a subset of the real numbers | ||
| are treated a bit differently. We create a transformed variable | ||
| that has support on the reals and then modify this variable. For | ||
| example:: | ||
|
|
||
| with pm.Model() as model: | ||
| mu = pm.Normal('mu', 0, 1) | ||
| sd = pm.HalfNormal('sd', 1) | ||
| y = pm.Normal('y', mu=mu, sd=sd, observed=data) | ||
|
|
||
| is roughly equivalent to this:: | ||
|
|
||
| # For illustration only, not real code! | ||
| model = pm.Model() | ||
| mu = tt.scalar('mu') | ||
| model.add_free_variable(mu) | ||
| model.add_logp_term(pm.Normal.dist(0, 1).logp(mu)) | ||
|
|
||
| sd_log__ = tt.scalar('sd_log__') | ||
| model.add_free_variable(sd_log__) | ||
| model.add_logp_term(corrected_logp_half_normal(sd_log__)) | ||
|
|
||
| sd = tt.exp(sd_log__) | ||
| model.add_deterministic_variable(sd) | ||
|
|
||
| model.add_logp_term(pm.Normal.dist(mu, sd).logp(data)) | ||
|
|
||
| The return values of the variable constructors are subclasses | ||
| of theano variables, so when we define a variable we can use any | ||
| theano operation on them:: | ||
|
|
||
| design_matrix = np.array([[...]]) | ||
| with pm.Model() as model: | ||
| # beta is a tt.dvector | ||
| beta = pm.Normal('beta', 0, 1, shape=len(design_matrix)) | ||
| predict = tt.dot(design_matrix, beta) | ||
| sd = pm.HalfCauchy('sd', beta=2.5) | ||
| pm.Normal('y', mu=predict, sd=sd, observed=data) | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a typo, it should be
μ \mapsto \log N(μ|0, 1) + \log N(y|μ, 1),right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, thank you.
Fix is coming up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad I can help. Great introduction by the way:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :-)
Fixed in #2497.