Refactoring to support Structured VI #2416

ferrine · 2017-07-13T21:50:29Z

Some readings:

Discussions:

junpenglao · 2017-07-14T05:27:08Z

pymc3/variational/opvi.py

+class GroupApproximation(object):
+    """
+    Grouped Approximation that is used for modelling mutual dependencies
+    fro a specified group of variables.


typo: fro->for

ferrine · 2017-07-15T17:22:19Z

API change proposal №1:

local variables are held in separate group. So that you do not pass a dict[var, (mu, rho)]. But specify it as a group of variables and provide needed params.

On low level(high level API can be less verbose) it will roughly look like

approx = Approximation([
    MeanField([local_rv], # group of local variables
        shapes=dict([(local_rv, (s, 12))]), # where `s` is symbolic
        params=dict(mu=mu, rho=rho)), # where mu, rho have valid shape: (flexible, latent_ndim)
        # in this case flexible shape is `s`, and latent ndim is 12
    MeanField(-1)  # delayed init with `-1` instead of group. Will capture all the rest variables
])

As a bonus the following construction will be possible

approx = Approximation([
    FullRank([local_rv], # group of local variables
        shapes=dict([(local_rv, (s, 12))]), # where `s` is symbolic
        params=dict(mu=mu, L_tril=L_tril)), # where mu, L_tril have valid shape: (flexible, param_standard_shape)
        # in this case flexible shape is `s`, and latent ndim is 12, so
        # mu.shape = (s, 12)
        # L_tril.shape = (s, 12 * (12 + 1) / 2)
    MeanField(-1)  # delayed init with `-1` instead of group. Will capture all the rest variables
])

Holding backward compatibility for the first iterations will make future work super hard. API change is still under discussion, feel free to suggest ideas and criticize solutions.

The main Idea behind is handling local_rv as a separate group with it's flexible shape that depends on dynamic shape. Now I try to do it with 3d tensors where 2d dim holds for flexible shape. I hope to implement broadcastable Approximations so that one can parametrize any distribution for local variables. Even in this setup it's all hard enough.

ferrine · 2017-07-15T17:34:35Z

Personally I feel myself free to change API for 3.2 as we have released 3.1 but still a bit worried about regular users. They might not like so rapid changes in API. For top level API like ADVI, NFVI I do not expect significant changes as they are just a special case in Structured VI framework

ferrine · 2017-07-15T17:36:50Z

I also invite @fonnesbeck for discussion

junpenglao · 2017-07-15T18:35:27Z

How is it compare to the current API? The global and local vars are pass together to the Approximation?
I think now is still a good time to make API changes, as long as the high-level API is the same (e.g., pm.fit still has the same API). Personally, I dont interact with the low-level API this way.

ferrine · 2017-07-16T14:25:01Z

I think reasonable way to go is to restrict so called groups for local variables to have only one variable per group.

OK
local_1 : {locv_1}
local_2: {locv_2}
global_1: {var1, var2}
global_2: {var3, var4}

not OK
local_1 : {locv_1, locv_2}
global_1: {var1, var2}
global_2: {var3, var4}

not OK
local_1 : {locv_1}
global_1: {var1, var2, locv_2}
global_2: {var3, var4}

So we'll have no way to capture interactions between local and global variables as well as local with local. This is exactly what we have now but much more flexible. Here one can parametrize local variables with flows of FullRank or separate groups with more complex distributions rather than MeanField.

BTW I see it too hard to implement convenient API that allows more than one variable in local group.

fonnesbeck · 2017-07-16T19:20:28Z

pymc3/variational/opvi.py

+        self._mid_size = size
+        self.vmap = dict()
+        self.ndim = 0
+        self._global_ = True


What’s the trailing underscore for?

I use this to mention that it should not be changed even internally after init

I see. Another option would be to use all caps (e.g. self.GLOBAL). This is common for fixed variables, such as constants in Python. Not sure what PEP8 has to say about that, but the Google style guide uses this convention.

I think all caps is more about class attributes, not instance specific

fonnesbeck · 2017-07-16T19:32:46Z

I still need to properly read these papers. Could there not be default groupings based on, say, hierarchical structure in the model?

Whether a drastic API change is worth it depends on what structured VI gives us beyond what can be achieved with normalizing flows. The copula VI paper shows a 4x slowdown compared to mean field; is this comparable to what we are getting with NFVI?

I haven’t seen these implemented anywhere, which seems odd. Would have thought Edward would have them. Any idea why not?

ferrine · 2017-07-16T19:41:13Z

NFVI is not scalable in terms of model size. Ofc there is a slowdown but there is a tradeoff. We can vary flow length and thus computational cost. I want to make this tradeoff more flexible and require flows or FullRank when needed.

ferrine · 2017-07-16T19:44:10Z

Remark: NFVI is the most scalable approach we currently have for complex models but we can make it better

junpenglao · 2017-07-16T19:44:21Z

Would have thought Edward would have them. Any idea why not?

We can ask @dustinvtran for some opinions.

ferrine · 2017-07-16T19:54:06Z

I find implementation not easy. I need careful architecture choice first.

ferrine · 2017-07-16T20:00:13Z

Interested to hear from @dustinvtran too

taku-y · 2017-07-17T07:08:18Z

I think reasonable way to go is to restrict so called groups for local variables to have only one variable per group.

I agree. If two local variables should interact with each other, these variables can be merged when inputted to the network for posterior parameter estimation. In my opinion, you can first implement for "OK" case. Then, if required, other two cases would be considered.

dustinvtran · 2017-07-17T07:09:56Z

In practice, the tradeoff for any algorithm is its perceived benefit vs its complexity of implementation. I often use mean-field and structured approaches (i.e., low rank Gaussians) as simple baselines. They're fast and work well on many problems. Copula VI is difficult to implement if the library doesn't already have extensive support for copula distributions like R does, but it's also robust.

I haven't seen normalizing flows, operator VI, and many others used for problems besides training deep generative models. This is because they're quite difficult to train in practice, which means they're generally restricted to experts who are knowledgeable to get them to work.

Would have thought Edward would have them. Any idea why not?

Edward supports structured approximations. E.g., you just specify a Cholesky-parameterized normal distribution as your approximating family. If TensorFlow had support for copula distributions, that would also just work.™

ferrine · 2017-07-17T07:18:20Z

What is the connectivity for low rank (structured) approximations and householder flows?

ferrine · 2017-07-18T09:17:16Z

I'm still a bit stuck with architecture choice. I want to make more assumptions about local vars. They are the following

There is one flexible dim.
There is the first dimension that holds flexibility

How do feel about it?

junpenglao · 2017-07-18T09:25:22Z

I am still going through the paper...

taku-y · 2017-07-18T11:58:43Z

@ferrine Does the second choice means that the first dimension is the minibatch size? I don't understand the first choice.

ferrine · 2017-07-18T16:38:00Z

That's right. First one is about user should have only one subsampling dimension there.

taku-y · 2017-07-19T03:35:55Z

I don't come up with any ideas when "one flexible dim" is required. I think the 1st dim should always be mini-batch size for latent variables.

ferrine · 2017-07-19T06:41:02Z

I feel like we speak about the same things. 1st dim is for minibatches and no other.

ferrine · 2017-07-19T06:41:41Z

It is flexible because it depends on batch size

junpenglao · 2017-07-19T20:17:00Z

@ferrine I dont think it is a good idea to rename it as Group, Global and Local. The name is too generic.

ferrine · 2017-07-19T20:18:12Z

@aseyboldt Could you please review my last commit? I've decided to use dynamic class dispatching to use single implementation for Local and Global groups. Approximation families are supposed to be inherited from Group class. I wonder if you approve this architecture choice so I can focus on it and refactor all the rest related codebase

ferrine · 2017-07-19T20:21:56Z

@junpenglao I see your point and had the same concerns. I'm thinking about better name for Group. Maybe {Group|Local|Global}Approx?

ferrine · 2017-07-19T21:03:06Z

I feel like it is a good direction but I still have bad design. After calling new I'll then have Local or Global group but not approximation instance. Thus, calling init will give invalid results

ferrine · 2017-07-19T21:12:39Z

I have 2 solutions:

choose implementation based on ._is_local flag
create metaclass and substitute instance before init (https://stackoverflow.com/a/5961102)

# metaclass' call
def __call__(cls, *args, **kwargs):
    is_local = kwargs.get('local', False)
    if is_local:
        _cls = Local
    else:
        _cls = Global
    instance = object.__new__(_cls)
    cls.__init__(instance, *args, **kwargs)
    return instance

Which is better?

ferrine · 2017-07-20T07:42:02Z

I think that first one will be better

fonnesbeck · 2017-09-04T13:05:58Z

pymc3/math.py

+
+    def make_node(self, *matrices):
+        if not matrices:
+            raise ValueError('Got no matrices to allocate')


Just “No matrices to allocate” is more appropriate.

fonnesbeck · 2017-09-04T13:07:47Z

pymc3/util.py

+def get_transformed(z):
+    if hasattr(z, 'transformed'):
+        z = z.transformed
+    return z


Need newline

fonnesbeck · 2017-09-04T13:14:05Z

Do we need an example or a notebook on how approximation groups are used? It won’t be entirely clear to users.

ferrine · 2017-09-04T15:05:20Z

I yet do not have a good example for that. It can be done after special features like gumbel softmax

junpenglao · 2017-09-04T20:45:14Z

@taku-y do you have any further comment? Otherwise, this is ready to merge!

taku-y · 2017-09-08T08:41:43Z

@ferrine @junpenglao I'm sorry for not responding. The code and documentation looks good to me. Though I didn't run the notebooks, I believe it's fine to merge.

twiecki · 2017-09-08T08:47:44Z

Thanks @ferrine, this is a massive piece of work and adds amazing new functionality. Thanks also to all the reviewers!

ferrine · 2017-09-09T17:47:08Z

pymc3/tests/test_variational_inference.py

+        try:
+            approx = Approximation([cls([three_var_model.one], batched=True, **kw), Group(None, vfam='mf')])
+            inference = pm.KLqp(approx)
+            approx = inference.fit(3, obj_n_mc=2)


I see typo here. batched->rowwise

junpenglao reviewed Jul 14, 2017

View reviewed changes

twiecki added this to the 3.2 milestone Jul 14, 2017

fonnesbeck reviewed Jul 16, 2017

View reviewed changes

ferrine added 19 commits September 2, 2017 10:43

use strict replacements for flat_input

b5ac106

fix typo in error

53f8c4f

update empirical-approx-overview.ipynb

d437aab

update normalizing_flows_overview.ipynb

4f185a7

update api_quickstart.ipynb

85af2f6

found and fixed a bug with aevb

c2cf5d3

update convolutional_vae_keras_advi.ipynb

2f226a3

fix typo

51108d4

move to floatX

298fa23

update variational_api_quickstart.ipynb

0a02391

update normalizing_flows_overview.ipynb

d20b4cc

update warning

b8d4816

fix typos

f0f2e58

refine docs, add dev-docs

2dfd78c

Better docs for approximations

cc8fd34

Add some notes, CC @junpenglao

ff5d276

Refactor notes, CC @junpenglao

f16c823

autopep8 pymc3/variational --select=E101,E121 --in-place --recursive

1453b85

remove duplicate doc for asvgd

6af086a

fonnesbeck reviewed Sep 4, 2017

View reviewed changes

errmsg fix

023876b

twiecki merged commit 569a82e into pymc-devs:master Sep 8, 2017

ferrine deleted the approximations_refactoring branch September 8, 2017 15:39

ferrine commented Sep 9, 2017

View reviewed changes

Refactoring to support Structured VI #2416

Refactoring to support Structured VI #2416

Conversation

ferrine commented Jul 13, 2017

Choose a reason for hiding this comment

ferrine commented Jul 15, 2017 • edited Loading

ferrine commented Jul 15, 2017

ferrine commented Jul 15, 2017

junpenglao commented Jul 15, 2017

ferrine commented Jul 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fonnesbeck commented Jul 16, 2017 • edited Loading

ferrine commented Jul 16, 2017

ferrine commented Jul 16, 2017

junpenglao commented Jul 16, 2017

ferrine commented Jul 16, 2017

ferrine commented Jul 16, 2017

taku-y commented Jul 17, 2017

dustinvtran commented Jul 17, 2017 • edited Loading

ferrine commented Jul 17, 2017

ferrine commented Jul 18, 2017 • edited Loading

junpenglao commented Jul 18, 2017

taku-y commented Jul 18, 2017

ferrine commented Jul 18, 2017

taku-y commented Jul 19, 2017

ferrine commented Jul 19, 2017

ferrine commented Jul 19, 2017

junpenglao commented Jul 19, 2017

ferrine commented Jul 19, 2017

ferrine commented Jul 19, 2017

ferrine commented Jul 19, 2017 • edited by junpenglao Loading

ferrine commented Jul 19, 2017 • edited Loading

ferrine commented Jul 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fonnesbeck commented Sep 4, 2017

ferrine commented Sep 4, 2017

junpenglao commented Sep 4, 2017

taku-y commented Sep 8, 2017

twiecki commented Sep 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferrine commented Jul 15, 2017 •

edited

Loading

ferrine commented Jul 16, 2017 •

edited

Loading

fonnesbeck commented Jul 16, 2017 •

edited

Loading

dustinvtran commented Jul 17, 2017 •

edited

Loading

ferrine commented Jul 18, 2017 •

edited

Loading

ferrine commented Jul 19, 2017 •

edited by junpenglao

Loading

ferrine commented Jul 19, 2017 •

edited

Loading