Skip to content

Add random kwarg to DensityDist #2106#2805

Merged
twiecki merged 4 commits intopymc-devs:masterfrom
Vaibhavdixit02:editbranch1
Feb 5, 2018
Merged

Add random kwarg to DensityDist #2106#2805
twiecki merged 4 commits intopymc-devs:masterfrom
Vaibhavdixit02:editbranch1

Conversation

@Vaibhavdixit02
Copy link
Contributor

As per the discussion with @twiecki in #2106 I have implemented the random method for DensityDist, this is my first PR in PyMC3 so any feedbacks are very welcome. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to before *args to not mess with the order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works, you need to call self.rand(*args, **kwargs) and add those as well to the method call signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes more sense. 👍

@twiecki
Copy link
Member

twiecki commented Jan 18, 2018

That's a good start, definitely needs a test.

@Vaibhavdixit02
Copy link
Contributor Author

Which file would be appropriate for its test? I am confused between test_distributions.py and test_distributions_random.py

@twiecki
Copy link
Member

twiecki commented Jan 18, 2018

I would add a test to https://github.com/pymc-devs/pymc3/blob/718dab9ded5bf99d2fcfcd4ec8e2cf9769bb216a/pymc3/tests/test_distributions_random.py where you create a DensityDist for the Normal distribution (so the random function is trivial) and test it like we do the Normal random method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 requires spaces after arguments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@Vaibhavdixit02
Copy link
Contributor Author

Vaibhavdixit02 commented Jan 19, 2018

@twiecki Could you please provide the link to the gist or line number of the test for Normal random method you mentioned, I couldn't find it?

@junpenglao junpenglao added this to the 3.4 milestone Jan 22, 2018
@Vaibhavdixit02
Copy link
Contributor Author

Vaibhavdixit02 commented Jan 22, 2018

def test_densitydist(self):
        def ref_rand(size, mu, sd):
            return st.norm.rvs(size=size, loc=mu, scale=sd)
        normal_dist = pm.Normal.dist()
        pymc3_random(pm.DensityDist, {'logp':normal_dist.logp, 'random':normal_dist.random}, ref_rand=ref_rand)

This is the test I came up with, but it is throwing error when I run it locally. The error I got is,

distfam = <class 'pymc3.distributions.distribution.DensityDist'>, valuedomain = <pymc3.tests.test_distributions.Domain object at 0x7f4781f738d0>
vardomains = {'logp': <bound method Normal.logp of <pymc3.distributions.continuous.Normal object at 0x7f47125d3790>>, 'random': <bound method Normal.random of <pymc3.distributions.continuous.Normal object at 0x7f47125d3790>>}
extra_args = {}

    def build_model(distfam, valuedomain, vardomains, extra_args=None):
        if extra_args is None:
            extra_args = {}
        with Model() as m:
            vals = {}
            for v, dom in vardomains.items():
               vals[v] = Flat(v, dtype=dom.dtype, shape=dom.shape,
                               testval=dom.vals[0])
               AttributeError: 'function' object has no attribute 'dtype'

test_distributions.py:149: AttributeError

The problem is in the passing of functions (for logp and random) as parameters in the dictionary, I can't figure out any workaround for this can you help me out on this @twiecki?

@twiecki
Copy link
Member

twiecki commented Jan 23, 2018

This is a bit more tricky than I thought. This is how far I got:

    def custom_random(self, point=None, size=None, repeat=None):
        mu, tau, _ = draw_values([self.mu, self.tau, self.sd],
                                 point=point)
        return generate_samples(stats.norm.rvs, loc=mu, scale=tau**-0.5,
                                dist_shape=self.shape,
                                size=size)

    def test_density_dist(self):
        def ref_rand(size, mu, sd):
            return st.norm.rvs(size=size, loc=mu, scale=sd)
        def create_custom_dens(name, mu=0, sd=1, shape=None, transform=None):
            return pm.DensityDist(name, lambda value: np.log(st.norm(loc=mu, scale=sd).pdf(value)), #logp
                random=custom_random)
        pymc3_random(create_custom_dens, {'mu': R, 'sd': Rplus}, ref_rand=ref_rand)

A couple of notes:

Hopefully that gets us a step closer.

@Vaibhavdixit02
Copy link
Contributor Author

Thank you for such a detailed reply, I will try what you have suggested and get back to you once I have some working version ready.

@Vaibhavdixit02
Copy link
Contributor Author

@twiecki are you familiar with any other similar test that could serve as a reference? My efforts have not lead to anything, I think looking at some example would be helpful.

@junpenglao
Copy link
Member

@Vaibhavdixit02 implementing test could be tricky, you might find this PR useful: #2443
(As you can see from the discussion, I got stuck implementing the test at some point as well. You can have a look at the commit history, which might help as well)

@Vaibhavdixit02
Copy link
Contributor Author

Along the lines of the test for LKJCorr I tried.

    def test_density_dist(self):
        def ref_rand(size, mu, sd):
            return st.norm.rvs(size=size, loc=mu, scale=sd)
        
        class TestDensityDist(pm.DensityDist):

            def __init__(self, **kwargs):
                norm_dist = pm.Normal.dist()
                super(TestDensityDist, self).__init__('normal', logp=norm_dist.logp, random=norm_dist.random)

        pymc3_random(TestDensityDist, {},ref_rand=ref_rand)

But I am getting

    def __init__(self, **kwargs):
        norm_dist = pm.Normal.dist()
>       super(TestDensityDist, self).__init__('normal', logp=norm_dist.logp, random=norm_dist.random)
E       TypeError: __init__() got multiple values for keyword argument 'logp'

test_distributions_random.py:684: TypeError

Can't figure out why it says multiple arguments for logp

@twiecki
Copy link
Member

twiecki commented Jan 30, 2018

Because the first argument is already the logp (https://github.com/pymc-devs/pymc3/pull/2805/files#diff-eb25725602c01ec0a384e4b6d3946331R181). You can just drop the 'normal' arg.

@Vaibhavdixit02
Copy link
Contributor Author

This test was passing locally for me, please tell me if there is some conceptual error. 🙂

@twiecki
Copy link
Member

twiecki commented Jan 30, 2018

That looks great! Would it be easy from here to do the same put calling pm.DensityDist() directly?

@Vaibhavdixit02
Copy link
Contributor Author

Vaibhavdixit02 commented Jan 30, 2018

The problem with pm.DensityDist() is that without a model instance calling it is not possible (like all other distributions) to instantiate it and my attempts of passing it without instantiating it failed as we need to pass the logp and random from the pm.Normal.dist() object.

@twiecki
Copy link
Member

twiecki commented Jan 30, 2018

Oh it needs a model object on the stack? I suppose we could write a separate test that doesn't use the pymc3_random helper function and only tests that sampling works at all (i.e. not check the resulting distribution which is covered by this test). Sorry to be dense on this but I think we really should have a test in here that tests how a user would use it.

@Vaibhavdixit02
Copy link
Contributor Author

I think that would be the simplest and most straightforward approach, to create a separate model instance for DensityDist I am not sure how checking if the sampling is working fine would be implemented but over all this is the simpler approach.

@twiecki
Copy link
Member

twiecki commented Feb 1, 2018

@Vaibhavdixit02 The test I'm imagining would not test that sampling is valid. It would literarily just be a function that creates a model context, creates a DensityDist with random, and then does ppc and makes sure there is no exception and the len(ppc_samples) > 0. Correctness is already established by the test you have now.

@Vaibhavdixit02
Copy link
Contributor Author

Okay sure. I think we could use the normal logp and random method in this test too?

@twiecki
Copy link
Member

twiecki commented Feb 2, 2018 via email

@Vaibhavdixit02
Copy link
Contributor Author

@twiecki have added that test. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning we want to something like here: https://github.com/Vaibhavdixit02/pymc3/blob/216da86a6c8f9ae6516d133bc198ac45d678e716/pymc3/tests/test_distributions_random.py#L73 npt.assert_true(len(ppc) > 0, 'length of ppc sample is zero'). Check the np.testing submodule and other parts of the code for various usage patterns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Does it work with a scipy distribution inside a lambda as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite get that, can you elaborate on that a little?
My interpretation is something like

def mynormal_logp():
	norm_dist_logp = st.norm.logpdf
	norm_dist_random = np.random.normal
	density_dist = pm.DensityDist('density_dist', norm_dist_logp, random=norm_dist_random)

is that what you meant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

Copy link
Contributor Author

@Vaibhavdixit02 Vaibhavdixit02 Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test this locally and check! BTW does PyMC3 handle scipy distributions in general?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a bad merge.

@twiecki
Copy link
Member

twiecki commented Feb 2, 2018

This is shaping up nicely. See minor comments to resolve as well as the bad merge. Also, please add this (with credit) to the release-notes.

@Vaibhavdixit02
Copy link
Contributor Author

@twiecki added the test with scipy's normal distribution, the test passed locally but I am not very sure if it is correct, please provide feedback if it needs to be changed.

@twiecki
Copy link
Member

twiecki commented Feb 4, 2018

This looks great, thanks for figuring this out!

@twiecki
Copy link
Member

twiecki commented Feb 4, 2018

Only need to add this to the release notes.


try:
ppc = pm.sample_ppc(trace, samples=500, model=model, size=100)
if len(ppc) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to check if len(ppc) == 0

normal_dist = pm.Normal.dist()
density_dist = pm.DensityDist('density_dist', normal_dist.logp, random=normal_dist.random)
step = pm.Metropolis()
trace = pm.sample(5000, step)
Copy link
Member

@twiecki twiecki Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do fewer steps here, e.g. 100 with tuning=0.

@Vaibhavdixit02
Copy link
Contributor Author

Should I add it to the release notes or will the maintainers do that?

@twiecki
Copy link
Member

twiecki commented Feb 4, 2018 via email

@twiecki twiecki merged commit 760fba4 into pymc-devs:master Feb 5, 2018
@twiecki
Copy link
Member

twiecki commented Feb 5, 2018

Congrats @Vaibhavdixit02!

@Vaibhavdixit02
Copy link
Contributor Author

Yay! 😄

"""Distribution based on a given log density function."""

def __init__(self, logp, shape=(), dtype=None, testval=0, *args, **kwargs):
def __init__(self, logp, shape=(), dtype=None, testval=0, random=None, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add this to the doc-string.

with model:
norm_dist_logp = st.norm.logpdf
norm_dist_random = np.random.normal
density_dist = pm.DensityDist('density_dist', normal_dist_logp, random=normal_dist_random)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized, I think we need to turn this into an observed to actually get sampling from this.


try:
ppc = pm.sample_ppc(trace, samples=500, model=model, size=100)
if len(ppc) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this line.

@twiecki
Copy link
Member

twiecki commented Feb 5, 2018

@Vaibhavdixit02 Sorry, I realized two more things here (see new comments). Could you also address those in a separate PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants