Always add values from drawn dict to givens #3792

lucianopaz · 2020-01-28T10:17:55Z

This closes #3789

The problem was related to an old patch (b9f960a) that didn't really add all the drawn values from nested _DrawValuesContext into the givens dictionary in draw_values. This PR makes sure to always add the necesary values from drawn into givens.

rpgoldman · 2020-01-29T16:37:47Z

Thanks @lucianopaz I'll have a look at it as soon as I can grab a minute. I'll see what's going on with the checks and will also move my bug issue into the tests.

codecov · 2020-01-29T17:22:04Z

Codecov Report

Merging #3792 (edbafaa) into master (68d5201) will decrease coverage by 0.77%.
The diff coverage is 42.94%.

@@            Coverage Diff             @@
##           master    #3792      +/-   ##
==========================================
- Coverage   88.19%   87.42%   -0.78%     
==========================================
  Files          89       88       -1     
  Lines       14358    14342      -16     
==========================================
- Hits        12663    12538     -125     
- Misses       1695     1804     +109

Impacted Files	Coverage Δ
pymc3/distributions/bart.py	`80.80% <0.00%> (ø)`
pymc3/sampling_jax.py	`0.00% <0.00%> (ø)`
pymc3/model.py	`88.88% <81.81%> (-0.48%)`	⬇️
pymc3/distributions/multivariate.py	`81.17% <85.71%> (+0.04%)`	⬆️
pymc3/distributions/distribution.py	`94.31% <87.50%> (-0.12%)`	⬇️
pymc3/backends/base.py	`86.92% <100.00%> (ø)`
pymc3/distributions/bound.py	`92.36% <100.00%> (ø)`
pymc3/distributions/simulator.py	`82.19% <100.00%> (ø)`
pymc3/model_graph.py	`88.57% <100.00%> (+0.16%)`	⬆️
pymc3/sampling.py	`85.50% <100.00%> (-0.99%)`	⬇️
... and 8 more

rpgoldman · 2020-01-29T17:22:40Z

Rebased onto master to remove the merge conflict.

rpgoldman · 2020-01-29T19:30:37Z

I added a test for this issue to test_sampling.py. I now get an error that is different from the original one -- it's ValueError: probabilities do not sum to 1, but this still seems to be broken.

rpgoldman · 2020-01-29T19:43:44Z

I have checked and the probabilities sum to 0.999944722004737. It looks like numpy.random.choice() is insufficiently tolerant of floating point error for our purposes.

This is in some sense a new issue in pymc3.distributions.dist_math.random_choice(). On line 352 of this file, we need to renormalize the p_ vectors to remove floating point errors. But I hate to quash errors that are real.

Possible approch:

Check to make sure that each p_ vector is "close enough" to 1. If not, raise the ValueError. Picking an epsilon for this makes me quite uncomfortable. I have no idea how to pick a reasonable one. 10E-4 would make this problem go away. Is that good enough?
Renormalize p_ to avoid floating point error. Again, I'm not good at numerical algorithms, so I don't have an intuition about how this is to be done. Could we turn these numbers into rationals?

Doing this for all the p_ vectors would be quite expensive. So would it be reasonable to only do this if we get a ValueError, otherwise keep things as they are?

lucianopaz · 2020-01-29T21:58:12Z

Hi @rpgoldman, I wasn't able to write down earlier what I investigated regarding the failing test.

The ValueError: probabilities do not sum to 1 is a floating point error in the definition of the stick_breaking function in cell 19 of the notebook. You can fix it by changing it to:

def stick_breaking(beta):
    portion_remaining = tt.concatenate([[1], tt.extra_ops.cumprod(1 - beta)[:-1]])
    result = beta * portion_remaining
    return result / tt.sum(result, axis=-1, keepdims=True)

The real problem is the other failing test: test_mixture.py::TestMixture::test_sample_prior_and_posterior

It's quite nasty and will take some time to fix:

Background

All of our troubles with the Mixture distribution come from the fact that our distributions can't infer their shape from their parameters. This means that they cannot know a priori what will be the shape from a call to random(point, size) in a hierarchical model. Mixtures are worse than normal distributions because they have to wrap the random method of their component distribution and somehow compatibilize the provided size, the component distribution's shape and the mixture distribution's shape.

The problem

Mixtures can have observations, and the components distributions aren't aware of that. To overcome this problem, we try to compute the total number of calls that we should make to the comp_dist.random method. We then call comp_dist.random with this number as size. This means that within the forward pass through the model, that draw_values makes, we change the size parameter.

The problem is that shape_utils relies on the size parameter in its broadcasting functions to be able to broadcast samples drawn from many distributions in a hierarchical model without failing because of the size prepend axes. By changing size midway through draw_values, the broadcasting functions start to fail, and everything just falls apart.

How can we fix this?

We will have to restructure the Mixture.random method. The only safe thing to do is to never change size, and write a for loop to make the extra missing draws that we were trying to vectorize. This will hurt our runtimes a bit, but is the only safe course of action.

rpgoldman · 2020-01-29T22:40:48Z

Fixed that test. Thanks for providing the fix to the stick_breaking function.

twiecki · 2020-07-27T09:39:23Z

What's the status of this @rpgoldman, can we merge?

fonnesbeck · 2020-10-05T01:41:50Z

Looks like this needs a rebase, and test_sample_prior_and_posterior appears to be failing. @lucianopaz

Spaak · 2020-11-23T15:38:34Z

@rpgoldman @lucianopaz What's the status of this? The PR is marked to go into release 3.10, but it seems like a rebase is needed before this can be merged.

rpgoldman · 2020-11-23T16:00:58Z

OK, I have rebased this. I'm not at all sure what the status is, since the patch was by @lucianopaz, not me -- I just did a little tweaking. Let's see what happens on the tests after the rebase.

RELEASE-NOTES.md

pymc3/tests/test_sampling.py

Spaak · 2020-11-27T15:27:54Z

@lucianopaz earlier there were some trivially failing tests due to missing imports (I think related to the rebase by @rpgoldman), but I fixed those. Now the tests appear to be failing in a more relevant manner. It would be great if you could have a look so that we can get this into 3.10 :)

lucianopaz · 2020-11-27T15:32:19Z

Thanks, @Spaak, but I won't be able to look into this before next year. From what I remember, there was a deeper problem with the mixture distribution that had to be fixed to get this to work, and I won't be able to fix it with sporadic work.

twiecki · 2020-11-27T15:34:02Z

Let's hold off on this then.

Spaak · 2020-11-27T15:34:20Z

Thanks for the quick reply, I'll remove this from 3.10 then.

michaelosthege · 2021-06-05T14:55:24Z

@lucianopaz @rpgoldman since the entire draw_values function has been replaced, this PR remains relevant only for v3.
Should we close it, or do you want to retarget/rebase it for v3?

michaelosthege · 2021-06-21T16:27:44Z

@lucianopaz @rpgoldman I squashed and rebased this branch. If this bugfix is still relevant, this might be the chance to get it into the 3.11.3 release?

rpgoldman · 2021-06-21T17:43:03Z

I'm afraid I really don't know: the whole posterior predictive code base has changed. Probably this PR cannot be saved, but @lucianopaz should have the final word.

michaelosthege · 2021-07-02T16:18:03Z

Excluding the tests that failed because of changes in scipy's chisquared API, there are just two failures related to the changes made by this PR:

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 15.2 GiB for an array with shape (136000, 500, 30) and data type float64 from pymc3/tests/test_sampling.py::test_prior_sampling_mixture
assert (20, 3) == (20, 100, 3) from pymc3/tests/test_mixture.py::TestMixture::test_sample_prior_and_posterior

The second one is probably easy to fix, but the first one may be a systematic problem. @lucianopaz @rpgoldman fix and merge or close?

Co-authored-by: Eelke Spaak <[email protected]> Co-authored-by: Robert P. Goldman <[email protected]> Co-authored-by: Michael Osthege <[email protected]>

michaelosthege · 2021-08-15T12:29:51Z

Rebased, so the remaining CI failures are directly related to these changes. @lucianopaz can you make a decision to close or fix&merge this PR?

michaelosthege · 2021-08-20T16:47:10Z

I'm closing this. If anyone has an interest in it please fix the tests and reopen.

rpgoldman force-pushed the iss3789 branch from 48c03bb to 4018d1d Compare January 29, 2020 17:22

michaelosthege added the bug label Nov 14, 2020

michaelosthege added this to the 3.10 milestone Nov 14, 2020

rpgoldman force-pushed the iss3789 branch from 4fcc0d6 to a688e80 Compare November 23, 2020 16:00

rpgoldman force-pushed the iss3789 branch from 132b8da to 0b65a2c Compare November 23, 2020 17:18

Spaak force-pushed the iss3789 branch from f0a6e35 to 0b65a2c Compare November 27, 2020 13:15

twiecki reviewed Nov 27, 2020

View reviewed changes

RELEASE-NOTES.md Outdated Show resolved Hide resolved

twiecki reviewed Nov 27, 2020

View reviewed changes

pymc3/tests/test_sampling.py Outdated Show resolved Hide resolved

Spaak force-pushed the iss3789 branch 2 times, most recently from 45c316d to 0b65a2c Compare November 27, 2020 13:47

twiecki removed this from the 3.10 milestone Nov 27, 2020

michaelosthege added the wontfix label Feb 26, 2021

michaelosthege marked this pull request as draft March 27, 2021 12:29

michaelosthege added the v3 label Jun 5, 2021

michaelosthege removed the wontfix label Jun 5, 2021

michaelosthege force-pushed the iss3789 branch from 14c8d23 to 53f1173 Compare June 21, 2021 16:21

michaelosthege changed the base branch from main to v3 June 21, 2021 16:21

michaelosthege force-pushed the iss3789 branch from 53f1173 to ee7d211 Compare June 21, 2021 16:24

michaelosthege marked this pull request as ready for review June 21, 2021 16:24

michaelosthege added this to the vNext (3.11.3) milestone Jun 21, 2021

lucianopaz and others added 2 commits August 15, 2021 12:32

Always add values from drawn dict to givens

a13240c

Co-authored-by: Eelke Spaak <[email protected]> Co-authored-by: Robert P. Goldman <[email protected]> Co-authored-by: Michael Osthege <[email protected]>

Fix variable name

3db1f29

michaelosthege force-pushed the iss3789 branch from 446311b to 3db1f29 Compare August 15, 2021 10:32

michaelosthege closed this Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always add values from drawn dict to givens #3792

Always add values from drawn dict to givens #3792

lucianopaz commented Jan 28, 2020

rpgoldman commented Jan 29, 2020

codecov bot commented Jan 29, 2020 •

edited

Loading

rpgoldman commented Jan 29, 2020

rpgoldman commented Jan 29, 2020

rpgoldman commented Jan 29, 2020

lucianopaz commented Jan 29, 2020

rpgoldman commented Jan 29, 2020

twiecki commented Jul 27, 2020

fonnesbeck commented Oct 5, 2020

Spaak commented Nov 23, 2020

rpgoldman commented Nov 23, 2020 •

edited

Loading

Spaak commented Nov 27, 2020

lucianopaz commented Nov 27, 2020

twiecki commented Nov 27, 2020

Spaak commented Nov 27, 2020

michaelosthege commented Jun 5, 2021

michaelosthege commented Jun 21, 2021

rpgoldman commented Jun 21, 2021

michaelosthege commented Jul 2, 2021

michaelosthege commented Aug 15, 2021

michaelosthege commented Aug 20, 2021

Always add values from drawn dict to givens #3792

Always add values from drawn dict to givens #3792

Conversation

lucianopaz commented Jan 28, 2020

rpgoldman commented Jan 29, 2020

codecov bot commented Jan 29, 2020 • edited Loading

Codecov Report

rpgoldman commented Jan 29, 2020

rpgoldman commented Jan 29, 2020

rpgoldman commented Jan 29, 2020

lucianopaz commented Jan 29, 2020

Background

The problem

How can we fix this?

rpgoldman commented Jan 29, 2020

twiecki commented Jul 27, 2020

fonnesbeck commented Oct 5, 2020

Spaak commented Nov 23, 2020

rpgoldman commented Nov 23, 2020 • edited Loading

Spaak commented Nov 27, 2020

lucianopaz commented Nov 27, 2020

twiecki commented Nov 27, 2020

Spaak commented Nov 27, 2020

michaelosthege commented Jun 5, 2021

michaelosthege commented Jun 21, 2021

rpgoldman commented Jun 21, 2021

michaelosthege commented Jul 2, 2021

michaelosthege commented Aug 15, 2021

michaelosthege commented Aug 20, 2021

codecov bot commented Jan 29, 2020 •

edited

Loading

rpgoldman commented Nov 23, 2020 •

edited

Loading