[WIP] Add Dirichlet-multinomial distribution. #3639

bsmith89 · 2019-10-01T23:19:14Z

Dirichlet multinomial distribution.

The self.random implementation is non-standard, not well tested, and may be broken, and the documentation is currently lacking. But the log-likelihood has been working for me for a few years without obvious issues.

twiecki · 2019-10-02T04:08:30Z

How can this be tested?

junpenglao · 2019-10-02T10:47:59Z

Need test +1. I think we can follow the test for BetaBinomial here.

junpenglao · 2019-10-02T10:50:42Z

See:
https://github.com/pymc-devs/pymc3/blob/e81df2d19ddc4066648b4b2dfc72431c6824f96f/pymc3/tests/test_distributions_random.py#L390-L392
https://github.com/pymc-devs/pymc3/blob/e81df2d19ddc4066648b4b2dfc72431c6824f96f/pymc3/tests/test_distributions_random.py#L604-L606
https://github.com/pymc-devs/pymc3/blob/e81df2d19ddc4066648b4b2dfc72431c6824f96f/pymc3/tests/test_distributions.py#L717-L718

ColCarroll · 2019-11-29T15:22:59Z

We should probably fix this, but you'll also have to add a line in pymc3/distributions/__init__.py to import this...

rpgoldman · 2019-11-29T16:35:21Z

pymc3/distributions/multivariate.py

+
+    Parameters
+    ----------
+    alpha : one- or two-dimensional array


As far as I can tell, sphinx won't format these correctly with the : set apart by spaces. I think you need to change these to, for example alpha: one- or two-dimensional array or the online docs will come out wrong.

I'm matching the style found in all of the other docstrings for distribution classes. Is this wrong?

i agree all the other docs in this file are doing this. I think it is ok if you leave it, and we can file an issue to fix these all in one go.

(@rpgoldman if that's ok with you)

rpgoldman · 2019-11-29T16:37:19Z

We should probably fix this, but you'll also have to add a line in pymc3/distributions/__init__.py to import this...

Why do we have the assignment to __add__ in this file, if we aren't going to simply do from .multinomial import * in distributions/__init__.py?

ColCarroll · 2019-12-06T22:50:38Z

@rpgoldman -- that's exactly what we should do, but it is not in the scope of this PR, I think.

bsmith89 · 2019-12-07T00:35:26Z

Should I rebase this branch with master as I develop the tests, and, if so, how do I then update the PR?

ColCarroll · 2019-12-07T01:11:08Z

You don't have to, but it reduces the chance of merge conflicts.

Yeah, if you rebase off master and push to your branch, it will update the PR, and run the test suite on CI.

Please do ping when it is ready for another review!

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

codecov · 2019-12-07T03:14:42Z

Codecov Report

Merging #3639 into master will decrease coverage by 0.08%.
The diff coverage is 21.42%.

@@            Coverage Diff             @@
##           master    #3639      +/-   ##
==========================================
- Coverage   89.93%   89.84%   -0.09%     
==========================================
  Files         134      134              
  Lines       20429    20458      +29     
==========================================
+ Hits        18373    18381       +8     
- Misses       2056     2077      +21

Impacted Files	Coverage Δ
pymc3/distributions/__init__.py	`100% <100%> (ø)`	⬆️
pymc3/distributions/multivariate.py	`76.78% <18.51%> (-2.16%)`	⬇️
pymc3/tuning/starting.py	`77.86% <0%> (-1.3%)`	⬇️
pymc3/variational/inference.py	`79.54% <0%> (-0.11%)`	⬇️
pymc3/tests/test_step.py	`100% <0%> (ø)`	⬆️
pymc3/tests/test_tuning.py	`100% <0%> (ø)`	⬆️
pymc3/smc/smc.py	`92.6% <0%> (ø)`	⬆️
pymc3/sampling.py	`83.09% <0%> (+0.08%)`	⬆️
pymc3/parallel_sampling.py	`86.71% <0%> (+0.09%)`	⬆️
pymc3/step_methods/arraystep.py	`94.32% <0%> (+0.7%)`	⬆️
... and 1 more

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

bsmith89 · 2019-12-16T23:30:46Z

Just pushed some (relatively messy) commits that include at least basic tests.

(@ColCarroll and others)

Right now I can see a few reasons this PR is still a WIP, but I'd welcome comments.

This implementation of the DM distribution only handles a 1d vector for n and a 2d vector for alpha. Some users may want to be able to specify a scalar n or higher numbers of dimensions. I'm having trouble figuring out best to make it polymorphic over possible dimensions of these parameters.
This implementation seems to require that the shape passed explicitly to __init__, even if it could be inferred from n.shape and alpha.shape. It's not clear to me what best practices are for passing the inferred shape to super().__init__.
Tests are not as extensive as for other distributions.

Nonetheless, this implementation seems to do what it says on the label. I'd welcome feedback, but may not have time in the next month or two to implement the fancy, shape-handling logic that I see for e.g. the Multinomial distribution.

Happy to squash and clean up some commits if that's wanted.

twiecki · 2020-07-27T09:38:21Z

This looks pretty good to merge, what do you think @ColCarroll?

AlexAndorra · 2020-09-19T17:44:14Z

This would still be a nice addition IMO! What should we do to push it over the finish line?
Just rebase on master and let CI run, or are there still some important features to add?

pymc3/distributions/multivariate.py

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

twiecki · 2020-12-23T15:36:55Z

Closing in favor of #4373.

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

@AlexAndorra

* Add implementation of DM distribution. * Fix class name mistake. * Add DM dist to exported multivariate distributions. * Export DirichletMultinomial in pymc3.distributions As suggested in #3639 (comment) Also see: #3639 (comment) but this seems to be part of a broader discussion. * Attempt at matching Multinomial initialization. * Add some simple tests for DM. * Correctly deal with 1d n and 2d alpha. * Fix typo in DM random. * Fix faulty tests for DM. * Drop redundant initialization test for DM. * Add test that DM is normalized for n=1 case. * Add DM test case based on BetaBinomial. * Update pymc3/distributions/multivariate.py * - Infer shape by default (copied code from Dirichlet Distribution) - Add default shape in `test_distributions_random.py` * - Use size information in random method - Change random unittests * - Restore merge accidental deletions * - Underscore missing * - More merge cleaning * Bring DirichletMultinomial initialization into alignment with Multinomial. * Align all DM tests with Multinomial. * Align DirichletMultinomial random implementation with Multinomial. * Match DM random method to Multinomial implementation. * Change alpha -> a Remove _repr_latex_ * Run pre-commit * Keep standard order of methods random and logp * Update docstrings for valid input types. Progress on batch test. * Add new test to ensure DM matches BetaBinom * Change DM alpha -> a in docstrings. * Test two additional parameterization shapes in `test_dirichlet_multinomial_random`. * Revert debugging comments. * Revert unrelated changes. * Fix minor Black inconsistency. * Drop no-longer-functional reshaping code. * Assert shape of random samples is as expected. * Explicitly test random sample shapes, including batch dimensions. * Sort imports. * Simplify _random It should be okay to not explicitly change the input dtype as in the multinomial, because the input to the np.random.dirichlet should be safe (it's fine to have float32 to float64 overflow from 1.00 to 1.01..., underflow from 0.01, to 0.0 would still be problematic, but we don't know if this is an issue yet...). The output of the numpy.random.dirichlet to numpy.random.multinomial should be safe since it is already in float64 by then. We still need to convert to the previous dtype, since numpy changes it by default. size_ argument was no longer being used. * Reorder tests more logically * Refactor tests Merged mode tests since shape must be given explicitly anyway Moved test_dirichlet_multinomial_random to test_distributions_random.py and renamed it to test_dirichlet_multinomial_shapes * Require shape argument Also allow more forgiveness if user passes lists instead of arrays (WIP/suggestion only) * Remove unused import `to_tuple` * Simplify logic to handle list as input for `a` * Raise ShapeError in random() * Finish batch and repr unittests * Add note about mode * Tiny rewording * Change mode to _defaultval * Revert comment for Multinomial mode * Update shape check logic * Add DM to release notes. * Minor docstring revisions as suggested by @AlexAndorra. * Revise the revision. * Add comment clarifying bounds checking in logp() * Address review suggestions * Update `matches_beta_binomial` to take into consideration float precision * Add DM to multivariate distributions docs. Co-authored-by: Byron Smith <[email protected]> Co-authored-by: Colin <[email protected]>

twiecki closed this Oct 2, 2019

twiecki reopened this Oct 2, 2019

twiecki added the WIP label Oct 2, 2019

rpgoldman reviewed Nov 29, 2019

View reviewed changes

bsmith89 added a commit to bsmith89/pymc3 that referenced this pull request Dec 7, 2019

Export DirichletMultinomial in pymc3.distributions

e932ae6

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

bsmith89 force-pushed the dirichlet-multinomial branch from 59f03e0 to e932ae6 Compare December 7, 2019 03:14

bsmith89 added 12 commits December 16, 2019 15:20

Add implementation of DM distribution.

c24214f

Fix class name mistake.

95f7d71

Add DM dist to exported multivariate distributions.

43b601f

Export DirichletMultinomial in pymc3.distributions

ac1983c

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

Attempt at matching Multinomial initialization.

ed0c68f

Add some simple tests for DM.

08cdf09

Correctly deal with 1d n and 2d alpha.

c73580c

Fix typo in DM random.

26a4202

Fix faulty tests for DM.

b82be63

Drop redundant initialization test for DM.

d16677c

Add test that DM is normalized for n=1 case.

f41450d

Add DM test case based on BetaBinomial.

46ceefb

bsmith89 force-pushed the dirichlet-multinomial branch from e932ae6 to 46ceefb Compare December 16, 2019 23:20

Merge branch 'master' into dirichlet-multinomial

00baf4f

ColCarroll reviewed Sep 19, 2020

View reviewed changes

pymc3/distributions/multivariate.py Outdated Show resolved Hide resolved

Update pymc3/distributions/multivariate.py

6e92bf2

ricardoV94 mentioned this pull request Nov 16, 2020

Adding new notebook for conjugate sampling #4199

Merged

AlexAndorra added enhancements help wanted labels Nov 19, 2020

ricardoV94 pushed a commit to ricardoV94/pymc that referenced this pull request Dec 22, 2020

Export DirichletMultinomial in pymc3.distributions

7a642ae

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

ricardoV94 mentioned this pull request Dec 22, 2020

Dirichlet multinomial (continued) #4373

Merged

15 tasks

twiecki closed this Dec 23, 2020

bsmith89 added a commit to bsmith89/pymc3 that referenced this pull request Dec 29, 2020

Export DirichletMultinomial in pymc3.distributions

2a63530

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

ricardoV94 pushed a commit to ricardoV94/pymc that referenced this pull request Jan 4, 2021

Export DirichletMultinomial in pymc3.distributions

24d7ec8

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add Dirichlet-multinomial distribution. #3639

[WIP] Add Dirichlet-multinomial distribution. #3639

bsmith89 commented Oct 1, 2019

twiecki commented Oct 2, 2019

junpenglao commented Oct 2, 2019

junpenglao commented Oct 2, 2019

ColCarroll commented Nov 29, 2019

rpgoldman Nov 29, 2019

bsmith89 Dec 6, 2019

ColCarroll Dec 6, 2019

ColCarroll Dec 6, 2019

rpgoldman commented Nov 29, 2019

ColCarroll commented Dec 6, 2019

bsmith89 commented Dec 7, 2019

ColCarroll commented Dec 7, 2019

codecov bot commented Dec 7, 2019 •

edited

Loading

bsmith89 commented Dec 16, 2019

twiecki commented Jul 27, 2020

AlexAndorra commented Sep 19, 2020

twiecki commented Dec 23, 2020

[WIP] Add Dirichlet-multinomial distribution. #3639

[WIP] Add Dirichlet-multinomial distribution. #3639

Conversation

bsmith89 commented Oct 1, 2019

twiecki commented Oct 2, 2019

junpenglao commented Oct 2, 2019

junpenglao commented Oct 2, 2019

ColCarroll commented Nov 29, 2019

rpgoldman Nov 29, 2019

Choose a reason for hiding this comment

bsmith89 Dec 6, 2019

Choose a reason for hiding this comment

ColCarroll Dec 6, 2019

Choose a reason for hiding this comment

ColCarroll Dec 6, 2019

Choose a reason for hiding this comment

rpgoldman commented Nov 29, 2019

ColCarroll commented Dec 6, 2019

bsmith89 commented Dec 7, 2019

ColCarroll commented Dec 7, 2019

codecov bot commented Dec 7, 2019 • edited Loading

Codecov Report

bsmith89 commented Dec 16, 2019

twiecki commented Jul 27, 2020

AlexAndorra commented Sep 19, 2020

twiecki commented Dec 23, 2020

codecov bot commented Dec 7, 2019 •

edited

Loading