Use distributions in ML backends for better Poisson approximation #268

matthewfeickert · 2018-09-17T18:30:35Z

Description

Resolves #267

This essentially removes the use of a Normal approximation the Poisson distribution (poisson_from_normal=True) and instead uses either a tensor distribution or the approximation that makes use of n! = Gamma(n+1) to put the Poisson distribution's p.m.f. in a numerically stable continuous approximation.

This PR also eliminates some warnings that arise from files in the tests not getting properly closes. This came up when I was trying to debug memory issues in Travis.

autopep8 is applied at various places when I was working.

Checklist Before Requesting Approver

Tests are passing
"WIP" removed from the title of the pull request

lukasheinrich · 2018-09-17T18:37:08Z

@matthewfeickert something similar for pytorch?

matthewfeickert · 2018-09-17T18:37:49Z

something similar for pytorch?

I'm doing them all. :) Just after dinner.

lukasheinrich · 2018-09-17T18:42:50Z

awesome! ✨

matthewfeickert · 2018-09-17T19:22:03Z

@lukasheinrich @kratsg So at the moment the backend tests are failing because of the agreement between answers:

Printing the values and the std for test_pdf_eval and test_pdf_eval_2 we get

#test_pdf_eval

# NumPy, TensorFlow, PyTorch, MXNet
values: [-17.648827643136507, -17.64873504638672, -17.64873504638672, -17.64873504638672]
STD of values: 4.0095568812047764e-05
Relative to NumPy: [1.0, 0.9999947533767307, 0.9999947533767307, 0.9999947533767307]
Offset from NumPy: [0.0, 9.259674978778776e-05, 9.259674978778776e-05, 9.259674978778776e-05]
Relative to TensorFlow: [1.0000052466507965, 1.0, 1.0, 1.0]
Offset from TensorFlow: [-9.259674978778776e-05, 0.0, 0.0, 0.0]

and

#test_pdf_eval_2

# NumPy, TensorFlow, PyTorch, MXNet
values: [-23.579605171119738, -23.57958984375, -23.57958984375, -23.57958984375]
STD of values: 6.6369457830921696e-06
Relative to NumPy: [1.0, 0.9999993499734356, 0.9999993499734356, 0.9999993499734356
Offset from NumPy: [0.0, 1.5327369737860863e-05, 1.5327369737860863e-05, 1.5327369737860863e-05]
Relative to TensorFlow: [1.000000650026987, 1.0, 1.0, 1.0]
Offset from TensorFlow: [-1.5327369737860863e-05, 0.0, 0.0, 0.0]

So I guess the question is should we allow a STD of greater than 1e-5? Or is that too much?

The ML backends all agree with each other very well.

lukasheinrich · 2018-09-17T19:45:56Z

I think i'd be fine with that. Let's cut is as close to the vest as possible with a tol of 5e-5? @kratsg and I are wrestling with a similar issue in the interp code. At some point we migh need to figure out how to really compare this properly (by e.g. forcing single precision in numpy?)

matthewfeickert · 2018-09-17T21:15:41Z

@lukasheinrich @kratsg Given that this PR removes the need for using the Normal approximation of the Poisson distribution, is there any reason to keep the

self.pois_from_norm = kwargs.get('poisson_from_normal',False)

in the NumPy backend anymore? Or can I remove that everywhere?

kratsg · 2018-09-17T21:18:45Z

Or can I remove that everywhere?

Remove. The only reason it was there was to ameliorate differences and make it easier to compare cross-backend.

coveralls · 2018-09-17T21:23:40Z

Coverage increased (+0.3%) to 98.411% when pulling 96b161e on feature/use-distributions-in-ML-backends-for-better-poisson into cbb899f on master.

matthewfeickert · 2018-09-17T22:34:47Z

I seem to have broken something in the CI. The test suite all passes and then it core dumps. 😢

Things all pass fine on my local machine with Python 3.6.6, so I'll need to follow up on this. I'm guessing it has something to do with the huge numbers of divide by zero encountered in log in the NumPy backend, which is (strangely from what is printed) the error you get when you try to evaluate log(0) in NumPy.

~~This is actually showing a deeper problem though, as looking at the way we approximate the Poisson p.m.f.~~

np.exp((np.log(lam) * n) - lam - gammaln(n + 1.))

~~this should never have lam=0. And as it is wrapped in a log then lam has to get insanely small before there should be roundoff error triggering a 0.~~ Scratch that. This is a normal thing to happen if the NumPy minimizer method drives the parameter to pass through 0, which will happen if it doesn't have information on the gradients or some extra constraint.

lukasheinrich · 2018-09-17T22:38:13Z

yeah +1 on removing the normal approx

matthewfeickert · 2018-09-18T02:38:10Z

@kratsg @lukasheinrich If you have any thoughts on this corrupted size vs. prev_size and core dump issue that Travis is having, but that I can't reproduce locally I'd be very interested in hearing them.

kratsg · 2018-09-18T02:43:23Z

@kratsg @lukasheinrich If you have any thoughts on this corrupted size vs. prev_size and core dump issue that Travis is having, but that I can't reproduce locally I'd be very interested in hearing them.

Running out of memory. You're allocated some amount of memory to run python and you've exceeded it.

matthewfeickert · 2018-09-18T02:47:00Z

Running out of memory

Yeah, that part I understood. Sorry, I should have been more explicit. Do you have any ideas on why changing this approximation seems to have caused (what I'm assuming is) NumPy to cause the memory to grow out of the allowed limit?

lukasheinrich · 2018-09-18T11:16:05Z

@matthewfeickert can you try adding each backend one by one? it might be only a single one that is misbehaving

matthewfeickert · 2018-09-18T20:34:11Z

For reasons not fully understood, using TensorFlow Probability causes Travis CI to use too much memory and a core dump occurs.

So until this is understood all use of tfp should be reverted to a combination of tf.distributions and tf.contrib.distributions (for the Poisson).

This is really annoying, but at least a temporary fix that doesn't break all functionality and gains. However, even with this still enough memory is being eaten up that I needed to break the notebook tests off into their own parallel tests in the test stage of the CI.

For additional reference on this issue c.f.

matthewfeickert · 2018-09-18T20:58:12Z

@lukasheinrich @kratsg With the exception of the coverage, everything is passing now.

This is a large and strange PR, sorry about that, and it is not perfect in that there will need to be additional fixes in the future to actually get tfp in and working. It also strongly motivates me making time to do more research on Issue #79 and how that would play with Issue #238. However, it does give us the ability to get rid of the the Normal approximation for the NumPy backend.

Let me know your thoughts and if you want additional changes.

matthewfeickert · 2018-09-18T21:04:51Z

This may want to wait until PR #251 is in to make the rebase easier then this one going first. That being said, if there are no changes it would be nice to have this for tomorrow's talk.

Resolves issue of ResourceWarning: unclosed file <_io.TextIOWrapper name='<your path here>/pyhf/pyhf/data/spec.json' mode='r' encoding='UTF-8'> Additionally apply autopep8

Resolves issue of ResourceWarning: unclosed file

If pytest runs all tests at once then it exceeds the alloted memory for it by Travis CI. To deal with this, tests/test_notebooks is run by itself in a second run of pytest. This is a bit strange, but seems to work. As a result of this and the order that pyest runs the tests, the docstring tests finish just before the import tests run. This means that the last backend tested is still in memory, and so the NumPy backend needs to get set as that is what is needed for the test_import.py

For reasons not fully understood, using TensorFlow Probability causes Travis CI to use too much memory and a core dump occurs. So until this is understood all use of tfp should be reverted to a combination of tf.distributions and tf.contrib.distributions (for the Poisson). This is really annoying, but at least a temporary fix that doesn't break all functionality and gains. c.f. - pytest-dev/pytest#3527 - travis-ci/travis-ci#9153

Calling the set default backend before and ensures that the backend gets reset even if it just came off of a docstring test outside of the tests dir As a result, remove the explicit calls to set the NumPy backend in test_import.py

Mention that the continuous approximation is done using n! = Gamma(n+1) Additionally use :code: and :math: qualifiers on different terms to give specific highlighting in the rendered docs

matthewfeickert · 2018-09-20T15:36:25Z

Rebased against master to prepare to merge with PR #280

This reverts commit cbb899f.

matthewfeickert · 2018-09-20T18:32:02Z

Thanks for the support on getting this in @kratsg. 👍

matthewfeickert added the feat/enhancement New feature or request label Sep 17, 2018

matthewfeickert self-assigned this Sep 17, 2018

matthewfeickert force-pushed the feature/use-distributions-in-ML-backends-for-better-poisson branch from 67083d6 to 14122ca Compare September 17, 2018 21:12

matthewfeickert force-pushed the feature/use-distributions-in-ML-backends-for-better-poisson branch from 14122ca to d9493da Compare September 17, 2018 21:44

matthewfeickert force-pushed the feature/use-distributions-in-ML-backends-for-better-poisson branch 3 times, most recently from 420e1cd to dac3303 Compare September 18, 2018 02:20

matthewfeickert force-pushed the feature/use-distributions-in-ML-backends-for-better-poisson branch 3 times, most recently from 3aba184 to 691ea56 Compare September 18, 2018 20:25

matthewfeickert changed the title ~~[WIP] Use distributions in ML backends for better Poisson~~ Use distributions in ML backends for better Poisson approximation Sep 18, 2018

matthewfeickert requested review from lukasheinrich and kratsg September 18, 2018 20:55

matthewfeickert added 7 commits September 20, 2018 17:32

Wrap json.load in with clause to safely load and close

bbbb1bd

Resolves issue of ResourceWarning: unclosed file <_io.TextIOWrapper name='<your path here>/pyhf/pyhf/data/spec.json' mode='r' encoding='UTF-8'> Additionally apply autopep8

Wrap click.open_file in with clause to safely load and close

ead36db

Resolves issue of ResourceWarning: unclosed file

Add names to notebook tests for visual clarity

7158e61

Add a universal fix to starting backend

b02d7ae

Calling the set default backend before and ensures that the backend gets reset even if it just came off of a docstring test outside of the tests dir As a result, remove the explicit calls to set the NumPy backend in test_import.py

Improve docstring with Gamma function comment

94b6e8e

Mention that the continuous approximation is done using n! = Gamma(n+1) Additionally use :code: and :math: qualifiers on different terms to give specific highlighting in the rendered docs

matthewfeickert force-pushed the feature/use-distributions-in-ML-backends-for-better-poisson branch from 975805b to 94b6e8e Compare September 20, 2018 15:34

kratsg added 6 commits September 20, 2018 08:38

update benchmark tests to use conftest backends

42f0296

pip install minuit for benchmark tests

70d03c3

Revert "use scipy.special.xlogy in poisson computation (#277)"

476e04b

This reverts commit cbb899f.

test for poisson NaN

413c19e

fix up the test and we're good

77e7151

more verbose pytest for logs, much nicer to debug

5f371ea

kratsg mentioned this pull request Sep 20, 2018

Fix: bug in benchmark tests #280

Merged

2 tasks

kratsg added 3 commits September 20, 2018 08:44

remove conflict lines

277af5b

fix test_tensor

503af27

no verboseness

fafe676

matthewfeickert changed the title ~~[WIP] Use distributions in ML backends for better Poisson approximation~~ Use distributions in ML backends for better Poisson approximation Sep 20, 2018

Remove unused import of numpy_backend

41e9351

matthewfeickert mentioned this pull request Sep 20, 2018

Require uproot to be version 3.0.0 or newer #279

Merged

2 tasks

kratsg added 3 commits September 20, 2018 09:32

fix up coverage

1c5d25e

add exception for InvalidMeasurement, fix up coverage

96b161e

[skip ci] just removing an extraneous line

d1c7948

kratsg merged commit 49a9e87 into master Sep 20, 2018

kratsg deleted the feature/use-distributions-in-ML-backends-for-better-poisson branch September 20, 2018 17:05

This was referenced Sep 20, 2018

Use scipy.special.xlogy in poisson computation for numpy backend #277

Merged

Bug Report: Benchmark tests are broken on master #278

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use distributions in ML backends for better Poisson approximation #268

Use distributions in ML backends for better Poisson approximation #268

matthewfeickert commented Sep 17, 2018 •

edited

Loading

lukasheinrich commented Sep 17, 2018

matthewfeickert commented Sep 17, 2018

lukasheinrich commented Sep 17, 2018

matthewfeickert commented Sep 17, 2018 •

edited

Loading

lukasheinrich commented Sep 17, 2018

matthewfeickert commented Sep 17, 2018

kratsg commented Sep 17, 2018

coveralls commented Sep 17, 2018 •

edited

Loading

matthewfeickert commented Sep 17, 2018 •

edited

Loading

lukasheinrich commented Sep 17, 2018

matthewfeickert commented Sep 18, 2018

kratsg commented Sep 18, 2018

matthewfeickert commented Sep 18, 2018

lukasheinrich commented Sep 18, 2018

matthewfeickert commented Sep 18, 2018

matthewfeickert commented Sep 18, 2018

matthewfeickert commented Sep 18, 2018 •

edited

Loading

matthewfeickert commented Sep 20, 2018

matthewfeickert commented Sep 20, 2018

Use distributions in ML backends for better Poisson approximation #268

Use distributions in ML backends for better Poisson approximation #268

Conversation

matthewfeickert commented Sep 17, 2018 • edited Loading

Description

Checklist Before Requesting Approver

lukasheinrich commented Sep 17, 2018

matthewfeickert commented Sep 17, 2018

lukasheinrich commented Sep 17, 2018

matthewfeickert commented Sep 17, 2018 • edited Loading

lukasheinrich commented Sep 17, 2018

matthewfeickert commented Sep 17, 2018

kratsg commented Sep 17, 2018

coveralls commented Sep 17, 2018 • edited Loading

matthewfeickert commented Sep 17, 2018 • edited Loading

lukasheinrich commented Sep 17, 2018

matthewfeickert commented Sep 18, 2018

kratsg commented Sep 18, 2018

matthewfeickert commented Sep 18, 2018

lukasheinrich commented Sep 18, 2018

matthewfeickert commented Sep 18, 2018

matthewfeickert commented Sep 18, 2018

matthewfeickert commented Sep 18, 2018 • edited Loading

matthewfeickert commented Sep 20, 2018

matthewfeickert commented Sep 20, 2018

matthewfeickert commented Sep 17, 2018 •

edited

Loading

matthewfeickert commented Sep 17, 2018 •

edited

Loading

coveralls commented Sep 17, 2018 •

edited

Loading

matthewfeickert commented Sep 17, 2018 •

edited

Loading

matthewfeickert commented Sep 18, 2018 •

edited

Loading