You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reading #4108 and #4249, it seems that there was some back and forth to try to match the behavior of scipy.stats.hypergeom with that of HyperGeometric, specially to pass the default unit tests when given invalid parameters. Unfortunately the current solution misbehaves often with invalid parameters.
(n=4 is impossible since it must be smaller or equal than N)
The initial PR draft had a bound to deal with invalid parameters, but it was removed so that logp would return the same values as the scipy counterpart (otherwise we would sometimes get pymc=-inf vs scipy=nan). Unfortunately, as shown in the example above, we cannot trust the pymc3 implementation to reliably return nan for invalid parameters. Scipy seems to have some parameter validation method, but it is not clear to me if and how it is used by the logpmf method:
Besides not being reliable, returning nan also seems to deviate from the usual pymc behavior of getting -inf for invalid configurations. In addition, the logcdf method for the HyperGeometric (see #4331) will also be incompatible with results of the scipy counterpart, which actually gives wrong results for some configurations of impossible parameters:
Reading #4108 and #4249, it seems that there was some back and forth to try to match the behavior of
scipy.stats.hypergeom
with that ofHyperGeometric
, specially to pass the default unit tests when given invalid parameters. Unfortunately the current solution misbehaves often with invalid parameters.Here is an example:
(
n=4
is impossible since it must be smaller or equal thanN
)The initial PR draft had a
bound
to deal with invalid parameters, but it was removed so thatlogp
would return the same values as the scipy counterpart (otherwise we would sometimes get pymc=-inf
vs scipy=nan
). Unfortunately, as shown in the example above, we cannot trust the pymc3 implementation to reliably returnnan
for invalid parameters. Scipy seems to have some parameter validation method, but it is not clear to me if and how it is used by thelogpmf
method:https://github.com/scipy/scipy/blob/624fd766e2886c458012d98e3c6ec85551ab2916/scipy/stats/_discrete_distns.py#L455-L458
Besides not being reliable, returning
nan
also seems to deviate from the usual pymc behavior of getting-inf
for invalid configurations. In addition, thelogcdf
method for the HyperGeometric (see #4331) will also be incompatible with results of the scipy counterpart, which actually gives wrong results for some configurations of impossible parameters:Given these issues, I suggest we depart from the scipy implementation and return to using
bound
to replace impossible configurations with-inf
. I am willing to do a PR (or give it to @Harivallabha if he is interested). To overcome the original issue with the unit tests, we can change thecheck_logp
method inpymc3_matches_scipy
to replace scipynan
with-inf
(perhaps with an optional argument, in case this conflicts with other distributions tests):https://github.com/pymc-devs/pymc3/blob/9dbf9eabf3b18bf2529a8855c086e90249be0b5e/pymc3/tests/test_distributions.py#L543-L551
What do you think?
The text was updated successfully, but these errors were encountered: