Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cardinality and integer cast #481

Merged
merged 7 commits into from
Nov 27, 2020

Conversation

bouthilx
Copy link
Member

Fix cardinality

There was two errors.

  1. The integer bounds are inclusive, but the cardinality was computed as if
    the upper bound was exclusive.

  2. The shape affects cardinality exponentially, it would be
    (cardinality of one item) ** prod(shape), not (cardinality of one item) * prod(shape).

Fix casting of Integer

Why:

There is a mismatch between the inclusive bound of Integer and
the underlying distribution (which is real) where high is in theory exclusive
but in practice inclusive due to limited precision and rounding.
We cannot use numpy.round() to fix this because otherwise the probability
is changed for lowest and highest possible numbers. For instance a
uniform distribution (0, 10) would have a probability of 0.5 * (1/10)
for 0 instead of (1/10).

How:

In the cast method, rescale the point from (low, high) to
(low, high + 1) before computing the numpy.floor().astype(int).
Note that some distributions have infinite intervals, in which case we
do no rescaling.

Deprecate randint distribution

uniform(a, b, discrete=True) and randint(a, b) handles differently the intervals are
seems incompatible. We were already recommending uniform over randint
anyway.

@bouthilx bouthilx added the bug Indicates an unexpected problem or unintended behavior label Nov 27, 2020
@bouthilx bouthilx added this to the v0.1.10 milestone Nov 27, 2020
There was two errors.

1. The integer bounds are inclusive, but the cardinality was computed as if
the upper bound was exclusive.

2. The shape affects cardinality exponentially, it would be
(cardinality of one item) ** prod(shape), not (cardinality of one item) * prod(shape).
Why:

There is a mismatch between the inclusive bound of Integer and
the underlying distribution (which is real) where high is in theory exclusive
but in practice inclusive due to limited precision and rounding.
We cannot use numpy.round() to fix this because otherwise the probability
is changed for lowest and highest possible numbers. For instance a
uniform distribution (0, 10) would have a probability of 0.5 * (1/10)
for 0 instead of (1/10).

How:

In the cast method, rescale the point from (low, high) to
(low, high + 1) before computing the numpy.floor().astype(int).
Note that some distributions have infinite intervals, in which case we
do no rescaling.
uniform(a, b, discrete=True) and randint(a, b) handles differently the intervals are
seems incompatible. We were already recommending uniform over randint
anyway.
The Enumerate transformation changes the Categorical dimension
to Integer dimension. The interval() should change accordingly.
If the transformer does not handle interval and the original dimension
is a category, then we should return the categories. We cannot transform
them.
@bouthilx bouthilx force-pushed the hotfix/integer_cardinality branch from efd8abf to aa8eecb Compare November 27, 2020 17:05
@bouthilx bouthilx merged commit a49f548 into Epistimio:develop Nov 27, 2020
@bouthilx bouthilx deleted the hotfix/integer_cardinality branch November 27, 2020 17:48
@bouthilx bouthilx mentioned this pull request Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant