Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniformFloat: allow inclusion of high in all cases #1462

Merged
merged 8 commits into from
Jul 16, 2024

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Jun 24, 2024

  • Added a CHANGELOG.md entry

Summary

Fix #1299 by removing logic specific to ensuring that we emulate a closed range by excluding high from the result.

Motivation

Fix #1299.

Additional motivation: floating-point types are approximations of continuous (real) numbers. Although one might expect that rng.gen_range(0f32..1f32) is strictly less than 1f32, it can also be seen as an approximation of the real range [0, 1) which includes values whose nearest approximation under f32 is 1f32.

In general, one can always expect that floating-point numbers may round up/down to the nearest representable value, hence this change is not expected to affect many users.

sample_single was changed (see below) as a simplification, and to match the new behaviour of new.

Details

Specific to floats (i.e. UniformFloat),

  • new sets scale = high - low (unchanged) then ensures that low + scale * max_rand <= high (previously: ensure < high)
  • new_inclusive sets scale = (high - low) / (1 - ε) then ensures that low + scale * max_rand <= high (unchanged)
  • sample_single is now equivalent to sample_single_inclusive (previously: use rejection sampling to ensure sample < high)
  • sample_single_inclusive is unchanged: it yields low + (high-low) * Standard.samgle(rng)

Note: new attempts to ensure that high may be yielded; sample_single_inclusive does not. This has not changed, but is a little surprising.

@dhardy
Copy link
Member Author

dhardy commented Jun 24, 2024

I should add that there is still a loop in new and new_inclusive, but I don't think it can be a problem any more. This would require that low + (high - low) / (1 - ε) * (1 - ε) rounds to a value greater than high, but additionally that there are many representable values between scale = (high - low) / (1 - ε) and the next smallest value which will round down to high. Specifically, we know that high is representable.

@sicking @WarrenWeckesser

@vks
Copy link
Collaborator

vks commented Jun 27, 2024

Additional motivation: floating-point types are approximations of continuous (real) numbers. Although one might expect that rng.gen_range(0f32..1f32) is strictly less than 1f32, it can also be seen as an approximation of the real range [0, 1) which includes values whose nearest approximation under f32 is 1f32.

I don't like this argument, because when calculating probabilities with real numbers, there is no difference between [0, 1) and [0, 1] when integrating the PDF. Therefore, the distinction only makes sense for floats. Also, when using something advertised as [0, 1), I would want to be able to rely on it, because otherwise I have to take care of the corner cases with rejection sampling when e.g. taking logs. If we cannot guarantee it, we might as well deprecate [0, 1) and only offer (0, 1).

I agree we should fix #1462.

new sets scale = (high - low) / (1 - ε) then ensures that low + scale * max_rand <= high (unchanged)

I think you meant new_inclusive?

@dhardy
Copy link
Member Author

dhardy commented Jun 27, 2024

I don't like this argument ...

With real numbers, both [0, 1] and [0, 1) have a range of 1, but the latter does not contain 1. With floats, it's pretty easy to lose precision — e.g. (1 - ε/2) + 1 → 2 — so does it make sense for us to take great care here? Will many users actually care that we do?

new and new_inclusive are part of trait UniformSampler and important to integers.

Meanwhile, we still provide Standard, OpenClosed01 and Open01 (but not Closed01). None of those are affected here.

I agree we should fix #1462.

I think you meant #1299?

@dhardy dhardy requested a review from TheIronBorn July 8, 2024 08:46
@dhardy dhardy added D-review Do: needs review X-discussion labels Jul 8, 2024
Copy link
Contributor

@TheIronBorn TheIronBorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks

@dhardy
Copy link
Member Author

dhardy commented Jul 16, 2024

I am now reasonably convinced that there are no (low, high) pairs of floating-point values such that more than one iteration of the loop in UniformFloat::new_bounded is required. Proof follows (and focusses on new_inclusive, which is the worst case).

Definition: ε is the smallest value such that 1 + ε is a representable value greater than 1.

Assumption: low, high are finite floating-point values.
Assumption: low ≤ high.
Assumption: (high - low) / (1 - ε) is finite.
(These assumptions are tested in new_inclusive.)

new_inclusive calculates scale = (high - low) / (1 - ε), then decreases to the next smallest value while scale × (1 - ε) > high.

Case 1: high - low rounds up. This is possible when low and high have opposite signs such that the result is larger than the absolute value of either low or high. An example is low = -1, high = 3ε/2 resulting in high - low → 1 + 2ε. Here, the exponent of the result is one larger than the maximum exponent of low and high. Since this is a rounding error, it is at worst one value too large.

Case 2: x / (1 - ε) rounds up such that x / (1 - ε) * (1 - ε) > x (where x = high - low). Note that floating-point numbers have format (-1)^sign * (1 + significand) * 2^exponent where 0 ≤ significant < 1; division by 1 - ε effectively increments the significand by 1 or 2 (or 0 in the case of sub-normals). In some cases this rounds up; I believe the worst (relative) case of this is (2 - ε) / (1 - ε) → 2 + 2ε such that the result, when multiplied by 1 - ε, is 2 (one representable value larger than 2 - ε).

In fact, I believe case 2 can only happen where the division causes a loss of precision (i.e. increases the exponent since x is very nearly the largest representable value with its exponent), and thus that the subtraction step (case 1) cannot also round up.

(We could thus potentially replace the loop with a single-step if, and I think we could remove it entirely for new (exclusives). These are only minor optimisations and thus excluded here.)

@dhardy dhardy merged commit 1e381d1 into rust-random:master Jul 16, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D-review Do: needs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uniform Generator hangs for certain limits.
3 participants