-
-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Bernoulli distribution #411
Conversation
I don't see how this achieves much, especially the implementations for non-boolean types. (I know I said I wanted a Bernoulli distribution, but |
I think this is more consistent. Similar to how If inlining and optimization work, the assert should be only evaluated once, so it could be slightly more efficient in tight loops. |
It seems the performance difference can be significant:
|
Okay, that is indeed a big difference. I'm wondering if we even need the assert. Unfortunately we do with this implementation of |
src/distributions/bernoulli.rs
Outdated
impl Distribution<bool> for Bernoulli { | ||
#[inline] | ||
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> bool { | ||
rng.gen_bool(self.p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most expensive piece here is multiplying p
and converting it to an integer, and a little the bounds test. Can you instead move the initialization to new
? Than we have a real reason to use a distribution here.
Also I reopened the issue because we wanted something with more precision than gen_bool
. If you multiply by u64::max
and use gen::<u64>
you get exactly 64 bits of precision, which seems pretty close to using HighPrecision01
which @dhardy and I originally wanted to use. And this should be faster, at least when the set-up happens in new
.
Also, does it make sense to make this version exact?
With the multiply method and using u64
s, there are exactly 2^64 + 1
steps between 0.0
and 1.0
. If we want to sample without any bias (not that that is worth much), always return true for the 1.0
case (possibly without using the RNG), multiply by u64::MAX + 1
, and compare the results with <
. I am curious what the performance will be. Interested in testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, the are EPSILON / 4 * 2^64 + 1
values that round up to 1.0 and can't be represented by f64
. Given that we are already in the rounding error territory, I see no reason to special-case 1.0
. Multiplying with u64::MAX
and comparing with <
should just be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, no, f64
has 53 bits of precision IIRC and u64::MAX
isn't exactly representable in f64
; I seem to remember you saying we should use this method with u64
@pitdicker.
I suppose we could multiply by 2.0.powi(64)
but there may still be a problem converting to u64
afterwards — does it clamp to its max value if out of bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u64::MAX
isn't exactly representable inf64
Oops, yes. I thought too much about how all representable values for p
in the range 0.0..1.0
are also representable when multiplied in 0..2^64
, that I forgot that multiplying by 2^64 - 1 rounds to multiplying by 2^64. So this method does not work without special-casing 1.0, per my first comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could multiply by 2.0.powi(64) but there may still be a problem converting to u64 afterwards — does it clamp to its max value if out of bounds?
No, it's much worse. If we want more than 32 bits of precision, we should probably use the traditional rng.gen::<f64>() <= p
method.
Going to 64 bit using the same method requires special casing large probabilties, because of a fun bug in the compiler: #![feature(test)]
extern crate test;
extern crate core;
use test::black_box;
fn main() {
let mut i = ::core::u64::MAX;
let mut p;
loop {
p = (black_box(1.0) * (i as f64)) as u64;
if p != 0 {
break;
}
i -= 1;
}
println!("maximal u64 i:\ni={:x}", ::core::u64::MAX);
println!("maximal u64 i where 1.0*(i as f64) as u64 == p != 0:\ni={:x}\np={:x}", i, p);
} prints
but without
|
I think you are just hitting the problem that 2^64 - 1 is not representable with |
How would that help? Note that |
I am pretty sure it does not work that way. |
It gives exactly the same result as |
This discussion is going a bit difficult as I should just write some code to test things... But |
Ok, if understood it correctly, I implemented what was suggested and added a test to make sure that it's enough to have a special case for 1.0. |
src/distributions/bernoulli.rs
Outdated
assert!(p >= 0.0, "Bernoulli::new called with p < 0"); | ||
assert!(p <= 1.0, "Bernoulli::new called with p > 1"); | ||
let p_int = if p != 1.0 { | ||
(p * (::core::u64::MAX as f64)) as u64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what happens here is that u64::MAX
gets cast to 2.0.powi(64)
, in which case the multiplication is probably only an adjustment to the exponent of p
. So if we stick with this method, IMO it makes more sense to test r < self.p_int
and document that Bernoulli::new(1.0)
may sample incorrectly (instead of 0.0), because then it's only 1.0 and small values not a multiple of 2^-64
which aren't sampled exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this correctly. What exactly is the advantage of using <
in place of <=
? That Bernoulli::new(1.0)
is not quite correct as opposed to Bernoulli::new(0.0)
? Why is that preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider 0.5: this should get mapped exactly to 2^63
; exactly half of all possible u64
values are less than this.
This is only the case because u64::MAX as f64
isn't representable and essentially becomes 2^64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to have an error smaller than 2-64. If someone wants more precision, they should probably just use Uniform
with only integers.
I decided against it, because the extra branch for every case did not seem worth it for getting rid of the tiny bias for a special case. I did not check how big the performance impact would be though. |
Benchmarks with the branch:
Benchmarks without:
I mainly suggested all this because @dhardy had concerns about the accuracy of |
That difference seems to be negligible, I could definitely live with that. (It might be more painful on platforms without branch prediction though, but I'm not sure we should worry about those.) |
I can already live with the 32-bit accuracy and tiny bias, even more so with this with 64 bit. But as it costs us little, having no easy to point out bias seems like a win to me (no need to defend the choice later...) @dhardy Interested in making a decisive vote? |
The suggestion I just made eliminates bias, except unfortunately with regards to 1.0. @pitdicker your suggestion makes the distribution's memory larger. Why not map 1.0 to |
That is a good idea. Just use a value known to be impossible instead of |
I added the special case for |
Then there was one point of discussion left from the issue: should The idea is that Because benchmarks with Xorshift can be a bit messy (I think it has some problems with using too many registers or something), I benchmarked here with
Before:
|
src/distributions/bernoulli.rs
Outdated
/// | ||
/// This `Bernoulli` distribution uses 64 bits from the RNG (a `u64`), making | ||
/// its bias less than 1 in 2<sup>64</sup>. In practice the floating point | ||
/// accuracy of `p` will usually be the limiting factor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true: (a) I don't think this method has any bias, other than when rounding due to limited precision, and (b) this method cannot represent values smaller than 2^-64 or not a multiple of that; f64
can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point, I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I corrected the remarks and made it possible to alternatively construct Bernoulli from integers for increased precision. Should I implement From<f64>
and From<u64>
?
/// Construct a new `Binomial` with the given shape parameters `n` (number | ||
/// of trials) and `p` (probability of success). | ||
/// | ||
/// Panics if `p <= 0` or `p >= 1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The p==0 and p==1 cases are trivial — should we support them? This might require extra branching however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but this is orthogonal to this PR.
src/distributions/bernoulli.rs
Outdated
pub fn new(p: f64) -> Bernoulli { | ||
assert!(p >= 0.0, "Bernoulli::new called with p < 0"); | ||
assert!(p <= 1.0, "Bernoulli::new called with p > 1"); | ||
let p_int = if p < 1.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we could allow p > 1
very easily — should we? My fear is that accumulated floating point errors could push a number slightly above 1 or below 0 when it shouldn't be. Unfortunately we cannot support <0 here without a jump.
Alternatively we could use strict bounds here but suggest use of p > rng.sample(HighPrecision01)
or similar where strict bounds may be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could. I thought about this. It would reduce the number of branches, but might mask errors. Accumulated FP errors will happen, but usually the user has to fix those anyway. Adding a branch for p < 0
is not a problem if we allow p < 0
, because then we can get rid of the asserts. Another advantage would be that we can make it panic free. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, except for one thing: if p
is NaN, the method should panic, since NaNs usually indicate serious problems which should be fixed. The same could be said for values not close to the range [0, 1]. Because of this maybe you are right, that users should fix accumulated FP errors themselves.
BTW wouldn't it reduce the number of branches if we used a single assert!(p >= 0.0 && p <= 1.0, ...)
instead?
I think it should. It is a convenience method (that I would actually prefer to remove), having a different precision would be surprising. If we want a lower precision method, I think it should be implemented as an additional distribution. Then we could decide which one |
src/distributions/bernoulli.rs
Outdated
/// | ||
/// This is more precise than using `Bernoulli::new`. | ||
#[inline] | ||
pub fn from_int(p_int: u64) -> Bernoulli { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this makes much sense to expose — if users have a probability expressed as a u64
it's quite easy to implement an appropriate test anyway, and users can deal with any bias as appropriate themselves. Our mapping of 1.0 to u64::MAX
is essentially a hack, taking advantage of the fact that this number could not be generated otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I'll remove it and suggest to use Uniform
instead.
src/distributions/bernoulli.rs
Outdated
@@ -48,6 +48,10 @@ impl Bernoulli { | |||
/// | |||
/// For `p = 1.0`, the resulting distribution will always generate true. | |||
/// For `p = 0.0`, the resulting distribution will always generate false. | |||
/// Due to the limitations of floating point numbers, not all internally | |||
/// supported probabilities (multiples of 2<sup>-64</sup>) can be specified | |||
/// using this constructor. If you need more precision, use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to say:
This method is accurate for any input
p
in the range[0, 1]
which is a mutliple of2^-64
. Values outside this range are treated as if they were0
or1
(whichever is nearest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/distributions/bernoulli.rs
Outdated
pub fn new(p: f64) -> Bernoulli { | ||
assert!(p >= 0.0, "Bernoulli::new called with p < 0"); | ||
assert!(p <= 1.0, "Bernoulli::new called with p > 1"); | ||
let p_int = if p < 1.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, except for one thing: if p
is NaN, the method should panic, since NaNs usually indicate serious problems which should be fixed. The same could be said for values not close to the range [0, 1]. Because of this maybe you are right, that users should fix accumulated FP errors themselves.
BTW wouldn't it reduce the number of branches if we used a single assert!(p >= 0.0 && p <= 1.0, ...)
instead?
src/distributions/bernoulli.rs
Outdated
/// | ||
/// This method is accurate for any input `p` in the range `[0, 1]` which is | ||
/// a multiple of 2<sup>-64</sup>. If you need more precision, use `Uniform` | ||
/// and a comparison instead. (Note that not all multiples of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, why use Uniform
, and what did you have in mind (given that this can generate ints and floats)? I don't think we'll include high-precision sampling code in Uniform
anyway. Ultimately I'd say here to use HighPrecisionFP
or whatever we get, but it's not available yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having something like rng.sample(Uniform::from(0..N)) < M
in mind, replacing the usecase of Bernoulli::from_int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, for fractional probabilities. Okay, but not everything's a fraction (try π/4). Personally I think just drop this bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, some fractional probabilities cannot be represented by f64
, but we can still sample them via u64
. I was trying to adress your comment:
if users have a probability expressed as a u64 it's quite easy to implement an appropriate test anyway, and users can deal with any bias as appropriate themselves
But I'm fine with just dropping it.
src/distributions/bernoulli.rs
Outdated
/// 2<sup>-64</sup> in `[0, 1]` can be represented as a `f64`.) | ||
#[inline] | ||
pub fn new(p: f64) -> Bernoulli { | ||
assert!(p >= 0.0 & p <= 1.0, "Bernoulli::new not called with 0 <= p <= 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really should get into the habit of running a few tests before pushing your commits!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that.
I ran the benchmarks. Is the slowdown to
I don't get why @pitdicker would you like to review before this gets merged? I think it's nearly ready. |
src/distributions/bernoulli.rs
Outdated
|
||
#[test] | ||
fn test_trivial() { | ||
let mut r = SmallRng::from_entropy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been using let mut rng = ::test::rng(123);
(but with a unique number) in most tests in order to make them reproducible (and not dependent on EntropyRng
). Not a big issue but might as well follow convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the reason to do so was not to make things reproducible, but to make the tests run without the std
feature. Now the CI is unhappy 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. We should document this somewhere.
fn test_average() { | ||
const P: f64 = 0.3; | ||
let d = Bernoulli::new(P); | ||
const N: u32 = 10_000_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 mil' makes this a litte slow. I guess leave it as-is for now though; maybe as part of #357 we can divide tests into two sets (fast and slow) or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately you need a lot of samples to get decent statistics. (The relative error probably scales with 1/sqrt(n_samples)
.)
Also support construction from integers.
* Remove `Bernoulli::from_int`. * Improve precision comments. * Only use one branch for asserts.
* Don't use libc. * Add assert message. * Add example for undefined behavior.
Done.
I think they are need, because the methods are likely to be used in hot loops, so the compiler should have the option to inline them across crates.
In this case I want to avoid a conditional jump, so I feel
What do you mean? Use
I don't want to do that before the end of review.
I'm not sure what you mean. |
The compiler always has the option, but do you want to force it to? I'd rather not see it mixed in with this PR, but I'll leave that up to @dhardy.
I am not sure why go the non-standard route here, and if it matters at all.
Yes,
👍
You want to multiply by 2^64, not 2^64 - 1. It is harder to write, but kind of an important detail when trying to understand the code in my opinion. Something like |
No, the compiler does not have that option across crates (without LTO). Note that
It probably generates the same code, but I think
I thought
Why? I would want |
Some interesting notes on Using I don't know about the benches. @pitdicker is correct about the constant: you compare the sample |
I tried that (when this issue was just a few days old), but |
Yes, and it is also not a |
@vks I think you still have some tweaks to make? I.e. remove some inline annotations (at least those on generic functions) and add a comment. Then I think we can merge. |
Yes, I need to do that and possibly improve the benchmarks. I'm a bit busy, but I'll try to find time tonight. |
They are redundant, because generic functions can be inlined anyway.
I did the remaining changes except for the benchmarks.
I don't think it can for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vks! Looks good to me. @pitdicker are you happy?
The benchmarks still need fixing. The results are unexpectedly fast for a reason, and I have already shown why. But I'm okay with merging now, and changing those later. |
I opened #448 for improving the benchmarks. |
gen_bool
, not a different implementation.bool
and let the user do the conversion? HavingDistribution<T>
implemented for more than oneT
can make type annotations annoying. If we ever addimpl Distribution<f32> for Normal
, it will probably break a lot of code...