The random selection doesn't appear to be very random #119

sgrif · 2016-01-15T17:19:08Z

I have a type which wraps an i32, and wrote an Arbitrary impl like so:

trait Arbitrary<PgDate> {
    fn arbitrary<G: Gen>() -> Self {
          PgDate(i32::arbitrary())
    }
}

When I actually output what it's running with, the values are only ranging from 100 to -100, instead of 100 completely random numbers as I'd expect. Additionally, shrink appears to be flawed on these types. My understanding was that quickcheck would test for known problematic values for a given type. For i32 I'd expect that to at minimum be 1, 0, -1, i32::MAX and i32::MIN, but when I add the shrink like so:

fn shrink(&self) -> Box<Iterator<Item=Self>> {
    Box::new(self.0.shrink().map(PgDate))

I just see the same range of -100 to 100. This caused a critical bug that occurs for any values less than -2451545 to go unnoticed.

The text was updated successfully, but these errors were encountered:

BurntSushi · 2016-01-27T21:58:59Z

@sgrif I think the original idea here is that the size method on quickcheck::Gen would control the magnitude of the integers generated, but perhaps that is a misuse. With #116 merged, I think there is now good precedent for biasing towards known problematic values.

vi · 2016-01-31T18:19:22Z

It is somewhat reversed #99: numbers being biased too much towards -100...100.

I'm going to prepare a reformed number generation analogous to char (but simpler) which should cover entire range, yet preferring some particular numbers.

bluss · 2016-03-15T17:25:11Z

There needs to be a new api for requesting a quickcheck-controlled size value. Something that's appropriate for selecting the size of a datastructure to generate. I think I mostly use usize's Arbitrary impl for that at the moment.

mulkieran · 2017-01-31T14:11:22Z

Experienced a problem with some tests: stratis-storage/stratisd@0f7838d, all u64 values were less than minimum required, so all testt were discarded.

bluss · 2017-01-31T19:57:56Z

With current behaviour that is expected @mulkieran and you simply need a wrapper type that defines the distribution of values that you need.

mulkieran · 2017-01-31T20:29:33Z

@bluss Can you give me an example?

mulkieran · 2017-02-02T17:18:55Z

if anyone who understand this problem can give me some help, I'ld appreciate it.

bluss · 2017-02-02T19:01:29Z

Something like this to make a wrapper type that puts a value inside using the regular Rand impl (not Arbitrary)

extern crate quickcheck;
use quickcheck::Arbitrary;
use quickcheck::Gen;

extern crate rand;
use rand::{Rand, Rng};

#[derive(Copy, Clone, Debug)]
pub struct Random<T>(pub T);

impl<T> Arbitrary for Random<T>
    where T: Rand + Clone + Send + 'static
{
    fn arbitrary<G: Gen>(g: &mut G) -> Self {
        Random(g.gen())
    }
}

vi · 2017-02-03T13:35:59Z

Maybe it's time to make a change like #99, but for numbers?

With 1/4 probability, pick a number from [-100,100]
With 3/4-1/25 probability, pick random bit pattern interpreted as a number
With 3/100 probability, pick a power of two + [-3,3]
With 1/100 probability, pick a number from some pre-selected set of tricky numbers that have more than one bug about and not already included in "1." or "3.". Some typical constants for CRC or prime numbers may appear there.

That could make "sane default" for multiple varied uses. Obviously one could override the distribution to task-specific.

mulkieran · 2017-02-03T13:41:06Z

I would always ask myself what Hypothesis (https://github.com/HypothesisWorks/hypothesis-python) does. But it might not be such a useful question for this problem in this situation.

quark-zju · 2018-06-27T00:43:16Z

The Haskell implementation seems to first decide maximum bits the generated number has. Then generate a number within the range (-2 ^ max_bits, 2 ^ max_bits). Doing something similar is possible but doing it without regressing performance would be tricky.

Current workaround would be setting environment variable QUICKCHECK_GENERATOR_SIZE=18446744073709551615 on 64-bit platform. Not sure how feasible it is to change the default gen.size() from 100 to usize::MAX. It seems it does not hurt performance at least.

maxbla · 2019-08-12T07:18:26Z

@BurntSushi I'd like to hear your thoughts about following propositions addressing this issue.

Ensure that every valid value* for each integer type has some positive probability of being returned by Arbitrary
Completely ignore the size of Gen (for integer types)
Have a greater chance to pick problematic values
- The most problematic values for any type are probably 0, MAX_VALUE and MIN_VALUE, but I could see -1 and 1 being included in this list along with all numbers at "bit transitions" i.e. 3,4,7,8,15,16
Use a uniform distribution (besides problematic values)
- Another candidate distribution is the Geometric/Exponential distribution

*valid values for an integer type T means every value in T::min_value()..=T::max_value()

BurntSushi · 2019-08-12T10:50:38Z

@maxbla Thanks for writing that out! I think that all seems pretty reasonable to me.

romanb · 2019-08-12T10:58:43Z

@maxbla Have you seen #221 ? I closed it after ~6 months because there was apparently no interest in merging it, but maybe there is still something useful in there for you, if you want to revive the topic with a new PR, since it implements something very similar to what you propose (the size parameter is not ignored but acts as a weighted bias towards a preferred range).

BurntSushi · 2019-08-12T11:22:05Z

I closed it after ~6 months because there was apparently no interest in merging it

There is interest. But I maintain dozens of projects, so it can take me a very long time to merge PRs and/or it's very easy for me to lose track of PRs. Editing comments to include status updates probably doesn't help, as was done in #221. I don't get emails when edits are made.

romanb · 2019-08-12T13:18:36Z

There is interest. But I maintain dozens of projects, so it can take me a very long time to merge PRs and/or it's very easy for me to lose track of PRs. Editing comments to include status updates probably doesn't help, as was done in #221. I don't get emails when edits are made.

No worries and I did not mean to criticize, of course. It's just a matter of my personal PR hygiene that I close unmerged PRs on my own after a while when I've also moved on to other things and no longer intend to keep them updated / mergeable.

maxbla · 2019-08-19T09:56:02Z

@romanb yep, I saw #221, and thanks, it was helpful in creating #240.

This commit tweaks the Arbitrary impls of number types (integers, floats) to use the full range with a small bias toward "problem" values. This is a change from prior behavior that would use the `size` parameter to control the range of integers. In retrospect, using the `size` parameter this way was probably misguided. Instead, it should only be used to control the sizes of data structures instead of also constraining numeric ranges. By constraining numeric ranges, we leave out a huge space of values that are never tested. Fixes #27, Fixes #119, Fixes #190, Fixes #233, Closes #240

BurntSushi mentioned this issue Jan 31, 2016

Bias generated chars (#99) #116

Merged

This was referenced Aug 13, 2016

Various changes rust-lang/compiler-builtins#30

Merged

Quickcheck and test coverage rust-lang/compiler-builtins#31

Closed

DurhamG mentioned this issue Jun 30, 2018

Selecting values for usize #190

Closed

romanb mentioned this issue Nov 30, 2018

Expand the ranges of arbitrary integers. #221

Closed

maxbla mentioned this issue Aug 19, 2019

make arbitrary generate full range of integers #240

Closed

5 tasks

BurntSushi closed this as completed in 71d743a Dec 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The random selection doesn't appear to be very random #119

The random selection doesn't appear to be very random #119

sgrif commented Jan 15, 2016

BurntSushi commented Jan 27, 2016

vi commented Jan 31, 2016

bluss commented Mar 15, 2016

mulkieran commented Jan 31, 2017

bluss commented Jan 31, 2017

mulkieran commented Jan 31, 2017

mulkieran commented Feb 2, 2017

bluss commented Feb 2, 2017 •

edited

Loading

vi commented Feb 3, 2017 •

edited

Loading

mulkieran commented Feb 3, 2017

quark-zju commented Jun 27, 2018

maxbla commented Aug 12, 2019

BurntSushi commented Aug 12, 2019

romanb commented Aug 12, 2019

BurntSushi commented Aug 12, 2019

romanb commented Aug 12, 2019

maxbla commented Aug 19, 2019

The random selection doesn't appear to be very random #119

The random selection doesn't appear to be very random #119

Comments

sgrif commented Jan 15, 2016

BurntSushi commented Jan 27, 2016

vi commented Jan 31, 2016

bluss commented Mar 15, 2016

mulkieran commented Jan 31, 2017

bluss commented Jan 31, 2017

mulkieran commented Jan 31, 2017

mulkieran commented Feb 2, 2017

bluss commented Feb 2, 2017 • edited Loading

vi commented Feb 3, 2017 • edited Loading

mulkieran commented Feb 3, 2017

quark-zju commented Jun 27, 2018

maxbla commented Aug 12, 2019

BurntSushi commented Aug 12, 2019

romanb commented Aug 12, 2019

BurntSushi commented Aug 12, 2019

romanb commented Aug 12, 2019

maxbla commented Aug 19, 2019

bluss commented Feb 2, 2017 •

edited

Loading

vi commented Feb 3, 2017 •

edited

Loading