Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve shuffle function documentation #903

Closed
nliberg opened this issue Oct 28, 2019 · 6 comments
Closed

Improve shuffle function documentation #903

nliberg opened this issue Oct 28, 2019 · 6 comments

Comments

@nliberg
Copy link

nliberg commented Oct 28, 2019

In order for all shufflings to be possible "the number of states of the PRNG should exceed the number of permutations by at least several orders of magnitude" (Wikipedia source). I.e. for a slice consisting of n elements the PRNG state bits should ideally exceed log2(n!). Perhaps it could be helpful to people to mention this in the documentation of the shuffle functions.

@vks
Copy link
Collaborator

vks commented Oct 28, 2019

for a slice consisting of n elements the PRNG state bits should ideally exceed log2(n!).

I'm not sure how relevant it is to make all permutations possible. In practice, it would be very hard to show that some permutations are not possible, it would basically require iterating all possible states of the RNG, which is not really feasible for 128 bits of state or more.

@nliberg
Copy link
Author

nliberg commented Oct 28, 2019

I'm not sure how relevant it is to make all permutations possible. In practice, it would be very hard to show that some permutations are not possible, it would basically require iterating all possible states of the RNG, which is not really feasible for 128 bits of state or more.

I was thinking more of RNGs with few enough bits to make the states possible to enumerate.
Feel free to close this if this seems like a non-issue in practice.

@vks
Copy link
Collaborator

vks commented Oct 28, 2019

I was thinking more of RNGs with few enough bits to make the states possible to enumerate.

The default RNG within Rand has 1088 bits of state. Even SmallRng (which is behind a feature flag) has at least 64 bits of state. For an RNG with 32 bits of state, enumerating all states is easy, but I don't think this is a concern for the RNGs provided by Rand. RNGs with only 32 bits of state should not be used unless you know what you are doing, in which case I'm not sure how useful a warning would be.

(When massive parallelization over many CPUs is involved, a state size of at least 256 bits has a more comfortable margin than 64 bits.)

@dhardy
Copy link
Member

dhardy commented Oct 29, 2019

I was thinking more of RNGs with few enough bits to make the states possible to enumerate.

This is the real concern: if your RNG has a short enough cycle that you can loop through all values, then it is possible to observe non-random behaviour (i.e. correlations between one part of a sequence and another). Note that n bits of state guarantees no more than 2^n steps in that cycle, the reverse is not true (some RNGs are chaotic and have sequences with very short cycles; some other RNGs iterate through all or nearly all possible state permutations).

So how long a cycle is enough? Attempting to generate all 52! permutations of your stack of cards will take a modern computer a long time; too long to be feasible. Modern CPUs run at a few GHz, say approx 3e7 × 3e9 ~= 1e17 clock cycles per year. 52! ~= 8e67, so it would take over 10^50 years at one permutation per clock cycle. On the other hand, an RNG with 64 bits of state has approx 2e19 states; still too long to exhaust, but short enough that if you pick many seed for many short pieces of output, there's a chance of overlap more on that here. Which brings us to...

Even SmallRng (which is behind a feature flag) has at least 64 bits of state.

Not actually true if you check the rand_pcg code; both implementations use 128 bits of state. Check here for our requirements for small RNGs.

As for that wikipedia article, it makes the assumption that the RNG state must be large enough to describe all permutations to avoid bias. This is simply wrong; generating one permutation requires enough unbiased random data to describe one permutation (i.e. less than n^2 numbers), not all of them.

@dhardy dhardy closed this as completed Oct 29, 2019
@vks
Copy link
Collaborator

vks commented Oct 29, 2019

Not actually true if you check the rand_pcg code; both implementations use 128 bits of state. Check here for our requirements for small RNGs.

I was looking at the Rand book, where it is claimed that SmallRng has a period of at least 2^64. Is that inaccurate?

@dhardy
Copy link
Member

dhardy commented Oct 29, 2019

That is accurate. Nevertheless, the state is 128 bits (more accurately, 127 bits since 1 is discarded). This influences the sequence, reducing the chance of two randomly seeded instances producing the same sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants