Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rand::rngs::SmallRng returns the same output for different seeds for at least ARM hosts. #1032

Closed
cr1901 opened this issue Aug 29, 2020 · 5 comments

Comments

@cr1901
Copy link

cr1901 commented Aug 29, 2020

Problem: As part of testing rust cross-compilation on a variety of hosts, I've found that rand::rngs::SmallRng returns the same output for different seeds on a armv7-unknown-linux-gnueabihf host. I'm not sure if this applies to other ARM hosts, even of the same architecture version (I'm using an ASUS TinkerBoard in this example). Nor am I sure this applies to other architectures; x86_64 Windows and Linux seem fine.

Quick solution: Compile on hosts besides armv7-unknown-linux-gnueabihf :).

Details: Consider the following program, with the small_rng feature enabled. I've created a small repo for convenience:

use std::{
    sync::atomic::{AtomicUsize, Ordering},
    time::{SystemTime, UNIX_EPOCH},
};

use rand::{Rng, SeedableRng};

fn random_ident() -> (u64, u64, [u8; 16], String) {
    static CALL_COUNT: AtomicUsize = AtomicUsize::new(0);

    let secs = SystemTime::now()
        .duration_since(UNIX_EPOCH)
        .unwrap()
        .as_secs();

    let count: u64 = CALL_COUNT.fetch_add(1, Ordering::SeqCst) as u64;
    let mut seed: [u8; 16] = [0; 16];

    for (i, v) in seed.iter_mut().take(8).enumerate() {
        *v = ((secs >> (i * 8)) & 0xFF) as u8
    }

    for (i, v) in seed.iter_mut().skip(8).enumerate() {
        *v = ((count >> (i * 8)) & 0xFF) as u8
    }

    let mut rng = rand::rngs::SmallRng::from_seed(seed);
    (secs, count, seed, (0..16)
        .map(|i| {
            if i == 0 || rng.gen() {
                ('a' as u8 + rng.gen::<u8>() % 25) as char
            } else {
                ('0' as u8 + rng.gen::<u8>() % 10) as char
            }
        })
        .collect::<String>())
}

fn main() {
    println!("{:?}", random_ident());
    println!("{:?}", random_ident());
    println!("{:?}", random_ident());
}

On x86_64-unknown-linux-gnu (and Windows for that matter), this small program returns two different random identifiers for different seeds, as expected:

william@xubuntu-dtrain:~/Projects/debug/rand-bug$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/rand-bug`
(1598719239, 0, [7, 133, 74, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "puetn2yx9hil9580") 
(1598719239, 1, [7, 133, 74, 95, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], "wdhp4fvs75a53km8") 
(1598719239, 2, [7, 133, 74, 95, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0], "l1uu7f619wu97iu6")
william@xubuntu-dtrain:~/Projects/debug/rand-bug$ rustc -Vv
rustc 1.47.0-nightly (09f4c9f50 2020-08-07)
binary: rustc
commit-hash: 09f4c9f5082f78b0adcee83d3ab4337e000cd28e
commit-date: 2020-08-07
host: x86_64-unknown-linux-gnu
release: 1.47.0-nightly
LLVM version: 10.0

On the other hand, on an armv7-unknown-linux-gnueabihf host, the same random identifier is emitted twice for different seeds, before emitting a new identifier when a third seed is used:

wjones@DietPi:~/src/debug/rand-bug$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/rand-bug`
(1598719307, 0, [75, 133, 74, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "v0go31yx18533ls1")
(1598719307, 1, [75, 133, 74, 95, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], "v0go31yx18533ls1")
(1598719307, 2, [75, 133, 74, 95, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0], "jmu731eti715bgc2")
wjones@DietPi:~/src/debug/rand-bug$ rustc -Vv
rustc 1.46.0-nightly (feb3536eb 2020-06-09)
binary: rustc
commit-hash: feb3536eba10c2e4585d066629598f03d5ddc7c6
commit-date: 2020-06-09
host: armv7-unknown-linux-gnueabihf
release: 1.46.0-nightly
LLVM version: 10.0
wjones@DietPi:~/src/debug/rand-bug$

While my use case will eventually be replaced with a more robust solution, this currently prevents me from compiling my safe Rust embedded code on ARM hosts; safe interrupts rely on being unable to call interrupt functions in application code, and random identifiers is one way to do this.

@dhardy
Copy link
Member

dhardy commented Aug 29, 2020

If you want a simple, robust solution, I'd suggest using ChaCha8 which is a weak cryptographic RNG.

What we're seeing here is that two very similar seeds result in very similar streams: #907. We're likely to replace SmallRng soon (or maybe just remove it).

But why are you using such a poor seed? Many simple RNGs struggle with seeds containing many zeros. This won't be an issue with ChaCha and in general with cryptographic RNGs, but is with many non-crypto RNGs. If you can't use getrandom / from_entropy or rand_jitter I'd suggest using some type of hash function on the input: SeedableRng::seed_from_u64 is a very simple hash function; rand_seeder is a better one.

@cr1901
Copy link
Author

cr1901 commented Aug 29, 2020

But why are you using such a poor seed?

I ported the code from another library; I wouldn't know a good seed from a poor one, nor how to use any of the Rust RNGs, really. I'll look into your suggestions.

@dhardy
Copy link
Member

dhardy commented Aug 30, 2020

There's a thing called the Hamming weight: the number of bits which are one. When this count is close to zero or close to the total number of bits, many basic PRNGs perform poorly. (Of course there may be more reasons.)

I might as well close this then. We already have an issue about replacing the algorithm.

@dhardy dhardy closed this as completed Aug 30, 2020
@cr1901
Copy link
Author

cr1901 commented Aug 30, 2020

@dhardy

I might as well close this then. We already have an issue about replacing the algorithm.

This also isn't a bug anyway. I didn't read the docs:

The current algorithm is Pcg64Mcg on 64-bit platforms and Pcg32 on 32-bit platforms.

Going to the Pcg64Mcg page and opening the SeedableRng trait docs:

We use a single 126-bit seed to initialise the state and select a stream. Two seed bits (lowest order of last byte) are ignored.

Fair enough. Is the "last byte" seed[0] or seed[15] (just curious- endianness)?

Contrast to the Pcg32 docs for the same:

We use a single 127-bit seed to initialise the state and select a stream. One seed bit (lowest bit of seed[8]) is ignored.

On 32-bit systems, that one particular bit in seed[8] is the least-significant bit of count :). So I think a stop-gap is to shift count by 1 to the left.

@dhardy
Copy link
Member

dhardy commented Aug 30, 2020

You can check here and here.

IIRC there is actually a discrepancy between the C and C++ implementations of PCG with regards to these 1/2 last bit(s), so we just picked one. Most tests don't catch the discrepancy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants