Skip to content

Conversation

@vertexclique
Copy link
Contributor

@vertexclique vertexclique commented Nov 11, 2020

This PR fixes all unreproducible benchmarks and tests.

  1. how is the current code broken?
    Current, code is reseeding after 32 MB of data, and every thread has different randomness. So no data is same as another data and totally different in different use cases.
  2. how is this pr solving it?
    By introducing a code that have fixed seed for all threads at the all times. and all threads are at the same random number at the given time of a single benchmark.
  3. why do we need to take another dependency (i.e. why doesn't random work)?
    random usees thread_rng because of the reason mentioned in 1 it is not working and thread_rng also xors with the seed given from the threaded itself. So it is mostly random at any given time, but not consistent across concurrent behavior like benchmarks and things like that. Dependency is bastion runtime's utility crate. So runtime is using it already.

Seeing that we are not using rand there a lot, we can actually remove the rand with this PR too.

@github-actions
Copy link

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vertexclique thanks a lot for the PR.

Could you please explain in the PR description:

  1. how is the current code broken?
  2. how is this PR solving it
  3. why do we need to take another dependency (i.e. why doesn't random work)?

It is really difficult for me to even understand what is the issue in the first place.

@vertexclique
Copy link
Contributor Author

Answered

@alamb
Copy link
Contributor

alamb commented Nov 11, 2020

I am checking / testing this PR out locally.

@alamb
Copy link
Contributor

alamb commented Nov 11, 2020

In my measurements, the sum compute kernels do appear to have significant variability from run to run on my machine (details below). The variability still exists with this PR though it appears to be less.

I think one issue, as @vertexclique has expalined, is that that different random numbers are being used between runs (and between threasd) and thus the actual computations performed from run to run are changing.

Seeding the random number generator so it always produces the same sequence of random numbers is definitely a classic way to reduce such variability and seems like a good idea to me.

Seeding the random number generators can be done using the existing rand crate, in the following way (this took me longer to figure out from the rand crate's documentation than I would like to admit):

Instead of

let mut rng = rand::thread_rng();

Use

use rand::{
    Rng, SeedableRng,
    rngs::StdRng
};
let mut rng = StdRng::seed_from_u64(42);

Here are some measurements I ran on my machine:

Master @ f7027b4

sum 512                 time:   [452.64 ns 456.16 ns 459.94 ns]
sum 512                 time:   [459.08 ns 462.78 ns 466.55 ns]
sum 512                 time:   [457.80 ns 461.87 ns 466.06 ns]
sum nulls 512           time:   [246.69 ns 248.65 ns 250.87 ns]
sum nulls 512           time:   [269.41 ns 271.79 ns 274.21 ns]
sum nulls 512           time:   [247.98 ns 250.42 ns 252.92 ns]

ARROW-10551-fix-unreproducible-benches

sum 512                 time:   [476.98 ns 482.19 ns 488.17 ns]
sum 512                 time:   [470.75 ns 474.96 ns 479.42 ns]
sum 512                 time:   [506.47 ns 508.00 ns 509.59 ns]
sum nulls 512           time:   [268.39 ns 270.32 ns 272.56 ns]
sum nulls 512           time:   [272.20 ns 274.46 ns 276.81 ns]
sum nulls 512           time:   [266.60 ns 269.12 ns 272.28 ns]

Master @ f7027b4 w/ StdRng::seed_from_u64:

sum 512                 time:   [463.24 ns 467.51 ns 472.28 ns]
sum 512                 time:   [457.42 ns 460.00 ns 462.66 ns]
sum 512                 time:   [466.96 ns 471.60 ns 476.52 ns]
sum nulls 512           time:   [236.96 ns 238.34 ns 239.76 ns]
sum nulls 512           time:   [241.24 ns 243.24 ns 245.50 ns]
sum nulls 512           time:   [246.68 ns 248.60 ns 250.61 ns]

@jorgecarleitao
Copy link
Member

jorgecarleitao commented Nov 11, 2020

Current, code is reseeding after 32 MB of data, and every thread has different randomness. So no data is same as another data and totally different in different use cases.

What I am trying to understand is why this is a concern: in the benchmarks, the data is generated only once, during the setup and not on every iteration of the benchmark. Thus, any randomness will be frozen for every iteration of the benchmark.

If the argument is that two runs of the same benchmark do not agree due to randomness, then I do not understand how having the same seed per thread or different seeds per thread changes this behavior: the seed will still be set when the process starts and will be different per run.

EDIT: @alamb beat me to seconds (and evidence)

@alamb
Copy link
Contributor

alamb commented Nov 11, 2020

So my personal suggestion is change all the benches to use seed_from_u64 or equivalent. I don't think there is any need for a new additional dependency -- https://crates.io/crates/rand seems much more popular than https://crates.io/crates/bastion-utils.

@vertexclique
Copy link
Contributor Author

Can you use https://github.com/vertexclique/zor/blob/master/zor if you are on Linux?

@vertexclique
Copy link
Contributor Author

vertexclique commented Nov 11, 2020

If the argument is that two runs of the same benchmark do not agree due to randomness, then I do not understand how having the same seed per thread or different seeds per thread changes this behavior: the seed will still be set when the process starts and will be different per run.

I don't think that it should be different per run per benchmark, that's why it is variating.
https://github.com/bheisler/criterion.rs/blob/4499ee5e32f59c0c15eb046dd7a2dfd0c15a96f7/src/stats/univariate/sample.rs#L210

Seeding the random number generators can be done using the existing rand crate, in the following way (this took me longer to figure out from the rand crate's documentation than I would like to admit):

Yeah, I didn't want to write a const compiled seed and people forget to use the same seed every time. My implementation does some extra faster stuff compared to rand too. Here:
https://docs.rs/bastion-utils/0.3.2/src/bastion_utils/math.rs.html#4-28

But I am ok with that too if we won't forget it in benches 😄

@alamb
Copy link
Contributor

alamb commented Nov 11, 2020

Can you use https://github.com/vertexclique/zor/blob/master/zor if you are on Linux?

I am running on mac osx -- I looked at zor and it looked like a good tool to try and make reproducible results by clearing kernel caches, etc. I wonder if you can try using the seeded random approach with zor to see if that is as reproducible.

Yeah, I didn't want to write a const compiled seed and people forget to use the same seed every time.
That is a good point -- maybe we can make a utility function or something for the tests to use. I don't think the actual value of the seed (e.g. 42) is is important to keep consistent in all the .rs sources - just that it is a constant number between invocations of cargo bench ... across runs

My implementation does some extra faster stuff compared to rand too. Here:
https://docs.rs/bastion-utils/0.3.2/src/bastion_utils/math.rs.html#4-28

That looks cool, but I am not enough of a numerical algorithms expert to evaluate the random ness properties of that algorithm

@vertexclique
Copy link
Contributor Author

I wonder if you can try using the seeded random approach with zor to see if that is as reproducible.

will do.

That looks cool, but I am not enough of a numerical algorithms expert to evaluate the random ness properties of that algorithm

dw, I will check if that's like how you reported, I will change to constant seed with seededrng.

@vertexclique
Copy link
Contributor Author

@alamb I am unsure about the seedable_rng behavior right now. But let's go with this one. You can check the results here:
https://gist.github.com/vertexclique/44158100c6c019d3d6f02ef94606d91a

@vertexclique vertexclique force-pushed the ARROW-10551-fix-unreproducible-benches branch 2 times, most recently from 6b72627 to 494bd8d Compare November 11, 2020 23:20
@vertexclique vertexclique force-pushed the ARROW-10551-fix-unreproducible-benches branch from 494bd8d to 12ae5dc Compare November 11, 2020 23:21
@vertexclique
Copy link
Contributor Author

Anything blocking in this one with the current status quo?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@alamb alamb changed the title ARROW-10551: [Rust] Fix unreproducible benches ARROW-10551: [Rust] Fix unreproducible benches by seeding random number generator Nov 12, 2020
@alamb alamb closed this in 7a770d7 Nov 12, 2020
@alamb
Copy link
Contributor

alamb commented Nov 12, 2020

Thanks again for this @vertexclique -- it is much appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants