ARROW-10551: [Rust] Fix unreproducible benches by seeding random number generator #8635

vertexclique · 2020-11-11T03:55:50Z

This PR fixes all unreproducible benchmarks and tests.

how is the current code broken?
Current, code is reseeding after 32 MB of data, and every thread has different randomness. So no data is same as another data and totally different in different use cases.
how is this pr solving it?
By introducing a code that have fixed seed for all threads at the all times. and all threads are at the same random number at the given time of a single benchmark.
why do we need to take another dependency (i.e. why doesn't random work)?
random usees thread_rng because of the reason mentioned in 1 it is not working and thread_rng also xors with the seed given from the threaded itself. So it is mostly random at any given time, but not consistent across concurrent behavior like benchmarks and things like that. Dependency is bastion runtime's utility crate. So runtime is using it already.

Seeing that we are not using rand there a lot, we can actually remove the rand with this PR too.

github-actions · 2020-11-11T04:07:17Z

https://issues.apache.org/jira/browse/ARROW-10551

jorgecarleitao

@vertexclique thanks a lot for the PR.

Could you please explain in the PR description:

how is the current code broken?
how is this PR solving it
why do we need to take another dependency (i.e. why doesn't random work)?

It is really difficult for me to even understand what is the issue in the first place.

vertexclique · 2020-11-11T10:44:54Z

Answered

alamb · 2020-11-11T12:31:00Z

I am checking / testing this PR out locally.

alamb · 2020-11-11T13:11:08Z

In my measurements, the sum compute kernels do appear to have significant variability from run to run on my machine (details below). The variability still exists with this PR though it appears to be less.

I think one issue, as @vertexclique has expalined, is that that different random numbers are being used between runs (and between threasd) and thus the actual computations performed from run to run are changing.

Seeding the random number generator so it always produces the same sequence of random numbers is definitely a classic way to reduce such variability and seems like a good idea to me.

Seeding the random number generators can be done using the existing rand crate, in the following way (this took me longer to figure out from the rand crate's documentation than I would like to admit):

Instead of

let mut rng = rand::thread_rng();

Use

use rand::{
    Rng, SeedableRng,
    rngs::StdRng
};
let mut rng = StdRng::seed_from_u64(42);

Here are some measurements I ran on my machine:

Master @ f7027b4

sum 512                 time:   [452.64 ns 456.16 ns 459.94 ns]
sum 512                 time:   [459.08 ns 462.78 ns 466.55 ns]
sum 512                 time:   [457.80 ns 461.87 ns 466.06 ns]
sum nulls 512           time:   [246.69 ns 248.65 ns 250.87 ns]
sum nulls 512           time:   [269.41 ns 271.79 ns 274.21 ns]
sum nulls 512           time:   [247.98 ns 250.42 ns 252.92 ns]

ARROW-10551-fix-unreproducible-benches

sum 512                 time:   [476.98 ns 482.19 ns 488.17 ns]
sum 512                 time:   [470.75 ns 474.96 ns 479.42 ns]
sum 512                 time:   [506.47 ns 508.00 ns 509.59 ns]
sum nulls 512           time:   [268.39 ns 270.32 ns 272.56 ns]
sum nulls 512           time:   [272.20 ns 274.46 ns 276.81 ns]
sum nulls 512           time:   [266.60 ns 269.12 ns 272.28 ns]

Master @ f7027b4 w/ StdRng::seed_from_u64:

sum 512                 time:   [463.24 ns 467.51 ns 472.28 ns]
sum 512                 time:   [457.42 ns 460.00 ns 462.66 ns]
sum 512                 time:   [466.96 ns 471.60 ns 476.52 ns]
sum nulls 512           time:   [236.96 ns 238.34 ns 239.76 ns]
sum nulls 512           time:   [241.24 ns 243.24 ns 245.50 ns]
sum nulls 512           time:   [246.68 ns 248.60 ns 250.61 ns]

jorgecarleitao · 2020-11-11T13:11:44Z

Current, code is reseeding after 32 MB of data, and every thread has different randomness. So no data is same as another data and totally different in different use cases.

What I am trying to understand is why this is a concern: in the benchmarks, the data is generated only once, during the setup and not on every iteration of the benchmark. Thus, any randomness will be frozen for every iteration of the benchmark.

If the argument is that two runs of the same benchmark do not agree due to randomness, then I do not understand how having the same seed per thread or different seeds per thread changes this behavior: the seed will still be set when the process starts and will be different per run.

EDIT: @alamb beat me to seconds (and evidence)

alamb · 2020-11-11T13:14:57Z

So my personal suggestion is change all the benches to use seed_from_u64 or equivalent. I don't think there is any need for a new additional dependency -- https://crates.io/crates/rand seems much more popular than https://crates.io/crates/bastion-utils.

vertexclique · 2020-11-11T13:20:22Z

Can you use https://github.com/vertexclique/zor/blob/master/zor if you are on Linux?

vertexclique · 2020-11-11T13:27:21Z

If the argument is that two runs of the same benchmark do not agree due to randomness, then I do not understand how having the same seed per thread or different seeds per thread changes this behavior: the seed will still be set when the process starts and will be different per run.

I don't think that it should be different per run per benchmark, that's why it is variating.
https://github.com/bheisler/criterion.rs/blob/4499ee5e32f59c0c15eb046dd7a2dfd0c15a96f7/src/stats/univariate/sample.rs#L210

Seeding the random number generators can be done using the existing rand crate, in the following way (this took me longer to figure out from the rand crate's documentation than I would like to admit):

Yeah, I didn't want to write a const compiled seed and people forget to use the same seed every time. My implementation does some extra faster stuff compared to rand too. Here:
https://docs.rs/bastion-utils/0.3.2/src/bastion_utils/math.rs.html#4-28

But I am ok with that too if we won't forget it in benches 😄

alamb · 2020-11-11T13:59:29Z

Can you use https://github.com/vertexclique/zor/blob/master/zor if you are on Linux?

I am running on mac osx -- I looked at zor and it looked like a good tool to try and make reproducible results by clearing kernel caches, etc. I wonder if you can try using the seeded random approach with zor to see if that is as reproducible.

Yeah, I didn't want to write a const compiled seed and people forget to use the same seed every time.
That is a good point -- maybe we can make a utility function or something for the tests to use. I don't think the actual value of the seed (e.g. 42) is is important to keep consistent in all the .rs sources - just that it is a constant number between invocations of cargo bench ... across runs

My implementation does some extra faster stuff compared to rand too. Here:
https://docs.rs/bastion-utils/0.3.2/src/bastion_utils/math.rs.html#4-28

That looks cool, but I am not enough of a numerical algorithms expert to evaluate the random ness properties of that algorithm

vertexclique · 2020-11-11T14:00:51Z

I wonder if you can try using the seeded random approach with zor to see if that is as reproducible.

will do.

That looks cool, but I am not enough of a numerical algorithms expert to evaluate the random ness properties of that algorithm

dw, I will check if that's like how you reported, I will change to constant seed with seededrng.

vertexclique · 2020-11-11T23:16:31Z

@alamb I am unsure about the seedable_rng behavior right now. But let's go with this one. You can check the results here:
https://gist.github.com/vertexclique/44158100c6c019d3d6f02ef94606d91a

vertexclique · 2020-11-12T10:56:48Z

Anything blocking in this one with the current status quo?

alamb

Looks good to me!

alamb · 2020-11-12T12:14:56Z

Thanks again for this @vertexclique -- it is much appreciated

github-actions bot added the Component: Rust label Nov 11, 2020

This was referenced Nov 11, 2020

ARROW-10500: [Rust] Refactor bit slice, bit view iterator for array buffers #8598

Closed

ARROW-10540 [Rust] Improve filtering #8630

Closed

jorgecarleitao reviewed Nov 11, 2020

View reviewed changes

vertexclique force-pushed the ARROW-10551-fix-unreproducible-benches branch 2 times, most recently from 6b72627 to 494bd8d Compare November 11, 2020 23:20

ARROW-10551 - Fix unreproducible benches

12ae5dc

vertexclique force-pushed the ARROW-10551-fix-unreproducible-benches branch from 494bd8d to 12ae5dc Compare November 11, 2020 23:21

alamb approved these changes Nov 12, 2020

View reviewed changes

alamb changed the title ~~ARROW-10551: [Rust] Fix unreproducible benches~~ ARROW-10551: [Rust] Fix unreproducible benches by seeding random number generator Nov 12, 2020

alamb closed this in 7a770d7 Nov 12, 2020

asfimport mentioned this pull request Nov 12, 2020

[Rust]: Fix unreproducible benchmarks #26517

Closed

ARROW-10551: [Rust] Fix unreproducible benches by seeding random number generator #8635

ARROW-10551: [Rust] Fix unreproducible benches by seeding random number generator #8635

Uh oh!

Conversation

vertexclique commented Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 11, 2020

Uh oh!

jorgecarleitao left a comment

Choose a reason for hiding this comment

Uh oh!

vertexclique commented Nov 11, 2020

Uh oh!

alamb commented Nov 11, 2020

Uh oh!

alamb commented Nov 11, 2020

Uh oh!

jorgecarleitao commented Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Nov 11, 2020

Uh oh!

vertexclique commented Nov 11, 2020

Uh oh!

vertexclique commented Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Nov 11, 2020

Uh oh!

vertexclique commented Nov 11, 2020

Uh oh!

vertexclique commented Nov 11, 2020

Uh oh!

vertexclique commented Nov 12, 2020

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Nov 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vertexclique commented Nov 11, 2020 •

edited

Loading

jorgecarleitao commented Nov 11, 2020 •

edited

Loading

vertexclique commented Nov 11, 2020 •

edited

Loading