reset GLOBAL_RNG state in/out at-testset for reproducibility #24445

rfourquet · 2017-11-02T13:47:11Z

This is a follow-up on the recently introduced guardsrand functionality,
first suggested at
#16940 (comment).
Each testset block is now implicitly wrapped in a guardsrand block, which
is more user-friendly and more systematic (only few users know about
guardsrand).

These are essentially two new features:

"in": in base, tests are run with the global RNG randomly seeded,
but the seed is printed in case of failure to allow reproducing the
failure; but even if the failure occurs after many tests, when they
alter the global RNG state, one has to re-run the whole file, which
can be time-consuming; with this change, at the beginning of each
testset, the global RNG is re-seeded with its own seed: this
allows to re-run only the failing testset, which can be done
easily in the REPL (the seeding occurs for each loop in a testset for;
this also allows to re-arrange testsets in arbitrary order
w.r.t. the global RNG;
"out": a testset leaves no tracks of its use of srand or rand
(this "feature" should be less and less needed/useful with the
generalization of the use of testset blocks).

Example:

@testset begin
   srand(123)
   rand()
   @testset for T in (Int, Float64)
       # here is an implicit srand(123), by 1)
       rand()
   end
   rand() # this value will not be affected if the sub-`testset` block
          # above is removed, or if another rand() call is added therein, by 2)
end

By wrapping @testset bodies in a guardsrand do block, the re-use of a local variable as a for-loop variable started to generate a warning (in "test/sparse/sparse.jl"). It's easy to silence the warning, but I don't know why there was no warning before, and I wonder if this wrapping can have other unwanted consequences. If needed, it would be easy to copy/paste code to avoid the wrapping. EDIT: guardsrand can't be used directly, as then the testset's body is wrapped in a function, which causes problem with overwriting local variable in for loop (outer), using using, defining new methods, etc.

Apologies for the big diffs, github is not always good at spotting indentation-only changes.

fredrikekre · 2017-11-02T13:54:21Z

Apologies for the big diffs, github is not always good at spotting indentation-only changes.

Append ?w=1 to the url: https://github.com/JuliaLang/julia/pull/24445/files?w=1

rfourquet · 2017-11-02T17:57:43Z

I still have a failing test, at "test/linalg/arnoldi.jl:103". I had to set a fresh random seed, which works well on a 64 bits machine, but fails on 32 bits ones. Would someone with easy access to both architectures help me? Run this snippet with changin the srand seed (123 here) until it passes on both (I could use CI, but this is a waste of ressources which I'm relunctant to commit).

@testset "Symmetric generalized with singular B" begin
            srand(123)
            n = 10
            k = 3
            A = randn(n,n); A = A'A
            B = randn(n,k); B = B*B'
            @test sort(eigs(A, B, nev = k, sigma = 1.0)[1]) ≈ sort(eigvals(A, B)[1:k])
        end

yuyichao · 2017-11-02T18:08:18Z

AFAICT 124 works on x86, x64, armv7-a and aarch64

rfourquet · 2017-11-02T18:16:18Z

AFAICT 124 works on x86, x64, armv7-a and aarch64

thanks a lot!

rfourquet · 2017-11-11T11:55:19Z

Any opinion on this one? I find it pretty neat to have this easier reproducibility at the @testset level, but maybe I'm missing a flaw in this approach?

rfourquet · 2017-11-20T12:27:32Z

Bump.

StefanKarpinski · 2017-11-20T14:01:57Z

I'm not really sure what needs to be reviewed here. This just seems to be a broader application in test files of the guardsrand thing that you previously introduced? Tests all passed. What's the risk here? If none and you find this to be a better way to write these tests, then why not?

StefanKarpinski · 2017-11-20T14:04:43Z

Oh I see. This automatically does the equivalent of guardsrand in testsets which is a significant change of behavior to the Test infrastructure. This seems like a potentially good idea, but it's definitely something that needs to be documented and probably discussed a bit more.

rfourquet · 2017-11-20T15:06:04Z

Yes my bump is mainly to generate feed-back and opinions on the idea, as the implementation is rather straightforward. I think it's a good change but I'm not confident enough to take the decision to merge without some support, I may miss some drawback. Essentially this implements what you suggested there, but a bit differently: when a base test fails, the RNG seed is printed, and with this change, to reproduce the failure you only have to do srand(SEED), and then copy-paste the failing testset in your REPL without bothering for the preceding tests (except if they setup some other global state).

rfourquet · 2017-11-25T09:14:41Z

@andreasnoack @kshyatt do you have opinions on this?

rfourquet · 2017-12-14T15:12:53Z

I put the triage label for getting a decision (it's slightly breaking), but I would really like to get this in, as it makes debugging much easier when you get a test error (I had such a use case today). Small recap for busy triage people: the test system prints the RNG seed when there is an error; with this PR, you can call srand with this seed, and then copy paste in your REPL only the failing @testset to reproduce the error, without having to re-run all previous tests in the file (which can use rand themselves and would modify the global RNG state and affect all remaining tests, without this PR).

StefanKarpinski · 2017-12-14T16:06:00Z

I defer to your judgement here, @rfourquet, go ahead and just rebase this, run CI and we'll merge when it passes.

@testset

[ci skip] [av skip] [bsd skip] This is a follow-up on the recently introduced `guardsrand` functionality, first suggested at #16940 (comment). Each `testset` block is now implicitly wrapped in a `guardsrand` block, which is more user-friendly and more systematic (only few users know about `guardsrand`). These are essentially two new features: 1) "in": in base, tests are run with the global RNG randomly seeded, but the seed is printed in case of failure to allow reproducing the failure; but even if the failure occurs after many tests, when they alter the global RNG state, one has to re-run the whole file, which can be time-consuming; with this change, at the beginning of each `testset`, the global RNG is re-seeded with its own seed: this allows to re-run only the failing `testset`, which can be done easily in the REPL (the seeding occurs for each loop in a `testset for`; this also allows to re-arrange `testset`s in arbitrary order w.r.t. the global RNG; 2) "out": a `testset` leaves no tracks of its use of `srand` or `rand` (this "feature" should be less and less needed/useful with the generalization of the use of `testset` blocks). Example: ``` @testset begin srand(123) rand() @testset for T in (Int, Float64) # here is an implicit srand(123), by 1) rand() end rand() # this value will not be affected if the sub-`testset` block # above is removed, or if another rand() call is added therein, by 2) end ``` Note that guardsrand can't be used directly, as then the testset's body is wrapped in a function, which causes problem with overwriting loop variable ("outer"), using `using`, defining new methods, etc. So we need to duplicate guardsrand's logic.

rfourquet · 2017-12-17T17:35:03Z

CI all green, but unfortunately one conflict: it's trivial to resolve (rename a couple copy! to copyto!), so I will re-push the updated branch with [ci skip] and merge right away.
Here are the links to the builds for the record:

timholy · 2017-12-20T13:10:27Z

One negative is that it breaks tests that look like this (observed in Revise.jl):

julia> using Test

julia> @testset begin
           workdir = joinpath(tempdir(), randstring(10))
           @show workdir
           mkdir(workdir)
       end

       @testset begin
           workdir = joinpath(tempdir(), randstring(10))
           @show workdir
           mkdir(workdir)
       end
workdir = "/tmp/xrOdLWavRh"
Test Summary: |
test set      | No tests
workdir = "/tmp/xrOdLWavRh"
test set: Error During Test at REPL[2]:7
  Got an exception of type SystemError outside of a @test
  SystemError (with /tmp/xrOdLWavRh): mkdir: File exists
  Stacktrace:
   [1] #systemerror#44(::String, ::Function, ::Symbol, ::Bool) at ./error.jl:106
   [2] #systemerror at ./<missing>:0 [inlined]
   [3] mkdir(::String, ::UInt16) at ./file.jl:99
   [4] mkdir(::String) at ./file.jl:94
   [5] macro expansion at ./REPL[2]:10 [inlined]
   [6] macro expansion at /home/tim/src/julia-1.0/usr/share/julia/site/v0.7/Test/src/Test.jl:978 [inlined]
   [7] top-level scope at ./REPL[2]:8
   [8] eval at ./boot.jl:291 [inlined]
   [9] eval(::Module, ::Expr) at ./repl/REPL.jl:3
   [10] eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./repl/REPL.jl:70
   [11] macro expansion at /home/tim/.julia/v0.7/Revise/src/Revise.jl:482 [inlined]
   [12] (::getfield(Revise, Symbol("##15#16")){Base.REPL.REPLBackend})() at ./event.jl:92
Test Summary: | Error  Total
test set      |     1      1
ERROR: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken.

rfourquet · 2017-12-20T16:44:28Z

I see. A sort of mental model about the way of this PR is that each time you enter a @testset, you get a reset of the global RNG set, like if you would call twice in a row a C program without calling srand in it. Would it be a problem in your case to either call srand() at the begining of the @testset, or, if you'd prefer to keep reproducibility except for the randstring call, to call randstring(RandomDevice()) ? Yet another possibility would be to share an RNG object M between the 2 @testsets and to call randstring(M, 10). Hope this helps.

StefanKarpinski · 2017-12-20T17:18:11Z

Would it make sense to make the different test sets independent but not have the same seed? I.e. generating more or less random numbers from one test set should not affect another one, but they they would not all start with the same seed. I guess the trouble then is how to pick the seed in a stable way. One option would be to base it on the testset name. Of course, in this case that wouldn't help.

KristofferC · 2017-12-20T17:39:39Z

Something with the location info of the testset? Of course it will change if you move the testset but if you aren't calling srand then you shouldn't rely on getting specific numbers anyway.

rfourquet · 2017-12-20T17:44:34Z

I have no idea yet for a solution. I think it would help to see the Revise problem or a specific instance where the current (new) behavior is a problem (which can't be solved easily). The testset name could inded work in some cases. But I also thought about the location, but this is exactly the problem this PR was trying to solve: to be able to re-arrange your testsets in any order without having your test depending on the global RNG affected, and even to paste it the REPL.

StefanKarpinski · 2017-12-20T17:45:43Z

A hash of the test-set name with a per-name counter might be pretty good since test sets with different names would be isolated from each other and test sets with the same name would not all have the same seed, but their RNG streams would only be determined by their order.

StefanKarpinski · 2017-12-20T17:46:53Z

Location info seems bad since adding or removing a line elsewhere in the file would change it. In that case, you may as well just always use different random seeds but choose a new random seed per testset so that seeding in one doesn't have the effect of inadvertently seeding the following ones.

rfourquet · 2017-12-20T17:52:52Z

But I would like to understand what use case we are trying to solve exactly, which is not addressed by the work-around I mentioned to Tim.

StefanKarpinski · 2017-12-20T18:09:45Z

To do the right thing more often by default – or at least to do something that's not surprising. If someone writes two testset blocks with no random seeds, I think they'd find it rather surprising that they both get the same sequence of random values.

timholy · 2017-12-20T18:10:52Z

It's not hard to work around, see timholy/Revise.jl@109cbd9. I could have alternatively used srand---I actually forgot you can call it without an argument, so based on that mistaken impression I decided to make sure the sequence would be different if I ran my tests twice (which it would not be if I used srand(i) with hard-coded i).

I think the only question is, what's more commonly the desired behavior? For example, should we view the fact that every @testset will generate the same random sequence as a strength or a weakness? (I don't think there's a clear answer here.) On one hand, preventing spurious failures is nice. On the other hand, I used to be able to do for i = 1:100; include("runtests.jl"); end and have extra confidence that there weren't corner cases that would cause test failures. Now that strategy will likely fail to reveal those corner cases, because each one of those iterates will use the exact same random numbers.

StefanKarpinski · 2017-12-20T18:17:57Z

What is the benefit of each testset starting with the same seed? Serious question.

rfourquet · 2017-12-20T18:22:10Z

I must say that I already started to enjoy the new behavior. For example at (55852ee), I feel super confident with this fuzzy tests (not sure it's the right name): it will be run by many different machines, with differenct seeds. And if a user finds a failure, she will give me the printed seed, and I will just be able to reproduce the failure by copy-pasting this single testset, without bothering with all what comes before in the file

For your example with the loop, it's very simple to work-around:

for i = 1:100
    srand(i)
    include("runtests.jl")
end

The big win is if that fails at iteration 90, you can easily start to debug directly at iteration 90.

StefanKarpinski · 2017-12-20T18:23:59Z

@rfourquet, can you spell out what the benefit is? I think putting it in general words (not examples or use cases) will help me at least to figure this situation out.

timholy · 2017-12-20T18:33:51Z

Worth separating two points:

what's the benefit of printing the seed upon failure? (seems quite large!)
what's the benefit of resetting the seed with each test set? (less sure about this one)

If these are coupled, I don't understand why.

rfourquet · 2017-12-20T19:07:58Z

can you spell out what the benefit is? (not examples or use cases)

It's tough question as the main motivation was to improve some use cases of the debugging experience. I will use Tim's separate questions as a starting point (at the risk of repeating myself)

what's the benefit of printing the seed upon failure?

Printing the seed upon failure on base tests, or seeding manually the global RNG before running some tests, allows (except in corner cases, e.g using RandomDevice), in case of failure of the test, to reproduce the failure which helps investigate the reason of it. Basically the alternative till now (seen in some places of base tests) was to run the tests with srand(fixed_seed) at the top of the file to be sure to be able to reproduce a failure. This is extremly limiting. By choosing randomly a seed each time the testsuite is run, and doing implicitly srand(seed) at the top of the test file, we can test algorithms with a wide variety of different inputs, without loosing the safety to track down a failure.

what's the benefit of resetting the seed with each test set?

There can be a benefit only if the tester has a way to know the seed. So it's coupled to your first item. The idea is to insulate each testset from each other. If a testset needs a particular seed to work correctly (like it happens a lot in linalg tests for example), it should state it explicitly. If it depends implicitly on the fact that a testset 20 lines above had a srand(123) in it, then they are coupled and it's more difficult to reproduce the failure.
Let's take a step back: if the global RNG was seeded only once at the beginning of running the whole base test suite (let's assume all files are run sequencially), and there is a failure in testfile 20, it's a nightmare to reproduce the failure, as to re-create the state of the RNG which caused the failure, there is no way around going through all the steps again, which can take many minutes. So my view was that testsets are to a testfile what a testfile is to the whole testsuite. You know the seed, so you know the state of the global RNG when the testset starts.

StefanKarpinski · 2017-12-20T20:14:45Z

What I don't understand is why all test sets need the same seed. Why not just give each testset a different seed but make them independent of each other? If a testset fails, print the seed it failed with so that we know how to reproduce that particular testset failure. Printing this could be conditional on whether the testset has used any random values or not, to avoid printing irrelevant information for non-random testset failures.

rfourquet · 2017-12-20T22:11:04Z

What I don't understand is why all test sets need the same seed

Oh ok, this is just the easiest implementation to keep track of the seed, but I agree this wouldn't need to be the same seed for each testset, as long as the user gets it in the end.

rfourquet added the testsystem The unit testing framework and Test stdlib label Nov 2, 2017

rfourquet force-pushed the rf/testset-seed branch from 27d1ef5 to dffee11 Compare November 2, 2017 16:18

rfourquet added the triage This should be discussed on a triage call label Dec 13, 2017

StefanKarpinski removed the triage This should be discussed on a triage call label Dec 14, 2017

StefanKarpinski added this to the 1.0 milestone Dec 14, 2017

StefanKarpinski assigned rfourquet Dec 14, 2017

rfourquet force-pushed the rf/testset-seed branch 4 times, most recently from 81b143e to 47a21f5 Compare December 17, 2017 09:00

rfourquet force-pushed the rf/testset-seed branch from 47a21f5 to 3eac481 Compare December 17, 2017 17:35

rfourquet merged commit a2c97c8 into master Dec 17, 2017

rfourquet deleted the rf/testset-seed branch December 17, 2017 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reset GLOBAL_RNG state in/out at-testset for reproducibility #24445

reset GLOBAL_RNG state in/out at-testset for reproducibility #24445

rfourquet commented Nov 2, 2017 •

edited

Loading

fredrikekre commented Nov 2, 2017

rfourquet commented Nov 2, 2017

yuyichao commented Nov 2, 2017

rfourquet commented Nov 2, 2017

rfourquet commented Nov 11, 2017

rfourquet commented Nov 20, 2017

StefanKarpinski commented Nov 20, 2017

StefanKarpinski commented Nov 20, 2017

rfourquet commented Nov 20, 2017

rfourquet commented Nov 25, 2017

rfourquet commented Dec 14, 2017

StefanKarpinski commented Dec 14, 2017

rfourquet commented Dec 17, 2017

timholy commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

KristofferC commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017 •

edited

Loading

timholy commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

timholy commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

rfourquet commented Dec 20, 2017

reset GLOBAL_RNG state in/out at-testset for reproducibility #24445

reset GLOBAL_RNG state in/out at-testset for reproducibility #24445

Conversation

rfourquet commented Nov 2, 2017 • edited Loading

fredrikekre commented Nov 2, 2017

rfourquet commented Nov 2, 2017

yuyichao commented Nov 2, 2017

rfourquet commented Nov 2, 2017

rfourquet commented Nov 11, 2017

rfourquet commented Nov 20, 2017

StefanKarpinski commented Nov 20, 2017

StefanKarpinski commented Nov 20, 2017

rfourquet commented Nov 20, 2017

rfourquet commented Nov 25, 2017

rfourquet commented Dec 14, 2017

StefanKarpinski commented Dec 14, 2017

rfourquet commented Dec 17, 2017

timholy commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

KristofferC commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017 • edited Loading

timholy commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

timholy commented Dec 20, 2017

rfourquet commented Dec 20, 2017

StefanKarpinski commented Dec 20, 2017

rfourquet commented Dec 20, 2017

rfourquet commented Nov 2, 2017 •

edited

Loading

StefanKarpinski commented Dec 20, 2017 •

edited

Loading