-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reset GLOBAL_RNG state in/out at-testset for reproducibility #24445
Conversation
Append |
27d1ef5
to
dffee11
Compare
I still have a failing test, at "test/linalg/arnoldi.jl:103". I had to set a fresh random seed, which works well on a 64 bits machine, but fails on 32 bits ones. Would someone with easy access to both architectures help me? Run this snippet with changin the
|
AFAICT 124 works on x86, x64, armv7-a and aarch64 |
thanks a lot! |
Any opinion on this one? I find it pretty neat to have this easier reproducibility at the |
Bump. |
I'm not really sure what needs to be reviewed here. This just seems to be a broader application in test files of the |
Oh I see. This automatically does the equivalent of |
Yes my bump is mainly to generate feed-back and opinions on the idea, as the implementation is rather straightforward. I think it's a good change but I'm not confident enough to take the decision to merge without some support, I may miss some drawback. Essentially this implements what you suggested there, but a bit differently: when a base test fails, the RNG seed is printed, and with this change, to reproduce the failure you only have to do |
@andreasnoack @kshyatt do you have opinions on this? |
I put the triage label for getting a decision (it's slightly breaking), but I would really like to get this in, as it makes debugging much easier when you get a test error (I had such a use case today). Small recap for busy triage people: the test system prints the RNG seed when there is an error; with this PR, you can call |
I defer to your judgement here, @rfourquet, go ahead and just rebase this, run CI and we'll merge when it passes. |
81b143e
to
47a21f5
Compare
[ci skip] [av skip] [bsd skip] This is a follow-up on the recently introduced `guardsrand` functionality, first suggested at #16940 (comment). Each `testset` block is now implicitly wrapped in a `guardsrand` block, which is more user-friendly and more systematic (only few users know about `guardsrand`). These are essentially two new features: 1) "in": in base, tests are run with the global RNG randomly seeded, but the seed is printed in case of failure to allow reproducing the failure; but even if the failure occurs after many tests, when they alter the global RNG state, one has to re-run the whole file, which can be time-consuming; with this change, at the beginning of each `testset`, the global RNG is re-seeded with its own seed: this allows to re-run only the failing `testset`, which can be done easily in the REPL (the seeding occurs for each loop in a `testset for`; this also allows to re-arrange `testset`s in arbitrary order w.r.t. the global RNG; 2) "out": a `testset` leaves no tracks of its use of `srand` or `rand` (this "feature" should be less and less needed/useful with the generalization of the use of `testset` blocks). Example: ``` @testset begin srand(123) rand() @testset for T in (Int, Float64) # here is an implicit srand(123), by 1) rand() end rand() # this value will not be affected if the sub-`testset` block # above is removed, or if another rand() call is added therein, by 2) end ``` Note that guardsrand can't be used directly, as then the testset's body is wrapped in a function, which causes problem with overwriting loop variable ("outer"), using `using`, defining new methods, etc. So we need to duplicate guardsrand's logic.
CI all green, but unfortunately one conflict: it's trivial to resolve (rename a couple |
47a21f5
to
3eac481
Compare
One negative is that it breaks tests that look like this (observed in Revise.jl): julia> using Test
julia> @testset begin
workdir = joinpath(tempdir(), randstring(10))
@show workdir
mkdir(workdir)
end
@testset begin
workdir = joinpath(tempdir(), randstring(10))
@show workdir
mkdir(workdir)
end
workdir = "/tmp/xrOdLWavRh"
Test Summary: |
test set | No tests
workdir = "/tmp/xrOdLWavRh"
test set: Error During Test at REPL[2]:7
Got an exception of type SystemError outside of a @test
SystemError (with /tmp/xrOdLWavRh): mkdir: File exists
Stacktrace:
[1] #systemerror#44(::String, ::Function, ::Symbol, ::Bool) at ./error.jl:106
[2] #systemerror at ./<missing>:0 [inlined]
[3] mkdir(::String, ::UInt16) at ./file.jl:99
[4] mkdir(::String) at ./file.jl:94
[5] macro expansion at ./REPL[2]:10 [inlined]
[6] macro expansion at /home/tim/src/julia-1.0/usr/share/julia/site/v0.7/Test/src/Test.jl:978 [inlined]
[7] top-level scope at ./REPL[2]:8
[8] eval at ./boot.jl:291 [inlined]
[9] eval(::Module, ::Expr) at ./repl/REPL.jl:3
[10] eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./repl/REPL.jl:70
[11] macro expansion at /home/tim/.julia/v0.7/Revise/src/Revise.jl:482 [inlined]
[12] (::getfield(Revise, Symbol("##15#16")){Base.REPL.REPLBackend})() at ./event.jl:92
Test Summary: | Error Total
test set | 1 1
ERROR: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken. |
I see. A sort of mental model about the way of this PR is that each time you enter a |
Would it make sense to make the different test sets independent but not have the same seed? I.e. generating more or less random numbers from one test set should not affect another one, but they they would not all start with the same seed. I guess the trouble then is how to pick the seed in a stable way. One option would be to base it on the testset name. Of course, in this case that wouldn't help. |
Something with the location info of the testset? Of course it will change if you move the testset but if you aren't calling srand then you shouldn't rely on getting specific numbers anyway. |
I have no idea yet for a solution. I think it would help to see the Revise problem or a specific instance where the current (new) behavior is a problem (which can't be solved easily). The testset name could inded work in some cases. But I also thought about the location, but this is exactly the problem this PR was trying to solve: to be able to re-arrange your testsets in any order without having your test depending on the global RNG affected, and even to paste it the REPL. |
A hash of the test-set name with a per-name counter might be pretty good since test sets with different names would be isolated from each other and test sets with the same name would not all have the same seed, but their RNG streams would only be determined by their order. |
Location info seems bad since adding or removing a line elsewhere in the file would change it. In that case, you may as well just always use different random seeds but choose a new random seed per testset so that seeding in one doesn't have the effect of inadvertently seeding the following ones. |
But I would like to understand what use case we are trying to solve exactly, which is not addressed by the work-around I mentioned to Tim. |
To do the right thing more often by default – or at least to do something that's not surprising. If someone writes two testset blocks with no random seeds, I think they'd find it rather surprising that they both get the same sequence of random values. |
It's not hard to work around, see timholy/Revise.jl@109cbd9. I could have alternatively used I think the only question is, what's more commonly the desired behavior? For example, should we view the fact that every |
What is the benefit of each testset starting with the same seed? Serious question. |
I must say that I already started to enjoy the new behavior. For example at (55852ee), I feel super confident with this fuzzy tests (not sure it's the right name): it will be run by many different machines, with differenct seeds. And if a user finds a failure, she will give me the printed seed, and I will just be able to reproduce the failure by copy-pasting this single testset, without bothering with all what comes before in the file For your example with the loop, it's very simple to work-around:
The big win is if that fails at iteration 90, you can easily start to debug directly at iteration 90. |
@rfourquet, can you spell out what the benefit is? I think putting it in general words (not examples or use cases) will help me at least to figure this situation out. |
Worth separating two points:
If these are coupled, I don't understand why. |
It's tough question as the main motivation was to improve some use cases of the debugging experience. I will use Tim's separate questions as a starting point (at the risk of repeating myself)
Printing the seed upon failure on base tests, or seeding manually the global RNG before running some tests, allows (except in corner cases, e.g using
There can be a benefit only if the tester has a way to know the seed. So it's coupled to your first item. The idea is to insulate each testset from each other. If a testset needs a particular seed to work correctly (like it happens a lot in linalg tests for example), it should state it explicitly. If it depends implicitly on the fact that a testset 20 lines above had a |
What I don't understand is why all test sets need the same seed. Why not just give each testset a different seed but make them independent of each other? If a testset fails, print the seed it failed with so that we know how to reproduce that particular testset failure. Printing this could be conditional on whether the testset has used any random values or not, to avoid printing irrelevant information for non-random testset failures. |
Oh ok, this is just the easiest implementation to keep track of the seed, but I agree this wouldn't need to be the same seed for each testset, as long as the user gets it in the end. |
This is a follow-up on the recently introduced
guardsrand
functionality,first suggested at
#16940 (comment).
Each
testset
block is now implicitly wrapped in aguardsrand
block, whichis more user-friendly and more systematic (only few users know about
guardsrand
).These are essentially two new features:
"in": in base, tests are run with the global RNG randomly seeded,
but the seed is printed in case of failure to allow reproducing the
failure; but even if the failure occurs after many tests, when they
alter the global RNG state, one has to re-run the whole file, which
can be time-consuming; with this change, at the beginning of each
testset
, the global RNG is re-seeded with its own seed: thisallows to re-run only the failing
testset
, which can be doneeasily in the REPL (the seeding occurs for each loop in a
testset for
;this also allows to re-arrange
testset
s in arbitrary orderw.r.t. the global RNG;
"out": a
testset
leaves no tracks of its use ofsrand
orrand
(this "feature" should be less and less needed/useful with the
generalization of the use of
testset
blocks).Example:
By wrappingEDIT: guardsrand can't be used directly, as then the testset's body is wrapped in a function, which causes problem with overwriting local variable in@testset
bodies in aguardsrand
do
block, the re-use of a local variable as a for-loop variable started to generate a warning (in "test/sparse/sparse.jl"). It's easy to silence the warning, but I don't know why there was no warning before, and I wonder if this wrapping can have other unwanted consequences. If needed, it would be easy to copy/paste code to avoid the wrapping.for
loop (outer
), usingusing
, defining new methods, etc.Apologies for the big diffs, github is not always good at spotting indentation-only changes.