Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testsystem: original error lost if a worker response results in a deserialization error on node 1 #20230

Closed
amitmurthy opened this issue Jan 25, 2017 · 6 comments · Fixed by #24847
Labels
error handling Handling of exceptions by Julia or the user testsystem The unit testing framework and Test stdlib

Comments

@amitmurthy
Copy link
Contributor

If a testset executed on a worker captures an exception which, in turn has an embedded type not available on node 1, then the original error and stacktrace is lost. This will also happen if the remote worker happens to throw a custom exception not available in Base. This is because deserialization of the testset result throws an UndefVar error on node 1 and testset result deserialization is aborted.

It may be better to print any errors in the testset on the worker itself. CI logs will then at least have a record of the original error.

Ref : #20027 (comment)

@amitmurthy amitmurthy added the testsystem The unit testing framework and Test stdlib label Jan 25, 2017
@vchuravy vchuravy added this to the 0.6.0 milestone Jan 26, 2017
@tkelman
Copy link
Contributor

tkelman commented Jan 26, 2017

this isn't a new problem, so while it sucks it's not release-blocking

@tkelman tkelman removed this from the 0.6.0 milestone Jan 26, 2017
@amitmurthy
Copy link
Contributor Author

Closing this as #20276 handled the primary concern. Please reopen if anyone feels we should still print errors on the workers in any case.

@tkelman
Copy link
Contributor

tkelman commented Jan 28, 2017

What does this look like now, is it still an undefvarerror just now without deserialization issues? I do think it would be worth showing better error info here than an undefvarerror.

@tkelman
Copy link
Contributor

tkelman commented Jan 28, 2017

When a test fails, we should be able to see exactly what failed and exactly where it failed. Until both of those are true there's still work to do here.

@tkelman tkelman reopened this Jan 28, 2017
@amitmurthy
Copy link
Contributor Author

It actually shows the actual exception type and call stack from the remote node. Not the exception object itself since that couldn't be deserialized.

julia> expr = quote
                       type DontExistOn1
                           x
                       end
                       throw(BoundsError(DontExistOn1(1), 1))
                  end
quote  # REPL[2], line 2:
    type DontExistOn1 # REPL[2], line 3:
        x
    end # REPL[2], line 5:
    throw(BoundsError(DontExistOn1(1),1))
end

julia> remotecall_fetch(()->eval(expr), 2)
ERROR: On worker 2:
Error deserializing a CapturedException. Original exception of type : BoundsError
 in eval at ./boot.jl:236
 in jlcall_eval_18136 at /Users/amitm/Julia/julia/usr/lib/julia/sys.dylib:?
 in deserialize_global_from_main at ./clusterserialize.jl:141
 in foreach at ./abstractarray.jl:1685
 in deserialize at ./clusterserialize.jl:39
 in handle_deserialize at ./serialize.jl:590
 in deserialize at ./serialize.jl:550
 in deserialize_datatype at ./serialize.jl:839
 in handle_deserialize at ./serialize.jl:580
 in deserialize_msg at ./multi.jl:120
 in deserialize_msg at ./multi.jl:130
 in message_handler_loop at ./multi.jl:1427
 in process_tcp_streams at ./multi.jl:1384
 in #508 at ./event.jl:73
Stacktrace:
 [1] #remotecall_fetch#496(::Array{Any,1}, ::Function, ::Function, ::Base.Worker) at ./multi.jl:1149
 [2] remotecall_fetch(::Function, ::Base.Worker) at ./multi.jl:1141
 [3] #remotecall_fetch#499(::Array{Any,1}, ::Function, ::Function, ::Int64) at ./multi.jl:1162
 [4] remotecall_fetch(::Function, ::Int64) at ./multi.jl:1162

The resultant UndefVarError on the calling node is available too. If you wrap the above call in a try-block, the error can be retrieved from ex.captured.ex.exceptions[2].ex . Not convenient and hence #20277

@tkelman
Copy link
Contributor

tkelman commented Jan 28, 2017

what does the remotecall stacktrace equivalent look like when it comes from a test failure? 20277 is more general, this issue is about the testsystem consequences

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error handling Handling of exceptions by Julia or the user testsystem The unit testing framework and Test stdlib
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants