Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix runtests for RemoteException capturing a UndefVarError case. #20210

Merged
merged 1 commit into from
Jan 25, 2017

Conversation

amitmurthy
Copy link
Contributor

Closes #20027

Deserialization errors due to a remote UndefVarError were not being recorded.

@tkelman tkelman added bugfix This change fixes an existing bug testsystem The unit testing framework and Test stdlib labels Jan 24, 2017
@tkelman
Copy link
Contributor

tkelman commented Jan 24, 2017

Great, applying this patch on top of e9380c0 does fix the worst part of the issue, and parallel tests fail consistently as they would if you run the tests in serial. But the UndefVarError and the deserialization backtrace is useless for identifying what actually caused the failure...

@tkelman
Copy link
Contributor

tkelman commented Jan 24, 2017

Looks like that's a really old issue actually - if a test throws something like a BoundsError that has a custom array type in one of its fields, and that array type isn't defined on node 1, then you get an UndefVarError when node 1 tries to display anything useful about the error. 561db3b made the backtrace when that happens much messier, with a few dozen lines in deserialize.

@amitmurthy
Copy link
Contributor Author

then you get an UndefVarError when node 1 tries to display anything useful about the error

I think this would probably happen at deserialize time on node 1 itself. And it would end up hiding the original error.

@tkelman
Copy link
Contributor

tkelman commented Jan 24, 2017

Right, I think that's what's been happening here. This patch fixes the incorrect exit status and tests showing green when they shouldn't be, but it doesn't fix the actual test failure error getting hidden by a bunch of confusing deserialization backtraces. Will be hard to debug if intermittent problems do this, but it's also not a recent regression (vs the exit status mistake which was).

@amitmurthy
Copy link
Contributor Author

I think that is a more involved fix. Offhand I have no idea how to tackle it. Should track it in a separate issue.

FWIW:

julia> remotecall_fetch(()->eval(quote
         type Foo <: Exception
           x
         end
         throw(Foo(1))
       end), 2)
ERROR: UndefVarError: Foo not defined
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at ./serialize.jl:841
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at ./serialize.jl:580
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::DataType) at ./serialize.jl:909
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at ./serialize.jl:856
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at ./serialize.jl:580
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::DataType) at ./serialize.jl:909
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at ./serialize.jl:856
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at ./serialize.jl:580
 in deserialize_msg(::Base.ClusterSerializer{TCPSocket}, ::Type{Base.ResultMsg}) at ./multi.jl:120
 in deserialize_msg(::Base.ClusterSerializer{TCPSocket}) at ./multi.jl:130
 in message_handler_loop(::TCPSocket, ::TCPSocket, ::Bool) at ./multi.jl:1371
 in process_tcp_streams(::TCPSocket, ::TCPSocket, ::Bool) at ./multi.jl:1328
 in (::Base.##505#506{TCPSocket,TCPSocket,Bool})() at ./event.jl:73
Stacktrace:
 [1] #remotecall_fetch#493(::Array{Any,1}, ::Function, ::Function, ::Base.Worker) at ./multi.jl:1093
 [2] remotecall_fetch(::Function, ::Base.Worker) at ./multi.jl:1085
 [3] #remotecall_fetch#496(::Array{Any,1}, ::Function, ::Function, ::Int64) at ./multi.jl:1106
 [4] remotecall_fetch(::Function, ::Int64) at ./multi.jl:1106

@amitmurthy
Copy link
Contributor Author

amitmurthy commented Jan 24, 2017

Should probably treat deserialization of Exception objects differently. If we get any any errors during deserialization of Exceptions, it should be captured along with the type of the container and rethrown.

@KristofferC
Copy link
Member

Could you point out where that stacktrace is being printed. It should probably be "updated" to the new version (similar to the last being printed).

@amitmurthy
Copy link
Contributor Author

As a workaround for the testsystem, we can print any exceptions captured in a testset to STDERR on the workers (and not on 1) so that relevant information is not lost.

@amitmurthy
Copy link
Contributor Author

Could you point out where that stacktrace is being printed. It should probably be "updated" to the new version (similar to the last being printed).

I assume the question was in the context of the example above.

Captured as a RemoteException here -

remote_err = RemoteException(myid(), CapturedException(e, catch_backtrace()))

And thrown here -

return isa(v, RemoteException) ? throw(v) : v

Don't quite follow what you mean by "updated to the new version"

@KristofferC
Copy link
Member

The stacktraces on 0.6 are a bit changed (you can see it in the very end of the code you quoted a few posts up):

Stacktrace:
 [1] #remotecall_fetch#493(::Array{Any,1}, ::Function, ::Function, ::Base.Worker) at ./multi.jl:1093
 [2] remotecall_fetch(::Function, ::Base.Worker) at ./multi.jl:1085
...

However, the stacktrace above that one uses the "old" way of printing backtraces:

 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at ./serialize.jl:841
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at ./serialize.jl:580
....

so there is apparently something about the new stacktraces that are not properly hooked in for remote exceptions.

@amitmurthy
Copy link
Contributor Author

julia/base/multi.jl

Lines 1016 to 1019 in 7119419

function showerror(io::IO, re::RemoteException)
(re.pid != myid()) && print(io, "On worker ", re.pid, ":\n")
showerror(io, re.captured)
end

and

showerror(io::IO, ce::CapturedException) = showerror(io, ce.ex, ce.processed_bt, backtrace=true)

@amitmurthy amitmurthy merged commit fdddee1 into master Jan 25, 2017
@amitmurthy amitmurthy deleted the amitm/fixtests branch January 25, 2017 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug testsystem The unit testing framework and Test stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants