Conversation
add patches made by Jonne Haß from crystal-lang/crystal#9508 (#9401 #9422 #9430) to make it build on aarch64 again
78be274 to
64ab32f
Compare
It looks like shutting down the server with the client socket already closed, fails at writing some TLS session shutdown messages. The spec would fail on some OpenSSL versions or environments with: Failed to raise an exception: 556175744 [0xaaaae58ecf54] *Exception::CallStack::print_backtrace:Int32 +100 [0xaaaae58bd004] __crystal_raise +80 [0xaaaae5a24cb4] *Socket+ +208 [0xaaaae5a24904] *Socket+ +196 [0xaaaae58cfbe8] ~procProc(Pointer(LibCrypto::Bio), Pointer(UInt8), UInt64, Pointer(UInt64), Int32) +932 [0xffffb3ac986c] ??? Tried to raise:: Error writing to socket: Broken pipe (IO::Error) from src/io/evented.cr:82:13 in 'unbuffered_write' from src/io/buffered.cr:136:14 in 'write' from src/openssl/bio.cr:31:7 in '->' from ??? fixup! WebSocket HTTPS spec: Wait before shutting down the server
|
@jhass any way I can reproduce the issue with the websocket? |
|
It's this one: https://github.com/jhass/crystal/runs/785495781?check_suite_focus=true#step:6:1106 It happened pretty consistently in this build environment, that is an AArch64 Ubuntu docker container. Even for running just the relevant spec file alone. But I didn't really try to reproduce it outside. It could also be related to the OpenSSL version 🤷 It seems to be a race condition between the server socket reading the close message and the http server being closed, so actually maybe quite related to #9563. The additional buffering by the OpenSSL socket possibly triggers it more consistently here. My initial fix was sleeping a bit before closing the http server, but that only helped sometimes. |
|
Running the spec in my machine I can also see there are exceptions being raised/rescued. So the real problem here might be that there is an exception that cannot be raised. Is anything related to exception handling that is expected to be incomplete in this architecture? |
|
Maybe, I cannot be sure. However the whole specs are passing, so it doesn't feel like a fundamental issue. But yeah, that random value as the failed to raise reason is a little weird. And then why would it affect |
That's what I'd like to investigate. I know you feel like dealing with that error is unrelated with this PR, but those are errors hard to reproduce and with conditions probably not covered by any other spec. Just forgetting about the issue with a workaround doesn't make me feel fully comfortable. In this case the error is being raised within the |
|
FWIW I feel like it's a compound issue. A race condition triggered and then failing again in properly reporting the error condition. So the solution doesn't feel like a workaround to me, because it seems to fix the actual race, just not the secondary issue in error reporting :) |
|
Here's a snippet to reproduce the issue: require "socket"
require "openssl"
TCPSocket.open("crystal-lang.org", 443) do |s|
OpenSSL::SSL::Socket::Client.open(s, hostname: "crystal-lang.org") do |ss|
s.close
ss.puts
end
endThere is something wrong with the stack unwinding from within BIO. Is not just the exception that's not working. Just obtaining a backtrace stops at the OpenSSL binaries. During debugging with lldb I can see a well formed backtrace, so it seems there might be something wrong in our unwind code for ARM. I can accept this PR now to unblock all the progress you've been doing on this architecture, but we need to keep an eye on that issue. |
|
This is great progress, thank you for the work @jhass and everyone else involved :)! |
Depends on, and thus includes:
Disable LLVM Global Isel #9401Fix VaList and disable va_arg for AArch64 #9422Fix C ABI for AArch64 #9430Make specs pass in non-IPv6 environments #9438Prevent socket specs from hanging the main fiber #9437 Probably not really after Make specs pass in non-IPv6 environments #9438, I didn't bother yet to rerun without this, but can try to exclude it here if preferredYeah it passes without now, I still consider it an improvement that should be merged though.Additional work included here:
A new spec feature,Extracted to Make specs pending instead of failing in no multicast environments #9566pending!to soft-fail a spec in the middle of it. The commit after motiviates the addition of this feature: Sometimes deciding whether a spec needs to be pending requires significant setup, that the spec needs to do anyway. Rather than duplicating potentially complex setup code, this allows to abort the spec without failing the build. Technically this is part of Make specs pass in non-IPv6 environments #9438, but I felt this provides the better context, as this introduces an execution environment where it becomes necessary, namely IPv6-less Docker. Of course I can send this separately or include it into Make specs pass in non-IPv6 environments #9438 if preferred, or both send the feature separately and including the spec fix in Make specs pass in non-IPv6 environments #9438.The runner setup scripts are at https://github.com/jhass/crystal-infrastructure for now. Of course that should be moved to the crystal-lang org, I just wanted your acknowledgement for it first.
I think we can keep the docker images used here on my account for now until we make AArch64 part of our official releases and push appropriate images under crystallang/crystal.
I want to mention some offspring work, which the work included here enabled and will sustain in the future:
will soonedge ships a AArch64 Crystal package again. In preparation a fully static Crystal build for AArch64 already appeared at https://dev.alpinelinux.org/archive/crystal/I would like to thank the generous Packet and ARM through their WorksOnArm project for providing us with the necessary server for this.