Skip to content

AArch64 CI#9508

Merged
waj merged 2 commits intocrystal-lang:masterfrom
jhass:gha_aarch64_ci
Jul 6, 2020
Merged

AArch64 CI#9508
waj merged 2 commits intocrystal-lang:masterfrom
jhass:gha_aarch64_ci

Conversation

@jhass
Copy link
Member

@jhass jhass commented Jun 18, 2020

Depends on, and thus includes:

Additional work included here:

The runner setup scripts are at https://github.com/jhass/crystal-infrastructure for now. Of course that should be moved to the crystal-lang org, I just wanted your acknowledgement for it first.

I think we can keep the docker images used here on my account for now until we make AArch64 part of our official releases and push appropriate images under crystallang/crystal.

I want to mention some offspring work, which the work included here enabled and will sustain in the future:

I would like to thank the generous Packet and ARM through their WorksOnArm project for providing us with the necessary server for this.

@jhass jhass marked this pull request as draft June 18, 2020 21:57
@jhass jhass mentioned this pull request Jun 20, 2020
algitbot pushed a commit to alpinelinux/aports that referenced this pull request Jun 21, 2020
add patches made by Jonne Haß from
crystal-lang/crystal#9508 (#9401 #9422 #9430)
to make it build on aarch64 again
@jhass jhass force-pushed the gha_aarch64_ci branch 2 times, most recently from 78be274 to 64ab32f Compare June 22, 2020 21:00
It looks like shutting down the server with the  client socket already
closed, fails at writing some TLS session shutdown messages. The spec
would fail  on some OpenSSL versions or environments with:

Failed to raise an exception: 556175744
[0xaaaae58ecf54] *Exception::CallStack::print_backtrace:Int32 +100
[0xaaaae58bd004] __crystal_raise +80
[0xaaaae5a24cb4] *Socket+ +208
[0xaaaae5a24904] *Socket+ +196
[0xaaaae58cfbe8] ~procProc(Pointer(LibCrypto::Bio), Pointer(UInt8), UInt64, Pointer(UInt64), Int32) +932
[0xffffb3ac986c] ???

Tried to raise:: Error writing to socket: Broken pipe (IO::Error)
  from src/io/evented.cr:82:13 in 'unbuffered_write'
  from src/io/buffered.cr:136:14 in 'write'
  from src/openssl/bio.cr:31:7 in '->'
  from ???

fixup! WebSocket HTTPS spec: Wait before  shutting down the server
@jhass jhass marked this pull request as ready for review June 29, 2020 15:05
@jhass jhass requested a review from a team July 2, 2020 14:55
@waj
Copy link
Member

waj commented Jul 3, 2020

@jhass any way I can reproduce the issue with the websocket?

@jhass
Copy link
Member Author

jhass commented Jul 3, 2020

It's this one: https://github.com/jhass/crystal/runs/785495781?check_suite_focus=true#step:6:1106

It happened pretty consistently in this build environment, that is an AArch64 Ubuntu docker container. Even for running just the relevant spec file alone. But I didn't really try to reproduce it outside. It could also be related to the OpenSSL version 🤷

It seems to be a race condition between the server socket reading the close message and the http server being closed, so actually maybe quite related to #9563. The additional buffering by the OpenSSL socket possibly triggers it more consistently here. My initial fix was sleeping a bit before closing the http server, but that only helped sometimes.

@waj
Copy link
Member

waj commented Jul 3, 2020

Running the spec in my machine I can also see there are exceptions being raised/rescued. So the real problem here might be that there is an exception that cannot be raised. Is anything related to exception handling that is expected to be incomplete in this architecture?

@jhass
Copy link
Member Author

jhass commented Jul 3, 2020

Maybe, I cannot be sure. However the whole specs are passing, so it doesn't feel like a fundamental issue. But yeah, that random value as the failed to raise reason is a little weird. And then why would it affect -gnu but not -musl? 🤷

@waj
Copy link
Member

waj commented Jul 3, 2020

And then why would it affect -gnu but not -musl?

That's what I'd like to investigate. I know you feel like dealing with that error is unrelated with this PR, but those are errors hard to reproduce and with conditions probably not covered by any other spec. Just forgetting about the issue with a workaround doesn't make me feel fully comfortable.

In this case the error is being raised within the BIO instance and the composition of that object depends on the OpenSSL version. Maybe something related with that? I was just able to reproduce the issue in a RPi. I'm suspecting we can't just raise exceptions from within the BIO implementation, but don't know why yet.

@jhass
Copy link
Member Author

jhass commented Jul 3, 2020

FWIW I feel like it's a compound issue. A race condition triggered and then failing again in properly reporting the error condition. So the solution doesn't feel like a workaround to me, because it seems to fix the actual race, just not the secondary issue in error reporting :)

@waj
Copy link
Member

waj commented Jul 3, 2020

Here's a snippet to reproduce the issue:

require "socket"
require "openssl"

TCPSocket.open("crystal-lang.org", 443) do |s|
  OpenSSL::SSL::Socket::Client.open(s, hostname: "crystal-lang.org") do |ss|
    s.close
    ss.puts
  end
end

There is something wrong with the stack unwinding from within BIO. Is not just the exception that's not working. Just obtaining a backtrace stops at the OpenSSL binaries. During debugging with lldb I can see a well formed backtrace, so it seems there might be something wrong in our unwind code for ARM.

I can accept this PR now to unblock all the progress you've been doing on this architecture, but we need to keep an eye on that issue.

@jhass jhass added this to the 1.0.0 milestone Jul 3, 2020
@waj waj merged commit 13c7934 into crystal-lang:master Jul 6, 2020
@jhass jhass deleted the gha_aarch64_ci branch July 6, 2020 13:50
@z64
Copy link
Contributor

z64 commented Jul 6, 2020

This is great progress, thank you for the work @jhass and everyone else involved :)!

asterite pushed a commit to asterite/crystal-1 that referenced this pull request Jul 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants