Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swift implementations to use macOS 14 runners + testcontainers #345

Merged
merged 16 commits into from
Jul 19, 2024

Conversation

jackgene
Copy link
Collaborator

@jackgene jackgene commented Jun 9, 2024

(Use this or #363, but not both, this one uses testcontainers, but scenario 3 fails)

Main purpose of this PR is to get the Swift implementations on the latest GitHub Actions macOS infrastructure before I start working on scenario 10.

Changes include:

  • swift-async and swift-combine

    • Upgrade to macos-latest (=macos-14) which is Apple Silicon and does not support nested virtualization
    • Use Testcontainers Cloud
  • swift-dispatch:

    • Use ubuntu-latest, as Grand Central Dispatch is the "old" way of doing Swift concurrency, and I want to make sure it compiles and runs on systems that aren't necessarily on the latest and greatest Apple hardware/libraries

@jamesward
Copy link
Owner

Thanks Jack! For the Pkl source, it's at:
https://github.com/jamesward/easyracer/blob/main/.github/workflows/clients.pkl

Instead of not using Testcontainers, a while back I remember trying to use some other docker daemon thing on Apple Silicon. I can investigate that more as it'd be nice to stick with Testcontainers for these.

@jackgene
Copy link
Collaborator Author

Thanks Jack! For the Pkl source, it's at: https://github.com/jamesward/easyracer/blob/main/.github/workflows/clients.pkl

Instead of not using Testcontainers, a while back I remember trying to use some other docker daemon thing on Apple Silicon. I can investigate that more as it'd be nice to stick with Testcontainers for these.

Hey James,
If you could point me to the containerization option for Apple Silicon you were investigating, I could look into it, but as I understand it, this is a fundamental limitation due to the M1 chip not supporting nested virtualization. AIUI, GitHub Actions is running in a virtualized environment on M1, which makes it the first virtualized layer, and anything we try on top of it (Docker, Podman, Colima) would have to be nested within it.

The history of macOS runners in GitHub Action, from my investigation is:

  1. GitHub Actions never included Docker because of licensing concerns.
  2. As a result, someone made setup-docker-macos-action to get around it, but it feels like a licensing gray area. Also, the implementation is a bit hacky, and is the reason Swift workflows are constantly failing (at least the version I use, it does seem like it's been actively worked on, and latest versions may be better).
  3. At some point, Colima was introduced as a Docker alternative, and even shipped on macos-12 runners. I was just able to get this working, as long as I stick to Intel macOS runners (up to macos-13, I really just tested on macos-12, but don't see why it wouldn't work on macos-13). It doesn't work on macos-14 because it's Apple Silicon, and this is documented.

So with that, I only saw two options:

  1. Use macos-13 and be able to use Testcontainers style testing.
  2. Use macos-14 but without virtualization

GitHub is phasing out support for macos-12 runners, and I figure macos-13 would follow, and the future will be all Apple Silicon, so I just went with option 2. I figure we could revisit if future Apple Silicon chips (M2, M3 etc) introduces support for nested virtualization.

@jackgene
Copy link
Collaborator Author

Thanks Jack! For the Pkl source, it's at: https://github.com/jamesward/easyracer/blob/main/.github/workflows/clients.pkl

Oh, about PKL, I did see that file, but it appears to reference components not defined within (unless I'm missing it somehow)?

Specifically, I see references to GitHubAction.Swift.testSteps(), but couldn't find the definition of it.

@jamesward
Copy link
Owner

Thanks Jack for investigating. Oh it was Colima and I'd thought I'd seen someone getting it working with Testcontainers on macos-14 but maybe not. Will investigate further.

For GitHubAction.Swift.testSteps() they come from:
https://github.com/jamesward/pklgha

@jamesward
Copy link
Owner

Yeah, so it does seem that no one has gotten Colima to work on macos-14 AFAICT. So I'm going to try using Testcontainers Cloud instead so we can stick with Testcontainers.

@jamesward
Copy link
Owner

Ok, I got Testcontainers Cloud setup but I think the DockerClient in the tests will need some tweaks and can hopefully work with Testcontainers Cloud. Want to give that a try? Or maybe this isn't possible with Swift? Not sure.

@jackgene
Copy link
Collaborator Author

Ok, I got Testcontainers Cloud setup but I think the DockerClient in the tests will need some tweaks and can hopefully work with Testcontainers Cloud. Want to give that a try? Or maybe this isn't possible with Swift? Not sure.

Is the idea that we'd have the scenarios running in Testcontainers Cloud, and just controlling it from GitHub Actions?

I can have a look at it depending on how this is set up. But yeah, Swift never had Testcontainers support, so I just implemented the minimal amount of equivalent features I needed using DockerClient (IIRC, just run a container, forward a port to a random available port, and run the scenarios).

@jamesward
Copy link
Owner

Yeah, that is the idea but I need to see if we can configure the DockerClient to connect to Testcontainers Cloud. I'm doing some research on that and will let you know what I find.

@jamesward
Copy link
Owner

Here is what the Testcontainer folks say:

~/.testcontainers.properties will contain tc.host that is a standard Docker host that you can point the client to (assuming that your client supports TCP docker hosts)

You should be able to sign up for a free Testcontainers account and give this a try (with the agent running locally). Or I can look more into it later.

@jackgene
Copy link
Collaborator Author

jackgene commented Jun 10, 2024 via email

@jamesward
Copy link
Owner

Looks like maybe there is a way to expose the TCP socket as a file socket:
https://stackoverflow.com/a/2150188/1826422

@jackgene
Copy link
Collaborator Author

Looks like maybe there is a way to expose the TCP socket as a file socket: https://stackoverflow.com/a/2150188/1826422

Ah, good find.

So I confirmed that docker-client-swift indeed does not support TCP. I tried using a fork of it that's supposedly more feature complete, including TCP support, but that's having problem just with regular Docker (AFAIK, it's having trouble parsing Docker JSONs, so maybe API incompatibility).

My next plan was to just use the Intel-based macos-13 with Colima (it's mostly working, but scenario 3 keeps failing), with the hope that since M2 and M3 chips supposedly have hardware support for nested virtualization, GitHub Actions will eventually upgrade to those, and provide nested virtualization support.

I'll give socat a try with Testcontainers Cloud though.

@jackgene
Copy link
Collaborator Author

jackgene commented Jun 12, 2024

As of the current version of this branch, GitHub Actions is:

  • Starting up the Testcontainers Cloud agent successfully
  • Using socat to proxy Docker tcp to a unix domain socket file
  • Run the scenarios as a test

However, scenario 3 (the 10k connection one) keeps failing. I'm not sure if this is due to a limitation of Testcontainers Cloud (I think it tunnels the exposed ports over SSH?) or just some tuning we need to do. I'll probably try and figure what's failing exactly this weekend.

@jamesward
Copy link
Owner

Wow! Amazing work Jack! If we need to loop in the Testcontainers folks, let me know :)

@jackgene
Copy link
Collaborator Author

jackgene commented Jun 15, 2024

Wow! Amazing work Jack! If we need to loop in the Testcontainers folks, let me know :)

Hey James,
I think we'll need input from the Testcontainers people. Scenario 3 is consistently failing when I use Testcontainers Cloud (even for the non-Swift implementations). I think it simply isn't designed to handle so many open connections.

I'm getting this error with swift-async:

Error Domain=NSURLErrorDomain Code=-1005 "The network connection was lost." UserInfo={_kCFStreamErrorCodeKey=-4, NSUnderlyingError=0x600004ec13b0 {Error Domain=kCFErrorDomainCFNetwork Code=-1005 "(null)" UserInfo={NSErrorPeerAddressKey=<CFData 0x60000e3ebbb0 [0x1f4f748c0]>{length = 16, capacity = 16, bytes = 0x100280057f0000010000000000000000}, _kCFStreamErrorCodeKey=-4, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <4391C030-DB4A-4E40-BE5E-2CA3E4F7F693>.<1>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <4391C030-DB4A-4E40-BE5E-2CA3E4F7F693>.<1>"
), NSLocalizedDescription=The network connection was lost., NSErrorFailingURLStringKey=http://localhost:32773/3, NSErrorFailingURLKey=http://localhost:32773/3, _kCFStreamErrorDomainKey=4}

With go-stdlib:

panic: Get "http://127.0.0.1:32771/3": read tcp 127.0.0.1:44138->127.0.0.1:32771: read: connection reset by peer

With scala-ox:

[info]   Cause: java.net.SocketException: Connection reset
[info]   at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:401)
[info]   at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:434)
[info]   at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1178)
[info]   at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:841)
[info]   at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:181)
[info]   at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)
[info]   at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:280)
[info]   at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:233)
[info]   at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:782)
[info]   at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:[96](https://github.com/jackgene/easyracer/actions/runs/9532170872/job/26274039643#step:5:97)5)

@jackgene jackgene changed the title Upgrade Swift implementations to use macOS 14 GitHub Actions runners Swift implementations to use macOS 14 runners + testcontainers Jun 22, 2024
@jackgene jackgene force-pushed the macos-runner-use-colima branch from 8ec9c75 to ff181ad Compare June 22, 2024 01:01
@jackgene
Copy link
Collaborator Author

jackgene commented Jun 29, 2024

So I've created a bunch of PRs for the Swift implementations. Longer term, I'm hoping with support for nested virtualization in the M2+ processors (assuming Apple provides the software support), GitHub Actions will at some point in the future support containerization again.

In the meantime, we have two options:

  1. Use Testcontainers Cloud (this PR). But it will not handle Scenario 3 (10k connections). I've found this to be the case even with the Go and Scala-Ox.
  2. Don't use containerization at all for now PR. This is also currently much faster than the containerized versions (which takes 10+ minutes just to install the virtualization components).

Independently from this, I've also implemented Scenario 10:

The first two will fail checks as of now, but succeeds in GitHub Actions without containerization (swift-async, swift-combine). The third is set up to use ubuntu-latest, and will run fine as is.

@jamesward
Copy link
Owner

Thanks Jack! I've opened an issue with Testcontainers Cloud to see if they can increase the number of concurrent connections. I'll let you know what I find out.

@jackgene jackgene force-pushed the macos-runner-use-colima branch from 6c75342 to 4a248dd Compare July 1, 2024 19:02
@jackgene
Copy link
Collaborator Author

jackgene commented Jul 9, 2024

Thanks Jack! I've opened an issue with Testcontainers Cloud to see if they can increase the number of concurrent connections. I'll let you know what I find out.

Let me know if they've responded. In the meantime, it appears Apple will start supporting nested virtualization on macOS 15.0, but only on M3 chips and higher:

Nested virtualization is available for Mac with the M3 chip, and later.

I can revisit when GitHub Actions get that hardware/OS.

@jamesward
Copy link
Owner

The Testcontainers Cloud folks are investigating. In the meantime, I noticed this:
abiosoft/colima#970 (comment)

@jamesward
Copy link
Owner

Testcontainers Cloud has rolled out a fix so we should be good to give that approach a try again.

@jackgene
Copy link
Collaborator Author

The Testcontainers Cloud folks are investigating. In the meantime, I noticed this: abiosoft/colima#970 (comment)

I was going to respond to this, so two things:

  1. I was evaluating Colima for personal use (mainly out of concern for Docker Desktop license possibly affecting me in the future), and found that either with qemu of vz, it was struggling with 10k connections.
  2. I tried this on GitHub Actions last week, and saw the same problem - https://github.com/jackgene/easyracer/actions/runs/9922717212/job/27412084879

Unrelated, but just for the heck of it, I tried using Docker Desktop to see if it would work, and it failed due to nested virtualization issues, not surprisingly.

@jackgene
Copy link
Collaborator Author

Testcontainers Cloud has rolled out a fix so we should be good to give that approach a try again.

Sweet, time to give it another go.

@jackgene
Copy link
Collaborator Author

Testcontainers Cloud has rolled out a fix so we should be good to give that approach a try again.

I just re-ran this without any code changes, and we are still getting the same error. Do you know if we'll need to update anything (e.g., re-generate token)?

@jamesward
Copy link
Owner

It is possible you need to be using a non-free Testcontainers Cloud account. The account I'm using for this repo is non-free, so we might have to test on this repo.

@jackgene
Copy link
Collaborator Author

It is possible you need to be using a non-free Testcontainers Cloud account. The account I'm using for this repo is non-free, so we might have to test on this repo.

Ah OK. Let me push some random change. I believe that should cause the checks in this PR to re-run.

@jackgene
Copy link
Collaborator Author

Looks like you'll have to add the Testcontainers Cloud token to this repository first (using TC_CLOUD_TOKEN):

ERROR: cannot proceed without a valid Testcontainers Cloud API token. Please verify that "TC_CLOUD_TOKEN" environment variable or command line argument "--token" has a valid token, or login in a desktop environment: permanent authorization error: failed to parse token: incorrect token

https://github.com/jamesward/easyracer/actions/runs/10013553868/job/27681408480?pr=345

Alternatively, if you've already defined the secret, but used a different name, let me know what name you used, and I'll update the GHA YAML to reflect that.

@jamesward
Copy link
Owner

I have defined that, but it looks like by default PRs don't get the secrets (which makes sense from a security perspective). I might just have to test this in prod :) Should I just merge this and see if it works?
btw, we might be able to change the GHA config to allow secrets access: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

@jamesward
Copy link
Owner

Also, I invited you as a collaborator so you don't need to go through PRs to work through this.

@jackgene
Copy link
Collaborator Author

I have defined that, but it looks like by default PRs don't get the secrets (which makes sense from a security perspective). I might just have to test this in prod :) Should I just merge this and see if it works? btw, we might be able to change the GHA config to allow secrets access: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

Sure. I don't think it'll be any more broken than it is now. :D

@jamesward jamesward merged commit 0b9860c into jamesward:main Jul 19, 2024
0 of 2 checks passed
@jackgene jackgene deleted the macos-runner-use-colima branch July 25, 2024 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants