-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infrastructure for Orka (2024 and beyond) #3686
Comments
Current Orka stateupdated on April 19, 2024
|
Next Orka stateupdated on April 22, 2024 Intel Nodes
ARM Nodes We assume that ARM Nodes can handle only 2 VMs and not +4 as Intel in the past due license limitations. This needs to be confirmed with support AFAIK?
How Nearform machines are "relocated"?
|
I don't think it's necessary to have two identical release machines. |
Are these typos? |
Great feedback @targos! I updated the tables
We have space for redundancy, but let's remove them for now.
I made a better reference for the "relocated" machines |
Actually, I think we should have one x64 and two arm64 machines, because there are two jobs that run on macos-arm64 during a release (osx11-release-pkg and osx11-arm64-release-tar). |
Some questions/thoughts/suggestions:
And the table lists And that table may be outdated as it seems as though MacOS 11 was EOL as of November 2023 ?
https://orkadocs.macstadium.com/docs/apple-arm-based-support confirms this:
#3592 My suggestion to avoid Jenkins worker decay is to lean into an ephemeral node strategy so that each build has a fresh Orka instance to run on. We can do that with the following Jenkins plugin for Orka: We would first need to set up a packer build process to create our VM images so that Orka would have a baseline image to create: The packer process can leverage our existing ansible playbooks: This strategy would require that we have an Orka3.0 cluster. Rather than trying to do an upgrade of the existing cluster, I propose that we ask macstadium to allow us to provision a new cluster with the resources we need in it (enough arm/intel backing nodes for our macos11/13 testing and release), get it built/provisioned and working, and then decommission/return all the existing macstadium/orka machines. I believe this would end up with us using roughly the same amount of resources, so should be palatable for macstadium to support this transition. |
+1 from me if Macstadium will support that |
Quick update from our last call with MacStadium: Next week we will have a new Orka cluster (v3) that includes 2 nodes (Intel and ARM):
Pending:Dependencies
✅ Setup Jenkins <-> Orka
Current status: Completed. ✅ Create Image templates
Current status: Completed. ✅ Trigger Ephemeral VMs from Jenkins
Current status: Completed Jobs and Agents Migration
Current status: @UlisesGascon working on the setup. Clean up
Other
Deadline Important We don't expect any downtime will doing the migration as we will have a new cluster working on isolation will the current system is in place until we are ready to transfer the operations to the new cluster and then decommission the HW. Challenges
|
Started in on this. The PATH variable is set on the existing macos machines via the script that launches the jenkins agent: I've added ARCH, DESTCPU, and PATH to the Environment variables to the Orka Cluster Cloud Template configurations on ci-release machine. The osx13-x64-release-tar job worked and signed the tarball, but failed to push the release to node-www, so, need to adjust that next. |
We need this config in the image: |
node-www also has a ufw2 firewall and will not allow connections from ip addresses not on the allowlist. |
I've added the main orka address to the ufw2 firewall on node-www (199.7.167.98) I've confirmed that this is the address that ephemeral nodes will all appear as to node-www. |
I've requested the new nodes from MacStadium to fill out the rest of our capacity, and got a response today that they are aiming to have the nodes installed by Wed, Oct 30th. |
Great to see the details and progress on this front. One thought is that once everything is landed it would be great to do a deep dive session for other build team members who are interested in learning a bit more about now it works. |
It's been a month now. Did we get the new nodes? |
I'm sorry to insist, but I don't know what else to do to move this forward :( |
@ryanaslett, @UlisesGascon any update on this? |
Hi, yes, we did get the nodes, but havent fully transitioned testing over to using them as there was still an open question about whether or not we had the right xcode and OS versions. Apologies that I didnt see your question earlier. I've been wrapping up some other OpenJS project stuff for the end of the year, but I can refocus on this once more to make sure its in a stable situation. |
Existing macOS machines (due to outdated clang/gcc versions) are blocking 4 pull-requests:
Should we start talking about lowering the support tier of macOS? The oldest PRs are from September 17 (almost 3 months ago). |
We are in the process of updating macOS to version 13 in the Jenkins CI, but unfortunately this is taking longer than expected. Add it to the GitHub actions test matrix so that we have some coverage. Refs: nodejs/build#3686
We are in the process of updating macOS to version 13 in the Jenkins CI, but unfortunately this is taking longer than expected. Add it to the GitHub actions test matrix so that we have some coverage. Refs: nodejs/build#3686 PR-URL: #56307 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
Some interesting news, coming from nodejs/node-v8#295 and a Slack chat with @joyeecheung:
That said, I suggest:
|
Note that officially (according to https://developer.apple.com/download/applications/), Xcode 16.1 requires at least macOS 14.5 to run, and according to Wikipedia, Xcode 16.0 did too. So I don't know how the |
I left my machine that has macOS 13 + Apple Clang 14 now so can't provide more details until after the holidays but FWIW: when I tried to install the latest system update for 13, the only available update was upgrading to Sequoia, and nothing else showed up when I tried to look for last compatible update of XCode or command line tools with App Store or Software Update/softwareupdate --list. If somehow it is possible to run macOS 13 with XCode 16 we should likely need to document how to install it, or contributors on macOS 13 may have a hard time getting it to build (or if it just doesn't work then we need to tell contributors to upgrade to Sequoia). |
This is how we manually install Xcode on the build machines: https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#full-xcode |
Also my 2cents: V8 uses (almost) tip of tree clang, so that's currently clang 20, and they have been doing a lot of C++ modernization that lower versions of clang aren't very good at parsing. I did quite a few patching to make V8 build on macOS 13 and Clang 14 in https://github.com/joyeecheung/node/tree/fix-macos-13 and many of the fixes don't look very acceptable in the upstream because they basically just revert the modernization. If we are upgrading the build system the least friction route would probably be to just require Sequoia and XCode 16 to build, though we can keep targeting 11. The lower macOS version we need to support, the harder it is to install higher versions of Apple Clang on it, and the C++ feature gap will keep widening as V8 uses ToT Clang. |
For the new Orka machines, we are using Packer, and the instructions include some manual steps on how to install it that are replicable for local machines as well: We probably want to update the commands and ensure that we are using the correct version 👍 |
We are in the process of updating macOS to version 13 in the Jenkins CI, but unfortunately this is taking longer than expected. Add it to the GitHub actions test matrix so that we have some coverage. Refs: nodejs/build#3686 PR-URL: #56307 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
Is there any update/progress on this issue? |
Let me ping @ryanaslett! AFAIK we were testing the new ephemeral instances and waiting for a HW upgrade in the new cluster so we can decommission the old VMs and move all the workloads for both CI environments, but not sure if this was completed or not. |
I plan to work on it during the weekend, so I can provide a good overview on the next build meeting on Tuesday.
Current tasks on MacOS infra
Blocked until ARM nodes are provided
The text was updated successfully, but these errors were encountered: