Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orka node (macpro-6) replacement #3415

Closed
UlisesGascon opened this issue Jul 7, 2023 · 5 comments
Closed

Orka node (macpro-6) replacement #3415

UlisesGascon opened this issue Jul 7, 2023 · 5 comments

Comments

@UlisesGascon
Copy link
Member

I was redeploying macos10.15-x64-2 and checking the Orka VMs there are some issues:

Xnapper-2023-07-07-12 21 29

  • The release-macos11-x64-1 is allocated in the wrong node macpro-5, but keeping the correct IP from macpro-4 node (where is should be located).
  • Seems like all the machines were re-deployed yesterday like in a batch process (due the time).

Note that I redeployed macos10.15-x64-2.

Not sure if also this was the root cost for #3413. Should I open a ticket to support regarding this? The most tricky problem here is if we change the VMs allocation the inventory won't match the expected VMs while doing SSH and so on. Or running Jenkins jobs in incorrect MacOS versions or machine type (release / test).

@UlisesGascon
Copy link
Member Author

Xnapper-2023-07-07-19 09 00

Seems like the system again redeployed the VMs ? and two of them are pending and currently offline:

I asume that release machines are down again but It will require confirmation.

@richardlau and I were restoring the VMs few hours ago:

@richardlau
Copy link
Member

@richardlau and I were restoring the VMs few hours ago:

Just to set expectations, I'm not currently looking at this -- I just opened the issue to report the offline VMs. I do not have access to Apple developer account (either my own or build's) so someone else will need to help out with the full Xcode install if the release VM needs to be rebuilt.

@UlisesGascon UlisesGascon changed the title Orka Nodes IPs missconfiguration Important: Orka instability Jul 10, 2023
@UlisesGascon
Copy link
Member Author

UlisesGascon commented Jul 10, 2023

TL;DR

Orka nodes are having some issues that made the VMs to be down until one node is physically replaced and we can re-deploy/restore the VMs. This is affecting releases and test environment as all the macos-11-x64-* machines are down (ARM machines are not affected).

Our current system impact

We started to detect some weird situation with the machines on Friday, we tried to recover the machines but after few hours they were down again.

All the VMs related to orka are currently down, these are the machines affected:

  • release-macos10.15-x64-1
  • release-macos11-x64-1
  • test-macos10.15-x64-1
  • test-macos10.15-x64-2
  • test-macos11-x64-1
  • test-macos11-x64-2
  • test-macos12-x64-1

This situation is making any Jenkins pipeline depending on osx11-x64 label to be pending or fail as there are no available nodes, as an example node-test-commit-osx

Orka Support feedback

There are several communications from the support team with us in the ticket SERVICE-164961:

Hi,

This is due to an issue regarding Docker images being removed from some of our source control code bases. We can fix this by upgrading your environment. We have moved our internal routes to a different repository.

Would you like me to conduct an upgrade today?


We have noticed the decline of the cluster's health and are looking at ways to fix this. Also, your macpro has crashed which out DC is looking at.


Hi,

We have upgraded your environment to a more stable version. This will avoid all docker issues. We are currently looking to replace .16 one of your worker nodes. This action will be carried out Monday. Let me know if you have any questions.


By checking Orka we have all the VMs waiting for being deployed (expected after upgrade)

Captura de pantalla 2023-07-10 a las 8 23 05

Also I can confirm that the Node macpro-6 is not present in the cluster
Captura de pantalla 2023-07-10 a las 8 23 22

Next steps

  • Wait for the confirmation from the support team that the node macpro-6 is available.
  • Check the macpro-6 specs
  • Redeploy the testing machines.
  • Redeploy the release machines (I will need help).

@UlisesGascon UlisesGascon changed the title Important: Orka instability Orka node (macpro-6) replacement Jul 10, 2023
@UlisesGascon
Copy link
Member Author

UlisesGascon commented Jul 11, 2023

I am redeploying and preparing the vms in macpro-6.

Captura de pantalla 2023-07-11 a las 9 23 43

FYI: the node name has changed to x64-macpro-6

@UlisesGascon
Copy link
Member Author

The test machines are back in the x64-macpro-6 node 🎉

Captura de pantalla 2023-07-11 a las 9 14 44

I will close the issue as the incident has been resolved, the missing VM test-macos10.15-x64-2 is discussed in #3218

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants