Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout getting IP address from DHCP leases with macOS GitHub actions runner #1331

Closed
astefanutti opened this issue Jun 22, 2020 · 9 comments
Labels
kind/bug Something isn't working status/stale Issue went stale; did not receive attention or no reply from the OP

Comments

@astefanutti
Copy link

Running CRC within the macOS GitHub actions runner leads to inconsistent results. While it works from time to time, it often fails with the error:

(crc) DBG | Searching for e:f0:e3:c7:4f:be in /var/db/dhcpd_leases ...
(crc) DBG | error: Temporary Error: open /var/db/dhcpd_leases: no such file or directory - sleeping 2s
level=error msg="Error creating host: Error creating the VM: Error creating machine: Error in driver during machine creation: IP address never found in dhcp leases file Temporary Error: open /var/db/dhcpd_leases: no such file or directory

It seems the following retry loop is not long enough:

https://github.com/code-ready/machine-driver-hyperkit/blob/c592b26d73624e85e1bdb7d962c6f0cea99740e1/pkg/hyperkit/driver.go#L259

It may be due to the 8GB memory constraint that's set on GH actions environment. It may lead to Hyperkit being slow to start.

General information

  • OS: macOS
  • Hypervisor: hyperkit
  • Did you run crc setup before starting it (Yes/No)? Yes
  • Running CRC on: VM, GitHub Actions macOS hosted runner

CRC version

1.11.0

Host Operating System

$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.5
BuildVersion:	19F101

Steps to reproduce

Run the following workflow:

name: CRC

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:
  crc:
    runs-on: macos-latest

    steps:
      - name: Info
        run: |
          df -h
          sysctl hw.memsize

      - name: Checkout code
        uses: actions/checkout@v2

      - name: Setup CRC
        env:
          PULL_SECRET: ${{ secrets.PULL_SECRET }}
        run: |
          curl -L https://mirror.openshift.com/pub/openshift-v4/clients/crc/1.11.0/crc-macos-amd64.tar.xz -o crc-macos-amd64.tar.xz
          tar xf crc-macos-amd64.tar.xz
          rm crc-macos-amd64.tar.xz
          cd crc-macos-1.11.0-amd64/

          ./crc setup
          ./crc config view

          echo $PULL_SECRET > pull.secret
          ./crc start -p pull.secret --log-level debug
          eval $(./crc oc-env)

          ./crc stop
          ./crc delete

Expected

Creating machine...
(crc) DBG | exe=/Users/runner/.crc/bin/crc-driver-hyperkit uid=0
(crc) Downloading /Users/runner/.crc/cache/crc.disk from file:///Users/runner/.crc/cache/crc_hyperkit_4.4.5/crc.qcow2...
(crc) DBG | exe=/Users/runner/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | clean start, hyperkit pid file doesn't exist: /Users/runner/.crc/machines/crc/hyperkit.pid
(crc) DBG | Using hyperkit binary from /Users/runner/.crc/bin/hyperkit
(crc) DBG | Using UUID c3d68012-0208-11ea-9fd7-f2189899ab08
(crc) DBG | Generated MAC e:f0:e3:c7:4f:be
(crc) DBG | Starting with cmdline: BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-cf41684b1dbabaaed5a639c1f7f0de24e446ac749963eded81a9fe5373d52831/vmlinuz-4.18.0-147.8.1.el8_1.x86_64 rhcos.root=crypt_rootfs console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu rd.luks.options=discard ostree=/ostree/boot.1/rhcos/cf41684b1dbabaaed5a639c1f7f0de24e446ac749963eded81a9fe5373d52831/0
(crc) DBG | Trying to execute /Users/runner/.crc/bin/hyperkit -A -u -F /Users/runner/.crc/machines/crc/hyperkit.pid -c 4 -m 9216M -s 0:0,hostbridge -s 31,lpc -s 1:0,virtio-net -U c3d68012-0208-11ea-9fd7-f2189899ab08 -s 2:0,virtio-blk,file:///Users/runner/.crc/machines/crc/crc.disk,format=qcow -s 3,virtio-rnd -l com1,autopty=/Users/runner/.crc/machines/crc/tty,log=/Users/runner/.crc/machines/crc/console-ring -f kexec,/Users/runner/.crc/cache/crc_hyperkit_4.4.5/vmlinuz-4.18.0-147.8.1.el8_1.x86_64,/Users/runner/.crc/cache/crc_hyperkit_4.4.5/initramfs-4.18.0-147.8.1.el8_1.x86_64.img,earlyprintk=serial BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-cf41684b1dbabaaed5a639c1f7f0de24e446ac749963eded81a9fe5373d52831/vmlinuz-4.18.0-147.8.1.el8_1.x86_64 rhcos.root=crypt_rootfs console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu rd.luks.options=discard ostree=/ostree/boot.1/rhcos/cf41684b1dbabaaed5a639c1f7f0de24e446ac749963eded81a9fe5373d52831/0
(crc) DBG | exe=/Users/runner/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | hyperkit pid from json: 1068
(crc) DBG | Searching for e:f0:e3:c7:4f:be in /var/db/dhcpd_leases ...
(crc) DBG | error: Temporary Error: open /var/db/dhcpd_leases: no such file or directory - sleeping 2s
(crc) DBG | retry loop 1
...
(crc) DBG | error: Temporary Error: open /var/db/dhcpd_leases: no such file or directory - sleeping 2s
(crc) DBG | retry loop 16
(crc) DBG | exe=/Users/runner/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | hyperkit pid from json: 1068
(crc) DBG | Searching for e:f0:e3:c7:4f:be in /var/db/dhcpd_leases ...
(crc) DBG | Found 1 entries in /var/db/dhcpd_leases!
(crc) DBG | dhcp entry: {Name:crc-m27h4-master-0 IPAddress:192.168.64.2 HWAddress:e:f0:e3:c7:4f:be ID:1,e:f0:e3:c7:4f:be Lease:0x5ef1cf09}
(crc) DBG | Found match: e:f0:e3:c7:4f:be
(crc) DBG | IP: 192.168.64.2
(crc) Calling .GetConfigRaw
(crc) Calling .DriverName
(crc) Calling .DriverName
(crc) Calling .GetState
Waiting for machine to be running, this may take a few minutes...
(crc) DBG | exe=/Users/runner/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | hyperkit pid from json: 1068
Machine successfully created
Machine is up and running!

Actual

Creating machine...
(crc) DBG | exe=/Users/runner/.crc/bin/crc-driver-hyperkit uid=0
(crc) Downloading /Users/runner/.crc/cache/crc.disk from file:///Users/runner/.crc/cache/crc_hyperkit_4.4.5/crc.qcow2...
(crc) DBG | exe=/Users/runner/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | clean start, hyperkit pid file doesn't exist: /Users/runner/.crc/machines/crc/hyperkit.pid
(crc) DBG | Using hyperkit binary from /Users/runner/.crc/bin/hyperkit
(crc) DBG | Using UUID c3d68012-0208-11ea-9fd7-f2189899ab08
(crc) DBG | Generated MAC e:f0:e3:c7:4f:be
(crc) DBG | Starting with cmdline: BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-cf41684b1dbabaaed5a639c1f7f0de24e446ac749963eded81a9fe5373d52831/vmlinuz-4.18.0-147.8.1.el8_1.x86_64 rhcos.root=crypt_rootfs console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu rd.luks.options=discard ostree=/ostree/boot.1/rhcos/cf41684b1dbabaaed5a639c1f7f0de24e446ac749963eded81a9fe5373d52831/0
(crc) DBG | Trying to execute /Users/runner/.crc/bin/hyperkit -A -u -F /Users/runner/.crc/machines/crc/hyperkit.pid -c 4 -m 9216M -s 0:0,hostbridge -s 31,lpc -s 1:0,virtio-net -U c3d68012-0208-11ea-9fd7-f2189899ab08 -s 2:0,virtio-blk,file:///Users/runner/.crc/machines/crc/crc.disk,format=qcow -s 3,virtio-rnd -l com1,autopty=/Users/runner/.crc/machines/crc/tty,log=/Users/runner/.crc/machines/crc/console-ring -f kexec,/Users/runner/.crc/cache/crc_hyperkit_4.4.5/vmlinuz-4.18.0-147.8.1.el8_1.x86_64,/Users/runner/.crc/cache/crc_hyperkit_4.4.5/initramfs-4.18.0-147.8.1.el8_1.x86_64.img,earlyprintk=serial BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-cf41684b1dbabaaed5a639c1f7f0de24e446ac749963eded81a9fe5373d52831/vmlinuz-4.18.0-147.8.1.el8_1.x86_64 rhcos.root=crypt_rootfs console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu rd.luks.options=discard ostree=/ostree/boot.1/rhcos/cf41684b1dbabaaed5a639c1f7f0de24e446ac749963eded81a9fe5373d52831/0
(crc) DBG | exe=/Users/runner/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | hyperkit pid from json: 1100
(crc) DBG | Searching for e:f0:e3:c7:4f:be in /var/db/dhcpd_leases ...
(crc) DBG | error: Temporary Error: open /var/db/dhcpd_leases: no such file or directory - sleeping 2s
(crc) DBG | retry loop 1
...
crc) DBG | retry loop 59
(crc) DBG | exe=/Users/runner/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | hyperkit pid from json: 1100
(crc) DBG | Searching for e:f0:e3:c7:4f:be in /var/db/dhcpd_leases ...
(crc) DBG | error: Temporary Error: open /var/db/dhcpd_leases: no such file or directory - sleeping 2s
level=error msg="Error creating host: Error creating the VM: Error creating machine: Error in driver during machine creation: IP address never found in dhcp leases file Temporary Error: open /var/db/dhcpd_leases: no such file or directory
@gbraad
Copy link
Contributor

gbraad commented Jun 23, 2020

Not completely sure what is happening, but understand that the macOS version has not been tested to run as a headless version. if the github actions runner is using a different account, and therefore less permissive, I could understand why this fails:

level=error msg="Error creating host: Error creating the VM: Error creating machine: Error in driver during machine creation: IP address never found in dhcp leases file Temporary Error: open /var/db/dhcpd_leases: no such file or directory

sounds like a VM bring up issue; the driver waits, but the VM is never started?

Can you start the VM in the regular way without the github runner? If so, please check the permission granted. To test a CI-like environment consider using the Linux version instead as this has been tested to run headless.

@astefanutti
Copy link
Author

astefanutti commented Jun 23, 2020

Unfortunately, nested virtualisation is not supported by the GitHub actions Linux hosted runner, so the pre-flight check for virtualisation enabled fails.

It is surprising that it works with the macOS runner around 20% of the time. I haven't identified any differences in the environment, so I wonder whether this could be the number of attempts in the retry loop that's not enough given the limited resources (2 vCPUs, 8Gb):

https://github.com/code-ready/machine-driver-hyperkit/blob/c592b26d73624e85e1bdb7d962c6f0cea99740e1/pkg/hyperkit/driver.go#L259

@gbraad
Copy link
Contributor

gbraad commented Jun 23, 2020

pre-flight check for virtualisation enabled fails.

preflight checks can be disabled, but eventually this would still fail. but why would it needed nested... I would expect it would run on the host directly?

resources could surely be an issue, as the default is now 9GB and it is suggested to even assign more.

@astefanutti
Copy link
Author

GitHub actions hosted runners are virtual machines: https://help.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners#cloud-hosts-for-github-hosted-runners. If I understand it correctly, the Standard_DS2_v2 machines lack the instructions that are required for nested virtualisation.

I agree resources will ultimately be an issue. I haven't found CI hosted environments that provide 16GB of memory :(

The fact that it sometimes works was giving me little hope. But I'll trust you if you tell me there is no change this is ever going to work reliably on that kind of environment.

@astefanutti
Copy link
Author

astefanutti commented Jun 23, 2020

For information, sysctl hw.memsize returns 12884901888, that is 12GB, on the macOS GitHub actions runner VM.

@gbraad
Copy link
Contributor

gbraad commented Jun 23, 2020

Hmmm... I think I know. IIRC we initially ran into the same issues when spiking on using Github Actions. We ultimately decided that this woud not work for us due to the lack of Nested Virtualization to run tests... and yes, the resource usage (and storage) where of similar concern. The use of GitLab Runners seemed more managebale but meant to maintain our own infra, which would in that case not preovide any benefits for a public CI. So, yes... this sounds all of a sudden very familiar.

this was all part of our search to find a replacement CI for public use, to replace CentOS CI. Eventually we moved to OpenShift CI as that would also make more sense... but that won't help you. So... there is no real public option.

@gbraad
Copy link
Contributor

gbraad commented Jun 23, 2020

@praveenkumar You might have an idea?

@praveenkumar
Copy link
Member

As per https://help.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners#supported-runners-and-hardware-resources this is really a kind of resource issue but missing /var/db/dhcpd_leases is something also mentioned in #882 (comment)

@stale
Copy link

stale bot commented Aug 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale Issue went stale; did not receive attention or no reply from the OP label Aug 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working status/stale Issue went stale; did not receive attention or no reply from the OP
Projects
None yet
Development

No branches or pull requests

3 participants