Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hostagent] Driver stopped due to error: "usernet unable to resolve IP for SSH forwarding" #1620

Closed
fyuan1316 opened this issue Jun 12, 2023 · 11 comments · Fixed by #1625
Closed

Comments

@fyuan1316
Copy link
Contributor

Description

Description:

❯ limactl start lima
INFO[0000] Using the existing instance "lima"
INFO[0000] Attempting to download the nerdctl archive    arch=x86_64 digest="sha256:3bb5f5358ee2c3bd9097e0bf37649c1775e2449094e75acf629f129ec2c7915f" location="https://github.com/containerd/nerdctl/releases/download/v1.4.0/nerdctl-full-1.4.0-linux-amd64.tar.gz"
INFO[0000] Using cache "/Users/yuan/Library/Caches/lima/download/by-url-sha256/7f7d5c10ef2e2b350e26ceca17d2e3a7ce9993c867ce53f0a1114aa0264bfb4b/data"
INFO[0000] [hostagent] Replacing "http_proxy" value "http://127.0.0.1:7890" with "http://192.168.5.2:7890"
INFO[0000] [hostagent] Replacing "https_proxy" value "http://127.0.0.1:7890" with "http://192.168.5.2:7890"
INFO[0006] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/yuan/.lima/lima/serial.log")
INFO[0006] [hostagent] new connection from  to
INFO[0006] SSH Local Port: 60001
INFO[0006] [hostagent] [VZ] - vm state change: running
INFO[0006] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0016] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0026] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0037] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0047] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0057] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0066] [hostagent] Driver stopped due to error: "usernet unable to resolve IP for SSH forwarding"
INFO[0066] [hostagent] Shutting down the host agent
WARN[0066] [hostagent] failed to exit SSH master
INFO[0066] [hostagent] Shutting down VZ
INFO[0067] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
FATA[0071] exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see "/Users/yuan/.lima/lima/ha.stderr.log")

Link to tracking Issue:
#1200

@balajiv113
Copy link
Member

@fyuan1316
Could you share the yaml configuration and system information

@fyuan1316
Copy link
Contributor Author

Thanks for your quick reply 😄

vmType: "vz"
# This example requires Lima v0.8.0 or later
images:
# Try to use release-yyyyMMdd image if available. Note that release-yyyyMMdd will be removed after several months.
- location: "https://cloud-images.ubuntu.com/releases/22.04/release-20230518/ubuntu-22.04-server-cloudimg-amd64.img"
  arch: "x86_64"
  digest: "sha256:afb820a9260217fd4c5c5aacfbca74aa7cd2418e830dc64ca2e0642b94aab161"
- location: "https://cloud-images.ubuntu.com/releases/22.04/release-20230518/ubuntu-22.04-server-cloudimg-arm64.img"
  arch: "aarch64"
  digest: "sha256:b47f8be40b5f91c37874817c3324a72cea1982a5fdad031d9b648c9623c3b4e2"
# Fallback to the latest release image.
# Hint: run `limactl prune` to invalidate the cache
- location: "https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img"
  arch: "x86_64"
- location: "https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-arm64.img"
  arch: "aarch64"

cpus: 4
memory: 4GiB
disk: 100GiB
ssh:
  localPort: 60001
  loadDotSSHPubKeys: true
mounts:
- location: "~"
  writable: true
- location: "/tmp/lima"
  writable: true
# containerd is managed by Docker, not by Lima, so the values are set to false here.
containerd:
  system: false
  user: true
provision:
- mode: system
  # This script defines the host.docker.internal hostname when hostResolver is disabled.
  # It is also needed for lima 0.8.2 and earlier, which does not support hostResolver.hosts.
  # Names defined in /etc/hosts inside the VM are not resolved inside containers when
  # using the hostResolver; use hostResolver.hosts instead (requires lima 0.8.3 or later).
  script: |
    #!/bin/sh
    sed -i 's/host.lima.internal.*/host.lima.internal host.docker.internal/' /etc/hosts
- mode: system
  script: |
    #!/bin/bash
    set -eux -o pipefail
    command -v docker >/dev/null 2>&1 && exit 0
    if [ ! -e /etc/systemd/system/docker.socket.d/override.conf ]; then
      mkdir -p /etc/systemd/system/docker.socket.d
      # Alternatively we could just add the user to the "docker" group, but that requires restarting the user session
      cat <<-EOF >/etc/systemd/system/docker.socket.d/override.conf
      [Socket]
      SocketUser=${LIMA_CIDATA_USER}
    EOF
    fi
    export DEBIAN_FRONTEND=noninteractive
    curl -fsSL https://get.docker.com | sh
probes:
- script: |
    #!/bin/bash
    set -eux -o pipefail
    if ! timeout 30s bash -c "until command -v docker >/dev/null 2>&1; do sleep 3; done"; then
      echo >&2 "docker is not installed yet"
      exit 1
    fi
    if ! timeout 30s bash -c "until pgrep dockerd; do sleep 3; done"; then
      echo >&2 "dockerd is not running"
      exit 1
    fi
  hint: See "/var/log/cloud-init-output.log". in the guest
hostResolver:
  # hostResolver.hosts requires lima 0.8.3 or later. Names defined here will also
  # resolve inside containers, and not just inside the VM itself.
  hosts:
    host.docker.internal: host.lima.internal
portForwards:
- guestSocket: "/var/run/docker.sock"
  hostSocket: "{{.Dir}}/sock/docker.sock"
message: |
  To run `docker` on the host (assumes docker-cli is installed), run the following commands:
  ------
  docker context create lima-{{.Name}} --docker "host=unix://{{.Dir}}/sock/docker.sock"
  docker context use lima-{{.Name}}
  docker run hello-world
  ------

mountType: "virtiofs"

networks:
- vzNAT: true

@fyuan1316
Copy link
Contributor Author

I compared the difference between the ha.stdout.log file after normal and abnormal startup. I then looked at the hostagent related code and wondered if the problem might be in the ResolveAndForwardSSH method?

FYI

# success
{"time":"2023-06-12T18:02:07.957353+08:00","status":{"sshLocalPort":60001}}
# fail
{"time":"2023-06-12T18:03:13.18735+08:00","status":{"exiting":true}}

@balajiv113
Copy link
Member

@fyuan1316 If you are using lima from source (current master), could you try adding

video:
  display: "vz"

This will help to debug this issue a lot easier.

problem might be in the ResolveAndForwardSSH method

Hopefully not, this error can also be thrown if the vm itself is not started.

@fyuan1316
Copy link
Contributor Author

Sorry, just to add the scene behind the question.
The current error is not reported when the vm is created for the first time, but occurs when limactl start <vm> occurs again after limactl stop <vm>.

@fyuan1316
Copy link
Contributor Author

vmType: "vz"
video:
  display: "vz"
# This example requires Lima v0.8.0 or later
images:
# Try to use release-yyyyMMdd image if available. Note that release-yyyyMMdd will be removed after several months.
- location: "https://cloud-images.ubuntu.com/releases/22.04/release-20230518/ubuntu-22.04-server-cloudimg-amd64.img"
  arch: "x86_64"
...

and command line output

❯ limactl start lima
INFO[0000] Using the existing instance "lima"
WARN[0000] Ignoring: vmType vz: [Video]
INFO[0000] Attempting to download the nerdctl archive    arch=x86_64 digest="sha256:3bb5f5358ee2c3bd9097e0bf37649c1775e2449094e75acf629f129ec2c7915f" location="https://github.com/containerd/nerdctl/releases/download/v1.4.0/nerdctl-full-1.4.0-linux-amd64.tar.gz"
INFO[0000] Using cache "/Users/yuan/Library/Caches/lima/download/by-url-sha256/7f7d5c10ef2e2b350e26ceca17d2e3a7ce9993c867ce53f0a1114aa0264bfb4b/data"
INFO[0004] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/yuan/.lima/lima/serial.log")
INFO[0004] [hostagent] new connection from  to
INFO[0004] SSH Local Port: 60001
INFO[0004] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0004] [hostagent] [VZ] - vm state change: running
INFO[0015] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0025] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0035] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0045] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0055] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0065] [hostagent] Driver stopped due to error: "usernet unable to resolve IP for SSH forwarding"
INFO[0065] [hostagent] Shutting down the host agent
WARN[0065] [hostagent] failed to exit SSH master
INFO[0065] [hostagent] Shutting down VZ
INFO[0065] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
FATA[0070] exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see "/Users/yuan/.lima/lima/ha.stderr.log")

@fyuan1316
Copy link
Contributor Author

This problem can be reproduced 100% in my environment at present.

After digging further into the ResolveAndForwardSSH code, I made a small modification and found that this problem no longer occurs.

I guess the timeout here is too short, we can add another try to the existing timeout.

timeout := time.After(1 * time.Minute) //  too short ? 
ticker := time.NewTicker(500 * time.Millisecond)

pr #1625

@fyuan1316
Copy link
Contributor Author

The following is the command line output when using the installed limactl and my locally compiled limactl to execute the startup respectively

❯ limactl start lima
INFO[0000] Using the existing instance "lima"
INFO[0000] Attempting to download the nerdctl archive    arch=x86_64 digest="sha256:3bb5f5358ee2c3bd9097e0bf37649c1775e2449094e75acf629f129ec2c7915f" location="https://github.com/containerd/nerdctl/releases/download/v1.4.0/nerdctl-full-1.4.0-linux-amd64.tar.gz"
INFO[0000] Using cache "/Users/yuan/Library/Caches/lima/download/by-url-sha256/7f7d5c10ef2e2b350e26ceca17d2e3a7ce9993c867ce53f0a1114aa0264bfb4b/data"
INFO[0007] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/yuan/.lima/lima/serial.log")
INFO[0007] [hostagent] new connection from  to
INFO[0008] SSH Local Port: 60001
INFO[0008] [hostagent] [VZ] - vm state change: running
INFO[0008] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0018] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0028] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0038] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0048] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0058] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0068] [hostagent] Driver stopped due to error: "usernet unable to resolve IP for SSH forwarding"
INFO[0068] [hostagent] Shutting down the host agent
WARN[0068] [hostagent] failed to exit SSH master
INFO[0068] [hostagent] Shutting down VZ
INFO[0068] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
FATA[0073] exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see "/Users/yuan/.lima/lima/ha.stderr.log")
❯ /Users/yuan/Dev/GolandProjects/lima/_output/bin/limactl start lima
INFO[0000] Using the existing instance "lima"
INFO[0000] Attempting to download the nerdctl archive    arch=x86_64 digest="sha256:3bb5f5358ee2c3bd9097e0bf37649c1775e2449094e75acf629f129ec2c7915f" location="https://github.com/containerd/nerdctl/releases/download/v1.4.0/nerdctl-full-1.4.0-linux-amd64.tar.gz"
INFO[0000] Using cache "/Users/yuan/Library/Caches/lima/download/by-url-sha256/7f7d5c10ef2e2b350e26ceca17d2e3a7ce9993c867ce53f0a1114aa0264bfb4b/data"
INFO[0003] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/yuan/.lima/lima/serial.log")
INFO[0003] [hostagent] new connection from  to
INFO[0004] SSH Local Port: 60001
INFO[0004] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0004] [hostagent] [VZ] - vm state change: running
INFO[0014] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0017] [hostagent] 2023/06/13 11:19:35 tcpproxy: for incoming conn 127.0.0.1:60969, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0027] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0030] [hostagent] 2023/06/13 11:19:49 tcpproxy: for incoming conn 127.0.0.1:60986, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0040] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0043] [hostagent] 2023/06/13 11:20:02 tcpproxy: for incoming conn 127.0.0.1:61007, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0053] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0056] [hostagent] 2023/06/13 11:20:15 tcpproxy: for incoming conn 127.0.0.1:61046, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0066] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0069] [hostagent] 2023/06/13 11:20:28 tcpproxy: for incoming conn 127.0.0.1:61059, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0079] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0082] [hostagent] 2023/06/13 11:20:41 tcpproxy: for incoming conn 127.0.0.1:61079, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0092] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0095] [hostagent] 2023/06/13 11:20:54 tcpproxy: for incoming conn 127.0.0.1:61094, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0105] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0106] [hostagent] The essential requirement 1 of 3 is satisfied
INFO[0106] [hostagent] Waiting for the essential requirement 2 of 3: "user session is ready for ssh"
INFO[0119] [hostagent] Waiting for the essential requirement 2 of 3: "user session is ready for ssh"
INFO[0119] [hostagent] The essential requirement 2 of 3 is satisfied
INFO[0119] [hostagent] Waiting for the essential requirement 3 of 3: "the guest agent to be running"
INFO[0120] [hostagent] The essential requirement 3 of 3 is satisfied
INFO[0120] [hostagent] Waiting for the optional requirement 1 of 3: "systemd must be available"
INFO[0120] [hostagent] Forwarding "/var/run/docker.sock" (guest) to "/Users/yuan/.lima/lima/sock/docker.sock" (host)
INFO[0120] [hostagent] Forwarding "/run/lima-guestagent.sock" (guest) to "/Users/yuan/.lima/lima/ga.sock" (host)
INFO[0120] [hostagent] The optional requirement 1 of 3 is satisfied
INFO[0120] [hostagent] Waiting for the optional requirement 2 of 3: "containerd binaries to be installed"
INFO[0120] [hostagent] Not forwarding TCP 127.0.0.53:53
INFO[0120] [hostagent] Not forwarding TCP 0.0.0.0:22
INFO[0120] [hostagent] The optional requirement 2 of 3 is satisfied
INFO[0120] [hostagent] Not forwarding TCP [::]:22
INFO[0120] [hostagent] Waiting for the optional requirement 3 of 3: "user probe 1/1"
INFO[0120] [hostagent] The optional requirement 3 of 3 is satisfied
INFO[0120] [hostagent] Waiting for the final requirement 1 of 1: "boot scripts must have finished"
INFO[0120] [hostagent] The final requirement 1 of 1 is satisfied
INFO[0120] READY. Run `limactl shell lima` to open the shell.
INFO[0120] Message from the instance "lima":
To run `docker` on the host (assumes docker-cli is installed), run the following commands:
------
docker context create lima-lima --docker "host=unix:///Users/yuan/.lima/lima/sock/docker.sock"
docker context use lima-lima
docker run hello-world
------
❯ limactl ls
NAME    STATUS     SSH                VMTYPE    ARCH      CPUS    MEMORY    DISK      DIR
lima    Running    127.0.0.1:60001    vz        x86_64    4       4GiB      100GiB    ~/.lima/lima

@fyuan1316
Copy link
Contributor Author

If there is a better way, I would very much like to know.
WDYT? @balajiv113

@balajiv113
Copy link
Member

@fyuan1316 Increasing timeout looks good. Looks like we have set a best case time to fail fast.

Could you share you system info ?? Is it a CI machine or a normal macOS ?? Am just curious as it works fine in mine and i suspect the slowness is mostly because of the machine maybe

@fyuan1316
Copy link
Contributor Author

fyuan1316 commented Jun 13, 2023

macbook pro 2019

Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Radeon Pro 555X 4 GB
Intel UHD Graphics 630 1536 MB
16 GB 2400 MHz DDR4

macos: 13.3.1 (a) (22E772610a)

Possibly hardware related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants