Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod sometimes stuck with first init container #5

Open
schu opened this issue Jul 18, 2018 · 4 comments
Open

Pod sometimes stuck with first init container #5

schu opened this issue Jul 18, 2018 · 4 comments

Comments

@schu
Copy link
Collaborator

schu commented Jul 18, 2018

Pods by default get a check-dns init container. Sometimes a new pod doesn't get past that first init container and hangs:

NAME                                                   READY     STATUS     RESTARTS   AGE
example-sensu-cluster-gzwbrntcd6                       0/2       Init:0/3   0          4m

Example description (kubectl describe pod ...):

Init Containers:
  check-dns:
    Container ID:  docker://5a1629435ad0067359b80ff5c82c8f058aa90f8a03cd36bd835751eb6da34340
    Image:         busybox:1.28.0-glibc
    Image ID:      docker-pullable://busybox@sha256:0b55a30394294ab23b9afd58fab94e61a923f5834fba7ddbae7f8e0c11ba85e6
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      
          TIMEOUT_READY=0
          while ( ! nslookup example-sensu-cluster-gzwbrntcd6.example-sensu-cluster.default.svc )
          do
            # If TIMEOUT_READY is 0 we should never time out and exit
            TIMEOUT_READY=$(( TIMEOUT_READY-1 ))
                        if [ $TIMEOUT_READY -eq 0 ];
                                  then
                                      echo "Timed out waiting for DNS entry"
                                      exit 1
                                  fi
                              sleep 1
                            done
    State:          Running
      Started:      Wed, 18 Jul 2018 14:06:05 +0200
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:         <none>

The response from kube-dns is NXDOMAIN but that's also the case for a successful setup and nslookup still returns 0 then:

Server:		10.96.0.10
Address:	10.96.0.10:53

** server can't find example-sensu-cluster-gzwbrntcd6.example-sensu-cluster.default.svc: NXDOMAIN

*** Can't find example-sensu-cluster-gzwbrntcd6.example-sensu-cluster.default.svc: No answer

I see the issue happening for example after creating a new cluster and doing a restore:

kubectl apply -f example/example-sensu-cluster.yaml
./example/restore-operator/restore-backup --cluster-name=example-sensu-cluster --aws-bucket-name=sensu-backup-test --backup-name=sensu-cluster-backup-1531893564

If the pod gets stuck, I redo the restore operation and it usually works then:

kubectl delete sensurestore example-sensu-cluster
./example/restore-operator/restore-backup --cluster-name=example-sensu-cluster --aws-bucket-name=sensu-backup-test --backup-name=sensu-cluster-backup-1531893564
@schu
Copy link
Collaborator Author

schu commented Aug 8, 2018

So the NXDOMAIN above actually is an unrelated problem due to docker-library/busybox#48 But the operator uses an older image, busybox:1.28.0-glibc, so this issue is caused by something else.

kubectl run --rm -ti --restart=Never --image=busybox:1.28.0-glibc busybox can be used to test.

@iaguis
Copy link
Contributor

iaguis commented Aug 8, 2018

Have you tried using -type=a like suggested in https://bugs.busybox.net/show_bug.cgi?id=11161#c4? Maybe it's worth a try?

@schu
Copy link
Collaborator Author

schu commented Aug 8, 2018

No I haven't. I meant to say: docker-library/busybox#48 is not an issue for us, since we use an older image, so I don't think we should need the -type=a workaround.

@schu
Copy link
Collaborator Author

schu commented Aug 8, 2018

I haven't managed to reproduce the bug yet today in a few attempts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants