Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-driver-registrar pod reports "Connecting to /csi/csi.sock timed out" #36

Closed
darcyllingyan opened this issue Mar 21, 2019 · 14 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@darcyllingyan
Copy link

darcyllingyan commented Mar 21, 2019

Hi,
When I deployed the CSI driver, the node-driver-registrar pod couldn't be started and always reports the TRANSIENT_FAILURE error, whether there are some issues or restrictions for my cluster environment when using the CSI plugin?
Now the external provisioner, external attacher are all running, only the node-driver-registrar crash.
Based on below log, the csi driver socket is OK but the node-driver-registrar socket couldn't connect.

# kubectl get pod -n kube-system
NAME                                          READY   STATUS             RESTARTS   AGE
csi-attacher-cinderplugin-0                   2/2     Running            0          5h53m
csi-nodeplugin-cinderplugin-dj8w4             1/2     CrashLoopBackOff   6          17m
csi-nodeplugin-cinderplugin-sfrqf             1/2     CrashLoopBackOff   6          17m
csi-provisioner-cinderplugin-0                2/2     Running            0          5h53m
csi-snapshotter-cinder-0                      2/2     Running            0          5h53m

The node-driver-registrar log is below:

# kubectl logs csi-nodeplugin-cinderplugin-gm7sl -n kube-system node-driver-registrar
I0321 08:59:37.981108       1 main.go:111] Version: 
I0321 08:59:37.981219       1 main.go:118] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0321 08:59:37.981238       1 connection.go:69] Connecting to /csi/csi.sock
I0321 08:59:37.981533       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:37.982349       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:38.981960       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:38.982200       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:40.024028       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:40.024239       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:41.200354       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:41.200952       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:42.266468       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:42.266622       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:43.241629       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:43.241848       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:44.211730       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:44.211804       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:45.286693       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:46.113061       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:46.113212       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:46.975907       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:46.975991       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:47.815025       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:47.815552       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:48.735753       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:49.741918       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:49.742031       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:50.867745       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:51.753623       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:51.753767       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:52.705934       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:52.706004       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:53.633234       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:53.633507       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:54.620910       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:54.621000       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:55.534845       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:55.535080       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:56.451708       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:56.451879       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:57.523498       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:57.523625       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:58.411093       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:58.411253       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 08:59:59.292524       1 connection.go:96] Still trying, connection is CONNECTING
I0321 08:59:59.292616       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:00.237176       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:00.237323       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:01.265661       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:01.265814       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:02.411186       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:03.328529       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:03.328771       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:04.247522       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:04.247714       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:05.348857       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:05.348987       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:06.231515       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:06.231624       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:07.377978       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:07.378151       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:08.456791       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:08.456977       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:09.466570       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:10.278022       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:10.278267       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:11.141699       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:11.141731       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:12.184630       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:13.374929       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:13.375161       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:14.206948       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:15.245086       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:15.245589       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:16.068902       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:16.069127       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:17.146052       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:17.146131       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:18.066639       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:18.067036       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:18.936236       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:18.936701       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:19.952976       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:19.953258       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:20.970847       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:20.970957       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:21.882470       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:21.882509       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:22.851985       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:22.852068       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:23.864330       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:23.864465       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:24.766355       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:24.766704       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:25.679190       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:26.794854       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:26.794984       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:27.739730       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:27.739875       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:28.892300       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:29.811378       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:30.969299       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:30.969543       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:31.808533       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:32.999305       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:32.999488       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:33.829136       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:33.829275       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:34.718244       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:34.718459       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:35.790877       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:35.790983       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:36.687615       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:36.687817       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:37.612463       1 connection.go:96] Still trying, connection is CONNECTING
I0321 09:00:37.612718       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 09:00:37.981411       1 connection.go:89] Connection timed out
I0321 09:00:37.981436       1 main.go:126] Calling CSI driver to discover driver name.
I0321 09:00:37.981453       1 connection.go:137] GRPC call: /csi.v1.Identity/GetPluginInfo
I0321 09:00:37.981458       1 connection.go:138] GRPC request: {}
I0321 09:00:37.982497       1 connection.go:140] GRPC response: {}
I0321 09:00:37.982918       1 connection.go:141] GRPC error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
E0321 09:00:37.982934       1 main.go:131] rpc error: code = Unavailable desc = all SubConns are in TransientFailure

The CSI driver socket is below:

# kubectl logs -f csi-nodeplugin-cinderplugin-228sh -n kube-system cinder
I0321 03:19:18.516206       1 driver.go:56] Driver: cinder.csi.openstack.org version: 1.0.0
I0321 03:19:18.516722       1 driver.go:88] Enabling controller service capability: LIST_VOLUMES
I0321 03:19:18.516744       1 driver.go:88] Enabling controller service capability: CREATE_DELETE_VOLUME
I0321 03:19:18.516756       1 driver.go:88] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0321 03:19:18.516802       1 driver.go:88] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0321 03:19:18.516827       1 driver.go:88] Enabling controller service capability: LIST_SNAPSHOTS
I0321 03:19:18.516840       1 driver.go:100] Enabling volume access mode: SINGLE_NODE_WRITER
I0321 03:19:18.516885       1 driver.go:110] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0321 03:19:18.518095       1 server.go:108] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}

Thanks
Darcy

@darcyllingyan darcyllingyan changed the title node-driver-registrar pod reports "all SubConns are in TransientFailure" and couldn't be connected. node-driver-registrar pod reports "Connecting to /csi/csi.sock timed out" Mar 21, 2019
@msau42
Copy link
Collaborator

msau42 commented Mar 21, 2019

What version of the node driver registrar are you using?

@darcyllingyan
Copy link
Author

Hi @msau42 ,
I'm using v1.0.1 version.

Thanks
Darcy

@darcyllingyan
Copy link
Author

darcyllingyan commented Mar 21, 2019

Hi @msau42 ,
Using the same CSI driver, I have deployed in one cluster successfully, but after I change another cluster, the node-driver-registrar reports error.
The difference between the two clusters are the node number, ip address.., so maybe there is configuration restriction for the cluster when using CSI?
Do you have any suggestions? Thanks!

@msau42
Copy link
Collaborator

msau42 commented Mar 21, 2019

We recently made a fix to improve connection/reconnect handling: #29

As a temporary test, can you try using the "canary" image to see if that helps?

@darcyllingyan
Copy link
Author

OK, I will have a try now.

@darcyllingyan
Copy link
Author

Hi @msau42
It still doesn't wok and reports the same error, confusing with the issue. I'm blocked by the issue and whether there are some suggestions about how to debug? Thanks for any suggestion.

# kubectl logs -f csi-nodeplugin-cinderplugin-6bj7x -nkube-system node-driver-registrar
I0321 16:32:14.803533       1 main.go:108] Version: v1.0.2-4-g27141bf7
I0321 16:32:14.803650       1 main.go:115] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0321 16:32:14.803676       1 connection.go:69] Connecting to /csi/csi.sock
I0321 16:32:14.803980       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:14.804346       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:15.804335       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:15.804751       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:16.846317       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:16.847115       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:18.022734       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:18.023030       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:19.088936       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:20.063972       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:20.064194       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:21.034118       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:21.034361       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:22.108869       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:22.109078       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:22.935456       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:23.798314       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:24.637034       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:24.637423       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:25.557658       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:25.557816       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:26.564052       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:27.690666       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:28.576267       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:28.576598       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:29.528783       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:29.529125       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:30.456179       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:30.456353       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:31.443939       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:31.444107       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:32.357247       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:32.357408       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:33.274608       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:33.274793       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:34.346567       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:34.346788       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:35.233913       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:35.234643       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:36.115371       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:36.116198       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:37.059844       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:37.060350       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:32:38.088310       1 connection.go:96] Still trying, connection is CONNECTING
I0321 16:32:38.089099       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:33:13.509227       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:33:14.434070       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0321 16:33:14.803928       1 connection.go:89] Connection timed out
I0321 16:33:14.803955       1 main.go:123] Calling CSI driver to discover driver name.
I0321 16:33:14.803995       1 connection.go:137] GRPC call: /csi.v1.Identity/GetPluginInfo
I0321 16:33:14.804004       1 connection.go:138] GRPC request: {}
I0321 16:33:14.805435       1 connection.go:140] GRPC response: {}
I0321 16:33:14.806067       1 connection.go:141] GRPC error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
E0321 16:33:14.806080       1 main.go:128] rpc error: code = Unavailable desc = all SubConns are in TransientFailure

Thanks

@msau42
Copy link
Collaborator

msau42 commented Mar 21, 2019

Is your cinder driver container also restarting?

Can you post your pod spec?

@darcyllingyan
Copy link
Author

Hi @msau42 ,
No, the cider driver works well and isn't restarting.
The pod spec is:

# This YAML file contains driver-registrar & csi driver nodeplugin API objects,
# which are necessary to run csi nodeplugin for cinder.

kind: DaemonSet
apiVersion: apps/v1beta2
metadata:
  name: csi-nodeplugin-cinderplugin
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: csi-nodeplugin-cinderplugin
  template:
    metadata:
      labels:
        app: csi-nodeplugin-cinderplugin
    spec:
      serviceAccount: csi-nodeplugin
      hostNetwork: true
      containers:
        - name: node-driver-registrar
          image: bcmt-registry:5000/node-driver-registrar:v1.0-canary
          args:
            - "--v=5"
            - "--csi-address=$(ADDRESS)"
            - "--kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)"
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "rm -rf /registration/cinder.csi.openstack.org /registration/cinder.csi.openstack.org-reg.sock"]
          env:
            - name: ADDRESS
              value: /csi/csi.sock
            - name: DRIVER_REG_SOCK_PATH
              value: /var/lib/kubelet/plugins/cinder.csi.openstack.org/csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          imagePullPolicy: "Always"
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
            - name: registration-dir
              mountPath: /registration
        - name: cinder
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
          image: bcmt-registry:5000/docker.io/k8scloudprovider/cinder-csi-plugin:latest
          args :
            - /bin/cinder-csi-plugin
            - "--nodeid=$(NODE_ID)"
            - "--endpoint=$(CSI_ENDPOINT)"
            - "--cloud-config=$(CLOUD_CONFIG)"
          env:
            - name: NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CSI_ENDPOINT
              value: unix://csi/csi.sock
            - name: CLOUD_CONFIG
              value: /etc/config/cloud.conf
          imagePullPolicy: "IfNotPresent"
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
            - name: pods-mount-dir
              mountPath: /var/lib/kubelet/pods
              mountPropagation: "Bidirectional"
            - name: pods-cloud-data
              mountPath: /var/lib/cloud/data
              readOnly: true
            - name: pods-probe-dir
              mountPath: /dev
              mountPropagation: "HostToContainer"
            - name: secret-cinderplugin
              mountPath: /etc/config
              readOnly: true
            - name: ca-cinderplugin
              mountPath: /etc/kubernetes
              readOnly: true
      volumes:
        - name: socket-dir
          hostPath:
            path: /var/lib/kubelet/plugins/cinder.csi.openstack.org
            type: DirectoryOrCreate
        - name: registration-dir
          hostPath:
            path: /var/lib/kubelet/plugins_registry/
            type: DirectoryOrCreate
        - name: pods-mount-dir
          hostPath:
            path: /var/lib/kubelet/pods
            type: Directory
        - name: pods-cloud-data
          hostPath:
            path: /var/lib/cloud/data
            type: Directory
        - name: pods-probe-dir
          hostPath:
            path: /dev
            type: Directory
        - name: secret-cinderplugin
          secret:
            secretName: cloud-config
        - name: ca-cinderplugin
          secret:
            secretName: csi-ca-cinderplugin

Thanks

@msau42
Copy link
Collaborator

msau42 commented Mar 22, 2019

Maybe running the sidecar and driver with log level 10 will have more detailed logs. Can you also check the version of grpc used in the cinder driver and your sidecar? node-driver-registrar is using 1.10.0

@msau42
Copy link
Collaborator

msau42 commented Mar 22, 2019

cc @jsafrane

@darcyllingyan
Copy link
Author

darcyllingyan commented Mar 22, 2019

Hi @msau42 ,
Thanks for the information.
Sorry I'm not clear how could I change the log level to 10? Need I change the code and rebuild the docker image? How to check the grpc version?

Thanks.

@jsafrane
Copy link
Contributor

I can't find anything obviously wrong in the pod spec. What is bcmt-registry:5000/node-driver-registrar:v1.0-canary image? Can you test with quay.io/k8scsi/csi-node-driver-registrar? Note that quay.io/k8scsi/driver-node-registrar and quay.io/k8scsi/csi-node-driver-registrar are different images.

Another thing to check would be good old strace. On a node try strace -p $( pidof node-driver-registrar), it might reveal why it can't connect to the driver. If it was RHEL, I'd blame SELinux.

@darcyllingyan
Copy link
Author

darcyllingyan commented Mar 23, 2019

Hi @jsafrane , @msau42
Thanks very much for the response.
I used the quay.io/k8scsi/csi-node-driver-registrar image and still reports the same issue.

After check the strace, the los is:

# strace -p $( pidof node-driver-registrar)
strace: Process 19904 attached
futex(0x1df7fa0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1dfbba0, FUTEX_WAKE_PRIVATE, 1) = 1
write(2, "I0323 06:44:37.831648       1 co"..., 87) = 87
futex(0x1df7fa0, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0xc000187640, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1dfbba0, FUTEX_WAIT_PRIVATE, 0, {19, 998670599}) = 0
futex(0xc00005ef40, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1dfbba0, FUTEX_WAIT_PRIVATE, 0, {0, 862743664}) = -1 ETIMEDOUT (Connection timed out)
futex(0x1df72d0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc00005ef40, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc00005e840, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1dfbba0, FUTEX_WAIT_PRIVATE, 0, {26, 823065886}) = 0
futex(0x1dfbba0, FUTEX_WAIT_PRIVATE, 0, {1, 42299504}) = -1 ETIMEDOUT (Connection timed out)
futex(0x1df72d0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc00005ef40, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1dfbc20, FUTEX_WAKE_PRIVATE, 1) = 1
socket(AF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
connect(3, {sa_family=AF_LOCAL, sun_path="/csi/csi.sock"}, 16) = -1 EACCES (Permission denied)
close(3)                                = 0

Seems the root cause is SELinux, as in the SELinux enabled env, the node-driver-registrar couldn't connect the /csi/csi.sock, after I disable it, the node-driver-registrar could connect the csi driver successfully.

So in order to use the node-driver-registrar, the SELinux must be disabled? As disable SELinux may leads to some security issue, whether there is method to use the node-driver-registrar when allowing enable the SELinux

Thanks very much!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 23, 2019
jsafrane pushed a commit to jsafrane/node-driver-registrar that referenced this issue Oct 15, 2019
pohly added a commit to pohly/node-driver-registrar that referenced this issue Oct 12, 2020
a0f195cc Merge pull request kubernetes-csi#106 from msau42/fix-canary
7100c120 Only set staging registry when running canary job
b3c65f9c Merge pull request kubernetes-csi#99 from msau42/add-release-process
e53f3e85 Merge pull request kubernetes-csi#103 from msau42/fix-canary
d1294628 Document new method for adding CI jobs are new K8s versions
e73c2ce5 Use staging registry for canary tests
2c098465 Add cleanup instructions to release-notes generation
60e1cd3 Merge pull request kubernetes-csi#98 from pohly/kubernetes-1-19-fixes
0979c09 prow.sh: fix E2E suite for Kubernetes >= 1.18
3b4a2f1 prow.sh: fix installing Go for Kubernetes 1.19.0
1fbb636 Merge pull request kubernetes-csi#97 from pohly/go-1.15
82d108a switch to Go 1.15
d8a2530 Merge pull request kubernetes-csi#95 from msau42/add-release-process
843bddc Add steps on promoting release images
0345a83 Merge pull request kubernetes-csi#94 from linux-on-ibm-z/bump-timeout
1fdf2d5 cloud build: bump timeout in Prow job
41ec6d1 Merge pull request kubernetes-csi#93 from animeshk08/patch-1
5a54e67 filter-junit: Fix gofmt error
0676fcb Merge pull request kubernetes-csi#92 from animeshk08/patch-1
36ea4ff filter-junit: Fix golint error
f5a4203 Merge pull request kubernetes-csi#91 from cyb70289/arm64
43e50d6 prow.sh: enable building arm64 image
0d5bd84 Merge pull request kubernetes-csi#90 from pohly/k8s-staging-sig-storage
3df86b7 cloud build: k8s-staging-sig-storage
c5fd961 Merge pull request kubernetes-csi#89 from pohly/cloud-build-binfmt
db0c2a7 cloud build: initialize support for running commands in Dockerfile
be902f4 Merge pull request kubernetes-csi#88 from pohly/multiarch-windows-fix
340e082 build.make: optional inclusion of Windows in multiarch images
5231f05 build.make: properly declare push-multiarch
4569f27 build.make: fix push-multiarch ambiguity
17dde9e Merge pull request kubernetes-csi#87 from pohly/cloud-build
bd41690 cloud build: initial set of shared files
9084fec Merge pull request kubernetes-csi#81 from msau42/add-release-process
6f2322e Update patch release notes generation command
0fcc3b1 Merge pull request kubernetes-csi#78 from ggriffiths/fix_csi_snapshotter_rbac_version_set
d8c76fe Support local snapshot RBAC for pull jobs
c1bdf5b Merge pull request kubernetes-csi#80 from msau42/add-release-process
ea1f94a update release tools instructions
152396e Merge pull request kubernetes-csi#77 from ggriffiths/snapshotter201_update
7edc146 Update snapshotter to version 2.0.1
4cf843f Merge pull request kubernetes-csi#76 from pohly/build-targets
3863a0f build for multiple platforms only in CI, add s390x
8322a7d Merge pull request kubernetes-csi#72 from pohly/hostpath-update
7c5a89c prow.sh: use 1.3.0 hostpath driver for testing
b8587b2 Merge pull request kubernetes-csi#71 from wozniakjan/test-vet
fdb3218 Change 'make test-vet' to call 'go vet'
d717c8c Merge pull request kubernetes-csi#69 from pohly/test-driver-config
a1432bc Merge pull request kubernetes-csi#70 from pohly/kubelet-feature-gates
5f74333 prow.sh: also configure feature gates for kubelet
84f78b1 prow.sh: generic driver installation
3c34b4f Merge pull request kubernetes-csi#67 from windayski/fix-link
fa90abd fix incorrect link
ff3cc3f Merge pull request kubernetes-csi#54 from msau42/add-release-process
ac8a021 Document the process for releasing a new sidecar
23be652 Merge pull request kubernetes-csi#65 from msau42/update-hostpath
6582f2f Update hostpath driver version to get fix for connection-timeout
4cc9174 Merge pull request kubernetes-csi#64 from ggriffiths/snapshotter_2_version_update
8191eab Update snapshotter to version v2.0.0
3c463fb Merge pull request kubernetes-csi#61 from msau42/enable-snapshots
8b0316c Fix overriding of junit results by using unique names for each e2e run
5f444b8 Merge pull request kubernetes-csi#60 from saad-ali/updateHostpathVersion
af9549b Update prow hostpath driver version to 1.3.0-rc2
f6c74b3 Merge pull request kubernetes-csi#57 from ggriffiths/version_gt_kubernetes_fix
fc80975 Fix version_gt to work with kubernetes prefix
9f1f3dd Merge pull request kubernetes-csi#56 from msau42/enable-snapshots
b98b2ae Enable snapshot tests in 1.17 to be run in non-alpha jobs.
9ace020 Merge pull request kubernetes-csi#52 from msau42/update-readme
540599b Merge pull request kubernetes-csi#53 from msau42/fix-make
a4e6299 fix syntax for ppc64le build
771ca6f Merge pull request kubernetes-csi#49 from ggriffiths/prowsh_improve_version_gt
d7c69d2 Merge pull request kubernetes-csi#51 from msau42/enable-multinode
4ad6949 Improve snapshot pod running checks and improve version_gt
53888ae Improve README by adding an explicit Kubernetes dependency section
9a7a685 Create a kind cluster with two worker nodes so that the topology feature can be tested. Test cases that test accessing volumes from multiple nodes need to be skipped
4ff2f5f Merge pull request kubernetes-csi#50 from darkowlzz/kind-0.6.0
80bba1f Use kind v0.6.0
6d674a7 Merge pull request kubernetes-csi#47 from Pensu/multi-arch
8adde49 Merge pull request kubernetes-csi#45 from ggriffiths/snapshot_beta_crds
003c14b Add snapshotter CRDs after cluster setup
a41f386 Merge pull request kubernetes-csi#46 from mucahitkurt/kind-cluster-cleanup
1eaaaa1 Delete kind cluster after tests run.
83a4ef1 Adding build for ppc64le
4fcafec Merge pull request kubernetes-csi#43 from pohly/system-pod-logging
f41c135 prow.sh: also log output of system containers
ee22a9c Merge pull request kubernetes-csi#42 from pohly/use-vendor-dir
8067845 travis.yml: also use vendor directory
23df4ae prow.sh: use vendor directory if available
a53bd4c Merge pull request kubernetes-csi#41 from pohly/go-version
c8a1c4a better handling of Go version
5e773d2 update CI to use Go 1.13.3
f419d74 Merge pull request kubernetes-csi#40 from msau42/add-1.16
e0fde8c Add new variables for 1.16 and remove 1.13
adf00fe Merge pull request kubernetes-csi#36 from msau42/full-clone
f1697d2 Do full git clones in travis. Shallow clones are causing test-subtree errors when the depth is exactly 50.
2c81919 Merge pull request kubernetes-csi#34 from pohly/go-mod-tidy
518d6af Merge pull request kubernetes-csi#35 from ddebroy/winbld2
2d6b3ce Build Windows only for amd64
c1078a6 go-get-kubernetes.sh: automate Kubernetes dependency handling
194289a update Go mod support
0affdf9 Merge pull request kubernetes-csi#33 from gnufied/enable-hostpath-expansion
6208f6a Enable hostpath expansion
6ecaa76 Merge pull request kubernetes-csi#30 from msau42/fix-windows
ea2f1b5 build windows binaries with .exe suffix
2d33550 Merge pull request kubernetes-csi#29 from mucahitkurt/create-2-node-kind-cluster
a8ea8bc create 2-node kind cluster since topology support is added to hostpath driver
df8530d Merge pull request kubernetes-csi#27 from pohly/dep-vendor-check
35ceaed prow.sh: install dep if needed
f85ab5a Merge pull request kubernetes-csi#26 from ddebroy/windows1
9fba09b Add rule for building Windows binaries
0400867 Merge pull request kubernetes-csi#25 from msau42/fix-master-jobs
dc0a5d8 Update kind to v0.5.0
aa85b82 Merge pull request kubernetes-csi#23 from msau42/fix-master-jobs
f46191d Kubernetes master changed the way that releases are tagged, which needed changes to kind. There are 3 changes made to prow.sh:
1cac3af Merge pull request kubernetes-csi#22 from msau42/add-1.15-jobs
0c0dc30 prow.sh: tag master images with a large version number
f4f73ce Merge pull request kubernetes-csi#21 from msau42/add-1.15-jobs
4e31f07 Change default hostpath driver name to hostpath.csi.k8s.io
4b6fa4a Update hostpath version for sidecar testing to v1.2.0-rc2
ecc7918 Update kind to v0.4.0. This requires overriding Kubernetes versions with specific patch versions that kind 0.4.0 supports. Also, feature gate setting is only supported on 1.15+ due to kind.sigs.k8s.io/v1alpha3 and kubeadm.k8s.io/v1beta2 dependencies.
a6f21d4 Add variables for 1.15
db8abb6 Merge pull request kubernetes-csi#20 from pohly/test-driver-config
b2f4e05 prow.sh: flexible test driver config
0399988 Merge pull request kubernetes-csi#19 from pohly/go-mod-vendor
066143d build.make: allow repos to use 'go mod' for vendoring
0bee749 Merge pull request kubernetes-csi#18 from pohly/go-version
e157b6b update to Go 1.12.4
88dc9a4 Merge pull request kubernetes-csi#17 from pohly/prow
0fafc66 prow.sh: skip sanity testing if component doesn't support it
bcac1c1 Merge pull request kubernetes-csi#16 from pohly/prow
0b10f6a prow.sh: update csi-driver-host-path
0c2677e Merge pull request kubernetes-csi#15 from pengzhisun/master
ff9bce4 Replace 'return' to 'exit' to fix shellcheck error
c60f382 Merge pull request kubernetes-csi#14 from pohly/prow
7aaac22 prow.sh: remove AllAlpha=all, part II
6617773 Merge pull request kubernetes-csi#13 from pohly/prow
cda2fc5 prow.sh: avoid AllAlpha=true
546d550 prow.sh: debug failing KinD cluster creation
9b0d9cd build.make: skip shellcheck if Docker is not available
aa45a1c prow.sh: more efficient execution of individual tests
f3d1d2d prow.sh: fix hostpath driver version check
31dfaf3 prow.sh: fix running of just "alpha" tests
f501443 prow.sh: AllAlpha=true for unknown Kubernetes versions
95ae9de Merge pull request kubernetes-csi#9 from pohly/prow
d87eccb prow.sh: switch back to upstream csi-driver-host-path
6602d38 prow.sh: different E2E suite depending on Kubernetes version
741319b prow.sh: improve building Kubernetes from source
29545bb prow.sh: take Go version from Kubernetes source
429581c prow.sh: pull Go version from travis.yml
0a0fd49 prow.sh: comment clarification
2069a0a Merge pull request kubernetes-csi#11 from pohly/verify-shellcheck
55212ff initial Prow test job
6c7ba1b build.make: integrate shellcheck into "make test"
b2d25d4 verify-shellcheck.sh: make it usable in csi-release-tools
3b6af7b Merge pull request kubernetes-csi#12 from pohly/local-e2e-suite
104a1ac build.make: avoid unit-testing E2E test suite
34010e7 Merge pull request kubernetes-csi#10 from pohly/vendor-check
e6db50d check vendor directory
fb13c51 verify-shellcheck.sh: import from Kubernetes
94fc1e3 build.make: avoid unit-testing E2E test suite
849db0a Merge pull request kubernetes-csi#8 from pohly/subtree-check-relax
cc564f9 verify-subtree.sh: relax check and ignore old content
33d58fd Merge pull request kubernetes-csi#5 from pohly/test-enhancements
be8a440 Merge pull request kubernetes-csi#4 from pohly/canary-fix
b0336b5 build.make: more readable "make test" output
09436b9 build.make: fix pushing of "canary" image from master branch
147892c build.make: support suppressing checks
154e33d build.make: clarify usage of "make V=1"

git-subtree-dir: release-tools
git-subtree-split: a0f195cc2ddc2a1f07d4d3e46fc08187db358f94
jsafrane pushed a commit to jsafrane/node-driver-registrar that referenced this issue Sep 29, 2022
Bug 2097286: Rebase to v2.5.1 for OCP 4.12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants