Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Connection() from util package #123

Merged
merged 3 commits into from
Feb 22, 2019

Conversation

jsafrane
Copy link
Contributor

@jsafrane jsafrane commented Feb 8, 2019

This has several effects:

  • --connection-timeout option has no effect now, Connect() waits forever for the driver.
  • Attacher exits when driver socket is closed (i.e. driver container either dies or restarts). This makes sure that attacher can cache driver capabilities safely, as we don't expect that driver can change its capabilities while it's running.

@pohly, @msau42, PTAL

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 8, 2019
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this with the example hostpath deployment. It works, but I noticed one drawback: when the hostpath driver gets removed, the csi-attacher container immediately exists and then also immediately gets restarted.

That's all as intended, but it also has the effect that the "Lost connection to CSI driver, exiting" message is lost. kubectl logs only shows the logs of the restarted csi-attacher.

Should we change that by logging the exit reason to /dev/termination-log (https://kubernetes.io/docs/tasks/debug-application-cluster/determine-reason-pod-failure/#writing-and-reading-a-termination-message)?

@pohly
Copy link
Contributor

pohly commented Feb 8, 2019

Another solution is to change the deployment:

diff --git a/deploy/hostpath/csi-hostpath-attacher.yaml b/deploy/hostpath/csi-hostpath-attacher.yaml
index df7f0d63..534ff632 100644
--- a/deploy/hostpath/csi-hostpath-attacher.yaml
+++ b/deploy/hostpath/csi-hostpath-attacher.yaml
@@ -32,6 +32,7 @@ spec:
         - name: csi-attacher
           image: quay.io/k8scsi/csi-attacher:v1.0.1
           imagePullPolicy: Always
+          terminationMessagePolicy: FallbackToLogsOnError
           args:
             - --v=5
             - --csi-address=$(ADDRESS)

But the result isn't as nice:

    Last State:  Terminated
      Reason:    Error
      Message:   ion.go:157] GRPC request: {}
I0208 11:59:10.391492       1 connection.go:159] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}}]}
I0208 11:59:10.393149       1 connection.go:160] GRPC error: <nil>
I0208 11:59:10.393162       1 connection.go:156] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0208 11:59:10.393169       1 connection.go:157] GRPC request: {}
I0208 11:59:10.395096       1 connection.go:159] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}}]}
I0208 11:59:10.397978       1 connection.go:160] GRPC error: <nil>
I0208 11:59:10.397993       1 main.go:154] CSI driver does not support ControllerPublishUnpublish, using trivial handler
I0208 11:59:10.398189       1 controller.go:113] Starting CSI attacher
I0208 11:59:10.398580       1 reflector.go:131] Starting reflector *v1.PersistentVolume (10m0s) from k8s.io/client-go/informers/factory.go:132
I0208 11:59:10.398634       1 reflector.go:169] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:132
I0208 11:59:10.398588       1 reflector.go:131] Starting reflector *v1beta1.VolumeAttachment (10m0s) from k8s.io/client-go/informers/factory.go:132
I0208 11:59:10.398652       1 reflector.go:169] Listing and watching *v1beta1.VolumeAttachment from k8s.io/client-go/informers/factory.go:132
I0208 11:59:10.498517       1 shared_informer.go:123] caches populated
I0208 11:59:10.498679       1 controller.go:175] Started VA processing "csi-86113377f1338d05e24b17447d9a07fdcb35f8cf23bef352d63c43e00793fef5"
I0208 11:59:10.498713       1 controller.go:190] Skipping VolumeAttachment csi-86113377f1338d05e24b17447d9a07fdcb35f8cf23bef352d63c43e00793fef5 for attacher pmem-csi
I0208 11:59:10.498746       1 controller.go:205] Started PV processing "pvc-cde28d4c-2b8e-11e9-82fb-deadbeef0100"
E0208 11:59:24.983978       1 connection.go:105] Lost connection to unix:///csi/csi.sock.
F0208 11:59:24.983998       1 connection.go:87] Lost connection to CSI driver, exiting

      Exit Code:    255

@pohly
Copy link
Contributor

pohly commented Feb 8, 2019

Here's a patch for the code:

diff --git a/pkg/connection/connection.go b/pkg/connection/connection.go
index a2abf84..1d1d86d 100644
--- a/pkg/connection/connection.go
+++ b/pkg/connection/connection.go
@@ -19,6 +19,7 @@ package connection
 import (
        "context"
        "fmt"
+       "io/ioutil"
 
        "github.com/container-storage-interface/spec/lib/go/csi"
        "github.com/kubernetes-csi/csi-lib-utils/connection"
@@ -84,7 +85,12 @@ func New(address string) (CSIConnection, error) {
 }
 
 func onReconnect() bool {
-       klog.Fatalf("Lost connection to CSI driver, exiting")
+       terminationMsg := "Lost connection to CSI driver, exiting"
+       terminationLog := "/dev/termination-log"
+       if err := ioutil.WriteFile(terminationLog, []byte(terminationMsg), 0644); err == nil {
+               klog.Errorf("%s: %s", terminationLog, err)
+       }
+       klog.Fatalf(terminationMsg)
        return false
 }

This leads to this exit status:

    Last State:     Terminated
      Reason:       Error
      Message:      Lost connection to CSI driver, exiting
      Exit Code:    255
      Started:      Fri, 08 Feb 2019 12:56:19 +0100
      Finished:     Fri, 08 Feb 2019 12:56:30 +0100

Copy link
Collaborator

@msau42 msau42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just a nit. And I like @pohly 's idea of for the termination msg

Gopkg.toml Outdated
@@ -44,7 +44,7 @@

[[constraint]]
name = "github.com/kubernetes-csi/csi-lib-utils"
version = "0.1.0"
version = "0.3.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to constrain the version here or just use the latest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm for "use the latest" and the constraint does allow that because it means ">= 0.3.0". So this looks right to me.

@jsafrane
Copy link
Contributor Author

I like the terminationLog usage, however, it belongs to csi-lib-utils, because it's going to be used in provisioner too.

@jsafrane
Copy link
Contributor Author

Created kubernetes-csi/csi-lib-utils#16 to move GetDriverName to -util.

@jsafrane
Copy link
Contributor Author

/hold
Until csi-test is released with kubernetes-csi/csi-test#165. Right now, Connect() does not support tcp addresses that csi-test uses.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 11, 2019
@jsafrane
Copy link
Contributor Author

Filled kubernetes-csi/csi-lib-utils#18 for writing termination log.

@msau42
Copy link
Collaborator

msau42 commented Feb 14, 2019

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 15, 2019
@jsafrane
Copy link
Contributor Author

I rewrote the PR to newest csi-test (that pulls container-storage-interface/spec@master; the first commit) + my current PRs in csi-lib-utils (the second commit) to show how the code is going to look when everything gets merged.

And I tested everything with a mock driver.

@msau42
Copy link
Collaborator

msau42 commented Feb 16, 2019

@msau42
Copy link
Collaborator

msau42 commented Feb 20, 2019

@k8s-ci-robot k8s-ci-robot removed the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 21, 2019
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 21, 2019
@jsafrane
Copy link
Contributor Author

Rebased to csi-lib-utils=0.3.1-rc1 for preview. Waiting for official release.

@jsafrane
Copy link
Contributor Author

Rebased to csi-test 0.3.1 and csi-lib-util 1.0.3

@jsafrane
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 22, 2019
@msau42
Copy link
Collaborator

msau42 commented Feb 22, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2019
@k8s-ci-robot k8s-ci-robot merged commit 0e6f5f2 into kubernetes-csi:master Feb 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants