Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent name and Exponential backoff #66

Merged
merged 1 commit into from
Apr 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmd/csi-provisioner/csi-provisioner.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ var (
kubeconfig = flag.String("kubeconfig", "", "Absolute path to the kubeconfig file. Either this or master needs to be set if the provisioner is being run out of cluster.")
csiEndpoint = flag.String("csi-address", "/run/csi/socket", "The gRPC endpoint for Target CSI Volume")
connectionTimeout = flag.Duration("connection-timeout", 10*time.Second, "Timeout for waiting for CSI driver socket.")
volumeNamePrefix = flag.String("volume-name-prefix", "kubernetes-dynamic-pv", "Prefix to apply to the name of a created volume")
volumeNamePrefix = flag.String("volume-name-prefix", "pvc", "Prefix to apply to the name of a created volume")
volumeNameUUIDLength = flag.Int("volume-name-uuid-length", 16, "Length in characters for the generated uuid of a created volume")

provisionController *controller.ProvisionController
Expand Down
59 changes: 52 additions & 7 deletions pkg/controller/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,19 +29,27 @@ import (

"k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/uuid"
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"

"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/connectivity"
"google.golang.org/grpc/status"

"github.com/container-storage-interface/spec/lib/go/csi/v0"
)

const (
secretNameKey = "csiProvisionerSecretName"
secretNamespaceKey = "csiProvisionerSecretNamespace"
// Defines parameters for ExponentialBackoff used for executing
// CSI CreateVolume API call, it gives approx 4 minutes for the CSI
// driver to complete a volume creation.
backoffDuration = time.Second * 5
backoffFactor = 1.2
backoffSteps = 10
)

// CSIProvisioner struct
Expand Down Expand Up @@ -238,6 +246,18 @@ func checkDriverState(grpcClient *grpc.ClientConn, timeout time.Duration) (strin
return driverName, nil
}

func makeVolumeName(prefix, pvcUID string) (string, error) {
// create persistent name based on a volumeNamePrefix and volumeNameUUIDLength
// of PVC's UID
if len(prefix) == 0 {
return "", fmt.Errorf("Volume name prefix cannot be of length 0")
}
if len(pvcUID) == 0 {
return "", fmt.Errorf("corrupted PVC object, it is missing UID")
}
return fmt.Sprintf("%s-%s", prefix, pvcUID), nil
}

func (p *csiProvisioner) Provision(options controller.VolumeOptions) (*v1.PersistentVolume, error) {
if options.PVC.Spec.Selector != nil {
return nil, fmt.Errorf("claim Selector is not supported")
Expand All @@ -247,12 +267,16 @@ func (p *csiProvisioner) Provision(options controller.VolumeOptions) (*v1.Persis
if err != nil {
return nil, err
}
// create random share name
share := fmt.Sprintf("%s-%s", p.volumeNamePrefix, strings.Replace(string(uuid.NewUUID()), "-", "", -1)[0:p.volumeNameUUIDLength])

share, err := makeVolumeName(p.volumeNamePrefix, fmt.Sprintf("%s", options.PVC.ObjectMeta.UID))
if err != nil {
return nil, err
}

capacity := options.PVC.Spec.Resources.Requests[v1.ResourceName(v1.ResourceStorage)]
volSizeBytes := capacity.Value()

// Create a CSI CreateVolumeRequest
// Create a CSI CreateVolumeRequest and Response
req := csi.CreateVolumeRequest{

Name: share,
Expand All @@ -268,6 +292,8 @@ func (p *csiProvisioner) Provision(options controller.VolumeOptions) (*v1.Persis
LimitBytes: int64(volSizeBytes),
},
}
rep := &csi.CreateVolumeResponse{}

secret := v1.SecretReference{}
if options.Parameters != nil {
credentials, err := getCredentialsFromParameters(p.client, options.Parameters)
Expand All @@ -277,13 +303,32 @@ func (p *csiProvisioner) Provision(options controller.VolumeOptions) (*v1.Persis
req.ControllerCreateSecrets = credentials
secret.Name, secret.Namespace, _ = getSecretAndNamespaceFromParameters(options.Parameters)
}
ctx, cancel := context.WithTimeout(context.Background(), p.timeout)
defer cancel()

rep, err := p.csiClient.CreateVolume(ctx, &req)
opts := wait.Backoff{Duration: backoffDuration, Factor: backoffFactor, Steps: backoffSteps}
err = wait.ExponentialBackoff(opts, func() (bool, error) {
ctx, cancel := context.WithTimeout(context.Background(), p.timeout)
defer cancel()
rep, err = p.csiClient.CreateVolume(ctx, &req)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok, but still depends on the csiCLient to be setup. That is the #64 where this client should always be setup and never stop retrying. It looks like this change does not handler #64. If it does not handle that issue and the client is never setup correctly, then .CreateVolume will never work. and will eventually time out.

It looks like Provision() and maybe other calls needs to create the connection to the client at every time it calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, up to @sbezverk if he wants to address it in this PR or in a seperate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpabon I am sorry but I do disagree with csi client constantly trying to connect. Here is why: We do not carry state of relation between provisioner and the driver. So every time when there is a need for provisioner to communicate, then it will build connection to the provisioner, check it is actual state of mind and run the operation. I have done lots of negative testing with this approach and it worked really well. If you can reproduce a failure scenario which is not coverer here, please let me know I would be really interested to dissect it ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I no problem. Let's go with this change and we can adjust the Connect() call if we need to.

if err == nil {
// CreateVolume has finished successfully
return true, nil
}

if status, ok := status.FromError(err); ok {
if status.Code() == codes.DeadlineExceeded {
// CreateVolume timed out, give it another chance to complete
glog.Warningf("CreateVolume timeout: %s has expired, operation will be retried", p.timeout.String())
return false, nil
}
}
// CreateVolume failed , no reason to retry, bailing from ExponentialBackoff
return false, err
})

if err != nil {
return nil, err
}

if rep.Volume != nil {
glog.V(3).Infof("create volume rep: %+v", *rep.Volume)
}
Expand Down