Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advance csi-node-driver-registrar version to 1.1.0 #248

Merged
merged 1 commit into from
Apr 30, 2019

Conversation

okartau
Copy link
Contributor

@okartau okartau commented Apr 25, 2019

One deployment trial case shows that driver deployment fails with version 1.0.2 but succeeds with 1.1.0.
In failing case, node-driver-registrar fails in getting connection to local csi socket, times out in 60s, and causes pod to exit and CrashLoop. For some reason (still not explained), this scenario repeats multiple times.
There is explanation why connection can take longer time 1st time, because node driver is in turn waiting to register with controller, and controller is still in starting stage.
But such wait should not happen on 2nd start of node pod.
In 1.1.0 the timeout and exit was removed and driver-registrar keeps trying. That seems to help
in the current deployment case.

@okartau
Copy link
Contributor Author

okartau commented Apr 25, 2019

Note that this commit also has side-artifact in form of /sys mount change by kustomize pass in a file which is not relevant to this PR.
This seems to be caused by sequence:

  • I had made /sys mount add change in a branch
  • branch by @pohly was merged adding deployment files (testing variant) which did not had /sys mount initially
  • my branch was merged, leaving some files without /sys mount addition

So the lack of some parts was side effect of generating files and adding files in overlapping branches.
But, what is interesting: I tried to amend the state of devel separately from current PR
by running make customize in hope that it would fix the state.
Surprisingly, it does not generate the missing part (it does not make any changes).
I get missing parts generated only after I make a next (release number) change.

@okartau
Copy link
Contributor Author

okartau commented Apr 25, 2019

Joy too early: first impression "it works now" was based on observation that pod did not enter CrashLoop state. But instead, driver-registrar container remains retrying without timeout, and does not reach functional state.
In a sense the new semantics is not so good because it hides the "can not connect to socket" problem.

@okartau okartau added deployment affects deployment, not driver code question Further information is requested labels Apr 25, 2019
@okartau
Copy link
Contributor Author

okartau commented Apr 25, 2019

The problem we see in this deployment trial is similar to what is reported here:
kubernetes-csi/node-driver-registrar#36

The SELinux=enabled has been pointed to trigger connection failure

@pohly
Copy link
Contributor

pohly commented Apr 25, 2019 via email

@okartau
Copy link
Contributor Author

okartau commented Apr 25, 2019

The SELinux=enabled has been pointed to trigger connection failure

But how come only driver-registrar has problem with that?
Other sidecars also mount same host directory as /csi.
Is it so the other sidecars do not try to connect to same socket?

@okartau okartau force-pushed the advance-node-driver-registrar branch from 0206d95 to f2b0d82 Compare April 29, 2019 14:35
@okartau
Copy link
Contributor Author

okartau commented Apr 29, 2019

Although the reason for deployment issue which caused this trial, turned out to be other misconfiguration, we can still consider this PR as independent contribution

@okartau okartau removed the question Further information is requested label Apr 29, 2019
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I also verified with the Kubernetes-CSI WG that csi-node-driver-registrar is indeed compatible with 1.13 and that the resulting merge leaves deployment files in a consistent state (merge manually, run make test-kustomize).

@pohly pohly merged commit ad2b018 into intel:devel Apr 30, 2019
@okartau okartau deleted the advance-node-driver-registrar branch April 30, 2019 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployment affects deployment, not driver code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants