Skip to content

fix(helm): Wait for CRDs to reach established state for crd_install hook#5112

Merged
bacongobbler merged 1 commit intohelm:masterfrom
mortent:WaitCRDEstablished
Apr 15, 2019
Merged

fix(helm): Wait for CRDs to reach established state for crd_install hook#5112
bacongobbler merged 1 commit intohelm:masterfrom
mortent:WaitCRDEstablished

Conversation

@mortent
Copy link

@mortent mortent commented Dec 29, 2018

What this PR does / why we need it:
There is a race condition in the crd_install hook implementation, where there is a chance a CRD is not yet ready by the time CRs are being created. This is reported in issue #4925. This change makes sure CRDs installed through the crd_install hook reaches the established state before the hook is considered complete.

Fixes #4925

Special notes for your reviewer:
Unit-testing code in the kubernetes client is difficult, as the builder/infos is tightly coupled with the API server. I will look into how to improve testing for this part of the codebase, but I would like to separate it from this PR.

If applicable:

  • this PR contains documentation
  • this PR contains unit tests
  • this PR has been tested for backwards compatibility

@helm-bot helm-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 29, 2018
@jhohertz
Copy link

jhohertz commented Jan 4, 2019

This is fairly important for those charts installing a CRD as well as instances of those new resources. (IE: prometheus-operator).

@bacongobbler bacongobbler added this to the 2.13.0 milestone Jan 4, 2019
@helm-bot helm-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 20, 2019
@mortent
Copy link
Author

mortent commented Jan 20, 2019

Updated this PR to include unit tests.

@aviau
Copy link

aviau commented Jan 23, 2019

I am curious if this also fixes race conditions in chart purges? (see #10743)

Copy link
Member

@bacongobbler bacongobbler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code in here looks great to me!

@vsliouniaev
Copy link

Two more issues opened that should get fixed by this PR
prometheus-operator/prometheus-operator#2432
helm/charts#11702

@micw
Copy link

micw commented Feb 28, 2019

I tested this with the test chart+script I wrote to reproduce #5375 - the issue still occurs.

@jhohertz
Copy link

CRDs with helm, and in particular trying to deploy prometheus-operator chart is completely broken until something like this gets merged down.

@tim-beatson
Copy link

@bacongobbler, I'm pretty surprised this didn't make it into Helm 2.13.0. Are you able to offer a timescale on when we can see this in a release of Helm? I have a number of charts I want to use that are completely broken until this can be resolved....

@Starefossen
Copy link

What's the holdup of this? Anything we can do to help moving this forward?

@bacongobbler
Copy link
Member

bacongobbler commented Mar 24, 2019

Any PR size/L or higher needs two maintainers to sign off. I'll see if I can bug one of the other maintainers to take a look at this before the 2.14 release. Sorry for the delay on this one!

@tim-beatson
Copy link

@bacongobbler, that would be massively appreciated.

@helgi
Copy link
Contributor

helgi commented Mar 29, 2019

Getting the conflict resolved and this landed would be great :-)

@micw
Copy link

micw commented Mar 29, 2019

I'd say, the conflict is not the main problem. Please have a look at my comment above. I build and deployed tiller with this PR applied and it did not solved the issue (see #5375).

Makes sure CRDs installed through the crd_install hook reaches the `established` state before the hook is considered complete.

Signed-off-by: Morten Torkildsen <mortent@google.com>
@mortent
Copy link
Author

mortent commented Mar 30, 2019

I have rebased the PR so the conflict should be resolved.

@micw Your issue is actually a separate one from what is being addressed in this PR. This fixes the issue where the installed CRD is not ready at the point where CRs are created.

Your issue is that the deletion of the CRD through the hook-delete-policy hook has not completed when the crd-install hook tries to install the same CRD. Based on the error message, it sounds like the deletion timestamp has been set, but the resource is not yet removed from etcd. From just a quick look at the code that does the deletion, I think it needs to poll for the resource to be fully removed from etcd before the delete hook should be considered complete. I will try to find time to make a separate fix for this issue.

@Morriz
Copy link

Morriz commented Apr 4, 2019

Any updates? Can this be merged?

@bacongobbler
Copy link
Member

@Morriz see my comment above

@thomastaylor312
Copy link
Contributor

I'll be reviewing this soon

@thomastaylor312 thomastaylor312 self-requested a review April 8, 2019 16:13
@squillace
Copy link
Contributor

@thomastaylor312 I know you're busy, but I, too, am hoping this can get reviewed sooner rather than later. MEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.

@jdn-za
Copy link

jdn-za commented Apr 12, 2019

+1 to that, anything we can help with?

Copy link
Contributor

@thomastaylor312 thomastaylor312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go! Thanks for all the patience everyone and @mortent for the fix.

@bacongobbler Merge away!

@bacongobbler bacongobbler merged commit 0d81d92 into helm:master Apr 15, 2019
@guilledipa
Copy link

Is there a time estimate to port this change to Helm v3?

Thanks for this!

@thomastaylor312 thomastaylor312 added the Needs v3 backport Label PRs for v4/main, which are still applicable to v3 so need a separate backport PR label Jul 8, 2019
@thomastaylor312
Copy link
Contributor

@guilledipa thanks for the reminder! We missed labeling this with the needs v3 label so we can make sure to come back around and port it over

@thomastaylor312
Copy link
Contributor

This was implemented in Helm 3 by #6332

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files. v3 port complete For completed v2->v3 ports

Projects

None yet

Development

Successfully merging this pull request may close these issues.

crd-install hook possible race condition