Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Deployment with StatefulSet #33

Conversation

dylanpiergies
Copy link

Signed-off-by: Dylan Piergies [email protected]

@dylanpiergies
Copy link
Author

dylanpiergies commented Sep 4, 2020

Migrated from helm/charts#23126.

@dylanpiergies dylanpiergies force-pushed the replace_deployment_with_statefulset branch 3 times, most recently from 28bb3c4 to d031188 Compare September 5, 2020 00:06
@torstenwalter
Copy link
Member

I would rather not do this in a minor update, but we could include it for #41

@dylanpiergies dylanpiergies changed the base branch from main to jenkins-3-0-0 September 10, 2020 17:10
@dylanpiergies dylanpiergies force-pushed the replace_deployment_with_statefulset branch from 994a9f3 to 07adc5a Compare September 10, 2020 17:11
@dylanpiergies dylanpiergies force-pushed the replace_deployment_with_statefulset branch from 07adc5a to 01059bb Compare September 10, 2020 17:32
@UkonnRa
Copy link

UkonnRa commented Sep 15, 2020

+1 any update?

Copy link
Member

@torstenwalter torstenwalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this looks good. I just need to do some testing and would love to get feedback from other maintainers as well.

As 3.0.0 is a breaking change we could also consider use volumeClaimTemplates.

Any idea on how a migration path could look like. Could we support users with that?

@wmcdona89
Copy link
Contributor

wmcdona89 commented Sep 16, 2020

While I get moving to a StatefulSet might be the right thing to do in principle, I need some convincing it's the right thing to do in practice for Jenkins.

Going beyond horizontal scalability considerations (since Jenkins doesn't scale horizontally), the arguments for StatefulSet over Deployment talk about how it eliminates the risk of accidentally deleting a PVC and how it addresses the problem of a rolling update getting stuck due to a ReadWriteOnce PVC.

Both of these issues can be addressed via chart configuration when using Deployment. Accidental PVC deletion can be avoided by setting persistence.annotations to "helm.sh/resource-policy": keep (which the chart could start doing by default). And the rolling update problem can be avoided by setting the update strategy to Recreate (which the chart already does by default).

Other considerations...

  • Deployment gives users the flexibility of deleting the PVC on purpose such as when testing Jenkins.
  • Moving to StatefulSet and volumeClaimTemplates will mean an existing claim could only be used if it follows the volumeClaimTemplates naming convention of [name + pod-name + ordinal number].

Anything else I should be considering?

@dylanpiergies
Copy link
Author

dylanpiergies commented Sep 18, 2020

@torstenwalter As this PR currently stands, there's no migration needed... the PVC provisioned in the chart can be used as-is. If we were to switch to using a volumeClaimTemplate by default, the migration will be more involved. This PR was originally targeted at a minor release, so I was deliberately avoiding breaking changes. Now that it's targeted at a major release, we might want to discuss further.

@wmcdona89

Both of these issues can be addressed via chart configuration when using Deployment. Accidental PVC deletion can be avoided by setting persistence.annotations to "helm.sh/resource-policy": keep (which the chart could start doing by default). And the rolling update problem can be avoided by setting the update strategy to Recreate (which the chart already does by default).

Whilst use of the Recreate update strategy does mitigate the contention problem surrounding ReadWriteOnce volumes, it does not solve it completely. In the case where a helm upgrade modifies the Deployment, it yields the correct behaviour: all Pods in the previous version are terminated, and no Pods of the new revision are created until the old ones have been removed. However, this is not the case if the Pod is deleted by the user or if it's evicted from its current Node or otherwise rescheduled for some reason; in these cases, the ReplicaSet managing the Pods will not await removal of the terminating Pod before creating a new one, and you still encounter errors when the replacement Pod tries to start. In practice, the removed Pod will eventually terminate, releasing its volume mounts, but the new Pod has to rely on retries to mount its volumes successfully. To put it bluntly: it's sloppy. As a rule of thumb, if you're running anything that uses ReadWriteOnce volumes, you should almost certainly be using a StatefulSet. The lifecycle of Pods managed by a StatefulSet is slightly different: the StatefulSet will await the removal of old Pods before creating replacements and the contention for the ReadWriteOnce PVC cannot occur.

  • Deployment gives users the flexibility of deleting the PVC on purpose such as when testing Jenkins.

Moving to a StatefulSet does not prevent users from doing this. The change from Deployment to StatefulSet alone has no impact whatsoever. If we use volumeClaimTemplate as well, it would mean users will have to delete the volume explicitly (it won't get removed when the chart is uninstalled).

  • Moving to StatefulSet and volumeClaimTemplates will mean an existing claim could only be used if it follows the volumeClaimTemplates naming convention of [name + pod-name + ordinal number].

As things currently stand, we are still using the PVC defined in home-pvc.yaml; since Jenkins doesn't scale horizontally, it's not imperative that we do use a volumeClaimTemplate. The real difference in doing so would mean that the created PersistentVolume is no longer managed by Helm, or even by the StatefulSet. It's simply created if it does not exist, and is then an independent resource that's left to the user to manage. As you rightly point out, though, switching to support only volumeClaimTemplate (or making it the default) would be a breaking change that would require attention during upgrade.

@torstenwalter
Copy link
Member

Thanks for the detailed explanation. I am in favor of using a StatefulSet and keeping the existing PVC.

I've seen quite often that Jenkins could not be started as the PVC was still mounted at a different node

@wmcdona89 What's your opinion?

@wmcdona89
Copy link
Contributor

@dylanpiergies thanks for the detailed response!

if the Pod is deleted by the user or if it's evicted from its current Node or otherwise rescheduled for some reason; in these cases, the ReplicaSet managing the Pods will not await removal of the terminating Pod before creating a new one, and you still encounter errors when the replacement Pod tries to start.

I see...I guess I've been fortunate to have never been bitten by this (or at least not in recent memory)

Moving to a StatefulSet does not prevent users from doing this. The change from Deployment to StatefulSet alone has no impact whatsoever. If we use volumeClaimTemplate as well, it would mean users will have to delete the volume explicitly (it won't get removed when the chart is uninstalled).

Ah...that's right. Since we're not using volumeClaimTemplates, the chart will still own the PVC and thus can delete it.

My remaining reservations are only with volumeClaimTemplates and since StatefulSet will resolve the Pod deletion/rescheduling issue then I'm good switching to StatefulSet without volumeClaimTemplates.

@dylanpiergies
Copy link
Author

@torstenwalter @wmcdona89 Once burned... the first time I encountered the deadlock situation (it was a database of some sort, IIRC), I took some time to properly research and understand the various controller resources and their use cases. I've never personally used the Recreate deployment strategy. I guess the primary use case is for when you can tolerate some downtime as the rollout happens and can't supply the extra compute resource necessary to do a rolling update, or if your application cannot tolerate the situation where both old and new revisions are running simultaneously.

Anyway, I digress...

I don't really have any strong opinions on using volumeClaimTemplate. For new installations, I'd probably prefer to use a volumeClaimTemplate for the safety reasons others have mentioned, but for existing installations I probably wouldn't want to do the work to migrate, as it would be fiddly and error prone.

In any case, if we were to support volumeClaimTemplate at all, I do agree that it should be as an option and I'd prefer to do this as a separate change, since the complexity will be much higher and the implications for testing much greater.

@torstenwalter torstenwalter merged commit bace68d into jenkinsci:jenkins-3-0-0 Sep 27, 2020
@torstenwalter
Copy link
Member

@dylanpiergies Thank you for this! It's a big step forward to improve this chart.

@torstenwalter torstenwalter added this to the v3.0.0 release milestone Sep 28, 2020
@torstenwalter torstenwalter mentioned this pull request Nov 21, 2020
4 tasks
torstenwalter added a commit that referenced this pull request Nov 21, 2020
* Chart uses StatefulSet instead of Deployment
* XML configuration was dropped
* offending terms have been removed
* values have been renamed and re-ordered to make it easier to use
* already deprecated items got removed
* chart migrated to helm 3.0.0

List of contributions:

- Dylan Piergies: Replace Deployment with StatefulSet #33
- Aaron McDonald: move kubernetes plugin configuration under agent #106
- holmesb: Adding prometheus \ metrics guidance #78

Closes #78

Co-authored-by: Aaron McDonald <[email protected]>
Co-authored-by: Dylan Piergies [email protected]
Co-authored-by: holmesb <[email protected]>
Signed-off-by: Torsten Walter <[email protected]>
@torstenwalter torstenwalter mentioned this pull request Nov 21, 2020
4 tasks
torstenwalter added a commit that referenced this pull request Nov 26, 2020
* Chart uses StatefulSet instead of Deployment
* XML configuration was dropped
* offending terms have been removed
* values have been renamed and re-ordered to make it easier to use
* already deprecated items got removed
* chart migrated to helm 3.0.0
* rename containers (#155)
* Configure admin user via configuration as code (#158)
  * admin username and password are no longer exposed as environment
    variables
  * admin password is not modified on helm updates instead the values from
    the existing secret is re-used
  * default length of admin password was increased
  * renamed useSecurity to adminSecret as disabling
    security feels odd and it's not what this flag does.
  * Check usage of deprecated imageTag (#161)
  * Remove whitespace when not rendering helm.sh/chart label (#160)
    Before:

    ```
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: RELEASE-NAME-jenkins
      namespace: cockpit
      labels:
        "app.kubernetes.io/name": 'jenkins'

        "app.kubernetes.io/managed-by": "Helm"
        "app.kubernetes.io/instance": "RELEASE-NAME"
        "app.kubernetes.io/component": "jenkins-controller"
    ```

    After:

    ```
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: RELEASE-NAME-jenkins
      namespace: cockpit
      labels:
        "app.kubernetes.io/name": 'jenkins'
        "app.kubernetes.io/managed-by": "Helm"
        "app.kubernetes.io/instance": "RELEASE-NAME"
        "app.kubernetes.io/component": "jenkins-controller"
    ```

List of contributions:

- Dylan Piergies: Replace Deployment with StatefulSet #33
- Aaron McDonald: move kubernetes plugin configuration under agent #106
- holmesb: Adding prometheus \ metrics guidance #78
- Florian Buchmeier: Updated UI tests to use official BATS image since
  dduportal/bats:0.4.0 is still using docker manifest v1 and thus is deprecated. (#157)

Closes #78

Co-authored-by: Aaron McDonald <[email protected]>
Co-authored-by: Dylan Piergies [email protected]
Co-authored-by: holmesb <[email protected]>
Signed-off-by: Torsten Walter <[email protected]>
torstenwalter added a commit that referenced this pull request Nov 26, 2020
* Chart uses StatefulSet instead of Deployment
* XML configuration was dropped
* offending terms have been removed
* values have been renamed and re-ordered to make it easier to use
* already deprecated items got removed
* chart migrated to helm 3.0.0
* rename containers (#155)
* Configure admin user via configuration as code (#158)
  * admin username and password are no longer exposed as environment
    variables
  * admin password is not modified on helm updates instead the values from
    the existing secret is re-used
  * default length of admin password was increased
  * renamed useSecurity to adminSecret as disabling
    security feels odd and it's not what this flag does.
  * Check usage of deprecated imageTag (#161)
  * Remove whitespace when not rendering helm.sh/chart label (#160)
    Before:

    ```
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: RELEASE-NAME-jenkins
      namespace: cockpit
      labels:
        "app.kubernetes.io/name": 'jenkins'

        "app.kubernetes.io/managed-by": "Helm"
        "app.kubernetes.io/instance": "RELEASE-NAME"
        "app.kubernetes.io/component": "jenkins-controller"
    ```

    After:

    ```
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: RELEASE-NAME-jenkins
      namespace: cockpit
      labels:
        "app.kubernetes.io/name": 'jenkins'
        "app.kubernetes.io/managed-by": "Helm"
        "app.kubernetes.io/instance": "RELEASE-NAME"
        "app.kubernetes.io/component": "jenkins-controller"
    ```

List of contributions:

- Dylan Piergies: Replace Deployment with StatefulSet #33
- Aaron McDonald: move kubernetes plugin configuration under agent #106
- holmesb: Adding prometheus \ metrics guidance #78
- Florian Buchmeier: Updated UI tests to use official BATS image since
  dduportal/bats:0.4.0 is still using docker manifest v1 and thus is deprecated. (#157)

Closes #78

Co-authored-by: Aaron McDonald <[email protected]>
Co-authored-by: Dylan Piergies [email protected]
Co-authored-by: holmesb <[email protected]>
Signed-off-by: Torsten Walter <[email protected]>
torstenwalter added a commit that referenced this pull request Nov 26, 2020
* Chart uses StatefulSet instead of Deployment
* XML configuration was dropped
* offending terms have been removed
* values have been renamed and re-ordered to make it easier to use
* already deprecated items got removed
* chart migrated to helm 3.0.0
* rename containers (#155)
* Configure admin user via configuration as code (#158)
  * admin username and password are no longer exposed as environment
    variables
  * admin password is not modified on helm updates instead the values from
    the existing secret is re-used
  * default length of admin password was increased
  * renamed useSecurity to adminSecret as disabling
    security feels odd and it's not what this flag does.
  * Check usage of deprecated imageTag (#161)
  * Remove whitespace when not rendering helm.sh/chart label (#160)
    Before:

    ```
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: RELEASE-NAME-jenkins
      namespace: cockpit
      labels:
        "app.kubernetes.io/name": 'jenkins'

        "app.kubernetes.io/managed-by": "Helm"
        "app.kubernetes.io/instance": "RELEASE-NAME"
        "app.kubernetes.io/component": "jenkins-controller"
    ```

    After:

    ```
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: RELEASE-NAME-jenkins
      namespace: cockpit
      labels:
        "app.kubernetes.io/name": 'jenkins'
        "app.kubernetes.io/managed-by": "Helm"
        "app.kubernetes.io/instance": "RELEASE-NAME"
        "app.kubernetes.io/component": "jenkins-controller"
    ```

List of contributions:

- Dylan Piergies: Replace Deployment with StatefulSet #33
- Aaron McDonald: move kubernetes plugin configuration under agent #106
- holmesb: Adding prometheus \ metrics guidance #78
- Florian Buchmeier: Updated UI tests to use official BATS image since
  dduportal/bats:0.4.0 is still using docker manifest v1 and thus is deprecated. (#157)

Closes #78

Co-authored-by: Aaron McDonald <[email protected]>
Co-authored-by: Dylan Piergies [email protected]
Co-authored-by: holmesb <[email protected]>
Signed-off-by: Torsten Walter <[email protected]>

Co-authored-by: Aaron McDonald <[email protected]>
Co-authored-by: holmesb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants