Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1875 - Implementing resiliency measures - PodDisruptionBudget #2237

Merged
merged 8 commits into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 15 additions & 8 deletions devops/openshift/api-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ objects:
name: ${NAME}
spec:
replicas: "${{REPLICAS}}"
disruptionBudget:
maxUnavailable: 0
minAvailable: "${{REPLICAS}}"
revisionHistoryLimit: 10
selector:
deploymentconfig: ${NAME}
Expand Down Expand Up @@ -240,6 +237,16 @@ objects:
minReplicas: "${{REPLICAS}}"
maxReplicas: 10
targetCPUUtilizationPercentage: 80
- apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ${NAME}-pdb
spec:
selector:
matchLabels:
app: ${NAME}
maxUnavailable: 1
disruptionsAllowed: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind elaborating on how the disruptionsAllowed configuration affects the PDB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/docs/tasks/run-application/configure-pdb/

image

disruptionsAllowed is a status configuration, I thought i can configure them. Removing it now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure about what it was after reading some docs that why I asked 😉

parameters:
- name: NAME
required: true
Expand All @@ -254,15 +261,15 @@ parameters:
- name: SERVICE_NAME
value: api
- name: CPU_LIMIT
value: "1.0"
value: "0.2"
- name: MEMORY_LIMIT
value: "2000M"
value: "256M"
- name: CPU_REQUEST
value: "0.5"
value: "0.1"
- name: MEMORY_REQUEST
value: "1000M"
value: "128M"
- name: REPLICAS
value: "1"
value: "2"
- name: PORT
required: true
- name: DB_SERVICE
Expand Down
10 changes: 10 additions & 0 deletions devops/openshift/database/mongo-ha.yml
Original file line number Diff line number Diff line change
Expand Up @@ -232,3 +232,13 @@ objects:
resources:
requests:
storage: "${VOLUME_CAPACITY}"
- apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ${NAME}-pdb
spec:
selector:
matchLabels:
app: ${NAME}
maxUnavailable: 1
disruptionsAllowed: 1
11 changes: 11 additions & 0 deletions devops/openshift/database/patroni-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,17 @@ objects:
resources:
requests:
storage: ${PVC_SIZE}
- apiVersion: policy/v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a source of recommendation for Patroni PDB configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind: PodDisruptionBudget
metadata:
Copy link
Collaborator

@andrewsignori-aot andrewsignori-aot Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How we would apply these changes to PROD? Is there a plan for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to deploy the PDB configuration in our namespace manually, than running the yaml.

name: ${NAME}-pdb
spec:
selector:
matchLabels:
app: ${NAME}
minAvailable: 3
maxUnavailable: 1
disruptionsAllowed: 1
parameters:
- name: NAME
value: patroni
Expand Down
11 changes: 11 additions & 0 deletions devops/openshift/database/redis-ha-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,17 @@ objects:
requests:
storage: ${PVC_SIZE}
storageClassName: ${STORAGE_CLASS}
- apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ${NAME}-pdb
spec:
selector:
matchLabels:
app: ${NAME}
minAvailable: 6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a source of recommendation for Redis PDB configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As all the databases we use are statefulsets and I followed the documentation below
https://kubernetes.io/docs/tasks/run-application/configure-pdb/#identify-an-application-to-protect

The numbers I have put in there are values I have like maxUnavailable for each pods, when the disruption happens.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if there is a recommendation from Gov or if we can get something from RocketChat.

maxUnavailable: 2
disruptionsAllowed: 1
parameters:
- name: NAME
description: The name of the application for labelling all artifacts.
Expand Down
20 changes: 13 additions & 7 deletions devops/openshift/forms-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,6 @@ objects:
name: ${NAME}
spec:
replicas: "${{REPLICAS}}"
disruptionBudget:
maxUnavailable: 1
minAvailable: "${{REPLICAS}}"
selector:
app: ${NAME}
strategy:
Expand Down Expand Up @@ -172,7 +169,16 @@ objects:
minReplicas: "${{REPLICAS}}"
maxReplicas: 10
targetCPUUtilizationPercentage: 80

- apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ${NAME}-pdb
spec:
selector:
matchLabels:
app: ${NAME}
maxUnavailable: 1
disruptionsAllowed: 1
parameters:
- name: NAME
displayName: Name
Expand Down Expand Up @@ -217,12 +223,12 @@ parameters:
displayName: Resources CPU Request
description: The resources CPU request (in cores) for this build.
required: true
value: 250m
value: "0.1"
- name: CPU_LIMIT
displayName: Resources CPU Limit
description: The resources CPU limit (in cores) for this build.
required: true
value: "2"
value: "0.5"
- name: MEMORY_REQUEST
displayName: Resources Memory Request
description: The resources Memory request (in Mi, Gi, etc) for this build.
Expand All @@ -232,4 +238,4 @@ parameters:
displayName: Resources Memory Limit
description: The resources Memory limit (in Mi, Gi, etc) for this build.
required: true
value: 2Gi
value: 512Mi
23 changes: 15 additions & 8 deletions devops/openshift/queue-consumers-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ objects:
name: ${NAME}
spec:
replicas: "${{REPLICAS}}"
disruptionBudget:
maxUnavailable: 0
minAvailable: "${{REPLICAS}}"
revisionHistoryLimit: 10
selector:
deploymentconfig: ${NAME}
Expand Down Expand Up @@ -251,6 +248,16 @@ objects:
minReplicas: "${{REPLICAS}}"
maxReplicas: 10
targetCPUUtilizationPercentage: 80
- apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ${NAME}-pdb
spec:
selector:
matchLabels:
app: ${NAME}
maxUnavailable: 1
disruptionsAllowed: 1
parameters:
- name: NAME
required: true
Expand All @@ -259,15 +266,15 @@ parameters:
- name: SERVICE_NAME
value: queue-consumers
- name: CPU_LIMIT
value: "1.0"
value: "0.2"
- name: MEMORY_LIMIT
value: "2000M"
value: "256M"
- name: CPU_REQUEST
value: "0.5"
value: "0.1"
- name: MEMORY_REQUEST
value: "1000M"
value: "128M"
- name: REPLICAS
value: "1"
value: "2"
- name: DB_SERVICE
value: patroni-master
- name: DB_SECRET_NAME
Expand Down
17 changes: 12 additions & 5 deletions devops/openshift/web-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ objects:
name: ${NAME}
spec:
replicas: "${{REPLICAS}}"
disruptionBudget:
maxUnavailable: 1
minAvailable: "${{REPLICAS}}"
revisionHistoryLimit: 10
selector:
deploymentconfig: ${NAME}
Expand Down Expand Up @@ -97,6 +94,16 @@ objects:
minReplicas: "${{REPLICAS}}"
maxReplicas: 10
targetCPUUtilizationPercentage: 80
- apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ${NAME}-pdb
spec:
selector:
matchLabels:
app: ${NAME}
maxUnavailable: 1
disruptionsAllowed: 1
parameters:
- name: NAME
required: true
Expand All @@ -105,9 +112,9 @@ parameters:
- name: SERVICE_NAME
value: web
- name: CPU_LIMIT
value: "0.5"
value: "0.25"
- name: MEMORY_LIMIT
value: "256M"
value: "512M"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMS-Api will do way more processing than the Web POD, why we would justify more memory allocation to Web than API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initially we had only one pod API with more memory, but due to the change in replication controller to have max replicas of 10 we changed it to smaller numbers. In the case of web, even though there are more processing done by API, the web had to render the application faster, so for safer side, I had the values in the vertical pod bumped up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The POD will not render anything, it will just allow the download of static files (HTML, js, CSS, ect.). I did not follow the explanation.

- name: CPU_REQUEST
value: "0.1"
- name: MEMORY_REQUEST
Expand Down
23 changes: 15 additions & 8 deletions devops/openshift/workers-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ objects:
name: ${NAME}
spec:
replicas: "${{REPLICAS}}"
disruptionBudget:
maxUnavailable: 1
minAvailable: "${{REPLICAS}}"
revisionHistoryLimit: 10
selector:
deploymentconfig: ${NAME}
Expand Down Expand Up @@ -99,6 +96,16 @@ objects:
minReplicas: "${{REPLICAS}}"
maxReplicas: 10
targetCPUUtilizationPercentage: 80
- apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ${NAME}-pdb
spec:
selector:
matchLabels:
app: ${NAME}
maxUnavailable: 1
disruptionsAllowed: 1
parameters:
- name: NAME
required: true
Expand All @@ -107,15 +114,15 @@ parameters:
- name: SERVICE_NAME
value: workers
- name: CPU_LIMIT
value: "1.0"
value: "0.2"
- name: MEMORY_LIMIT
value: "2000M"
value: "256M"
- name: CPU_REQUEST
value: "0.5"
value: "0.1"
- name: MEMORY_REQUEST
value: "1000M"
value: "128M"
- name: REPLICAS
value: "1"
value: "2"
- name: DB_SERVICE
value: patroni-master
- name: DB_SECRET_NAME
Expand Down