AAW Dev: Re-size workloads scheduled on system nodepools #1992

Jose-Matsuda · 2024-11-27T16:18:17Z

Take the pod workloads that you can see on the system workloads and after consulting with grafana over an extended period of time (the option) suggest and change workload sizes.

It would be nice to have a table kind of like this

Resource	Current CPU	Suggested CPU	Current Mem	Suggested Mem
toleration-injector	500m	1m	128Mi	20Mi

Follow up issue for general nodepool #1997

The text was updated successfully, but these errors were encountered:

jacek-dudek · 2024-12-02T17:54:52Z

I created excel tables of cpu and memory utilization averages of all pods on the system nodes in aaw-dev.
I used the dashboard named: General / Kubernetes / Compute Resources / Node (Pods)
The averages are evaluated over the last 7 days.
I'm resolving some formatting issues and will be posting the tables shortly.

jacek-dudek · 2024-12-03T05:45:22Z

Here are the tables for cpu and memory usage of pods running on system nodes and suggested requests:
resource-utilization-on-aaw-dev-system-nodes.xlsx

jacek-dudek · 2024-12-03T05:47:15Z

Currently tracking down the parent objects and manifests corresponding to all the pods listed in the tables.

Jose-Matsuda · 2024-12-03T15:56:55Z

CPU-wise the sum of all the CPU requests in the column goes up to 3.48 vCPU, this would fit on two system nodes sized to using 2 D2s, still want to better size the requests of course, but I will note an inaccuracy in the toleration-injector, which itself only has limits and not requests, this means that theyll schedule wherever and won't reserve the stated 0.5 CPU, so after removing that the actual vCPU request goes up to 2.98 and this may be the case for some others.

Also need to be careful of making sure the memory requests are honoured as well, as if we move to a D2 we get 8Gb of memory

Jose-Matsuda · 2024-12-03T18:35:52Z

Took a bit of time to go over the pods listed in Jacek's excel file and made a few notes;

Daemonsets you can easily influence

statcan-system/sysctl, azure-blob-csi-system/csi-blob-node: both of these are already well-sized already though

Daemonsets deployed via helm

These ones I'm not too sure on as some may just be deployed by CNS
kube-prometheus-stack-prometheus-node-exporter, fluentd-operator-fluentd-operator, aad-pod-identity-nmi

Daemonsets with no traceable (might just be CNS / AKS)

azure-ip-masq-agent, azure-npm, cloud-node-manager, csi-azuredisk-node, csi-azurefile-node, istio-cni-node, kube-proxy
These daemonsets you don't need to touch or try to modify

Deployments that you don't need to change (no requests)

cert-manager-anything, kube-prometheus-stack-kube-state-metrics, statcan-system/toleration-injector (only has limits)

Likely dont have the ability to change

coredns(100m,70Mi), coredns-autoscaler(20m,10Mi), konnectivity-agent(20,20), metrics-server(5m,30Mi),

Resources deployed by helm

gatekeeper-audit(100m, 1546Mi), gatekeeper-controller-manager(100m, 1546Mi),

Argo resources

statcan-system/sidecar-terminator(10m,200M)

jacek-dudek · 2024-12-05T15:47:14Z

My discovery work so far. Identified four argocd managed workloads and a bunch of aks managed workloads. Not sure about the helm managed ones yet.
object-hierarchy-and-labels-and-annotations.xlsx

Jose-Matsuda assigned jacek-dudek Nov 27, 2024

This was referenced Nov 27, 2024

AAW Dev: Re-size workloads scheduled on general nodepools #1997

Closed

AAW Dev: Resource Utilization #1998

Open

jacek-dudek closed this as completed Dec 6, 2024

Jose-Matsuda mentioned this issue Dec 11, 2024

AAW dev: system nodepools track resource usage #2001

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AAW Dev: Re-size workloads scheduled on system nodepools #1992

AAW Dev: Re-size workloads scheduled on system nodepools #1992

Jose-Matsuda commented Nov 27, 2024 •

edited

Loading

jacek-dudek commented Dec 2, 2024

jacek-dudek commented Dec 3, 2024

jacek-dudek commented Dec 3, 2024

Jose-Matsuda commented Dec 3, 2024 •

edited

Loading

Jose-Matsuda commented Dec 3, 2024

jacek-dudek commented Dec 5, 2024

AAW Dev: Re-size workloads scheduled on system nodepools #1992

AAW Dev: Re-size workloads scheduled on system nodepools #1992

Comments

Jose-Matsuda commented Nov 27, 2024 • edited Loading

jacek-dudek commented Dec 2, 2024

jacek-dudek commented Dec 3, 2024

jacek-dudek commented Dec 3, 2024

Jose-Matsuda commented Dec 3, 2024 • edited Loading

Jose-Matsuda commented Dec 3, 2024

Daemonsets you can easily influence

Daemonsets deployed via helm

Daemonsets with no traceable (might just be CNS / AKS)

Deployments that you don't need to change (no requests)

Likely dont have the ability to change

Resources deployed by helm

Argo resources

jacek-dudek commented Dec 5, 2024

Jose-Matsuda commented Nov 27, 2024 •

edited

Loading

Jose-Matsuda commented Dec 3, 2024 •

edited

Loading