-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spring 2023: Decrease AWS costs #3502
Comments
|
Update: #2846 is now closed with the following results:
=> the garbage collecting is now tracking and removing these 2 resources on each build of jenkins-infra/packer-image. |
Update: as per jenkins-infra/packer-images#596, we should not build anymore EC2 templates with packer-images for Linux x86 and Windows (*) as we do not use them. Not a lot of cost gain to expect: it's currently ~1$ daily so might not even be visible. But it's of intereste for use (less complex pipeline). |
Resource tuning for ci.jenkins.io tracked in #3521 |
|
Please note that @basil 's PR jenkinsci/bom#2031 on the |
Update: the tuning on the bom was pretty effective and we see now a sustainable usage in ohio: The main focus has to be on us-east-1 for now. The outbound bandwidth of the update center in next in line: #2649 |
Created a daily monitor for cost anomalies, let's see how it behaves |
Update:
We're closing this issue as it was scoped at analysing and controlling unexpected costs, and only during Spring 2023. Next steps for summer 2023 in #3662 to continue reaching the $5,000 monthly goal. |
More than 1 year ago, #2646 was closed after we controlled the AWS spending.
But we're back at an AWS unsustainable bill: in March 2023, we spent ~ $18,000 in AWS.
As a reminder, the following Jenkins Infrastructure elements are present in AWS:
us-east-1
:pkg.origin.jenkins.io
(packaging and serving Jenkins packages) and the Update Center index (updates.jenkins.io
/updates.jenkins-ci.org
)trusted.ci.jenkins.io
(SSH bastionbounce
, the controller and the permanent agent with the jenkins-infra/update_center2 cache)usage.jenkins.io
census.jenkins.io
us-east-2
:cik8s
used for the Linux container agents forci.jenkins.io
(and associated resources: node pools, networks, etc.)eks-public
used for hosting the Artifact Caching Registry for AWS (and associated resources: node pools, networks, load balancers, nat gateway, etc.)ci.jenkins.io
:arm64
machines (Linux x86 and Windows-Server-* are now in Azure)As discussed with @lemeurherve and @smerle33 during today's team mob-programming about cloud budgets:
cik8s
cluster to maximum 10 nodes (== 30 agents) as per EC2s are not available #3421 , and no VM agents (exceptarm64
). It allowed us to have around ~ $ 300 daily (which is already a lot...). You can see the result of setting capacity back to 50 nodes (== 150 agents) in the cost increase:us-east-1
, associated to the VM hostingpkg.origin.jenkins.io
. Moving this VM to another cloud where the bandwitdh if cheaper would clearly allow us to avoid spending 3500-4000 bucks monthly! Tracked in [INFRA-3100] Migrate updates.jenkins.io to another Cloud #2649us-east-2
shows we have the following elements to take care of:cik8s
has different direct costs:SpotUsage:m5*
is a cost related to the usage of VM used as nodes. Decreasing this cost means either decreasing build rate, build times, and optimize the agent packing (checking pod limits, pack more pods by using bigger instances, avoid building BOM builds when not needed, etc.).USE2-NAT-Gateway
need to be analysised with more details. But it could be related to the "symetric" of [ci.jenkins.io] Azure billing shows huge cloud cost due to outbound bandwidth #3485 (involving thestash
andarchiveArtifacts
steps sending data to ci.jenkins.io controller in Azure) and to the Artifact Caching proxy downloads from Jfrog (when uncaching artifacts).The text was updated successfully, but these errors were encountered: