Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching to new machine types on aaw-dev cluster #2022

Open
jacek-dudek opened this issue Feb 19, 2025 · 6 comments
Open

Switching to new machine types on aaw-dev cluster #2022

jacek-dudek opened this issue Feb 19, 2025 · 6 comments
Assignees

Comments

@jacek-dudek
Copy link

Reach to cloud native team and/or fin ops to discuss switching to more cost effective machines for system, cloudmainsys, and general nodepools on aaw-dev cluster. Follow up to these issues: #1965, #1993.

Proposed machine models for each nodepool are:
cloudmainsys, system: Standard-D2ds_v5
general: Standard_E4ds_v5

@jacek-dudek jacek-dudek self-assigned this Feb 19, 2025
@Souheil-Yazji
Copy link
Contributor

Contact for FinOps is Lyly Vu

@jacek-dudek
Copy link
Author

Reached out to Lyly to confirm choice of new virtual machine models and whether there is any special pricing arrangement for other VM models.

@jacek-dudek
Copy link
Author

Sent more info to FinOps on when and how many machines are expected to be running on the affected nodepools. Waiting for feedback.

@jacek-dudek
Copy link
Author

@Souheil-Yazji
Copy link
Contributor

@jacek-dudek Please update this issue with content from emails with finOps. Let's follow up next fiscal

@jacek-dudek
Copy link
Author

Attaching email communication with FinOps regarding this, and putting issue into backlog until FinOps gets back to us with recommendations:
Hi Jacek,

Thank you for that information. I’ve noted that the following will be used 24/7 for AAW:
• Standard_D2ds_v5 (3 instances)
• Standard_E4ds_v5 (5 instances)

Lyly

From: Dudek, Jacek (StatCan) [email protected]
Sent: Monday, March 3, 2025 2:03 PM
To: Vuu, Lyly (SSC/SPC) [email protected]
Cc: Yazji, Souheil (StatCan) [email protected]; Vuu, Lyly (she | elle) (StatCan) [email protected]; Al-Zaher, Sarah (she | elle) (StatCan) [email protected]; Verma, Ravi (StatCan) [email protected]
Subject: RE: Inquiry about cloud costs of different VM models

Hello again Lyly,

So here are the expected number of nodes per nodepool:
cloudmainsys: 1
system: 2
general: 5

I expect those numbers to stay the same after the change of machine types. And these machines are always up and running whenever the cluster is up. So generally they'll be running 24 hours per day. (We have other nodepools for intermittent workloads started by users.)

Here's a link to the github issue I opened to document this work if you want to comment directly in the issue:
#2022

Also, here are links to the github issues where we discussed the reasoning behind the suggested changes:
#1965
#1993

Regards,
Jacek

From: Vuu, Lyly (SSC/SPC) [email protected]
Sent: Monday, March 3, 2025 6:37 AM
To: Dudek, Jacek (StatCan) [email protected]
Cc: Yazji, Souheil (StatCan) [email protected]; Vuu, Lyly (she | elle) (StatCan) [email protected]; Al-Zaher, Sarah (she | elle) (StatCan) [email protected]; Verma, Ravi (StatCan) [email protected]
Subject: RE: Inquiry about cloud costs of different VM models

Unclassified | Non classifié

Morning Jacek,

First of all, FinOps appreciates your proactiveness in rightsizing your VMs – this is great!

Your proposed VM SKU changes to Standard_Dds_v5 and Standard_Eds_v5 are in line with commonly used SKUs in the tenant so these choices are acceptable. We have Reserved Instances (RIs) for these SKUs although we cannot control where these RIs get applied (Azure automatically applies the RI discount where it is most beneficial).

Can you please tell me how many instances you’ve expecting of each, and how many hours they are expected to run per day?

Cc-ing my StatCan FinOps counterparts (Sarah and Ravi) as we’re currently in the midst of doing a RI review and the information you’re providing will be relevant.

Thanks,
Lyly

From: Dudek, Jacek (StatCan) [email protected]
Sent: February 28, 2025 10:50 AM
To: Vuu, Lyly (SSC/SPC) [email protected]
Cc: Yazji, Souheil (StatCan) [email protected]
Subject: Inquiry about cloud costs of different VM models

Hello Lyly,

I help maintain a kubernetes cluster for the Advanced Analytics Workspace platform (AAW) at Statistics Canada. Recently we reviewed the virtual machine types that are being used for the clusters hosting that platform and examined the resource utilization. We noticed that most of these machines are underutilized and we may be able to switch to lower performance models.

Currently I'm proposing VM model changes across three nodepools in the development cluster as follows:
cloudmainsys: Standard_D16s_v3 -> Standard_D2ds_v5
system: Standard_D8s_v3 -> Standard_D2ds_v5
general: Standard_D8s_v3 -> Standard_E4ds_v5

I expect the number of machines in each nodepool to remain the same, and so would expect a lower monthly cost based on calculations using Microsoft Azure's pricing calculator.

I wanted to confirm with you if we have any sort of special pricing arrangements on other VM models that might come into play in my cost estimations and would be a better choice. Please let me know if you require more information to make an assessment.

Regards,
Jacek Dudek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants