Proposition to refactor cloud instance models and data #252

samuelrince · 2023-12-05T17:10:03Z

Problem

We want to make the process of adding cloud instances as simple as possible, while:

Keeping the ability to compute impacts with the bottom-up methodology,
Allocating embodied impacts of the cloud instance as a fraction of embodied impacts of the platform (or bare-metal server).

Resulting in describing both instance characteristics and platform (or bare metal) characteristics.

As of today, both are stored under the same CSV file (cloud archetypes) and can be confusing to interact with. First, it leads to duplicated data about some components (especially CPU with cpu_specs.csv). Second, the contributor needs to understand complex concepts to be able to make a new submission (e.g. difference between vcpu, platform_vcpu, CPU.core_units * CPU.units and USAGE.instance_per_server based on vcpu counts).

Solution

We propose a new way to add cloud instances that should clarify this process. We will separate the concept of a cloud instance and platform (or bare-metal server).

A cloud instance will be described with very few fields that are close to the description provided by cloud providers.

Example of a c5.2xlarge (in new aws.csv):

- id: c5.2xlarge
- vcpu: 8
- memory: 16
- storage_type: null
- storage_units: 0
- storage_capacity: 0
- gpu: 0
- platform: c5.18xlarge

The platform defined here is c5.18xlarge is another cloud instance AND a server archetype defined as follows:

Cloud instance (also in new aws.csv):

- id: c5.18xlarge
- vcpu: 72
- memory: 144
- storage_type: null
- storage_units: 0
- storage_capacity: 0
- gpu: 0
- platform: c5.18xlarge

Platform (server archetype):

- id: c5.18xlarge
- manufacturer: AWS
- CASE.case_type: rack
- CPU.units: 2
- CPU.name: Intel Xeon Platinum 8124M
- RAM.units: 12
- RAM.capacity: 16
- SSD.units: 0
- SSD.capacity: 0
- HDD.units: 0
- HDD.capacity: 0
- GPU.units: 0
- GPU.name: null
- GPU.memory: 0
- GPU.connector_type: null
- POWER_SUPPLY.units: 2
- POWER_SUPPLY: 2.99;1;5
- USAGE.time_workload: 50;0;100
- USAGE.use_time_ratio: 1
- USAGE.hours_life_time: 35040
- USAGE.other_consumption_ratio: 0.33;0.2;0.6
- USAGE.overcommitted: ??? (true/false or ratio) 
- warnings: ...

In this description, the embodied impacts of the cloud instance can be derided from this operation:

$$ Instance_{embodied} = \frac{instance.vcpu}{platform.vcpu} * Platform_{embodied} $$

We will need to introduce the notion of "vcpu" in CPU modeling so that we can take into account the number of "threads" or "virtual cores" in hyper-threading scenarios. Like following:

$$ \text{platform.vcpu} = \text{Platform.CPU.units} * \text{Platform.CPU.vcpu} $$

OR:

$$ \text{platform.vcpu} = \text{Platform.CPU.units} * \text{Platform.CPU.core\_units} * \text{(\# virtual cores per core)} $$

Happy to hear about your feedback @da-ekchajzer. I will detail other examples below.

The text was updated successfully, but these errors were encountered:

samuelrince · 2023-12-05T17:13:00Z

Example of a `p4d.24xlarge` (GPU):

Instance:

- id: p4d.24xlarge
- vcpu: 96
- memory: 1152
- storage_type: ssd
- storage_units: 8
- storage_capacity: 1000
- gpu: 8
- platform: p4d.24xlarge

Platform (archetype):

- id: p4d.24xlarge
- manufacturer: AWS
- CASE.case_type: rack
- CPU.units: 2
- CPU.name: Intel Xeon Platinum 8275CL
- RAM.units: 36
- RAM.capacity: 32
- SSD.units: 8
- SSD.capacity: 1000
- HDD.units: 0
- HDD.capacity: 0
- GPU.units: 8
- GPU.name: NVIDIA A100
- GPU.memory: 40
- GPU.connector_type: sxm
- POWER_SUPPLY.units: 2
- POWER_SUPPLY: 2.99;1;5
- USAGE.time_workload: 50;0;100
- USAGE.use_time_ratio: 1
- USAGE.hours_life_time: 35040
- USAGE.other_consumption_ratio: 0.33;0.2;0.6
- USAGE.overcommitted: ??? (true/false or ratio) 
- warnings: ...

samuelrince · 2023-12-05T17:37:43Z

A tricky example (`d3.2xlarge`, storage optimized):

Instance:

- id: d3.2xlarge
- vcpu: 8
- memory: 64
- storage_type: hdd
- storage_units: 6
- storage_capacity: 2000
- gpu: 0
- platform: ??? (see different alternatives)

Alternative 1 (incoherent allocation)

We set plartform: d3.8xlarge (biggest instance).

So instance d3.8xlarge is:

- id: d3.8xlarge
- vcpu: 32
- memory: 256
- storage_type: hdd
- storage_units: 24
- storage_capacity: 2000
- gpu: 0
- platform: d3.8xlarge

Platform:

- id: d3.8xlarge
- manufacturer: AWS
- CASE.case_type: rack
- CPU.units: 2
- CPU.name: Intel Xeon Platinum 8259CL
- RAM.units: 8
- RAM.capacity: 32
- SSD.units: 0
- SSD.capacity: 0
- HDD.units: 24
- HDD.capacity: 2000
- GPU.units: 0
- GPU.name: null
- GPU.memory: 0
- GPU.connector_type: null
- POWER_SUPPLY.units: 2
- POWER_SUPPLY: 2.99;1;5
- USAGE.time_workload: 50;0;100
- USAGE.use_time_ratio: 1
- USAGE.hours_life_time: 35040
- USAGE.other_consumption_ratio: 0.33;0.2;0.6
- USAGE.overcommitted: ??? (true/false or ratio) 
- warnings: ...

Allocation by vcpu

The platform described above has 2 units * 48 threads = 96 vcpu (source), but the instance itself only has 32 vcpu. Meaning that in terms of vcpu allocation we could fit 3 d3.8xlarge onto the underlying platform.

Allocation by disks

The platform described has 24 disks whereas d3.2xlarge got 6 of them. So we can fit 4 d3.2xlarge onto the underlying platform, but this is not coherent with the vcpu allocation (32 vcpu of the platform vs 4 vcpu * 4 instances = 16 vcpu)

Alternative 2 (creating a virtual platform)

We set platform: d3.platform.

Notice that d3.platform does not exist in AWS (only in BoaviztAPI archetype referential).

Platform (archetype):

- id: d3.platform
- manufacturer: AWS
- CASE.case_type: rack
- CPU.units: 2
- CPU.name: Intel Xeon Platinum 8259CL
- RAM.units: 12
- RAM.capacity: 64
- SSD.units: 0
- SSD.capacity: 0
- HDD.units: 72
- HDD.capacity: 2000
- GPU.units: 0
- GPU.name: null
- GPU.memory: 0
- GPU.connector_type: null
- POWER_SUPPLY.units: 2
- POWER_SUPPLY: 2.99;1;5
- USAGE.time_workload: 50;0;100
- USAGE.use_time_ratio: 1
- USAGE.hours_life_time: 35040
- USAGE.other_consumption_ratio: 0.33;0.2;0.6
- USAGE.overcommitted: ??? (true/false or ratio) 
- warnings: ...

Allocation by vcpu

Platform has 96 vcpus
Can fit 3 x d3.8xlarge in terms of vcpu
So the amount of memory should be 3 x d3.8xlarge memory capacity = 3 instances * 256 GB = 768 GB
We can assume that 768 GB = 12 banks * 64 GB
So the number of disks should be 3 x d3.8xlarge disks = 3 instances * 24 units = 72 units

And also works for the original instance:
Can fit 12 x d3.2xlarge in terms of vcpu
So the amount of memory should be 12 x d3.2xlarge memory capacity = 12 instances * 64 GB = 768 GB (Still 12 banks * 64 GB)
Matches the first assumption ✅
So the number of disks should be 12 x d3.8xlarge disks = 12 instances * 6 units = 72 units
Matches the first assumption ✅

samuelrince · 2023-12-05T17:40:04Z

Any opinions @demeringo @github-benjamin-davy @JacobValdemar? 🤗

JacobValdemar · 2023-12-05T19:22:37Z

Thank you for submitting this proposal. Here are my initial thoughts dumped in random order 😄

I agree, the CSV file can be confusing to interact with.
I agree, one of the most frustrating things is data duplication between aws.csv and cpu_specs.csv.
I agree, there is a steep learning curve before being able to contribute with data. Some complexity is inherent (e.g. CPU properties are inherently complex to understand without prior knowledge), and some other complexity is unnecessary and should be mitigated (as you describe).

The proposed solution makes it more explicit that a cloud instance is a fraction of a platform. I like that.

The proposed solution de-duplicates data. I like that.

I don't know if it is intentional, but it seems you have removed some of fields from the platform which are currently in aws.csv. I like that. To me it seems better to push for adding CPU details in cpu_specs.csv instead of into aws.csv.

Do you propose creating a separate file (e.g. platforms.csv) for platforms, or should they continue to reside in aws.csv?

Regarding tricky example d3.2xlarge, I prefer the virtual platform because it provides a more consistent and comparable result.

Something else is that I think it feels "bad" to work inside a large CSV file like aws.csv. I struggle with distinguishing columns from each other and identifying what "header" a value belongs to. However, I don't have any suggestion for a better solution. I have thought about JSON/YAML, but I think they would make too long files since each value use a row/line. However, it could be something to consider. Just a thought.

da-ekchajzer · 2023-12-05T21:07:30Z

Thank you, @samuelrince, for detailing our discussion so well, and thank you, @JacobValdemar, for your feedback, which confirms the importance of this reflection.

Do you propose creating a separate file (e.g. platforms.csv) for platforms, or should they continue to reside in aws.csv?

In my opinion, we should use the server archetype CSV, which already has the necessary columns. This could allow contributors to add instances by identifying a nearby generic platform already in the file without having to add it.

Regarding tricky example d3.2xlarge, could the problem be that we assume that one platform host only one type of instance ? I think this issue will occur also for RAM and GPU.

Could the problem be solved by allocating the impacts component per component ?

- id: d3.8xlarge
- vcpu: 32
- memory: 256
- storage_type: hdd
- storage_units: 24
- storage_capacity: 2000
- gpu: 0
- platform: d3.8xlarge

Platform:

- id: d3.8xlarge
- manufacturer: AWS
- CASE.case_type: rack
- CPU.units: 2
- CPU.name: Intel Xeon Platinum 8259CL
- RAM.units: 8
- RAM.capacity: 32
- SSD.units: 0
- SSD.capacity: 0
- HDD.units: 24
- HDD.capacity: 2000
- GPU.units: 0
- GPU.name: null
- GPU.memory: 0
- GPU.connector_type: null
- POWER_SUPPLY.units: 2
- POWER_SUPPLY: 2.99;1;5
- USAGE.time_workload: 50;0;100
- USAGE.use_time_ratio: 1
- USAGE.hours_life_time: 35040
- USAGE.other_consumption_ratio: 0.33;0.2;0.6
- USAGE.overcommitted: ??? (true/false or ratio) 
- warnings: ...

$$ Cpu_{embodied} = \frac{instance.vcpu}{platform.vcpu} * CPU_{embodied} $$

$$ RAM_{embodied} = \frac{instance.memory}{platform.RAM.units*platform.RAM.capacity} * RAM_{embodied} $$

$$ SSD_{embodied} = \frac{instance.storage_capacity}{platform.SSD.units*SSD.capacity} * SSD_{embodied} $$

If we find out that this solution is too complicated or not relevant, I would also prefer the virtual platform solution.

samuelrince · 2023-12-06T10:02:33Z

Thank you, both @JacobValdemar @da-ekchajzer for your feedback in such a short timing! 😄

Do you propose creating a separate file (e.g. platforms.csv) for platforms, or should they continue to reside in aws.csv?

In my opinion, we should use the server archetype CSV, which already has the necessary columns. This could allow contributors to add instances by identifying a nearby generic platform already in the file without having to add it.

I agree with @da-ekchajzer about the CSV for platforms, we should use the already existing one with server archetypes. If the file gets too big that it becomes an issue, we can still split it by cloud provider in the future.

Regarding tricky example d3.2xlarge, could the problem be that we assume that one platform host only one type of instance ? I think this issue will occur also for RAM and GPU.

Could the problem be solved by allocating the impacts component per component ?

On the subject of allocation by components, I like the idea, but I think we will struggle to make it happen in the v1, given the architecture (cf our previous ~~2 min~~ 55 min conversation). We probably need to do some ugly computation in the code to extract the impacts of given components from the server object.

Also, for this approach to work, I think we need to specify the "purpose" of an instance to decide on which component we are going to make the allocation (compute, storage, general purpose, etc). For instance, if we take g5 instances, there are SSDs, but the impact is clearly due to the compute part (GPU, CPU, RAM), so we should say that it is a compute instance. Then a compute instance with GPU, so should we allocate by GPU and/or CPU and/or RAM? I think it adds complexity, and we need to think this thoroughly.

Plus, I really don't know if it makes sense to have servers hosting different types of instances? Does it really exist? If I select the CPU of the d3 instance, I see that m5, r5, vt1, g4 also share the same CPU so it could be possible... And given that I think the virtual platform makes sense in that use case as well.

I am not entirely convinced by the virtual platform, but I find it easier to deal with, even though this is something we will probably have a hard time to fully automate (in terms of platform creation in the CSV file). And later (in v2?), we can maybe address the issue of component-wise allocation strategies.

What do you think?

samuelrince · 2023-12-06T10:08:50Z

Something else is that I think it feels "bad" to work inside a large CSV file like aws.csv. I struggle with distinguishing columns from each other and identifying what "header" a value belongs to. However, I don't have any suggestion for a better solution. I have thought about JSON/YAML, but I think they would make too long files since each value use a row/line. However, it could be something to consider. Just a thought.

On that subject, I feel that it can also be an obstacle to new contributions. On my side, for the research part, I open the CSV on GitHub and filter the rows, but we appending data at the end of the file I can see myself struggling with that as well. Maybe we can look out for an open source project that can expose a CSV file with a nice UI in the browser, for instance?

I am thinking of projects like instances.vantage.sh for example.

da-ekchajzer · 2023-12-06T10:54:10Z

Plus, I really don't know if it makes sense to have servers hosting different types of instances? Does it really exist? If I select the CPU of the d3 instance, I see that m5, r5, vt1, g4 also share the same CPU so it could be possible... And given that I think the virtual platform makes sense in that use case as well.

I was more thinking of a same "type" of instance but different level of resources. I think that in some case, the different resources (RAM,vCPU, SSD, GPU) doesn't scale linearly. Is that the problem for d3.8xlarge ?

We probably need to do some ugly computation in the code to extract the impacts of given components from the server object.

I think it would be easy to implement, but may be more complicated to explain/document. We would need to apply a ratio on each component during the impacts aggregation. The ratio would be computed for each component from the platform and instance data.

I just cannot figure out if that will solve our problem.

samuelrince · 2023-12-06T18:34:52Z

Plus, I really don't know if it makes sense to have servers hosting different types of instances? Does it really exist? If I select the CPU of the d3 instance, I see that m5, r5, vt1, g4 also share the same CPU so it could be possible... And given that I think the virtual platform makes sense in that use case as well.

I was more thinking of a same "type" of instance but different level of resources. I think that in some case, the different resources (RAM,vCPU, SSD, GPU) doesn't scale linearly. Is that the problem for d3.8xlarge ?

Well, it's not only about the allocation, but also about how to choose the platform instance in that case.

The premise here is to guess the total amount of vCPU of the platform. One CPU (Intel Xeon Platinum 8259CL) has 48 vcpu. So it's enough to fit 1 x d3.8x + 1 x d3.4x. But it could be possible (and highly probable in my opinion) that we have 2 CPUs. If that's the case, how do we guess the scaling of the other components (RAM and HDDs here) for the platform? That is what I proposed in the Alternative 2 of the previous comment. Use the most probable config in terms of vcpus of the platform, then hint the rest based on trying to fit N times the same instance (ideally the biggest one and checking if it works with other variants)

I remember from our discussion that you indeed mentioned that it was not that difficult to add allocation based on components. I think the best way to answer this is to test.

Scenario 1

The platform can fit 1x d3.8x + 1x d3.4x, meaning we can deduce the following minimal configuration:

Platform:

CPU: 1x Intel Xeon Platinum 8259CL
RAM: 256 + 128 = 384 GB
HDD: (24 + 12) * 2000 GB = 36 * 2000 GB

Minified platform archetype:

- id: platform_d3
- CPU.units: 1
- CPU.name: Intel Xeon Platinum 8259CL
- RAM.units: 6
- RAM.capacity: 64
- HDD.units: 36
- HDD.capacity: 2000

If we compute the embodied impacts of the platform, we have:

Platform impacts:
- gwp: 2200 kgCO2eq
- adp: 0.13 kgSbeq
- pe: 24000 MJ

Input JSON:

{
  "model": {
    "type": "rack"
  },
  "configuration": {
    "cpu": {
      "units": 1,
      "name": "Intel Xeon Platinum 8259CL"
    },
    "ram": [
      {
        "units": 6,
        "capacity": 64
      }
    ],
    "disk": [
      {
        "units": 36,
        "type": "hdd",
        "capacity": 2000
      }
    ]
  }
}

Meaning that we can now compute the impacts of d3.8x and d3.4x instances, by vcpu only or by all components.

By vcpu only

For d3.8x

Instance has 32 vcpu so, 32/48 of total embodied impacts.

d3.8x instance impacts:
- gwp: 1467 kgCO2eq
- adp: 0.087 kgSbeq
- pe: 16000 MJ

For d3.4x

Instance has 16 vcpu so, 16/48 of total embodied impacts.

d3.4x instance impacts:
- gwp: 733 kgCO2eq
- adp: 0.043 kgSbeq
- pe: 8000 MJ

By all components

For d3.8x

Instance has 32 vcpu, 256 GB or RAM and 24 disks

d3.8x instance impacts:
- gwp: 18 + 473 + 747 + 100 + 100 + 4.45 = 1442 kgCO2eq
- adp: 0.0136 + 0.02 + 0.006 + 0.033 + 0.0135 = 0.0861 kgSbeq
- pe:  260 + 6000 + 6624 + 1400 + 1467 + 46 = 15797 MJ

We are very close to the impacts with the previous calculation.

Detailed calculation:

CPU
- gwp: 32/48 * 27 = 18 kgCO2eq
- adp: 32/48 * 0.02041 = 0.0136 kgSbeq
- pe: 32/48 * 390 = 260 MJ

RAM
- gwp: 256/384 * 710 = 473 kgCO2eq
- adp: 256/384 * 0.03 = 0.02 kgSbeq
- pe: 256/384 * 9000 = 6000 MJ

HDD
- gwp: 24/36 * 1120 = 747 kgCO2eq
- adp: 24/36 * 0.009 = 0.006 kgSbeq
- pe: 24/36 * 9936 = 6624 MJ

Power supply (using vcpu ratio)
- gwp: 32/48 * 150 = 100 kgCO2eq
- adp: 32/48 * 0.05 = 0.033 kgSbeq
- pe: 32/48 * 2100 = 1400 MJ

Case (using vcpu ratio)
- gwp: 32/48 * 150 = 100 kgCO2eq
- adp: 32/48 * 0.0202 = 0.0135 kgSbeq
- pe: 32/48 * 2200 = 1467 MJ

Motherboard (using vcpu ratio)
- gwp: 32/48 * 66.1 = 44 kgCO2eq
- adp: 32/48 * 0.00369 = 0.00246 kgSbeq
- pe: 32/48 * 836 = 557 MJ

Assembly (using vcpu ratio):
- gwp: 32/48 * 6.68 = 4.45
- adp: 32/48 * 0.00000141 ~= 0
- pe: 32/48 * 68.6 = 46

For d3.4x

Not doing this one, sorry.

Scenario 2

The platform can fit 3x d3.8x, meaning we can deduce the following minimal configuration:

Platform:

CPU: 2x Intel Xeon Platinum 8259CL
RAM: 3x 256 = 768 GB
HDD: (3x 24) * 2000 GB = 72 * 2000 GB

Minified platform archetype:

- id: platform_d3
- CPU.units: 2
- CPU.name: Intel Xeon Platinum 8259CL
- RAM.units: 12
- RAM.capacity: 64
- HDD.units: 72
- HDD.capacity: 2000

If we compute the embodied impacts of the platform, we have:

Platform impacts:
- gwp: 4100 kgCO2eq
- adp: 0.19 kgSbeq
- pe: 44000 MJ

Input JSON:

{
  "model": {
    "type": "rack"
  },
  "configuration": {
    "cpu": {
      "units": 2,
      "name": "Intel Xeon Platinum 8259CL"
    },
    "ram": [
      {
        "units": 12,
        "capacity": 64
      }
    ],
    "disk": [
      {
        "units": 72,
        "type": "hdd",
        "capacity": 2000
      }
    ]
  }
}

Meaning that we can now compute the impacts of d3.8x and d3.4x instances by vcpu only or by all components.

By vcpu only

For d3.8x

Instance has 32 vcpu so, 32/96 of total embodied impacts.

d3.8x instance impacts:
- gwp: 1366 kgCO2eq
- adp: 0.063 kgSbeq
- pe: 14666 MJ

For d3.4x

Instance has 16 vcpu so, 16/96 of total embodied impacts.

d3.4x instance impacts:
- gwp: 683 kgCO2eq
- adp: 0.0317 kgSbeq
- pe: 7333 MJ

By all components

For d3.8x

d3.8x instance impacts:
- gwp: 18 + 467 + 746 + 100 + 100 + 44 + 4.45 = 1479 kgCO2eq
- adp: 0.0136 + 0.0197 + 0.006 + 0.033 + 0.0135 + 0.00246 = 0.088 kgSbeq
- pe: 263 + 6000 + 6623 + 1400 + 1467 + 557 + 46 = 16356 MJ

Detailed calculation:

CPU:
- gwp: 32/96 * 54 = 18
- adp: 32/96 * 0.04081 = 0.0136
- pe: 32/96 * 790 = 263

RAM:
- gwp: 256/768 * 1400 = 467
- adp: 256/768 * 0.059 = 0.0197
- pe: 256/768 * 18000 = 6000

HDD:
- gwp: 24/72 * 2239 = 746
- adp: 24/72 * 0.018 = 0.006
- pe: 24/72 * 19870 = 6623

Power supply (using vcpu ratio)
- gwp: 32/48 * 150 = 100 kgCO2eq
- adp: 32/48 * 0.05 = 0.033 kgSbeq
- pe: 32/48 * 2100 = 1400 MJ

Case (using vcpu ratio)
- gwp: 32/48 * 150 = 100 kgCO2eq
- adp: 32/48 * 0.0202 = 0.0135 kgSbeq
- pe: 32/48 * 2200 = 1467 MJ

Motherboard (using vcpu ratio)
- gwp: 32/48 * 66.1 = 44 kgCO2eq
- adp: 32/48 * 0.00369 = 0.00246 kgSbeq
- pe: 32/48 * 836 = 557 MJ

Assembly (using vcpu ratio):
- gwp: 32/48 * 6.68 = 4.45
- adp: 32/48 * 0.00000141 ~= 0
- pe: 32/48 * 68.6 = 46

For d3.4x

Not doing this one, again.

TL;DR: I think we are doing overengineering. 😅

Of course, here, the scenario one, is kind of scaled based on the vcpu again, so I am not surprised by that result. If you want to test another configuration, feel free to try. But given the margins, I think it's overkilled.

da-ekchajzer · 2023-12-06T23:05:01Z

Thank you very much for doing this exercise.

So from what you say, using an allocation on vcpu or for each component is not an important question from the moment the platforms are built accordingly ?

I'm sorry, but I've just managed to identify what's bothering me. The problem by doing so would be that the following data would never be used in the impacts' calculation (but will be used by contributors to construct the platform).

- memory: 256
- storage_type: hdd
- storage_units: 24
- storage_capacity: 2000
- gpu: 0

This puts the complexity in the platform's building, and it reduces the importance of instance data, which are the most important to consider.

I would have liked contributor to be able to associate an instance with a generic platform when they do not know how to build platforms. If the allocation is made for each component, the API would only allocate the RAM/Storage/CPU/GPU impacts for the instance based on its information. By doing so, we ensure that all reserve resources are accounted for, even if a generic platform is used.

In case, we put a great effort in building platforms based on instance information it doesn't change anything (as you have shown), in case a generic platform is used it avoids totally incoherent evaluations.

TL;DR: Our families are missing us

samuelrince · 2023-12-07T19:22:24Z

So from what you say, using an allocation on vcpu or for each component is not an important question from the moment the platforms are built accordingly ?

Well, yes, but only if you make an "educated" guess based on vcpu. In other scenarios, it's not the case.

I agree with you on the fact that we don't use the instance's specs, and in that case, we shouldn't even bother to ask the user to input it! 😅

So I have made a notebook to quickly test different platforms and instances.

Here is an example:

Platform:
- CPU.units: 2
- CPU.cores: 24
- RAM.units: 8
- RAM.capacity: 64 GB
- HDD.units: 24
- HDD.capacity: 14000 GB
- vcpu: 96
- memory: 512 GB
- hdd_storage: 336000 GB

Instance:
- vcpu: 32
- memory: 256 GB
- hdd_units: 24
- hdd_capacity: 2000 GB
- hdd_storage: 48000 GB

PLATFORM IMPACTS
             | CPU          | RAM          | HDD          | Others       | Total       
------------ | ------------ | ------------ | ------------ | ------------ | ------------
gwp          | 54.00        | 900.00       | 746.40       | 372.78       | 2073.18
adp          | 0.04         | 0.04         | 0.01         | 0.07         | 0.16
pe           | 790.00       | 12000.00     | 6624.00      | 5204.60      | 24618.60

INSTANCE IMPACTS
* Allocation by vcpu
             | CPU          | RAM          | HDD          | Others       | Total       
------------ | ------------ | ------------ | ------------ | ------------ | ------------
gwp          | 18.00        | 300.00       | 248.80       | 124.26       | 691.06
adp          | 0.01         | 0.01         | 0.00         | 0.02         | 0.05
pe           | 263.33       | 4000.00      | 2208.00      | 1734.87      | 8206.20

* Allocation by components
             | CPU          | RAM          | HDD          | Others       | Total       
------------ | ------------ | ------------ | ------------ | ------------ | ------------
gwp          | 18.00        | 450.00       | 106.63       | 124.26       | 698.89
adp          | 0.01         | 0.02         | 0.00         | 0.02         | 0.06
pe           | 263.33       | 6000.00      | 946.29       | 1734.87      | 8944.49

* Equivalent server
             | CPU          | RAM          | HDD          | Others       | Total       
------------ | ------------ | ------------ | ------------ | ------------ | ------------
gwp          | 22.00        | 470.00       | 93.30        | 372.78       | 958.08
adp          | 0.02         | 0.02         | 0.00         | 0.07         | 0.12
pe           | 330.00       | 5900.00      | 828.00       | 5204.60      | 12262.60

The additional "equivalent server" is here to compare what would be the impact of a probable server that has the same characteristics of the instance.

I invite you to test the notebook (it is on Google Colab and editable by anyone, I have a local copy).

This made me change my mind, I think we need to do the allocation by components, it makes more sense, and usually, it's closer to "my expected reality" (whatever that means).

Also, I think we will probably need to add more archetypes based on what exists in the wild and with smaller min/max ranges so that it makes sense.

I think we can make the following archetypes:

compute
memory
compute_gpu
storage_ssd
storage_hdd

With the following variants:

verylow
low
medium
high
veryhigh

For instance, a d3en.12xlarge is a monster of 24 x 14 = 336 TB, we currently don't have archetypes with that kind of storage.

TL;DR: You were right since the beginning. 🙌 😇 🙏

da-ekchajzer · 2023-12-07T21:25:30Z

Perfect! I will work on the implementation during the following days. Do you think that you could handle the addition of AWS platforms and existing instances with the right format ? @JacobValdemar since you have made the file in the first place, you might be of help on that also.

JacobValdemar · 2023-12-07T23:09:22Z

Sure, just reach out if there is anything

samuelrince · 2023-12-15T18:26:04Z

I started to reference all the instances with the new format and link them with platforms (and "virtual platforms" when we don't know).

https://docs.google.com/spreadsheets/d/1EmXYTUx0Nmmubj96_-fTThu7UK16Og-gcSqSOl7qB3c/edit?usp=sharing

I still need to run some checks on this file and then create the virtual platforms.

Note to myself:

c7i seems to have wrong CPU units set to 6 instead of 2
cc2 and cr1 need further investigation
hs1.8xlarge issue with number of cpu units
m1, m2, m3, t1, issue with number of cpu units
different cpu for base m5 and variants db, elasticsearch and cache
could merge bare metal for: m6i, m6g, m7g, r5a, r5[b,d,dn,n], r6g, r6i,
different cpu for base t2 and variants db, elasticsearch and cache
rounding of memory for some instances
u-18tb1.112xlarge does not share the same cpu as u-18tb1.metal
no ssd storage units for ra3

This was referenced Dec 16, 2023

Feat/refactor cloud instance #255

Merged

Fix rounding USAGE.instance_per_server #246

Closed

da-ekchajzer mentioned this issue Jan 15, 2024

Internal code refactoring and easier process for adding a cloud instance #258

Merged

samuelrince closed this as completed Jan 16, 2024

da-ekchajzer mentioned this issue Jan 16, 2024

How do you add collated stats for providers like Hetzner ? #249

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposition to refactor cloud instance models and data #252

Proposition to refactor cloud instance models and data #252

samuelrince commented Dec 5, 2023 •

edited

Loading

samuelrince commented Dec 5, 2023 •

edited

Loading

samuelrince commented Dec 5, 2023 •

edited

Loading

samuelrince commented Dec 5, 2023

JacobValdemar commented Dec 5, 2023

da-ekchajzer commented Dec 5, 2023 •

edited

Loading

samuelrince commented Dec 6, 2023

samuelrince commented Dec 6, 2023

da-ekchajzer commented Dec 6, 2023 •

edited

Loading

samuelrince commented Dec 6, 2023 •

edited

Loading

da-ekchajzer commented Dec 6, 2023

samuelrince commented Dec 7, 2023 •

edited

Loading

da-ekchajzer commented Dec 7, 2023

JacobValdemar commented Dec 7, 2023

samuelrince commented Dec 15, 2023

Proposition to refactor cloud instance models and data #252

Proposition to refactor cloud instance models and data #252

Comments

samuelrince commented Dec 5, 2023 • edited Loading

Problem

Solution

samuelrince commented Dec 5, 2023 • edited Loading

Example of a p4d.24xlarge (GPU):

samuelrince commented Dec 5, 2023 • edited Loading

A tricky example (d3.2xlarge, storage optimized):

Alternative 1 (incoherent allocation)

Allocation by vcpu

Allocation by disks

Alternative 2 (creating a virtual platform)

Allocation by vcpu

samuelrince commented Dec 5, 2023

JacobValdemar commented Dec 5, 2023

da-ekchajzer commented Dec 5, 2023 • edited Loading

samuelrince commented Dec 6, 2023

samuelrince commented Dec 6, 2023

da-ekchajzer commented Dec 6, 2023 • edited Loading

samuelrince commented Dec 6, 2023 • edited Loading

Scenario 1

By vcpu only

For d3.8x

For d3.4x

By all components

For d3.8x

For d3.4x

Scenario 2

By vcpu only

For d3.8x

For d3.4x

By all components

For d3.8x

For d3.4x

da-ekchajzer commented Dec 6, 2023

samuelrince commented Dec 7, 2023 • edited Loading

da-ekchajzer commented Dec 7, 2023

JacobValdemar commented Dec 7, 2023

samuelrince commented Dec 15, 2023

samuelrince commented Dec 5, 2023 •

edited

Loading

samuelrince commented Dec 5, 2023 •

edited

Loading

Example of a `p4d.24xlarge` (GPU):

samuelrince commented Dec 5, 2023 •

edited

Loading

A tricky example (`d3.2xlarge`, storage optimized):

da-ekchajzer commented Dec 5, 2023 •

edited

Loading

da-ekchajzer commented Dec 6, 2023 •

edited

Loading

samuelrince commented Dec 6, 2023 •

edited

Loading

samuelrince commented Dec 7, 2023 •

edited

Loading