compute: support using `node` instead/above of `instance` label #262

jjo · 2024-07-22T18:28:07Z

What

Either change compute modules' instance to node, or support e.g. --compute.vm-instance-label=LABEL_NAME.

Why

Couple main reasons:

When using cloudcost-exporter metrics in Kubernetes, to be able to join against kube_.* metrics (like kube_node_labels, kube_node_annotations, kube_node_status_capacity, kube_node_status_allocatable, ... see full list below), you then are forced to relabel instance to node with a construct like

  label_join( <CSP>_instance_cpu_usd_per_core_hour{}, "node", "", "exported_instance")

The above needs to be done for every <CSP> you may run (especially the case if you have a centralized TSDB), note also that you need to visit every timeseries to perform this relabelling (under the context of TSDB pressure).

Wisely, cloudcost-exporter already chose cluster_name instead of cluster (this being a popular label_name choice, then avoiding the scrape-time rename to exported_cluster), then why keeping instance that by design scraping agents will always rename to exported_instance (thus the example relabelling query above).

Appendix

Pasting the full list of Kubernetes metrics using node, from this query count by (__name__)({__name__=~"kube_.+", node!=""})

kube_node_annotations
kube_node_created
kube_node_info
kube_node_labels
kube_node_spec_taint
kube_node_spec_unschedulable
kube_node_status_allocatable
kube_node_status_allocatable_cpu_cores
kube_node_status_allocatable_memory_bytes
kube_node_status_capacity
kube_node_status_capacity_cpu_cores
kube_node_status_capacity_memory_bytes
kube_node_status_condition
kube_pod_container_resource_limits
kube_pod_container_resource_requests
kube_pod_info
kube_node_status_addresses
kube_pod_init_container_resource_limits
kube_pod_init_container_resource_requests

The text was updated successfully, but these errors were encountered:

Ref: #262. See #262 for the rationale, this PR implements straightfwd `s/instance/node/`, alternatively we could consider this to be optionally behind a flag, although increasing complexity for main to CSP modules coupling and test cases. Signed-off-by: JuanJo Ciarlante <[email protected]>

logyball · 2024-07-22T21:18:32Z

I'm hesitant about replacing the instance label with node, as this couples our definition of "machine" with "machine provisioned by kubernetes". Since we want to make this more agnostic of how the machine was provisioned, I think that node wouldn't be the right choice here.

That being said, this does make sense as an optional config argument, as it can then be used for convenience for a big (and possibly majority) use-case: managed Kubernetes.

I think this brings up an interesting point, since the instance label will often be replaced by Prometheus Scrapers and rewritten as exported_instance (as the instance of the job running cloudcost-exporter is already using the instance label). Thoughts on:

config value for the label meaning "name of machine"?
renaming the instance label to something agnostic of how it was provisioned, like machine_name?

jjo · 2024-07-22T21:21:45Z

I'm hesitant about replacing the instance label with node, as this couples our definition of "machine" with "machine provisioned by kubernetes". Since we want to make this more agnostic of how the machine was provisioned, I think that node wouldn't be the right choice here.

That being said, this does make sense as an optional config argument, as it can then be used for convenience for a big (and possibly majority) use-case: managed Kubernetes.

I think this brings up an interesting point, since the instance label will often be replaced by Prometheus Scrapers and rewritten as exported_instance (as the instance of the job running cloudcost-exporter is already using the instance label). Thoughts on:

config value for the label meaning "name of machine"?

renaming the instance label to something agnostic of how it was provisioned, like machine_name?

+1 moving away from instance, I like machine_name as the default value for such e.g. --compute.vm-instance-label=LABEL_NAME (or alike, naming-is-hard always hunting us).

Fixes #262. *## What Allow `instance` label name to be set via new `-compute.instance-label` flag. *## Why See #262 for the rationale and discussion. *## How * for each compute `Collector`, create new `collectorMetrics` type containing compute metrics (moved from _global var_), to let each `Collector` own an allocated instance obtained via `newCollectorMetrics(instanceLabel)` * changing metrics from _global var_ to dynamically allocated object is needed for: * setting its label at runtime * allowing tests to be self-contained, else the 1st test to run would "own" setting `instanceLabel` * there's a new `config.CommonConfig` type, currently only having a single field `ComputeInstanceLabel`, which is passed down from `cmd/` to each instantiated _Provider_, then to each compute _Collector_ *## Testing Adapted testing with a couple `ComputeInstanceLabel="node"` changes, although it should be improved to test changes to the label name. -- Signed-off-by: JuanJo Ciarlante <[email protected]>

Pokom · 2024-07-23T17:56:26Z

I am opposed to having labels defined via command line arguments for the following reasons:

It implies we got the names of labels incorrect and we're masking that to ease PromQL creation
Potential of running many collectors with different labels at the same time which means we can't guarantee the use of a metric with a specific query
Exposing us to having a configuration for every label and having a massive config struct to run the application

I'd really like for us to get alignment on what the root problem is that we're trying to address(In this case, instance doesn't appear to be the appropriate label, it should be node), and then agree on a solution. Whichever path we take is going to require us to rewrite our internal PromQL queries and do a migration of the metrics.

jjo · 2024-07-23T18:16:17Z

I am opposed to having labels defined via command line arguments for the following reasons:

It implies we got the names of labels incorrect and we're masking that to ease PromQL creation

It's not only easying promQL, it's actually decreasing the load into TSDB, for an arbitrary renaming that will likely be always needed (should we keep instance as its name).

Potential of running many collectors with different labels at the same time which means we can't guarantee the use of a metric with a specific query

(?) sure, as operator you could always shoot yourself in the foot, nevertheless top->down label setting currently only allows one entry point for it.

Exposing us to having a configuration for every label and having a massive config struct to run the application

Didn't ever propose that -- it's this particular one I'm concerned, as we do know [*] we have disjoint sets of metrics (ec2, kube) that inevitably use either instance (sorry for that one, as it'll always be exported_instance unless changed at scrape time), xor node.

I'd really like for us to get alignment on what the root problem is that we're trying to address(In this case, instance doesn't appear to be the appropriate label, it should be node), and then agree on a solution. Whichever path we take is going to require us to rewrite our internal PromQL queries and do a migration of the metrics.

Already replied in the very previous paragraph[*]

jjo · 2024-07-23T18:19:15Z

I am opposed to having labels defined via command line arguments for the following reasons:

It implies we got the names of labels incorrect and we're masking that to ease PromQL creation

It's not only easying promQL, it's actually decreasing the load into TSDB, for an arbitrary renaming that will likely be always needed (should we keep instance as its name).

Potential of running many collectors with different labels at the same time which means we can't guarantee the use of a metric with a specific query

(?) sure, as operator you could always shoot yourself in the foot, nevertheless top->down label setting currently only allows one entry point for it.

Exposing us to having a configuration for every label and having a massive config struct to run the application

Didn't ever propose that -- it's this particular one I'm concerned, as we do know [*] we have disjoint sets of metrics (ec2, kube) that inevitably use either instance (sorry for that one, as it'll always be exported_instance unless changed at scrape time), xor node.

I'd really like for us to get alignment on what the root problem is that we're trying to address(In this case, instance doesn't appear to be the appropriate label, it should be node), and then agree on a solution. Whichever path we take is going to require us to rewrite our internal PromQL queries and do a migration of the metrics.

Already replied in the very previous paragraph[*]

Indeed, one aspect that I'll really like cloudcost-exporter to shine when compared with other existing solutions, is being as "friendly" as possible to the TSDB, also requiring a lesser amount of recording-rules / scraping artifacts (relabeling and building new derived metrics), for obvious reasons.

jjo mentioned this issue Jul 22, 2024

compute: use "node" instead of "instance" label name #264

Closed

jjo mentioned this issue Jul 23, 2024

compute: allow "instance" label name to be set (behind a flag) #266

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute: support using `node` instead/above of `instance` label #262

compute: support using `node` instead/above of `instance` label #262

jjo commented Jul 22, 2024 •

edited

Loading

logyball commented Jul 22, 2024

jjo commented Jul 22, 2024

Pokom commented Jul 23, 2024

jjo commented Jul 23, 2024

jjo commented Jul 23, 2024 •

edited

Loading

compute: support using node instead/above of instance label #262

compute: support using node instead/above of instance label #262

Comments

jjo commented Jul 22, 2024 • edited Loading

What

Why

Appendix

logyball commented Jul 22, 2024

jjo commented Jul 22, 2024

Pokom commented Jul 23, 2024

jjo commented Jul 23, 2024

jjo commented Jul 23, 2024 • edited Loading

compute: support using `node` instead/above of `instance` label #262

compute: support using `node` instead/above of `instance` label #262

jjo commented Jul 22, 2024 •

edited

Loading

jjo commented Jul 23, 2024 •

edited

Loading