Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute: support using node instead/above of instance label #262

Open
jjo opened this issue Jul 22, 2024 · 5 comments
Open

compute: support using node instead/above of instance label #262

jjo opened this issue Jul 22, 2024 · 5 comments

Comments

@jjo
Copy link
Contributor

jjo commented Jul 22, 2024

What

Either change compute modules' instance to node, or support e.g. --compute.vm-instance-label=LABEL_NAME.

Why

Couple main reasons:

  1. When using cloudcost-exporter metrics in Kubernetes, to be able to join against kube_.* metrics (like kube_node_labels, kube_node_annotations, kube_node_status_capacity, kube_node_status_allocatable, ... see full list below), you then are forced to relabel instance to node with a construct like
  label_join( <CSP>_instance_cpu_usd_per_core_hour{}, "node", "", "exported_instance")

The above needs to be done for every <CSP> you may run (especially the case if you have a centralized TSDB), note also that you need to visit every timeseries to perform this relabelling (under the context of TSDB pressure).

  1. Wisely, cloudcost-exporter already chose cluster_name instead of cluster (this being a popular label_name choice, then avoiding the scrape-time rename to exported_cluster), then why keeping instance that by design scraping agents will always rename to exported_instance (thus the example relabelling query above).

Appendix

Pasting the full list of Kubernetes metrics using node, from this query count by (__name__)({__name__=~"kube_.+", node!=""})

kube_node_annotations
kube_node_created
kube_node_info
kube_node_labels
kube_node_spec_taint
kube_node_spec_unschedulable
kube_node_status_allocatable
kube_node_status_allocatable_cpu_cores
kube_node_status_allocatable_memory_bytes
kube_node_status_capacity
kube_node_status_capacity_cpu_cores
kube_node_status_capacity_memory_bytes
kube_node_status_condition
kube_pod_container_resource_limits
kube_pod_container_resource_requests
kube_pod_info
kube_node_status_addresses
kube_pod_init_container_resource_limits
kube_pod_init_container_resource_requests
jjo added a commit that referenced this issue Jul 22, 2024
Ref: #262.

See #262 for the rationale, this PR implements straightfwd
`s/instance/node/`, alternatively we could consider this to be
optionally behind a flag, although increasing complexity for main to
CSP modules coupling and test cases.

Signed-off-by: JuanJo Ciarlante <[email protected]>
@logyball
Copy link
Contributor

I'm hesitant about replacing the instance label with node, as this couples our definition of "machine" with "machine provisioned by kubernetes". Since we want to make this more agnostic of how the machine was provisioned, I think that node wouldn't be the right choice here.

That being said, this does make sense as an optional config argument, as it can then be used for convenience for a big (and possibly majority) use-case: managed Kubernetes.

I think this brings up an interesting point, since the instance label will often be replaced by Prometheus Scrapers and rewritten as exported_instance (as the instance of the job running cloudcost-exporter is already using the instance label). Thoughts on:

  • config value for the label meaning "name of machine"?
  • renaming the instance label to something agnostic of how it was provisioned, like machine_name?

@jjo
Copy link
Contributor Author

jjo commented Jul 22, 2024

I'm hesitant about replacing the instance label with node, as this couples our definition of "machine" with "machine provisioned by kubernetes". Since we want to make this more agnostic of how the machine was provisioned, I think that node wouldn't be the right choice here.

That being said, this does make sense as an optional config argument, as it can then be used for convenience for a big (and possibly majority) use-case: managed Kubernetes.

I think this brings up an interesting point, since the instance label will often be replaced by Prometheus Scrapers and rewritten as exported_instance (as the instance of the job running cloudcost-exporter is already using the instance label). Thoughts on:

  • config value for the label meaning "name of machine"?
  • renaming the instance label to something agnostic of how it was provisioned, like machine_name?

+1 moving away from instance, I like machine_name as the default value for such e.g. --compute.vm-instance-label=LABEL_NAME (or alike, naming-is-hard always hunting us).

jjo added a commit that referenced this issue Jul 23, 2024
Fixes #262.

*## What

Allow `instance` label name to be set via new `-compute.instance-label` flag.

*## Why

See #262 for the rationale and discussion.

*## How

* for each compute `Collector`, create new `collectorMetrics` type
  containing compute metrics (moved from _global var_), to let each
  `Collector` own an allocated instance obtained via
  `newCollectorMetrics(instanceLabel)`

* changing metrics from _global var_ to dynamically allocated object is
  needed for:
  * setting its label at runtime
  * allowing tests to be self-contained, else the 1st test to run would "own"
    setting `instanceLabel`

* there's a new `config.CommonConfig` type, currently only having a
  single field `ComputeInstanceLabel`, which is passed down from `cmd/`
  to each instantiated _Provider_, then to each compute _Collector_

*## Testing

Adapted testing with a couple `ComputeInstanceLabel="node"` changes,
although it should be improved to test changes to the label name.

--
Signed-off-by: JuanJo Ciarlante <[email protected]>
@Pokom
Copy link
Contributor

Pokom commented Jul 23, 2024

I am opposed to having labels defined via command line arguments for the following reasons:

  1. It implies we got the names of labels incorrect and we're masking that to ease PromQL creation
  2. Potential of running many collectors with different labels at the same time which means we can't guarantee the use of a metric with a specific query
  3. Exposing us to having a configuration for every label and having a massive config struct to run the application

I'd really like for us to get alignment on what the root problem is that we're trying to address(In this case, instance doesn't appear to be the appropriate label, it should be node), and then agree on a solution. Whichever path we take is going to require us to rewrite our internal PromQL queries and do a migration of the metrics.

@jjo
Copy link
Contributor Author

jjo commented Jul 23, 2024

I am opposed to having labels defined via command line arguments for the following reasons:

  1. It implies we got the names of labels incorrect and we're masking that to ease PromQL creation

It's not only easying promQL, it's actually decreasing the load into TSDB, for an arbitrary renaming that will likely be always needed (should we keep instance as its name).

  1. Potential of running many collectors with different labels at the same time which means we can't guarantee the use of a metric with a specific query

(?) sure, as operator you could always shoot yourself in the foot, nevertheless top->down label setting currently only allows one entry point for it.

  1. Exposing us to having a configuration for every label and having a massive config struct to run the application

Didn't ever propose that -- it's this particular one I'm concerned, as we do know [*] we have disjoint sets of metrics (ec2, kube) that inevitably use either instance (sorry for that one, as it'll always be exported_instance unless changed at scrape time), xor node.

I'd really like for us to get alignment on what the root problem is that we're trying to address(In this case, instance doesn't appear to be the appropriate label, it should be node), and then agree on a solution. Whichever path we take is going to require us to rewrite our internal PromQL queries and do a migration of the metrics.

Already replied in the very previous paragraph[*]

@jjo
Copy link
Contributor Author

jjo commented Jul 23, 2024

I am opposed to having labels defined via command line arguments for the following reasons:

  1. It implies we got the names of labels incorrect and we're masking that to ease PromQL creation

It's not only easying promQL, it's actually decreasing the load into TSDB, for an arbitrary renaming that will likely be always needed (should we keep instance as its name).

  1. Potential of running many collectors with different labels at the same time which means we can't guarantee the use of a metric with a specific query

(?) sure, as operator you could always shoot yourself in the foot, nevertheless top->down label setting currently only allows one entry point for it.

  1. Exposing us to having a configuration for every label and having a massive config struct to run the application

Didn't ever propose that -- it's this particular one I'm concerned, as we do know [*] we have disjoint sets of metrics (ec2, kube) that inevitably use either instance (sorry for that one, as it'll always be exported_instance unless changed at scrape time), xor node.

I'd really like for us to get alignment on what the root problem is that we're trying to address(In this case, instance doesn't appear to be the appropriate label, it should be node), and then agree on a solution. Whichever path we take is going to require us to rewrite our internal PromQL queries and do a migration of the metrics.

Already replied in the very previous paragraph[*]

Indeed, one aspect that I'll really like cloudcost-exporter to shine when compared with other existing solutions, is being as "friendly" as possible to the TSDB, also requiring a lesser amount of recording-rules / scraping artifacts (relabeling and building new derived metrics), for obvious reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants