Exporting Low/High Cardinality Metrics #5308
Replies: 2 comments 5 replies
-
Hi @hannahchan , There are couple of things here, and unfortunately it depends a bit on the backend that you will send the data (not sure if Prometheus was an example or the real destination, the only things that matters is if it supports or not "delta" counters (prometheus only supports "cumulative" counters). Let's also start with the input data:
I know that I asked lots of questions, but solutions are different based on the answers. As an example if your backend does not support "delta" counters, the solution is very complex, because (most likely) you cannot handle all traffic with only one collector instance, and you need to do "stateful" sharding since you need to guarantee that all timeseries (combination of metric name + dimension + value) will go to the same instance of the collector in order to correctly calculate the cumulative (always increasing) value. |
Beta Was this translation helpful? Give feedback.
-
First let's talk about the routing part, you have couple of options:
Calculating what you need, you have some options here:
|
Beta Was this translation helpful? Give feedback.
-
Hi,
I'm wondering if it is possible for the OpenTelemetry Collector to do this and if it is, how do I configure the collector to do it.
My goal is to send a low cardinality version of a metric to a monitoring tool like Prometheus and the high cardinality version of the same metric to a data lake for offline analysis in a tool like PySpark without further or minimal additional instrumentation.
For example, I want to take metrics like this;
And drop the high cardinality label and aggregate to get this;
This aggregated metric would then be sent to a monitoring tool while the raw unchanged metrics would go to our data lake.
Context
We have a large multi-tenanted application where tenant sizes can range from a single-digit number of users to large tenants with 20,000+ users. Unsurprisingly, larger tenants have more data in our application. We want to understand why our larger tenants experience performance degradation and what about their data that impacts this. To do this, we want to collect metrics at a tenant level to allow our data analyst to identify long-term trends.
Beta Was this translation helpful? Give feedback.
All reactions