Lets make low(er) cardinality metrics#1064
Lets make low(er) cardinality metrics#1064breedx-splk wants to merge 15 commits intoopen-telemetry:mainfrom
Conversation
|
I don't think modelling jank as a metric is a good idea. While dropped frames and jank are numbers that resemble metrics, the way they are consumed requires one element that OTel metrics do not provide: time. Specifically, you want to pin the jank occurrence to a specific point in time so that you can see what happened before and after. No matter what dimensions we keep, adding them up and consuming them as metrics will miss the crucial piece of data that allows you to relate it to other things happening on the device. 100 dropped frames mean different things depending on when it happens, and munging them all together in a count, while doable in terms of the data, renders it pretty much useless. |
| fun createDefault(application: Application): Resource { | ||
| val appName = readAppName(application) | ||
| val appVersion = readAppVersion(application) | ||
| val resourceBuilder = | ||
| Resource.getDefault().toBuilder().put(ServiceAttributes.SERVICE_NAME, appName) | ||
| if (appVersion != null) { | ||
| resourceBuilder.put(ServiceAttributes.SERVICE_VERSION, appVersion) | ||
| } | ||
|
|
||
| return resourceBuilder | ||
| .put(RumConstants.RUM_SDK_VERSION, BuildConfig.OTEL_ANDROID_VERSION) | ||
| .put(DeviceIncubatingAttributes.DEVICE_MODEL_NAME, Build.MODEL) | ||
| .put(DeviceIncubatingAttributes.DEVICE_MODEL_IDENTIFIER, Build.MODEL) | ||
| .put(DeviceIncubatingAttributes.DEVICE_MANUFACTURER, Build.MANUFACTURER) | ||
| .put(OsIncubatingAttributes.OS_NAME, "Android") | ||
| .put(OsIncubatingAttributes.OS_TYPE, "linux") | ||
| .put(OsIncubatingAttributes.OS_VERSION, Build.VERSION.RELEASE) | ||
| .put(OsIncubatingAttributes.OS_DESCRIPTION, oSDescription) | ||
| .build() |
There was a problem hiding this comment.
What other frameworks do is have multiple resource detectors usually 1 per resource as defined in semconv, that way developers can opt in. In fact as per the spec android should be producing a resource with the api version.
There was a problem hiding this comment.
Please link to that part of the spec.
There was a problem hiding this comment.
https://opentelemetry.io/docs/specs/semconv/resource/android/
At the same time following other frameworks by having multiple resource detectors, you can add just 1 to the metrics but traces can have many more.
|
Yeah, don't use metrics on mobile. |
There are some telemetry data (such as jank statistics) that some folks think make sense to send as metrics. I get it. Flinging numeric data into events or spans and then hoping a backend can make sense of it or aggregate it or whatever...that's a little misaligned.
The challenge is that client applications run on a very diverse set of devices or platforms, and this quickly leads to high-cardinality metrics, which makes most timeseries databases unhappy. Thousands or millions of devices potentially generate many more millions of launches, on different OS versions on different manufacturer devices, on different application versions, and the permutation space becomes huge.
This is the main reason so far why we have avoided doing much with metrics. High-cardinality metrics are usually more harmful than helpful. Furthermore, when you start dropping certain dimensions (attributes, resource attributes) to lower the cardinality, you lose granularity and are essentially aggregating across that dimension. For instance, if a metric were measuring start time and we drop the application version string to lower cardinality, then users who look at a dashboard of start time might be seeing data for many different versions in the wild. This makes this kind of data largely unactionable.
But maybe we can find a middle ground and start working toward a set of dimensions which are useful to most users without blowing up the permutation space. And maybe this PR is a start.
This adds a new
MetricsConfigfor use with theOpenTelemetryRumInitializerin android-agent. This config has two sets of keys to include in metrics -- one forAttributeson data points, and one for Resource Attributes. These are user configurable and have, what I guessed, to be a sane/reasonable default.By default, the Android resource looks something like this:
and with the defaults here, it reduces to:
I'm a little hesitant that this creates a foot-gun, and I'm a little hesitant to send metric data points whose resource doesn't match the resource on traces and logs...but maybe this is a start.