Lets make low(er) cardinality metrics by breedx-splk · Pull Request #1064 · open-telemetry/opentelemetry-android

breedx-splk · 2025-07-11T21:39:27Z

There are some telemetry data (such as jank statistics) that some folks think make sense to send as metrics. I get it. Flinging numeric data into events or spans and then hoping a backend can make sense of it or aggregate it or whatever...that's a little misaligned.

The challenge is that client applications run on a very diverse set of devices or platforms, and this quickly leads to high-cardinality metrics, which makes most timeseries databases unhappy. Thousands or millions of devices potentially generate many more millions of launches, on different OS versions on different manufacturer devices, on different application versions, and the permutation space becomes huge.

This is the main reason so far why we have avoided doing much with metrics. High-cardinality metrics are usually more harmful than helpful. Furthermore, when you start dropping certain dimensions (attributes, resource attributes) to lower the cardinality, you lose granularity and are essentially aggregating across that dimension. For instance, if a metric were measuring start time and we drop the application version string to lower cardinality, then users who look at a dashboard of start time might be seeing data for many different versions in the wild. This makes this kind of data largely unactionable.

But maybe we can find a middle ground and start working toward a set of dimensions which are useful to most users without blowing up the permutation space. And maybe this PR is a start.

This adds a new MetricsConfig for use with the OpenTelemetryRumInitializer in android-agent. This config has two sets of keys to include in metrics -- one for Attributes on data points, and one for Resource Attributes. These are user configurable and have, what I guessed, to be a sane/reasonable default.

By default, the Android resource looks something like this:

Resource attributes:
     -> device.manufacturer: Str(Google)
     -> device.model.identifier: Str(sdk_gphone64_arm64)
     -> device.model.name: Str(sdk_gphone64_arm64)
     -> os.description: Str(Android Version 16 (Build BP22.250325.006 API level 36))
     -> os.name: Str(Android)
     -> os.type: Str(linux)
     -> os.version: Str(16)
     -> rum.sdk.version: Str(0.13.0-alpha-SNAPSHOT)
     -> service.name: Str(OpenTelemetryDemoApp)
     -> telemetry.sdk.language: Str(java)
     -> telemetry.sdk.name: Str(opentelemetry)
     -> telemetry.sdk.version: Str(1.51.0)

and with the defaults here, it reduces to:

Resource attributes:
     -> os.name: Str(Android)
     -> os.type: Str(linux)
     -> os.version: Str(16)
     -> service.name: Str(OpenTelemetryDemoApp)

I'm a little hesitant that this creates a foot-gun, and I'm a little hesitant to send metric data points whose resource doesn't match the resource on traces and logs...but maybe this is a start.

bidetofevil · 2025-07-15T22:04:52Z

I don't think modelling jank as a metric is a good idea. While dropped frames and jank are numbers that resemble metrics, the way they are consumed requires one element that OTel metrics do not provide: time. Specifically, you want to pin the jank occurrence to a specific point in time so that you can see what happened before and after.

No matter what dimensions we keep, adding them up and consuming them as metrics will miss the crucial piece of data that allows you to relate it to other things happening on the device. 100 dropped frames mean different things depending on when it happens, and munging them all together in a count, while doable in terms of the data, renders it pretty much useless.

thompson-tomo · 2025-07-23T14:20:26Z

+    fun createDefault(application: Application): Resource {
+        val appName = readAppName(application)
+        val appVersion = readAppVersion(application)
+        val resourceBuilder =
+            Resource.getDefault().toBuilder().put(ServiceAttributes.SERVICE_NAME, appName)
+        if (appVersion != null) {
+            resourceBuilder.put(ServiceAttributes.SERVICE_VERSION, appVersion)
+        }
+
+        return resourceBuilder
+            .put(RumConstants.RUM_SDK_VERSION, BuildConfig.OTEL_ANDROID_VERSION)
+            .put(DeviceIncubatingAttributes.DEVICE_MODEL_NAME, Build.MODEL)
+            .put(DeviceIncubatingAttributes.DEVICE_MODEL_IDENTIFIER, Build.MODEL)
+            .put(DeviceIncubatingAttributes.DEVICE_MANUFACTURER, Build.MANUFACTURER)
+            .put(OsIncubatingAttributes.OS_NAME, "Android")
+            .put(OsIncubatingAttributes.OS_TYPE, "linux")
+            .put(OsIncubatingAttributes.OS_VERSION, Build.VERSION.RELEASE)
+            .put(OsIncubatingAttributes.OS_DESCRIPTION, oSDescription)
+            .build()


What other frameworks do is have multiple resource detectors usually 1 per resource as defined in semconv, that way developers can opt in. In fact as per the spec android should be producing a resource with the api version.

Please link to that part of the spec.

https://opentelemetry.io/docs/specs/semconv/resource/android/

At the same time following other frameworks by having multiple resource detectors, you can add just 1 to the metrics but traces can have many more.

breedx-splk · 2025-08-21T20:33:46Z

Yeah, don't use metrics on mobile.

breedx-splk added 14 commits July 11, 2025 11:45

add FilteredResource

fff2616

wire up the filtered resource via config keys

5f2f6ed

move config to agent

61495ad

Rename .java to .kt

c9fd861

move AndroidResource to kotlin and make public

5978312

move FilteredResource and make include/opt-in instead of omit

a8df8c1

include not omit

1601948

rollback builder changes, because we filter via initializer

c3ac86a

filter metric resource attributes and build view for

92e85d9

add comment and spotless

b51c93e

create basic default resource attribute filter

c8b4931

demo app collector can receive metrics for debug

6fe1f48

match all instruments.

2d2e79d

create very basic counter to demonstrate metrics

60a05da

breedx-splk requested a review from a team as a code owner July 11, 2025 21:39

add service version to the resource

1f7e538

breedx-splk marked this pull request as draft July 15, 2025 15:38

bidetofevil mentioned this pull request Jul 15, 2025

Add jank events for slow app rendering open-telemetry/semantic-conventions#2157

Closed

thompson-tomo reviewed Jul 23, 2025

View reviewed changes

scheler mentioned this pull request Jul 25, 2025

Add guidance against using metrics API in Web/Mobile client side instrumentation open-telemetry/opentelemetry-specification#4604

Open

breedx-splk closed this Aug 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lets make low(er) cardinality metrics#1064

Lets make low(er) cardinality metrics#1064
breedx-splk wants to merge 15 commits intoopen-telemetry:mainfrom
breedx-splk:lets_make_low_cardinality_metrics

breedx-splk commented Jul 11, 2025

Uh oh!

bidetofevil commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

thompson-tomo Jul 23, 2025

Uh oh!

breedx-splk Jul 23, 2025

Uh oh!

thompson-tomo Jul 24, 2025 •

edited

Loading

Uh oh!

breedx-splk commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

breedx-splk commented Jul 11, 2025

Uh oh!

bidetofevil commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

thompson-tomo Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

breedx-splk Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

thompson-tomo Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

breedx-splk commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thompson-tomo Jul 24, 2025 •

edited

Loading