Cosmos DB: Operation Level Metrics #1438

sourabh1007 · 2024-09-30T06:51:31Z

Introducing operation-level metrics in the .NET Cosmos DB SDK.
This PR includes the registration of semantic conventions in the OpenTelemetry community, ensuring full alignment with OpenTelemetry's established guidelines.

Dimensions

Tag/dimension name	Sample value
db.system	cosmodb
db.collection.name	myCollectionName
db.namespace	myDatabaseName
server.address	myaccountname.documents.azure.com
server.port	443
db.operation.name	query_items
~~db.cosmosdb.operation_type~~	~~`batch`; `create`; `delete`~~
db.response.status_code	200 or 429 etc.
db.cosmos.sub_status_code	1002 etc.
~~db.cosmos.client_id~~	~~Guid~~
db.cosmosdb.consistency_level	Eventual, ConsistentPrefix, BoundedStaleness, Strong or Session
~~db.cosmosdb.machine_id~~	~~Unique Id for a machine~~ (Dependent of otel generated machine details)

Metrics

Name	Unit	Type	Description
db.cosmosdb.operation.request_charge	`{request_units}`	histogram	Total request units per operation (sum of RUs for all requested needed when processing an operation) This metric SHOULD be specified with `[ExplicitBucketBoundaries] of [ 1, 5, 10, 25, 50, 100, 250, 500, 1000]`
db.client.operation.duration	`s`	histogram	Total end-to-end duration of the operation This metric SHOULD be specified with `[ExplicitBucketBoundaries] of [0.001, 0.005, 0.010, 0.050, 0.100, 0.200, 0.500, 1.000, 2.000, 5.000]`
~~db.query.limit~~	`{count}`	histogram	For feed operations (query, readAll, readMany, change feed) and batch operations, this meter capture the requested maxItemCount per page/request. This metric SHOULD be specified with `[ExplicitBucketBoundaries] of [50, 100, 250, 500, 1000, 2000, 5000]`
db.client.response.row_count	`{count}`	histogram	For feed operations (query, readAll, readMany, change feed) batch operations this meter capture the actual item count in responses from the service. This metric SHOULD be specified with `[ExplicitBucketBoundaries] of [10, 50, 100, 250, 500, 1000, 2000, 5000, 10000]`.
~~db.cosmosdb.operation.regions_contacted~~	`{count}`	histogram	~~Number of regions contacted when executing an operation. This metric SHOULD be specified with `[ExplicitBucketBoundaries] of [1, 2, 3, 5, 10, 20, 50]`.~~
db.cosmosdb.client.active_instances	`{count}`	histogram	Number of active SDK client instances. This metric SHOULD be specified with `[ExplicitBucketBoundaries] of [1, 2, 3, 5, 10, 50, 100]`.

Added db.cosmosdb.consistency_level in attribute list also, we are going to collect it is traces also
Removed db.cosmosdb.operation_type
~~Added db.cosmosdb.machine_id in attribute list also, we are going to collect it is traces also~~

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
schema-next.yaml updated with changes to existing conventions.

model/database/common.yaml

model/database/metrics.yaml

model/database/common.yaml

model/database/metrics.yaml

lmolkova

Please add markdown for the metrics.

You need to add codegen placeholders like

and metrics will be generated between them.

Please take a look at the existing database metric definitions in https://github.com/open-telemetry/semantic-conventions/blob/v1.27.0/docs/database/database-metrics.md?plain=1 for the examples.

Note that a couple of properties are not yet defined in yaml, but need to be specified in markdown:

Please also document whether cosmos reports any of the standard DB metrics (I assume it does report db.client.operation.duration). If so, please document any difference from the generic metric. E.g. like it's done here - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/dotnet/dotnet-http-metrics.md#metric-httpserverrequestduration

model/database/registry.yaml

model/database/metrics.yaml

model/database/common.yaml

jcocchi

Left a few nits, but looks good overall

model/database/cosmosdb-metrics.yaml

model/database/registry.yaml

model/database/metrics.yaml

model/database/cosmosdb-metrics.yaml

docs/database/database-metrics.md

model/database/cosmosdb-metrics.yaml

lmolkova

Left a few minor final comments, but it looks good otherwise!

model/database/cosmosdb-metrics.yaml

model/database/metrics.yaml

model/database/registry.yaml

docs/attributes-registry/db.md

model/database/metrics.yaml

model/database/registry.yaml

Co-authored-by: Liudmila Molkova <[email protected]>

lmolkova · 2024-10-15T00:14:25Z

model/database/cosmosdb-metrics.yaml

+    type: metric
+    metric_name: db.cosmosdb.client.active_instances
+    brief: "Number of active client instances"
+    instrument: histogram


this one should probably be updowncounter - you probably don't expect people to measure P95 for it. Also it does not change much in well-behaved application.

We have plenty of examples of current/active metrics in semconv - http.server.active_requests, db.client.connection.count, dotnet.thread_pool.thread.count, etc all of them are updowncounters.

Histogram will be more expensive but also harder to use and would require you to keep a current value around. With up-down-counter, you'd record +1 when client is created and -1 when it's disposed and won't need to maintain any state in the clients.

model/database/registry.yaml

lmolkova · 2024-10-15T00:18:28Z

docs/database/database-metrics.md

+
+This metric SHOULD be specified with
+[`ExplicitBucketBoundaries`](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.35.0/specification/metrics/api.md#instrument-advisory-parameters)
+of `[ 0.001, 0.005, 0.010,  0.050, 0.100, 0.200, 0.500, 1.000, 2.000, 5.000 ]`.


any reason to have boundaries different than the generic ones ([ 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10 ])?

docs/database/database-metrics.md

lmolkova · 2024-10-15T00:20:09Z

docs/database/database-metrics.md

@@ -23,6 +23,11 @@ linkTitle: Metrics
    - [Metric: `db.client.connection.create_time`](#metric-dbclientconnectioncreate_time)
    - [Metric: `db.client.connection.wait_time`](#metric-dbclientconnectionwait_time)
    - [Metric: `db.client.connection.use_time`](#metric-dbclientconnectionuse_time)
+- [Azure Cosmos DB](#azure-cosmos-db)


could you please move Cosmos DB metrics to cosmosdb.md ?

lmolkova · 2024-10-15T00:20:27Z

docs/database/database-metrics.md

@@ -23,6 +23,11 @@ linkTitle: Metrics
    - [Metric: `db.client.connection.create_time`](#metric-dbclientconnectioncreate_time)
    - [Metric: `db.client.connection.wait_time`](#metric-dbclientconnectionwait_time)
    - [Metric: `db.client.connection.use_time`](#metric-dbclientconnectionuse_time)
+- [Azure Cosmos DB](#azure-cosmos-db)
+  - [Metric: `db.client.operation.duration`](#metric-dbclientoperationduration-1)
+  - [Metric: `db.query.response.row_count`](#metric-dbqueryresponserow_count)


this metric should be added to the list of generic metrics above

lmolkova · 2024-10-15T00:25:56Z

docs/database/database-metrics.md

+2. 250, 500, 1000: These values represent typical workloads where moderate amounts of data are returned in each query.
+3. 2000, 5000: These boundaries capture scenarios where the query returns large datasets. These larger page sizes can potentially increase memory or CPU usage and may lead to longer query execution times, making it important to track performance in these ranges.
+4. 10000: This boundary is included to capture rare, very large response sizes. Such queries can put significant strain on system resources, including memory, CPU, and network bandwidth, and can often lead to performance issues such as high latency or even network drops.
+


please generate this metric in this file

lmolkova · 2024-10-15T00:26:37Z

docs/database/database-metrics.md

+
+This metric SHOULD be specified with
+[`ExplicitBucketBoundaries`](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.35.0/specification/metrics/api.md#instrument-advisory-parameters)
+of `[10, 50, 100, 250, 500, 1000, 2000, 5000, 10000]`.


any reason not to start from 1? I assume many queries always return 1 item.

lmolkova · 2024-10-15T00:30:27Z

model/database/cosmosdb-metrics.yaml

+    type: attribute_group
+    brief: 'Azure Cosmos DB attributes'
+    attributes:
+      - ref: db.system


We don't have a good way to show it properly using tooling yet. Currently when it's rendered it shows 50+ db.system values table and is confusing.

We're doing something like

# TODO: add db.system once https://github.com/open-telemetry/build-tools/issues/192 is possible # - ref: db.system

in other places and adding some text in markdown that system is populated to improve readability. Suggesting to do the same here.

lmolkova · 2024-10-15T00:32:23Z

docs/database/database-metrics.md

+
+### Metric: `db.client.operation.duration`
+
+It captures the total time taken by an Azure Cosmos DB operation. This metric follows the common [db.client.operation.duration] (#metric-dbclientoperationduration)b definition.


each metric should have a requirement level - take a look at the examples of existing metrics in this file. I assume all of them should be required or recommended (when available) for cosmos?

lmolkova · 2024-10-15T00:39:51Z

model/database/cosmosdb-metrics.yaml

+    stability: experimental
+    extends: attributes.db.cosmosdb.minimal
+
+  - id: attributes.db.cosmosdb.minimal


you're losing the clarifications made for cosmos DB span with this approach - check out briefs/notes on the attributes - they are no longer specific to cosmos.

suggestion:

take the clarifications (levels, briefs, notes) of applicable attributes from db.cosmosdb group defined here

semantic-conventions/model/database/spans.yaml

Line 516 in 76112dd

- id: db.cosmosdb

put the resulting attributes.db.cosmosdb.minimal group to database/common.yaml

extend attributes.db.cosmosdb.minimal group in cosmos db spans. You might need to play a bit with extension to make spans look exactly the same, but it should be doable.

Co-authored-by: Liudmila Molkova <[email protected]>

sourabh1007 requested review from a team as code owners September 30, 2024 06:51

github-actions bot assigned jsuereth Sep 30, 2024

sourabh1007 force-pushed the users/sourabhjain/otelcosmosmetrics branch from 8106332 to 0e22c02 Compare September 30, 2024 06:54

FabianMeiswinkel reviewed Sep 30, 2024

View reviewed changes

model/database/common.yaml Outdated Show resolved Hide resolved

FabianMeiswinkel reviewed Sep 30, 2024

View reviewed changes

model/database/common.yaml Outdated Show resolved Hide resolved

FabianMeiswinkel reviewed Sep 30, 2024

View reviewed changes

model/database/common.yaml Outdated Show resolved Hide resolved

FabianMeiswinkel reviewed Sep 30, 2024

View reviewed changes

model/database/metrics.yaml Outdated Show resolved Hide resolved

jcocchi reviewed Sep 30, 2024

View reviewed changes

sourabh1007 force-pushed the users/sourabhjain/otelcosmosmetrics branch from 9fd4001 to 229b496 Compare September 30, 2024 16:07

lmolkova reviewed Oct 1, 2024

View reviewed changes

sourabh1007 mentioned this pull request Oct 1, 2024

OpenTelemetry Metrics: Adds support to collect request level metrics Azure/azure-cosmos-dotnet-v3#4682

Draft

sourabh1007 force-pushed the users/sourabhjain/otelcosmosmetrics branch 4 times, most recently from c0bd5ed to 0e65012 Compare October 2, 2024 10:39

jcocchi approved these changes Oct 2, 2024

View reviewed changes

model/database/cosmosdb-metrics.yaml Outdated Show resolved Hide resolved

model/database/cosmosdb-metrics.yaml Outdated Show resolved Hide resolved

model/database/cosmosdb-metrics.yaml Outdated Show resolved Hide resolved

model/database/registry.yaml Outdated Show resolved Hide resolved

sourabh1007 force-pushed the users/sourabhjain/otelcosmosmetrics branch 3 times, most recently from 5db7514 to 01dc811 Compare October 3, 2024 02:15

lmolkova reviewed Oct 3, 2024

View reviewed changes

sourabh1007 force-pushed the users/sourabhjain/otelcosmosmetrics branch 3 times, most recently from 1a632ac to ee1d400 Compare October 10, 2024 06:29

lmolkova reviewed Oct 10, 2024

View reviewed changes

sourabh1007 force-pushed the users/sourabhjain/otelcosmosmetrics branch from e311a60 to bff773c Compare October 11, 2024 01:57

lmolkova reviewed Oct 11, 2024

View reviewed changes

model/database/metrics.yaml Outdated Show resolved Hide resolved

lmolkova reviewed Oct 11, 2024

View reviewed changes

model/database/metrics.yaml Outdated Show resolved Hide resolved

lmolkova reviewed Oct 11, 2024

View reviewed changes

model/database/registry.yaml Outdated Show resolved Hide resolved

lmolkova reviewed Oct 11, 2024

View reviewed changes

model/database/registry.yaml Outdated Show resolved Hide resolved

sourabh1007 and others added 12 commits October 14, 2024 23:58

Update model/database/registry.yaml

e97ad68

Co-authored-by: Liudmila Molkova <[email protected]>

Update .chloggen/users_sourabhjain_otelcosmosmetrics.yaml

d9f17f8

Co-authored-by: Liudmila Molkova <[email protected]>

Update .chloggen/users_sourabhjain_otelcosmosmetrics.yaml

59658fe

Co-authored-by: Liudmila Molkova <[email protected]>

Update model/database/cosmosdb-metrics.yaml

cdd3416

Co-authored-by: Liudmila Molkova <[email protected]>

Update model/database/cosmosdb-metrics.yaml

07df5d0

Co-authored-by: Liudmila Molkova <[email protected]>

Update model/database/registry.yaml

666f4de

Co-authored-by: Liudmila Molkova <[email protected]>

Update model/database/metrics.yaml

08c10cd

Co-authored-by: Liudmila Molkova <[email protected]>

fix ebvrything

001ad79

updated docuemnattaion

9bf8091

Update model/database/registry.yaml

e0b9e4d

Co-authored-by: Liudmila Molkova <[email protected]>

Update model/database/registry.yaml

82ebc43

Co-authored-by: Liudmila Molkova <[email protected]>

Updated docs

c5f7ce0

sourabh1007 force-pushed the users/sourabhjain/otelcosmosmetrics branch from 6ca8c5c to c5f7ce0 Compare October 14, 2024 18:29

fix lint review

ee00e1b

lmolkova reviewed Oct 15, 2024

View reviewed changes

model/database/registry.yaml Outdated Show resolved Hide resolved

lmolkova reviewed Oct 15, 2024

View reviewed changes

model/database/registry.yaml Outdated Show resolved Hide resolved

lmolkova reviewed Oct 15, 2024

View reviewed changes

docs/database/database-metrics.md Outdated Show resolved Hide resolved

lmolkova reviewed Oct 15, 2024

View reviewed changes

sourabh1007 and others added 4 commits October 15, 2024 06:58

Update model/database/registry.yaml

6926455

Co-authored-by: Liudmila Molkova <[email protected]>

Update model/database/registry.yaml

df4684f

Co-authored-by: Liudmila Molkova <[email protected]>

Update docs/database/database-metrics.md

b7f7731

Co-authored-by: Liudmila Molkova <[email protected]>

wip

e3b5cb3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cosmos DB: Operation Level Metrics #1438

Cosmos DB: Operation Level Metrics #1438

sourabh1007 commented Sep 30, 2024 •

edited

Loading

lmolkova left a comment

jcocchi left a comment

lmolkova left a comment

lmolkova Oct 15, 2024

lmolkova Oct 15, 2024

lmolkova Oct 15, 2024

lmolkova Oct 15, 2024

lmolkova Oct 15, 2024

lmolkova Oct 15, 2024

lmolkova Oct 15, 2024

lmolkova Oct 15, 2024

lmolkova Oct 15, 2024 •

edited

Loading


		### Metric: `db.client.operation.duration`

		It captures the total time taken by an Azure Cosmos DB operation. This metric follows the common [db.client.operation.duration] (#metric-dbclientoperationduration)b definition.

Cosmos DB: Operation Level Metrics #1438

Are you sure you want to change the base?

Cosmos DB: Operation Level Metrics #1438

Conversation

sourabh1007 commented Sep 30, 2024 • edited Loading

Merge requirement checklist

lmolkova left a comment

Choose a reason for hiding this comment

jcocchi left a comment

Choose a reason for hiding this comment

lmolkova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

sourabh1007 commented Sep 30, 2024 •

edited

Loading

lmolkova Oct 15, 2024 •

edited

Loading