Skip to content

Conversation

@mabdinur
Copy link
Contributor

@mabdinur mabdinur commented Oct 29, 2025

What does this PR do?

Adds full OpenTelemetry Metrics support to dd-trace-js with a custom Meter Provider implementation. Enable with DD_METRICS_OTEL_ENABLED=true to export metrics via OTLP protocol.

Key Features:

  • Full OpenTelemetry Metrics API compliance with all standard instrument types (Counter, UpDownCounter, Histogram, Gauge, and Observable variants)
  • OTLP export via http/protobuf (default) or http/json protocols
  • Configurable endpoint, headers, timeout, and export intervals via standard OTEL_EXPORTER_OTLP_METRICS_* environment variables
  • Periodic metric collection and aggregation with support for DELTA, CUMULATIVE, and LOWMEMORY temporality modes
  • Comprehensive test coverage (91.69%) with 39 integration tests

Configuration:

  • DD_METRICS_OTEL_ENABLED - Enable OpenTelemetry metrics (default: false)
  • OTEL_EXPORTER_OTLP_METRICS_ENDPOINT - Endpoint URL (default: http://localhost:4318/v1/metrics)
  • OTEL_EXPORTER_OTLP_METRICS_PROTOCOL - Protocol: http/protobuf or http/json
  • OTEL_METRIC_EXPORT_INTERVAL - Export interval in ms (default: 60000)
  • Additional timeout and header configuration options

Motivation

Enables customers to use OpenTelemetry Metrics API with dd-trace-js without adding the OpenTelemetry SDK as a dependency. Custom implementation provides better integration with dd-trace-js configurations, avoids vendoring grpc libraries and maintains flexibility.

Additional Notes

mabdinur and others added 30 commits October 14, 2025 12:27
- Add metrics.proto and metrics_service.proto (OTLP v1 spec)
- Update protobuf_loader to support metrics protos
- Rename protos/ -> otlp/ directory for better organization
- Create OtlpHttpExporterBase for shared HTTP export logic
- Create OtlpTransformerBase for shared transformation logic
- Refactor logs exporter/transformer to extend base classes
- Update test mocking paths
- Eliminates ~400 lines of duplication
dataPoint.timeUnixNano = timestamp
}

#aggregateHistogram (metric, value, attributes, attrKey, timestamp, stateKey, cumulativeState) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mabdinur mabdinur changed the title opentelemetry metrics [all changes] feat(otel): add support for otel metrics api via protobuf and json Oct 30, 2025
@pr-commenter
Copy link

pr-commenter bot commented Oct 30, 2025

Benchmarks

Benchmark execution time: 2025-11-03 20:03:42

Comparing candidate commit 822e4dc in PR branch munir/add-otel-metrics-configs with baseline commit 18f8a78 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1604 metrics, 66 unstable metrics.

@mabdinur mabdinur marked this pull request as ready for review November 3, 2025 14:52
@mabdinur mabdinur requested review from a team as code owners November 3, 2025 14:52
@mabdinur mabdinur requested review from BridgeAR and removed request for a team November 3, 2025 14:52
@github-actions
Copy link

github-actions bot commented Nov 3, 2025

Overall package size

Self size: 13.26 MB
Deduped: 117.32 MB
No deduping: 119.53 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.3.0 | 20.73 MB | 20.74 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.12.0 | 11.19 MB | 11.57 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.82 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @opentelemetry/resources | 1.9.1 | 306.54 kB | 1.74 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api-logs | 0.207.0 | 201.39 kB | 1.42 MB | | @opentelemetry/api | 1.9.0 | 1.22 MB | 1.22 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.15.0 | 127.66 kB | 856.24 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | @datadog/openfeature-node-server | 0.1.0-preview.13 | 106.46 kB | 424.36 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB | | escape-string-regexp | 5.0.0 | 3.66 kB | 3.66 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@datadog-official

This comment has been minimized.

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

❌ Patch coverage is 99.43020% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.93%. Comparing base (18f8a78) to head (822e4dc).

Files with missing lines Patch % Lines
...opentelemetry/metrics/otlp_http_metric_exporter.js 89.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6783      +/-   ##
==========================================
+ Coverage   83.62%   83.93%   +0.31%     
==========================================
  Files         506      514       +8     
  Lines       21373    21709     +336     
==========================================
+ Hits        17873    18222     +349     
+ Misses       3500     3487      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +476 to +477
Metrics are collected periodically and exported via OTLP over HTTP. The protocol can be configured using `OTEL_EXPORTER_OTLP_METRICS_PROTOCOL` or `OTEL_EXPORTER_OTLP_PROTOCOL` environment variables. Supported protocols are `http/protobuf` (default) and `http/json`. All metrics use delta aggregation temporality to match Datadog's data model. For complete OTLP exporter configuration options, see the [OpenTelemetry OTLP Exporter documentation](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text is totally fine, I think it would just be more straight forward in case the individual configuration has all entries listed right away instead of having a separate section with a longer text that describes that.
I would therefore just inline this content into the above variables besides the parts that apply across multiple envs.

target.otelLogsBatchTimeout = maybeInt(OTEL_BSP_SCHEDULE_DELAY)
target.otelLogsMaxExportBatchSize = maybeInt(OTEL_BSP_MAX_EXPORT_BATCH_SIZE)

const otelMetricsExporter = String(OTEL_METRICS_EXPORTER).toLowerCase() !== 'none'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
const otelMetricsExporter = String(OTEL_METRICS_EXPORTER).toLowerCase() !== 'none'
const otelMetricsExporter = !OTEL_METRICS_EXPORTER || OTEL_METRICS_EXPORTER.toLowerCase() !== 'none'

target.otelLogsMaxExportBatchSize = maybeInt(OTEL_BSP_MAX_EXPORT_BATCH_SIZE)

const otelMetricsExporter = String(OTEL_METRICS_EXPORTER).toLowerCase() !== 'none'
this.#setBoolean(target, 'otelMetricsEnabled', DD_METRICS_OTEL_ENABLED && otelMetricsExporter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some documentation about the none exporter deactivating the metrics?

Comment on lines +627 to +629
target.otelMetricsTimeout = maybeInt(OTEL_EXPORTER_OTLP_METRICS_TIMEOUT) || target.otelTimeout
target.otelMetricsExportTimeout = maybeInt(OTEL_METRIC_EXPORT_TIMEOUT)
target.otelMetricsExportInterval = maybeInt(OTEL_METRIC_EXPORT_INTERVAL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is zero allowed for any of these values?

if (OTEL_EXPORTER_OTLP_ENDPOINT || OTEL_EXPORTER_OTLP_METRICS_ENDPOINT) {
this.#setString(target, 'otelMetricsUrl', OTEL_EXPORTER_OTLP_METRICS_ENDPOINT || target.otelUrl)
}
this.#setString(target, 'otelMetricsHeaders', OTEL_EXPORTER_OTLP_METRICS_HEADERS || target.otelHeaders)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: these will mess with telemetry values for now, while that is an issue in lots of places and it will be resolved in another PR where we fix the telemetry (the issue is that the property will be defined by either another property or the env and that can not be differentiated for the telemetry being defined like that).

Comment on lines +126 to +130
#startTime

constructor (temporalityPreference = TEMPORALITY.DELTA) {
this.#temporalityPreference = temporalityPreference
this.#startTime = Number(process.hrtime.bigint())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#startTime
constructor (temporalityPreference = TEMPORALITY.DELTA) {
this.#temporalityPreference = temporalityPreference
this.#startTime = Number(process.hrtime.bigint())
#startTime = Number(process.hrtime.bigint())
constructor (temporalityPreference = TEMPORALITY.DELTA) {
this.#temporalityPreference = temporalityPreference

Comment on lines +177 to +190
if (!metricsMap.has(metricKey)) {
metricsMap.set(metricKey, {
name,
description,
unit,
type,
instrumentationScope,
temporality: this.#getTemporality(type),
data: [],
dataPointMap: new Map()
})
}

const metric = metricsMap.get(metricKey)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!metricsMap.has(metricKey)) {
metricsMap.set(metricKey, {
name,
description,
unit,
type,
instrumentationScope,
temporality: this.#getTemporality(type),
data: [],
dataPointMap: new Map()
})
}
const metric = metricsMap.get(metricKey)
let metric = metricsMap.get(metricKey)
if (!metric) {
metric = {
name,
description,
unit,
type,
instrumentationScope,
temporality: this.#getTemporality(type),
data: [],
dataPointMap: new Map()
}
metricsMap.set(metricKey, metric)
}


this.#applyDeltaTemporality(metrics, lastExportedState)

return metrics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a public return type or is it only used internally?


const scopeKey = this.#getScopeKey(instrumentationScope)
const metricKey = `${scopeKey}:${name}:${type}`
const attrKey = JSON.stringify(attributes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the result of JSON.stringify on an object as key is not stable.

E.g., { a: 1, b: 2 } is not equal to { b: 2, a: 1 }

You can use a library such as https://www.npmjs.com/package/safe-stable-stringify

This should also be checked in other parts of the code where we do similar things.

Please also add test cases for that.

}
}

delete metric.dataPointMap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am uncertain why that property is removed. Could you add a comment? I would also likely just set it to undefined anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants