OTel receiver for CockroachDB performance monitoring

**Is your feature request related to a problem? Please describe.**

Currently, there is no official OpenTelemetry receiver for CockroachDB that provides query-level performance observability. While CockroachDB exposes rich metrics through `crdb_internal.*` tables, users must build custom solutions to collect and export these metrics to modern observability platforms. This creates friction for teams wanting to monitor query performance, index usage, contention, and transaction statistics using OpenTelemetry-based tooling.

The problem becomes particularly acute when trying to:
- Track actual SQL query text alongside performance metrics (not just fingerprint IDs)
- Monitor statement-level latencies (parse, plan, run) for query optimization
- Identify contentious indexes and tables affecting application performance
- Integrate CockroachDB metrics into existing OpenTelemetry pipelines

**Describe the solution you'd like**

I've developed a production-ready OpenTelemetry receiver for CockroachDB that addresses these gaps: 
https://github.com/npcomplete777/cockroachdbreceiver

**Key capabilities:**
- Collects 40+ metrics from `crdb_internal.*` tables covering statements, transactions, indexes, contention, sessions, and cluster health
- Preserves actual SQL query text in metric attributes for human-readable observability
- Implements cardinality control via configurable query limits (default: top 20 queries by execution count)
- Distinguishes between production-safe and expensive metrics with clear documentation
- Supports both CockroachDB Serverless and self-hosted deployments
- Includes comprehensive configuration examples for production and non-production environments

**The request:**
Would CockroachLabs consider either:
1. **Taking ownership** of this receiver as an official community contribution, or
2. **Providing a code review** and guidance on best practices for querying `crdb_internal.*` tables, or
3. **Endorsing this as a community solution** if it meets CockroachDB's standards for observability tooling

This would help the CockroachDB community adopt OpenTelemetry more easily and provide a reference implementation for database observability patterns.

**Describe alternatives you've considered**

**Alternative 1: Prometheus Exporter**
- Exists but doesn't integrate with OpenTelemetry pipelines
- Requires separate infrastructure and configuration
- Doesn't provide the same level of query-level granularity

**Alternative 2: Manual Queries**
- Users write custom scripts to query `crdb_internal.*` tables
- Lacks standardization across organizations
- Requires maintaining custom code and handling schema changes

**Alternative 3: Datadog/New Relic Native Integrations**
- Vendor-locked solutions
- Doesn't work for teams using vendor-agnostic OpenTelemetry
- Limited customization of collected metrics

**Alternative 4: Using Built-in Metrics Endpoint**
- CockroachDB's `/_status/vars` endpoint provides Prometheus-format metrics
- But doesn't include query-level observability (actual SQL text, per-query latencies)
- Misses contention, index usage, and transaction-level insights

**Additional context**

**Code maturity:**
- Apache 2.0 licensed
- Includes unit tests, validation, and error handling
- Production configuration examples with security best practices
- Clear documentation distinguishing safe vs. expensive metrics

**Technical implementation:**
- Queries aggregated statistics tables (not raw data) to minimize overhead
- Implements connection pooling and configurable timeouts
- Uses OpenTelemetry Collector framework v0.136.0
- Compatible with CockroachDB v22.1+

**Community value:**
This receiver would benefit teams running CockroachDB who want to:
- Monitor database performance in Grafana/Prometheus/Dynatrace/Datadog using OpenTelemetry
- Correlate database metrics with application traces
- Implement query-level SLOs based on actual statement performance
- Troubleshoot contention and lock issues proactively

**Questions for the CockroachLabs team:**
1. Are there any concerns about query patterns used against `crdb_internal.*` tables?
2. Would you recommend any changes to make this more robust or performant?
3. Are there upcoming schema changes to `crdb_internal.*` that should be accounted for?
4. Would CockroachLabs be interested in maintaining this as an official receiver or community project?

**Example metric output:**
```
cockroachdb.statement.execution.count{query="SELECT * FROM users WHERE id = $1", app_name="api-server", database="production"} 15420
cockroachdb.statement.latency.service.mean{query="SELECT * FROM users WHERE id = $1"} 0.0023
cockroachdb.index.contention.events{database="production", table="orders", index="idx_user_id"} 47
```

I'm happy to collaborate on this and contribute it to the CockroachDB ecosystem if there is interest.

Jira issue: CRDB-55312

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OTel receiver for CockroachDB performance monitoring #155197

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OTel receiver for CockroachDB performance monitoring #155197

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions