-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Is your feature request related to a problem? Please describe.
Currently, there is no official OpenTelemetry receiver for CockroachDB that provides query-level performance observability. While CockroachDB exposes rich metrics through crdb_internal.* tables, users must build custom solutions to collect and export these metrics to modern observability platforms. This creates friction for teams wanting to monitor query performance, index usage, contention, and transaction statistics using OpenTelemetry-based tooling.
The problem becomes particularly acute when trying to:
- Track actual SQL query text alongside performance metrics (not just fingerprint IDs)
- Monitor statement-level latencies (parse, plan, run) for query optimization
- Identify contentious indexes and tables affecting application performance
- Integrate CockroachDB metrics into existing OpenTelemetry pipelines
Describe the solution you'd like
I've developed a production-ready OpenTelemetry receiver for CockroachDB that addresses these gaps:
https://github.com/npcomplete777/cockroachdbreceiver
Key capabilities:
- Collects 40+ metrics from
crdb_internal.*tables covering statements, transactions, indexes, contention, sessions, and cluster health - Preserves actual SQL query text in metric attributes for human-readable observability
- Implements cardinality control via configurable query limits (default: top 20 queries by execution count)
- Distinguishes between production-safe and expensive metrics with clear documentation
- Supports both CockroachDB Serverless and self-hosted deployments
- Includes comprehensive configuration examples for production and non-production environments
The request:
Would CockroachLabs consider either:
- Taking ownership of this receiver as an official community contribution, or
- Providing a code review and guidance on best practices for querying
crdb_internal.*tables, or - Endorsing this as a community solution if it meets CockroachDB's standards for observability tooling
This would help the CockroachDB community adopt OpenTelemetry more easily and provide a reference implementation for database observability patterns.
Describe alternatives you've considered
Alternative 1: Prometheus Exporter
- Exists but doesn't integrate with OpenTelemetry pipelines
- Requires separate infrastructure and configuration
- Doesn't provide the same level of query-level granularity
Alternative 2: Manual Queries
- Users write custom scripts to query
crdb_internal.*tables - Lacks standardization across organizations
- Requires maintaining custom code and handling schema changes
Alternative 3: Datadog/New Relic Native Integrations
- Vendor-locked solutions
- Doesn't work for teams using vendor-agnostic OpenTelemetry
- Limited customization of collected metrics
Alternative 4: Using Built-in Metrics Endpoint
- CockroachDB's
/_status/varsendpoint provides Prometheus-format metrics - But doesn't include query-level observability (actual SQL text, per-query latencies)
- Misses contention, index usage, and transaction-level insights
Additional context
Code maturity:
- Apache 2.0 licensed
- Includes unit tests, validation, and error handling
- Production configuration examples with security best practices
- Clear documentation distinguishing safe vs. expensive metrics
Technical implementation:
- Queries aggregated statistics tables (not raw data) to minimize overhead
- Implements connection pooling and configurable timeouts
- Uses OpenTelemetry Collector framework v0.136.0
- Compatible with CockroachDB v22.1+
Community value:
This receiver would benefit teams running CockroachDB who want to:
- Monitor database performance in Grafana/Prometheus/Dynatrace/Datadog using OpenTelemetry
- Correlate database metrics with application traces
- Implement query-level SLOs based on actual statement performance
- Troubleshoot contention and lock issues proactively
Questions for the CockroachLabs team:
- Are there any concerns about query patterns used against
crdb_internal.*tables? - Would you recommend any changes to make this more robust or performant?
- Are there upcoming schema changes to
crdb_internal.*that should be accounted for? - Would CockroachLabs be interested in maintaining this as an official receiver or community project?
Example metric output:
cockroachdb.statement.execution.count{query="SELECT * FROM users WHERE id = $1", app_name="api-server", database="production"} 15420
cockroachdb.statement.latency.service.mean{query="SELECT * FROM users WHERE id = $1"} 0.0023
cockroachdb.index.contention.events{database="production", table="orders", index="idx_user_id"} 47
I'm happy to collaborate on this and contribute it to the CockroachDB ecosystem if there is interest.
Jira issue: CRDB-55312