diff --git a/cockroachdb_dedicated/CHANGELOG.md b/cockroachdb_dedicated/CHANGELOG.md index 24727be649..67959562fd 100644 --- a/cockroachdb_dedicated/CHANGELOG.md +++ b/cockroachdb_dedicated/CHANGELOG.md @@ -1,2 +1,13 @@ # CHANGELOG - CockroachDB Dedicated +## 1.0.0 + +***Added***: + +* Versioning existing release. + +## 2.0.0 + +***Added***: + +* Limited preview dashboard for serverless. \ No newline at end of file diff --git a/cockroachdb_dedicated/README.md b/cockroachdb_dedicated/README.md index 8d0ce97748..4a09726459 100644 --- a/cockroachdb_dedicated/README.md +++ b/cockroachdb_dedicated/README.md @@ -1,16 +1,16 @@ # Cockroach Cloud ## Overview - -The Cockroach Cloud integration for Datadog enables data collection and alerting on a subset of CockroachDB metrics available at the [Prometheus endpoint][1], using the Datadog platform. - + +The CockroachDB Cloud integration for Datadog enables data collection and alerting on a subset of CockroachDB metrics using the Datadog platform. + ## Setup ### Installation To enable Datadog monitoring for a Cockroach Cloud cluster: -1. On the cluster's **Monitoring** page, click **Setup** in the **Datadog** panel. +1. On the cluster's **Monitoring** > [**Tools** page][14]. 2. Fill in the **API key** and **Datadog Site** fields with the corresponding values. - The **API key** is associated with your Datadog organization. If you don't have an API key to use with your Cockroach Cloud cluster, you need to create one. For instructions, see the [Datadog documentation][2]. @@ -22,16 +22,18 @@ To enable Datadog monitoring for a Cockroach Cloud cluster: ### Configuration -Open your Datadog [Dashboard List][5] and click `CockroachDB Dedicated Overview`. This out of the box dashboard presents metrics on CockroachDB Dedicated Overview. +Open your Datadog [Dashboard List][5]. There are two out of the box dashboards that present CockroachDB metrics +- CockroachDB Cloud Serverless (Limited Preview) +- CockroachDB Cloud Dedicated -To create your own Cockroach Cloud dashboard, you can either [clone][6] the default `CockroachDB Dedicated Overview` dashboard and edit the widgets, or [create a new dashboard][7]. +To create your own Cockroach Cloud dashboard, you can either [clone][6] the default `CockroachDB Cloud Dedicated` dashboard and edit the widgets, or [create a new dashboard][7]. -The [available metrics][8] are drawn directly from the CockroachDB [Prometheus endpoint][1] and are intended for use as building blocks for your own charts. +The [available metrics][8] are intended for use as building blocks for your own charts. To preview the metrics being collected, you can: - Click on your cluster's entry in the [Infrastructure List][4] to display time-series graphs for each available metric. -- Use the [Metrics Explorer][9] to search for and view `crdb_dedicated` metrics. +- Use the [Metrics Explorer][9] to search for and view `crdb_cloud` or `crdb_dedicated` metrics. ### Validation @@ -48,7 +50,7 @@ Metrics export from CockroachDB can be interrupted in the event of: - A stale API key. In this case, the integration status will be `Unhealthy`. To resolve the issue, [update your integration](#update-integration) with a new API key. - Transient CockroachDB unavailbility. In this case, the integration status will continue to be `Active`. To resolve the issue, try [deactivating](#deactivate-integration) and reactivating the integration from the **Datadog** panel. If this does not resolve the issue, [contact our support team][10]. -To monitor the health of metrics export, you can [create a custom Monitor](#monitor-health-of-metrics-export) in Datadog. +To monitor the health of metrics export, you can create a custom Monitor in Datadog. ### Update integration @@ -72,31 +74,32 @@ After deactivating an integration, the metrics data will remain in Datadog for a ### Metrics -See [metadata.csv][13] for a list of metrics provided by this integration. +- `crdb_cloud` & `crdb_dedicated` [Metrics][13] ### Service Checks -The CockroachDB Dedicated integration does not include any service checks. +The Cockroach Cloud integration does not include any service checks. ### Events -The CockroachDB Dedicated integration does not include any events. +The Cockroach Cloud integration does not include any events. ## Support Need help? Contact [Datadog support][12]. -[1]: https://www.cockroachlabs.com/docs/stable/monitoring-and-alerting.html#prometheus-endpoint +[1]: https://www.cockroachlabs.com/docs/cockroachcloud/essential-metrics [2]: https://docs.datadoghq.com/account_management/api-app-keys/#add-an-api-key-or-client-token [3]: https://docs.datadoghq.com/getting_started/site/ [4]: https://docs.datadoghq.com/infrastructure/list/ -[5]: https://docs.datadoghq.com/dashboards/#dashboard-list -[6]: https://docs.datadoghq.com/dashboards/#clone-dashboard +[5]: https://app.datadoghq.com/dashboard/lists +[6]: https://docs.datadoghq.com/dashboards/configure/#configuration-actions [7]: https://docs.datadoghq.com/dashboards/#new-dashboard -[8]: https://docs.datadoghq.com/integrations/cockroachdb_dedicated +[8]: https://docs.datadoghq.com/integrations/cockroachdb_dedicated/#data-collected [9]: https://docs.datadoghq.com/metrics/explorer/ [10]: https://support.cockroachlabs.com/ [11]: https://docs.datadoghq.com/developers/guide/data-collection-resolution-retention/ [12]: https://docs.datadoghq.com/help/ [13]: https://github.com/DataDog/integrations-extras/blob/master/cockroachdb_dedicated/metadata.csv +[14]: https://www.cockroachlabs.com/docs/cockroachcloud/tools-page diff --git a/cockroachdb_dedicated/assets/dashboards/cockroach_cloud_overview.json b/cockroachdb_dedicated/assets/dashboards/cockroach_cloud_dedicated_overview.json similarity index 98% rename from cockroachdb_dedicated/assets/dashboards/cockroach_cloud_overview.json rename to cockroachdb_dedicated/assets/dashboards/cockroach_cloud_dedicated_overview.json index bfbd02bbb9..1fe63e2a9c 100644 --- a/cockroachdb_dedicated/assets/dashboards/cockroach_cloud_overview.json +++ b/cockroachdb_dedicated/assets/dashboards/cockroach_cloud_dedicated_overview.json @@ -1,6 +1,6 @@ { - "title": "CockroachDB Dedicated Overview", - "description": "## CockroachDB Dedicated Overview\n\nThis dashboard provides a high-level view of your CockroachDB Dedicated cluster, including:\nA high-level view of SQL performance & latency.\n- Information about resource consumption to help aid in capacity planning.\n- Ability to drill down to specific nodes (identified by a (node, region) tag pair) within your cluster.", + "title": "CockroachDB Cloud Dedicated", + "description": "## CockroachDB Cloud Dedicated Overview\n\nThis dashboard provides a high-level view of your CockroachDB Dedicated cluster, including:\nA high-level view of SQL performance & latency.\n- Information about resource consumption to help aid in capacity planning.\n- Ability to drill down to specific nodes (identified by a (node, region) tag pair) within your cluster.", "widgets": [ { "id": 8635810201263258, diff --git a/cockroachdb_dedicated/assets/dashboards/cockroach_cloud_serverless_overview.json b/cockroachdb_dedicated/assets/dashboards/cockroach_cloud_serverless_overview.json new file mode 100644 index 0000000000..4de2dc38af --- /dev/null +++ b/cockroachdb_dedicated/assets/dashboards/cockroach_cloud_serverless_overview.json @@ -0,0 +1,2552 @@ +{ + "title": "CockroachDB Cloud Serverless (Limited Preview)", + "description": "## CockroachDB Cloud Serverless (Limited Preview)\n\nThis dashboard provides a high-level view of your CockroachDB Serverless cluster, including:\nA high-level view of SQL performance & latency.\n- Information about resource consumption to help aid in capacity planning.", + "widgets": [ + { + "id": 8635810201263258, + "definition": { + "title": "New group", + "banner_img": "https://res.infoq.com/news/2021/10/cockroachdb-serverless/en/headerimage/cockroach_db_header-1634549116493.jpg", + "show_title": false, + "type": "group", + "layout_type": "ordered", + "widgets": [ + { + "id": 7027263531960678, + "definition": { + "type": "note", + "content": "This dashboard provides a high-level view of your CockroachDB cluster, including:\n- A high-level view of SQL performance & latency.\n- Information about resource consumption to help aid in capacity planning.\n- Ability to drill down to specific regions within your cluster.", + "background_color": "transparent", + "font_size": "16", + "text_align": "left", + "vertical_align": "center", + "show_tick": false, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": false + }, + "layout": { + "x": 0, + "y": 0, + "width": 3, + "height": 4 + } + }, + { + "id": 8131082657770196, + "definition": { + "type": "note", + "content": "#### Further reading on the Cockroach Cloud integration:\n- [Datadog's docs on the Cockroach Cloud integration](https://docs.datadoghq.com/integrations/cockroachdb_dedicated)\n- [Cockroach Labs' Datadog integration docs](https://www.cockroachlabs.com/docs/cockroachcloud/monitoring-page.html#monitor-with-datadog)\n", + "background_color": "transparent", + "font_size": "16", + "text_align": "left", + "vertical_align": "center", + "show_tick": false, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 3, + "y": 0, + "width": 3, + "height": 4 + } + } + ] + }, + "layout": { + "x": 0, + "y": 0, + "width": 6, + "height": 7 + } + }, + { + "id": 782550878758906, + "definition": { + "title": "Activity Summary", + "background_color": "blue", + "show_title": true, + "type": "group", + "layout_type": "ordered", + "widgets": [ + { + "id": 7838830465398464, + "definition": { + "title": "SQL Statements, P99", + "title_size": "16", + "title_align": "left", + "type": "query_value", + "requests": [ + { + "formulas": [ + { + "formula": "query1 / 1000000" + } + ], + "response_format": "scalar", + "queries": [ + { + "query": "avg:crdb_cloud.sql.service.latency{$Cluster}", + "data_source": "metrics", + "name": "query1", + "aggregator": "avg" + } + ] + } + ], + "autoscale": false, + "custom_unit": "ms", + "precision": 2 + }, + "layout": { + "x": 0, + "y": 0, + "width": 6, + "height": 3 + } + }, + { + "id": 2526915054656628, + "definition": { + "title": "Service Latency: 99th Percentile", + "title_size": "16", + "title_align": "left", + "type": "query_value", + "requests": [ + { + "formulas": [ + { + "formula": "query1 / 1000000" + } + ], + "response_format": "scalar", + "queries": [ + { + "query": "avg:crdb_cloud.sql.txn.latency{$Cluster}", + "data_source": "metrics", + "name": "query1", + "aggregator": "avg" + } + ] + } + ], + "autoscale": false, + "custom_unit": "ms", + "precision": 2 + }, + "layout": { + "x": 0, + "y": 3, + "width": 6, + "height": 3 + } + } + ] + }, + "layout": { + "x": 6, + "y": 0, + "width": 6, + "height": 7 + } + }, + { + "id": 8533260048455964, + "definition": { + "title": "Overview", + "background_color": "blue", + "show_title": true, + "type": "group", + "layout_type": "ordered", + "widgets": [ + { + "id": 4951811637444188, + "definition": { + "title": "SQL Connections", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Connections Per Second", + "formula": "default_zero(per_second(query1))" + } + ], + "queries": [ + { + "query": "avg:crdb_cloud.sql.conns{$Cluster}.rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ], + "yaxis": { + "include_zero": true, + "scale": "linear" + }, + "markers": [] + }, + "layout": { + "x": 0, + "y": 0, + "width": 4, + "height": 3 + } + }, + { + "id": 806253703222822, + "definition": { + "type": "note", + "content": "Rate of SQL connection attempts.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 0, + "width": 2, + "height": 3 + } + }, + { + "id": 6743199474847626, + "definition": { + "title": "SQL Statements", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "min", + "max", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Insert", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "avg:crdb_cloud.sql.insert.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Select", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "max:crdb_cloud.sql.select.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Update", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "avg:crdb_cloud.sql.update.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Delete", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "avg:crdb_cloud.sql.delete.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 0, + "width": 4, + "height": 3 + } + }, + { + "id": 7369727806832230, + "definition": { + "type": "note", + "content": "A moving average of the number of SELECT, INSERT, UPDATE, and DELETE statements successfully executed per second.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 0, + "width": 2, + "height": 3 + } + }, + { + "id": 6827499129241152, + "definition": { + "title": "Service Latency: 99th Percentile", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "p99", + "formula": "default_zero(query1) / 1000000", + "number_format": { + "unit": { + "type": "canonical_unit", + "unit_name": "millisecond" + } + } + } + ], + "queries": [ + { + "query": "max:crdb_cloud.sql.service.latency{$Cluster}", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 3, + "width": 4, + "height": 3 + } + }, + { + "id": 1249303266221180, + "definition": { + "type": "note", + "content": "Over the last minute, this cluster executed 99% of SQL statements within this time. This time only includes SELECT, INSERT, UPDATE and DELETE statements and does not include network latency between the cluster and client.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 3, + "width": 2, + "height": 3 + } + }, + { + "id": 7656853063317864, + "definition": { + "title": "Request Units", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "RU", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.tenant.sql_usage.request_units{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 3, + "width": 4, + "height": 3 + } + }, + { + "id": 1737414654900128, + "definition": { + "type": "note", + "content": "The CPU and I/O resources being used by queries in the cluster. Simple queries consume few RUs, while complicated queries with many reads and writes consume more RUs.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 3, + "width": 2, + "height": 3 + } + }, + { + "id": 2159470479294726, + "definition": { + "title": "Writes", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Bytes", + "number_format": { + "unit": { + "type": "canonical_unit", + "unit_name": "kilobyte" + } + }, + "formula": "query3 / 1024" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.storage_bytes{$Cluster}", + "data_source": "metrics", + "name": "query3" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 6, + "width": 4, + "height": 3 + } + }, + { + "id": 4627115609898984, + "definition": { + "type": "note", + "content": "The amount of data being stored in the cluster. This is the logical number of live bytes and does not account for compression or replication.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 6, + "width": 2, + "height": 3 + } + } + ] + }, + "layout": { + "x": 0, + "y": 7, + "width": 12, + "height": 10 + } + }, + { + "id": 8003220898607202, + "definition": { + "title": "Request Units", + "background_color": "blue", + "show_title": true, + "type": "group", + "layout_type": "ordered", + "widgets": [ + { + "id": 4049159710202292, + "definition": { + "title": "Request Units", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "RU", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "avg:crdb_cloud.tenant.sql_usage.request_units{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 0, + "width": 4, + "height": 3 + } + }, + { + "id": 7954069862193936, + "definition": { + "type": "note", + "content": "The CPU and I/O resources being used by queries in the cluster. Simple queries consume few RUs, while complicated queries with many reads and writes consume more RUs.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 0, + "width": 2, + "height": 3 + } + }, + { + "id": 1416042771744362, + "definition": { + "title": "CPU", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "cpu", + "formula": "default_zero(query1) * 333.3333" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.tenant.sql_usage.sql_pods_cpu_seconds{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 0, + "width": 4, + "height": 3 + } + }, + { + "id": 1971364057759994, + "definition": { + "type": "note", + "content": "The number of RUs that were consumed because of SQL CPU usage. Correlate this metric with Request Units (RUs) and determine if your workload is CPU bound.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 0, + "width": 2, + "height": 3 + } + }, + { + "id": 7529810084675084, + "definition": { + "title": "Egress", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Client traffic", + "formula": "default_zero(query3) / 1024" + }, + { + "alias": "Bulk I/O operations", + "formula": "default_zero(query1) / 1024" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.tenant.sql_usage.pgwire_egress_bytes{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query3" + }, + { + "query": "sum:crdb_cloud.tenant.sql_usage.external_io_egress_bytes{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 3, + "width": 4, + "height": 3 + } + }, + { + "id": 5002177819522632, + "definition": { + "type": "note", + "content": "The number of RUs that were consumed because of byte traffic to the client and cluster bulk I/O operations (e.g., CDC). Correlate this metric with Request Units (RUs).", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 3, + "width": 2, + "height": 3 + } + }, + { + "id": 2008636780629738, + "definition": { + "title": "Reads", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Requests", + "formula": "default_zero(query1) / 8" + }, + { + "alias": "Batches", + "formula": "default_zero(query2) / 2" + }, + { + "alias": "Bytes", + "formula": "default_zero(query3) / 1024 / 64" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.tenant.sql_usage.read_requests{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + }, + { + "query": "sum:crdb_cloud.tenant.sql_usage.read_batches{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query2" + }, + { + "query": "sum:crdb_cloud.tenant.sql_usage.read_bytes{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query3" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 3, + "width": 4, + "height": 3 + } + }, + { + "id": 239677142875976, + "definition": { + "type": "note", + "content": "The number of RUs that were consumed due to KV reads, broken down by requests, batches, and bytes. SQL statements are translated into lower-level KV read requests that are sent in batches. Correlate these metrics with Request Units (RUs).", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 3, + "width": 2, + "height": 3 + } + }, + { + "id": 3964940250968876, + "definition": { + "title": "Writes", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Requests", + "formula": "default_zero(query1)" + }, + { + "alias": "Batches", + "formula": "default_zero(query2)" + }, + { + "alias": "Bytes", + "formula": "default_zero(query3) / 1024" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.tenant.sql_usage.write_requests{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + }, + { + "query": "sum:crdb_cloud.tenant.sql_usage.write_batches{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query2" + }, + { + "query": "sum:crdb_cloud.tenant.sql_usage.write_bytes{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query3" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 6, + "width": 4, + "height": 3 + } + }, + { + "id": 8795041787820418, + "definition": { + "type": "note", + "content": "The number of RUs that were consumed due to KV writes, broken down by requests, batches, and bytes. SQL statements are translated into lower-level KV write requests that are sent in batches. Correlate these metrics with Request Units (RUs).", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 6, + "width": 2, + "height": 3 + } + }, + { + "id": 3138716994097406, + "definition": { + "title": "Cross-region Networking", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Network traffic", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.tenant.sql_usage.cross_region_network_ru{$Cluster}.as_rate()", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 6, + "width": 4, + "height": 3 + } + }, + { + "id": 3764683764771466, + "definition": { + "type": "note", + "content": "The number of RUs that were consumed due to cross-region networking. Correlate these metrics with Request Units (RUs).", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 6, + "width": 2, + "height": 3 + } + } + ] + }, + "layout": { + "x": 0, + "y": 17, + "width": 12, + "height": 10, + "is_column_break": true + } + }, + { + "id": 6847880046439812, + "definition": { + "title": "SQL", + "background_color": "blue", + "show_title": true, + "type": "group", + "layout_type": "ordered", + "widgets": [ + { + "id": 7977558114644004, + "definition": { + "title": "SQL Connection Rate", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Connections Per Second", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.new_conns.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 0, + "width": 4, + "height": 3 + } + }, + { + "id": 1928283031242920, + "definition": { + "type": "note", + "content": "Rate of SQL connection attempts.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 0, + "width": 2, + "height": 3 + } + }, + { + "id": 4612192271915838, + "definition": { + "title": "Connection Latency: 99th Percentile", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "p99", + "number_format": { + "unit": { + "type": "canonical_unit", + "unit_name": "millisecond" + } + }, + "formula": "default_zero(query1) / 1000000" + } + ], + "queries": [ + { + "query": "max:crdb_cloud.sql.conn.latency{$Cluster}", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 0, + "width": 4, + "height": 3 + } + }, + { + "id": 8077656645051682, + "definition": { + "type": "note", + "content": "Latency to establish and authenticate a SQL connection.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 0, + "width": 2, + "height": 3 + } + }, + { + "id": 8387088883894436, + "definition": { + "title": "Open SQL Sessions", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "horizontal", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Connections", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.conns{$Cluster}", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 3, + "width": 4, + "height": 3 + } + }, + { + "id": 4218362510234354, + "definition": { + "type": "note", + "content": "The total number of open SQL Sessions.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 3, + "width": 2, + "height": 3 + } + }, + { + "id": 5923134192625778, + "definition": { + "title": "Open SQL Transactions", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "horizontal", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Open Transactions", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.txns.open{$Cluster}.rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 3, + "width": 4, + "height": 3 + } + }, + { + "id": 1245930991391368, + "definition": { + "type": "note", + "content": "The total number of open SQL transactions.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 3, + "width": 2, + "height": 3 + } + }, + { + "id": 6674462475472702, + "definition": { + "title": "Transactions", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "horizontal", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Begin", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.txn.begin.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Commit", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.txn.commit.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Rollback", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.txn.rollback.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Abort", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.txn.abort.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 6, + "width": 4, + "height": 3 + } + }, + { + "id": 1843598916208238, + "definition": { + "type": "note", + "content": "The total number of transactions initiated, committed, rolled back, or aborted per second.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 6, + "width": 2, + "height": 3 + } + }, + { + "id": 7157959503980258, + "definition": { + "title": "Transaction Restarts", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "horizontal", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Write Too Old", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.txn.restarts.writetooold{$Cluster}.as_rate()", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Forwarded Timestamp", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.txn.restarts.serializable{$Cluster}.as_rate()", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 6, + "width": 4, + "height": 3 + } + }, + { + "id": 4030524918892762, + "definition": { + "type": "note", + "content": "The number of transactions restarted broken down by errors. Refer to the transaction retry error reference.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 6, + "width": 2, + "height": 3 + } + }, + { + "id": 6375983941304400, + "definition": { + "title": "Transaction Latency: 99th Percentile", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "p99", + "number_format": { + "unit": { + "type": "canonical_unit", + "unit_name": "millisecond" + } + }, + "formula": "default_zero(query1) / 1000000" + } + ], + "queries": [ + { + "query": "max:crdb_cloud.sql.txn.latency{$Cluster}.rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 9, + "width": 4, + "height": 3 + } + }, + { + "id": 1827683661232408, + "definition": { + "type": "note", + "content": "Over the last minute, this cluster executed 99% of transactions within this time. This time does not include network latency between the cluster and client.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 9, + "width": 2, + "height": 3 + } + }, + { + "id": 2559283592141386, + "definition": { + "title": "Active SQL Statements", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "horizontal", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Active Statements", + "formula": "query1" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.statements.active{$Cluster}.rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 9, + "width": 4, + "height": 3 + } + }, + { + "id": 905663155580656, + "definition": { + "type": "note", + "content": "The total number of running SQL statements.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 9, + "width": 2, + "height": 3 + } + }, + { + "id": 7191057770262672, + "definition": { + "title": "Service Latency: 99th Percentile", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "p99", + "number_format": { + "unit": { + "type": "canonical_unit", + "unit_name": "millisecond" + } + }, + "formula": "default_zero(query1) / 1000000" + } + ], + "queries": [ + { + "query": "max:crdb_cloud.sql.service.latency{$Cluster}.rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 12, + "width": 4, + "height": 3 + } + }, + { + "id": 4220588608277214, + "definition": { + "type": "note", + "content": "Over the last minute, this cluster executed 99% of SQL statements within this time. This time only includes SELECT, INSERT, UPDATE and DELETE statements and does not include network latency between the cluster and client.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 12, + "width": 2, + "height": 3 + } + }, + { + "id": 5812011552785118, + "definition": { + "title": "SQL Statements", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "min", + "max", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Insert", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "avg:crdb_cloud.sql.insert.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Select", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "max:crdb_cloud.sql.select.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Update", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "avg:crdb_cloud.sql.update.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + }, + { + "formulas": [ + { + "alias": "Delete", + "formula": "default_zero(query0)" + } + ], + "queries": [ + { + "query": "avg:crdb_cloud.sql.delete.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query0" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 12, + "width": 4, + "height": 3 + } + }, + { + "id": 231133543355080, + "definition": { + "type": "note", + "content": "A moving average of the number of SELECT, INSERT, UPDATE, and DELETE statements successfully executed per second.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 12, + "width": 2, + "height": 3 + } + }, + { + "id": 8153798343825926, + "definition": { + "title": "SQL Statement Errors", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "horizontal", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Errors", + "formula": "query1" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.failure.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 15, + "width": 4, + "height": 3 + } + }, + { + "id": 6960575227640992, + "definition": { + "type": "note", + "content": "The number of statements which returned a planning, runtime, or client-side retry error.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 15, + "width": 2, + "height": 3 + } + }, + { + "id": 4131244288804984, + "definition": { + "title": "SQL Statement Contention", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "horizontal", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Contention", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "avg:crdb_cloud.sql.distsql.contended.queries.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 15, + "width": 4, + "height": 3 + } + }, + { + "id": 4161385216355412, + "definition": { + "type": "note", + "content": "The total number of SQL statements that experienced contention.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 15, + "width": 2, + "height": 3 + } + }, + { + "id": 4360482651358524, + "definition": { + "title": "Full Scans", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "horizontal", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Full Scans", + "formula": "default_zero(query1)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.full.scan.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 18, + "width": 4, + "height": 3 + } + }, + { + "id": 3884820359029682, + "definition": { + "type": "note", + "content": "The total number of full table/index scans.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 18, + "width": 2, + "height": 3 + } + }, + { + "id": 5555927670927358, + "definition": { + "title": "Schema Changes", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "horizontal", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "DDL Statements", + "formula": "query1" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.sql.ddl.count{$Cluster}.as_rate().rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 18, + "width": 4, + "height": 3 + } + }, + { + "id": 1715322795282292, + "definition": { + "type": "note", + "content": "The total number of DDL statements per second.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 18, + "width": 2, + "height": 3 + } + } + ] + }, + "layout": { + "x": 0, + "y": 27, + "width": 12, + "height": 22 + } + }, + { + "id": 678470534448650, + "definition": { + "title": "Changefeeds", + "background_color": "blue", + "show_title": true, + "type": "group", + "layout_type": "ordered", + "widgets": [ + { + "id": 5203120089548500, + "definition": { + "title": "Changefeed Status", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Running", + "formula": "default_zero(query1)" + }, + { + "alias": "Failures", + "formula": "default_zero(query2)" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.changefeed.running{$Cluster}.as_count()", + "data_source": "metrics", + "name": "query1" + }, + { + "query": "sum:crdb_cloud.changefeed.failures{$Cluster}.as_count()", + "data_source": "metrics", + "name": "query2" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 0, + "width": 4, + "height": 3 + } + }, + { + "id": 6766355742123040, + "definition": { + "type": "note", + "content": "The number of all changefeeds by status - Running, Paused, and Failed. The currently running changefeeds includes sinkless.\n\nMonitor and alert on the Paused changefeeds metric to safeguard against an inadvertent operational error of leaving a changefeed job in a paused state for an extended period of time. Changefeed jobs should not be paused for a long time because the protected timestamp prevents garbage collection.\n\nThe Failed changefeeds metrics tracks the permanent changefeed job failures that the jobs system will not try to restart. Any increase in this counter should be investigated. An alert on this metric is recommended.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 0, + "width": 2, + "height": 3 + } + }, + { + "id": 5591536992130460, + "definition": { + "title": "Emitted Messages", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Emitted Messages", + "formula": "query1" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.changefeed.emitted.messages{$Cluster}.as_count()", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 0, + "width": 4, + "height": 3 + } + }, + { + "id": 1286682169563362, + "definition": { + "type": "note", + "content": "The total number of messages emitted by all changefeeds.\n\nThis metric characterizes the rate of changes being streamed from the CockroachDB cluster.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 0, + "width": 2, + "height": 3 + } + }, + { + "id": 1200226066770774, + "definition": { + "title": "Commit Latency", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "p99", + "number_format": { + "unit": { + "type": "canonical_unit", + "unit_name": "millisecond" + } + }, + "formula": "default_zero(query1) / 1000000" + } + ], + "queries": [ + { + "query": "max:crdb_cloud.changefeed.commit.latency{$Cluster}.rollup(sum, 30)", + "data_source": "metrics", + "name": "query1" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 0, + "y": 3, + "width": 4, + "height": 3 + } + }, + { + "id": 166111290403902, + "definition": { + "type": "note", + "content": "The difference between the event MVCC timestamp and the time it was acknowledged by the downstream sink. If the sink batches events, then the difference between the oldest event in the batch and acknowledgement is recorded. Latency during backfill is excluded.\n\nThis metric characterizes the end-to-end lag between a committed change and that change applied at the destination.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 4, + "y": 3, + "width": 2, + "height": 3 + } + }, + { + "id": 6109186366820708, + "definition": { + "title": "Retryable Errors", + "title_size": "16", + "title_align": "left", + "show_legend": true, + "legend_layout": "auto", + "legend_columns": [ + "avg", + "min", + "max", + "value", + "sum" + ], + "type": "timeseries", + "requests": [ + { + "formulas": [ + { + "alias": "Retryable Errors", + "formula": "query3" + } + ], + "queries": [ + { + "query": "sum:crdb_cloud.changefeed.error.retries{$Cluster}.as_count()", + "data_source": "metrics", + "name": "query3" + } + ], + "response_format": "timeseries", + "style": { + "palette": "dog_classic", + "line_type": "solid", + "line_width": "normal" + }, + "display_type": "line" + } + ] + }, + "layout": { + "x": 6, + "y": 3, + "width": 4, + "height": 3 + } + }, + { + "id": 2795017174881922, + "definition": { + "type": "note", + "content": "The total number of retryable errors encountered by all changefeeds.\n\nThis metric tracks transient changefeed errors. Alert on \"too many\" errors, such as 50 retries in 15 minutes. For example, during a rolling upgrade this counter will increase because the changefeed jobs will restart following node restarts. There is an exponential backoff, up to 10 minutes. But if there is no rolling upgrade in process or other cluster maintenance, and the error rate is high, investigate the changefeed job.", + "background_color": "yellow", + "font_size": "14", + "text_align": "left", + "vertical_align": "center", + "show_tick": true, + "tick_pos": "50%", + "tick_edge": "left", + "has_padding": true + }, + "layout": { + "x": 10, + "y": 3, + "width": 2, + "height": 3 + } + } + ] + }, + "layout": { + "x": 0, + "y": 49, + "width": 12, + "height": 7 + } + } + ], + "template_variables": [ + { + "name": "Cluster", + "prefix": "cluster", + "available_values": [], + "default": "*" + } + ], + "layout_type": "ordered", + "notify_list": [], + "reflow_type": "fixed" +} \ No newline at end of file diff --git a/cockroachdb_dedicated/manifest.json b/cockroachdb_dedicated/manifest.json index 5a4b601858..f15637f07c 100644 --- a/cockroachdb_dedicated/manifest.json +++ b/cockroachdb_dedicated/manifest.json @@ -9,7 +9,7 @@ "support": "README.md#Support", "changelog": "CHANGELOG.md", "description": "Send your Cockroach Cloud metrics to DataDog.", - "title": "CockroachDB Dedicated", + "title": "Cockroach Cloud", "media": [], "classifier_tags": [ "Category::Data Stores", @@ -32,8 +32,11 @@ "creates_events": false }, "metrics": { - "prefix": "crdb_dedicated.", - "check": "crdb_dedicated.sys.rss", + "prefix": "crdb_", + "check": [ + "crdb_dedicated.sys.uptime", + "crdb_cloud.sys.uptime" + ], "metadata_path": "metadata.csv" }, "service_checks": { @@ -43,7 +46,8 @@ "auto_install": true }, "dashboards": { - "cockroach_cloud_overview": "assets/dashboards/cockroach_cloud_overview.json" + "cockroach_cloud_dedicated_overview": "assets/dashboards/cockroach_cloud_dedicated_overview.json", + "cockroach_cloud_serverless_overview": "assets/dashboards/cockroach_cloud_serverless_overview.json" }, "logs": { "source": "cockroach-cloud" diff --git a/cockroachdb_dedicated/metadata.csv b/cockroachdb_dedicated/metadata.csv index c8bc7c7138..94821db2d1 100644 --- a/cockroachdb_dedicated/metadata.csv +++ b/cockroachdb_dedicated/metadata.csv @@ -237,3 +237,82 @@ crdb_dedicated.txn.restarts.serializable,count,,unit,,cockroachdb_dedicated,Numb crdb_dedicated.txn.restarts.writetooold,count,,unit,,cockroachdb_dedicated,Number of restarts due to a concurrent writer committing first,0,, crdb_dedicated.valbytes,gauge,,byte,,cockroachdb_dedicated,Number of bytes taken up by values. Shown as byte,0,, crdb_dedicated.valcount,gauge,,unit,,cockroachdb_dedicated,Count of all values,0,, +crdb_cloud.changefeed.backfill.count,gauge,,unit,,cockroachdb_dedicated,Number of changefeeds currently executing backfill. Shown as count.,0,, +crdb_cloud.changefeed.backfill.pending.ranges,gauge,,unit,,cockroachdb_dedicated,Number of ranges in an ongoing backfill that are yet to be fully emitted. Shown as count,0,, +crdb_cloud.changefeed.commit.latency,gauge,,unit,,cockroachdb_dedicated,"Event commit latency: a difference between event MVCC timestamp and the time it was acknowledged by the downstream sink. If the sink batches events, then the difference between the oldest event in the batch and acknowledgement is recorded. Excludes latency during backfill. Shown as nanoseconds.",0,, +crdb_cloud.changefeed.emitted.messages,count,,unit,,cockroachdb_dedicated,Messages emitted by all feeds. Shown as count.,0,, +crdb_cloud.changefeed.error.retries,count,,unit,,cockroachdb_dedicated,Total retryable errors encountered by all changefeeds. Shown as count.,0,, +crdb_cloud.changefeed.failures,count,,unit,,cockroachdb_dedicated,Total number of changefeed jobs which have failed. Shown as count.,0,, +crdb_cloud.changefeed.max.behind.nanos,gauge,,nanosecond,,cockroachdb_dedicated,Largest commit-to-emit duration of any running feed. Shown as nanoseconds.,0,, +crdb_cloud.changefeed.message.size.hist,gauge,,byte,,cockroachdb_dedicated,Histogram of message sizes for changefeeds. Shown as bytes.,0,, +crdb_cloud.changefeed.running,gauge,,unit,,cockroachdb_dedicated,"Number of currently running changefeeds, including sinkless. Shown as count.",0,, +crdb_cloud.clock.offset.meannanos,gauge,,nanosecond,,cockroachdb_dedicated,Mean clock offset with other nodes in nanoseconds. Shown as nanosecond,0,, +crdb_cloud.clock.offset.stddevnanos,gauge,,nanosecond,,cockroachdb_dedicated,Stdddev clock offset with other nodes in nanoseconds. Shown as nanosecond,0,, +crdb_cloud.distsender.batches,count,,,,cockroachdb_dedicated,Number of batches processed,0,, +crdb_cloud.distsender.batches.partial,count,,,,cockroachdb_dedicated,Number of partial batches processed,0,, +crdb_cloud.distsender.errors.notleaseholder,count,,error,,cockroachdb_dedicated,Number of NotLeaseHolderErrors encountered. Shown as error,0,, +crdb_cloud.distsender.rpc.sent,count,,request,,cockroachdb_dedicated,Number of RPCs sent,0,, +crdb_cloud.distsender.rpc.sent.local,count,,request,,cockroachdb_dedicated,Number of local RPCs sent,0,, +crdb_cloud.distsender.rpc.sent.nextreplicaerror,count,,request,,cockroachdb_dedicated,Number of RPCs sent due to per-replica errors. Shown as error,0,, +crdb_cloud.jobs.changefeed.resume.retry.error,count,,unit,,cockroachdb_dedicated,Number of changefeed jobs which failed with a retriable error. Shown as count.,0,, +crdb_cloud.requests.slow.distsender,gauge,,request,,cockroachdb_dedicated,Number of requests that have been stuck for a long time in the dist sender. Shown as request,0,, +crdb_cloud.round_trip.latency,count,,nanosecond,,cockroachdb_dedicated,Distribution of round-trip latencies with other nodes in nanoseconds. Shown as nanosecond,0,, +crdb_cloud.sql.bytesin,count,,byte,,cockroachdb_dedicated,Number of sql bytes received. Shown as byte,0,, +crdb_cloud.sql.bytesout,count,,byte,,cockroachdb_dedicated,Number of sql bytes sent. Shown as byte,0,, +crdb_cloud.sql.conn.latency,count,,nanosecond,,cockroachdb_dedicated,Latency to establish and authenticate a SQL connection. Shown as nanoseconds.,0,, +crdb_cloud.sql.conns,gauge,,connection,,cockroachdb_dedicated,Number of active sql connections. Shown as connection,0,, +crdb_cloud.sql.ddl.count,count,,query,,cockroachdb_dedicated,Number of SQL DDL statements,0,, +crdb_cloud.sql.delete.count,count,,query,,cockroachdb_dedicated,Number of SQL DELETE statements,0,, +crdb_cloud.sql.distsql.contended.queries.count,count,,query,,cockroachdb_dedicated,Number of SQL queries that experienced contention. Shown as count.,0,, +crdb_cloud.sql.distsql.exec.latency,count,,nanosecond,,cockroachdb_dedicated,Latency in nanoseconds of DistSQL statement execution. Shown as nanosecond,0,, +crdb_cloud.sql.distsql.flows.active,gauge,,query,,cockroachdb_dedicated,Number of distributed SQL flows currently active,0,, +crdb_cloud.sql.distsql.flows.total,count,,query,,cockroachdb_dedicated,Number of distributed SQL flows executed,0,, +crdb_cloud.sql.distsql.queries.active,gauge,,query,,cockroachdb_dedicated,Number of distributed SQL queries currently active,0,, +crdb_cloud.sql.distsql.queries.total,count,,query,,cockroachdb_dedicated,Number of distributed SQL queries executed,0,, +crdb_cloud.sql.distsql.select.count,count,,unit,,cockroachdb_dedicated,Number of DistSQL SELECT statements,0,, +crdb_cloud.sql.distsql.service.latency,count,,nanosecond,,cockroachdb_dedicated,Latency in nanoseconds of DistSQL request execution. Shown as nanosecond,0,, +crdb_cloud.sql.exec.latency,count,,nanosecond,,cockroachdb_dedicated,Latency in nanoseconds of SQL statement execution. Shown as nanosecond,0,, +crdb_cloud.sql.failure.count,count,,unit,,cockroachdb_dedicated,Number of statements resulting in a planning or runtime error. Shown as count.,0,, +crdb_cloud.sql.full.scan.count,count,,unit,,cockroachdb_dedicated,Number of full table or index scans. Shown as count.,0,, +crdb_cloud.sql.insert.count,count,,unit,,cockroachdb_dedicated,Number of SQL INSERT statements,0,, +crdb_cloud.sql.mem.distsql.current,gauge,,unit,,cockroachdb_dedicated,Current sql statement memory usage for distsql,0,, +crdb_cloud.sql.mem.distsql.max,count,,unit,,cockroachdb_dedicated,Memory usage per sql statement for distsql,0,, +crdb_cloud.sql.mem.internal.session.current,gauge,,unit,,cockroachdb_dedicated,Current sql session memory usage for internal,0,, +crdb_cloud.sql.mem.internal.session.max,count,,unit,,cockroachdb_dedicated,Memory usage per sql session for internal,0,, +crdb_cloud.sql.mem.internal.txn.current,gauge,,unit,,cockroachdb_dedicated,Current sql transaction memory usage for internal,0,, +crdb_cloud.sql.mem.internal.txn.max,count,,unit,,cockroachdb_dedicated,Memory usage per sql transaction for internal,0,, +crdb_cloud.sql.misc.count,count,,query,,cockroachdb_dedicated,Number of other SQL statements,0,, +crdb_cloud.sql.new_conns.count,count,,connection,,cockroachdb_dedicated,Number of SQL connections created,0,, +crdb_cloud.sql.query.count,count,,query,,cockroachdb_dedicated,Number of SQL queries,0,, +crdb_cloud.sql.select.count,count,,query,,cockroachdb_dedicated,Number of SQL SELECT statements,0,, +crdb_cloud.sql.service.latency,count,,nanosecond,,cockroachdb_dedicated,Latency in nanoseconds of SQL request execution. Shown as nanosecond,0,, +crdb_cloud.sql.statements.active,gauge,,unit,,cockroachdb_dedicated,Number of currently active user SQL statements. Shown as count.,0,, +crdb_cloud.sql.txn.abort.count,count,,unit,,cockroachdb_dedicated,Number of SQL transaction ABORT statements,0,, +crdb_cloud.sql.txn.begin.count,count,,unit,,cockroachdb_dedicated,Number of SQL transaction BEGIN statements,0,, +crdb_cloud.sql.txn.commit.count,count,,unit,,cockroachdb_dedicated,Number of SQL transaction COMMIT statements,0,, +crdb_cloud.sql.txn.latency,count,,unit,,cockroachdb_dedicated,Latency of SQL transactions. Shown as nanoseconds.,0,, +crdb_cloud.sql.txn.rollback.count,count,,unit,,cockroachdb_dedicated,Number of SQL transaction ROLLBACK statements,0,, +crdb_cloud.sql.txns.open,gauge,,unit,,cockroachdb_dedicated,Number of currently open SQL transactions. Shown as count.,0,, +crdb_cloud.sql.update.count,count,,unit,,cockroachdb_dedicated,Number of SQL UPDATE statements,0,, +crdb_cloud.sys.uptime,gauge,,second,,cockroachdb_dedicated,Process uptime in seconds. Shown as second,0,, +crdb_cloud.txn.aborts,count,,unit,,cockroachdb_dedicated,Number of aborted KV transactions,0,, +crdb_cloud.txn.commits,count,,commit,,cockroachdb_dedicated,Number of committed KV transactions including 1PC,0,, +crdb_cloud.txn.commits1PC,count,,commit,,cockroachdb_dedicated,Number of committed one-phase KV transactions,0,, +crdb_cloud.txn.durations,count,,nanosecond,,cockroachdb_dedicated,KV transaction durations in nanoseconds,0,, +crdb_cloud.txn.restarts,count,,unit,,cockroachdb_dedicated,Number of restarted KV transactions,0,, +crdb_cloud.txn.restarts.serializable,count,,unit,,cockroachdb_dedicated,Number of restarts due to a forwarded commit timestamp and isolation=SERIALIZABLE,0,, +crdb_cloud.txn.restarts.writetooold,count,,unit,,cockroachdb_dedicated,Number of restarts due to a concurrent writer committing first,0,, +crdb_cloud.tenant.sql_usage.request_units,count,,unit,,cockroachdb_dedicated,Total RU consumption,0,, +crdb_cloud.tenant.sql_usage.kv_request_units,count,,unit,,cockroachdb_dedicated,RU consumption attributable to KV,0,, +crdb_cloud.tenant.sql_usage.read_batches,count,,unit,,cockroachdb_dedicated,Total number of KV read batches,0,, +crdb_cloud.tenant.sql_usage.read_requests,count,,unit,,cockroachdb_dedicated,Total number of KV read requests,0,, +crdb_cloud.tenant.sql_usage.read_bytes,count,,byte,,cockroachdb_dedicated,Total number of bytes read from KV,0,, +crdb_cloud.tenant.sql_usage.write_batches,count,,unit,,cockroachdb_dedicated,Total number of KV write batches,0,, +crdb_cloud.tenant.sql_usage.write_requests,count,,unit,,cockroachdb_dedicated,Total number of KV write requests,0,, +crdb_cloud.tenant.sql_usage.write_bytes,count,,byte,,cockroachdb_dedicated,Total number of bytes written to KV,0,, +crdb_cloud.tenant.sql_usage.sql_pods_cpu_seconds,count,,second,,cockroachdb_dedicated,Total amount of CPU used by SQL pods,0,, +crdb_cloud.tenant.sql_usage.pgwire_egress_bytes,count,,unit,,cockroachdb_dedicated,Total number of bytes transferred from a SQL pod to the client,0,, +crdb_cloud.tenant.sql_usage.external_io_ingress_bytes,count,,byte,,cockroachdb_dedicated,Total number of bytes read from external services such as cloud storage providers,0,, +crdb_cloud.tenant.sql_usage.external_io_egress_bytes,count,,byte,,cockroachdb_dedicated,Total number of bytes written to external services such as cloud storage providers,0,, +crdb_cloud.tenant.sql_usage.cross_region_network_ru,count,,unit,,cockroachdb_dedicated,Total number of RUs charged for cross-region network traffic,0,, +crdb_cloud.storage_bytes,count,,byte,,cockroachdb_dedicated,The amount of data being stored in the cluster. This is the logical number of live bytes and does not account for compression or replication.,0,, \ No newline at end of file