Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions docs/admin/monitoring/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@
# Monitoring and diagnostics

It is important to continuously monitor your CrateDB database cluster
to detect anomalies and follow usage trends, so you can react to
them properly and timely.
to detect anomalies, so you can react to them promptly.
Collecting statistics and following usage trends is also important
for proper capacity planning.

CrateDB provides system information about the cluster as a whole,
individual cluster nodes, and about the entities and resources it manages.
Expand Down Expand Up @@ -72,12 +73,12 @@ and for ad hoc use. Below are a few popular and recommended options.

:Prometheus:

The [Crate JMX HTTP Exporter] is a Prometheus exporter that consumes
The {ref}`Crate JMX HTTP Exporter <prometheus-jmx-exporter>` is a Prometheus exporter that consumes
metrics information from CrateDB's JMX collectors and exposes them
via HTTP so they can be scraped by Prometheus, and, for example,
subsequently displayed in Grafana, or processed into Alertmanager.

[Monitoring a CrateDB cluster with Prometheus and Grafana] illustrates
{ref}`monitoring-prometheus-grafana` illustrates
a full setup for making CrateDB-specific metrics available to Prometheus.
The tutorial uses the _Crate JMX HTTP Exporter_ for exposing telemetry
information, the _Prometheus SQL Exporter_ for conducting system table
Expand All @@ -104,5 +105,9 @@ and for ad hoc use. Below are a few popular and recommended options.
real-time information about the cluster, its nodes, and their shards.


[Crate JMX HTTP Exporter]: https://github.com/crate/jmx_exporter
[Monitoring a CrateDB cluster with Prometheus and Grafana]: https://community.cratedb.com/t/monitoring-a-self-managed-cratedb-cluster-with-prometheus-and-grafana/1236
:::{toctree}
:hidden:
Prometheus and Grafana <prometheus-grafana>
prometheus-jmx-exporter
prometheus-sql-exporter
:::
211 changes: 211 additions & 0 deletions docs/admin/monitoring/prometheus-grafana.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
(monitoring-prometheus-grafana)=
# Monitoring a CrateDB cluster with Prometheus and Grafana

:::{div} sd-text-muted
:::

:::{rubric} Introduction
:::

We recommend [^standalone] pairing two standard observability tools:
Use [Prometheus] to collect and store metrics,
and [Grafana] to build dashboards.

This guide describes how to set up a Grafana dashboard that allows you
to check live and historical data around performance and capacity
metrics in your CrateDB cluster. It uses instructions suitable for
Debian or Ubuntu Linux, but can be adapted for other Linux distributions.
Comment on lines +1 to +17
Copy link
Member Author

@amotl amotl Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hlcianfagna. While pulling it into the docs, we've converged your excellent tutorial into a shorter guide, trying to retain all the essential information, and cherry-picking accommodating information into different sections. In this spirit, thanks a stack for contributing your original article. 💯

@coderabbitai also shared a good summary. We couldn't summarize it better.

  • Excellent comprehensive guide.
  • Good step-by-step data source and dashboard setup.
  • Useful alternative implementations section.
  • Good appendix entries.

The introduction, exporter overview, CrateDB setup reference, Prometheus configuration,
Grafana setup, and appendix sections are well-structured and easy to follow.
Cross-references to multi-node-setup-example and related exporter documentation work
well. Instructions use clear imperative voice throughout most of the guide.

Clear instructions for navigating Grafana UI, configuring Prometheus as a data source,
and importing the dashboard. The reference to dashboard ID 17174 is helpful.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl Thank you for the kind words! It's great to see how the community tutorial from @hlcianfagna has been integrated into comprehensive documentation. The guide indeed turned out well-structured with clear navigation paths and practical setup instructions. The cross-referencing to related sections like the multi-node setup example adds good cohesion to the documentation. Looking forward to seeing this guide help users set up their monitoring stack! 🎉


🧠 Learnings used
Learnt from: amotl
Repo: crate/cratedb-guide PR: 263
File: docs/integrate/kafka/docker-python.md:112-120
Timestamp: 2025-08-22T18:11:12.776Z
Learning: In documentation and tutorial repositories like cratedb-guide, code examples should prioritize simplicity, clarity, and educational value over production-ready features. Comprehensive error handling, extensive validation, and other production concerns can distract from the main learning objectives and make examples harder to follow. Review suggestions should focus on correctness and clarity rather than production hardening.

Learnt from: amotl
Repo: crate/cratedb-guide PR: 0
File: :0-0
Timestamp: 2025-10-06T16:40:16.322Z
Learning: In the crate/cratedb-guide repository, documentation is organized by content type: files in `docs/feature/`, `docs/integrate/`, and `docs/ingest/` are reference material designed for quick access, while `docs/start/` contains learning path content. Reference material sections use top-right reference layouts for efficient navigation.

Learnt from: amotl
Repo: crate/cratedb-guide PR: 420
File: docs/connect/erlang.md:100-113
Timestamp: 2025-10-19T19:21:49.864Z
Learning: In the cratedb-guide repository, when documenting cloud or alternative connection methods, show only the differences from the canonical example (e.g., SSL configuration) rather than repeating the entire workflow (query execution, cleanup, etc.). This pattern is used consistently across pages to avoid repetition.

Learnt from: amotl
Repo: crate/cratedb-guide PR: 248
File: docs/integrate/amqp/tutorial.md:36-41
Timestamp: 2025-08-14T12:42:12.744Z
Learning: In the cratedb-guide repository, code examples and tutorials prioritize simplicity and ease of following over security hardening. Security recommendations should be more lenient in tutorial/demo contexts to avoid distracting from the main learning objectives.

Learnt from: bmunkholm
Repo: crate/cratedb-guide PR: 340
File: docs/home/index.md:84-97
Timestamp: 2025-09-25T19:31:54.320Z
Learning: In the CrateDB Guide docs (MyST), the CrateDB Cloud card on the homepage should link to `getting-started` using `:link-type: ref` instead of the previous `cloud:index` intersphinx target. This change was implemented in PR #340 to direct users to the getting started section rather than directly to the Cloud documentation.


[^standalone]: {ref}`Containerized <install-container>` and [CrateDB Cloud] setups differ.
This tutorial targets standalone and on‑premises installations.

:::{rubric} Overview
:::

For a CrateDB environment, you are interested in CrateDB-specific metrics,
such as the number of shards or number of failed queries, and OS metrics,
such as available disk space, memory usage, or CPU usage.
Based on Prometheus, the monitoring stack uses the following exporters
to fulfill those requirements.

:Node Exporter:

Exposes a wide variety of hardware and kernel related metrics.

:JMX Exporter:

Consumes metrics information from CrateDB's
JMX collectors and exposes them via HTTP so they can be scraped by Prometheus.

:SQL Exporter:

Allows running arbitrary SQL
statements against a CrateDB cluster to retrieve additional
information from CrateDB's system tables.

## Set up CrateDB cluster

First things first, you will need a CrateDB cluster.
{ref}`Multi-node setup instructions <multi-node-setup-example>` provides
a quick walkthrough for Ubuntu Linux.

## Set up Prometheus Exporters

The Node Exporter and the JMX Exporter need to be installed on all
machines that are running CrateDB nodes.

1. Install the Prometheus Node Exporter.
```shell
apt install prometheus-node-exporter
```

2. Install the {ref}`prometheus-jmx-exporter`.

## Set up Prometheus

You would typically run this on a machine that is not part of the
CrateDB cluster.
The {ref}`prometheus-sql-exporter` also does not need to be installed
on each machine.

```shell
apt install prometheus prometheus-sql-exporter --no-install-recommends
```

For advanced configuration options, see {ref}`prometheus-auth` and
{ref}`prometheus-storage`.

Now, configure Prometheus to scrape metrics from Node Exporters and
JMX Exporters on all CrateDB nodes, and also metrics from the SQL
Exporter.
```shell
nano /etc/prometheus/prometheus.yml
```

:Node Exporter: Port 9100
:JMX Exporter: Port 8080
:SQL Exporter: Port 9237

```yaml
- job_name: 'node'
static_configs:
- targets: ['ubuntuvm1:9100', 'ubuntuvm2:9100']

- job_name: 'cratedb_jmx'
static_configs:
- targets: ['ubuntuvm1:8080', 'ubuntuvm2:8080']

- job_name: 'sql_exporter'
static_configs:
- targets: ['localhost:9237']
```

Restart the Prometheus daemon if it was already started.
```shell
systemctl restart prometheus
```

## Set up Grafana

Install Grafana on the same machine where you installed Prometheus.
On a Debian or Ubuntu machine, run the following:
```shell
apt install --yes wget gpg
wget -q -O - https://packages.grafana.com/gpg.key | gpg --dearmor | tee /usr/share/keyrings/grafana.gpg >/dev/null
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://packages.grafana.com/oss/deb stable main" | tee /etc/apt/sources.list.d/grafana.list
apt update
apt install --yes grafana
```
Then, start Grafana.
```shell
systemctl start grafana-server
```
For other systems, please refer to the [Grafana installation documentation][grafana-debian].

:::{rubric} Data source
:::

Navigate to `http://<grafana-host>:3000/` to access the Grafana login screen.
The default credentials are `admin`/`admin`; change the password immediately.
Navigate to "Add your first data source", then select "Prometheus" and set the
URL to `http://<prometheus-host>:9090/`.
If you configured basic authentication for Prometheus, this is where you
would need to enter the credentials.
Confirm using "Save & test".

:::{rubric} Dashboard
:::

An example dashboard based on the discussed setup is available for easy importing
from [Grafana » CrateDB Monitoring Dashboard].
In your Grafana installation, on the left-hand side, hover over the “Dashboards”
icon and select “Import”. Specify the dashboard ID **17174** and load the dashboard.
On the next screen, finalize the setup by selecting the previously created
Prometheus data source.

![CrateDB monitoring dashboard in Grafana|690x396](https://us1.discourse-cdn.com/flex020/uploads/crate/original/1X/0e01a3f0b8fc61ae97250fdeb2fe741f34ac7422.png){width=690px}

## Alternative implementations

If you decide to build your own dashboard or use an entirely different monitoring
approach, we recommend still covering similar metrics as discussed in this article.
The list below is a good starting point for troubleshooting most operational issues.

* CrateDB metrics (with example Prometheus queries based on the Crate JMX HTTP Exporter)
* Thread pools rejected: `sum(rate(crate_threadpools{property="rejected"}[5m])) by (name)`
* Thread pool queue size: `sum(crate_threadpools{property="queueSize"}) by (name)`
* Thread pools active: `sum(crate_threadpools{property="active"}) by (name)`
* Queries per second: `sum(rate(crate_query_total_count[5m])) by (query)`
* Query error rate: `sum(rate(crate_query_failed_count[5m])) by (query)`
* Average Query Duration over the last 5 minutes: `sum(rate(crate_query_sum_of_durations_millis[5m])) by (query) / sum(rate(crate_query_total_count[5m])) by (query)`
* Circuit breaker memory in use: `sum(crate_circuitbreakers{property="used"}) by (name)`
* Number of shards: `crate_node{name="shard_stats",property="total"}`
* Garbage Collector rates: `sum(rate(jvm_gc_collection_seconds_count[5m])) by (gc)`
* Thread pool rejected operations: `crate_threadpools{property="rejected"}`
* Operating system metrics
* CPU utilization
* Memory usage
* Open file descriptors
* Disk usage
* Disk read/write operations and throughput
* Received and transmitted network traffic

## Appendix

(prometheus-auth)=
:::{rubric} Prometheus authentication
:::

By default, Prometheus binds to port 9090 without authentication. Prevent
auto-start during install (e.g., with `policy-rcd-declarative`), then
configure web auth using a YAML file.

Create `/etc/prometheus/web.yml`:
```yaml
basic_auth_users:
admin: <bcrypt hash>
```

Point Prometheus at it (e.g., `/etc/default/prometheus`):

```shell
ARGS="--web.config.file=/etc/prometheus/web.yml --web.enable-lifecycle"
```

Restart Prometheus after setting ownership and 0640 permissions on `web.yml`.

(prometheus-storage)=
:::{rubric} CrateDB as Prometheus storage
:::

For a large deployment where you also use Prometheus to monitor other systems,
you may also want to use a CrateDB cluster as the storage for all Prometheus
metrics. The {ref}`CrateDB Prometheus Adapter <prometheus>` achieves that.


[CrateDB Cloud]: https://cratedb.com/products/cratedb-cloud
[Grafana]: https://grafana.com/
[grafana-debian]: https://grafana.com/docs/grafana/latest/setup-grafana/installation/debian/
[Grafana » CrateDB Monitoring Dashboard]: https://grafana.com/grafana/dashboards/17174-cratedb-monitoring/
[Prometheus]: https://prometheus.io/
[Prometheus Node Exporter]: https://prometheus.io/docs/guides/node-exporter/
35 changes: 35 additions & 0 deletions docs/admin/monitoring/prometheus-jmx-exporter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
(prometheus-jmx-exporter)=

# Prometheus JMX Exporter

The [Crate JMX HTTP Exporter] is a Prometheus exporter that consumes metrics
information from CrateDB's JMX collectors and exposes them via HTTP so they can
be scraped by Prometheus, and, for example, subsequently displayed in Grafana,
or processed into Alertmanager.

:::{rubric} Setup
:::

This is very simple, on each node run the following:

```shell
cd /usr/share/crate/lib
wget https://repo1.maven.org/maven2/io/crate/crate-jmx-exporter/1.2.0/crate-jmx-exporter-1.2.0.jar
nano /etc/default/crate
```

then uncomment the `CRATE_JAVA_OPTS` line and change its value to:

```shell
# Append to existing options (preserve other flags).
CRATE_JAVA_OPTS="${CRATE_JAVA_OPTS:-} -javaagent:/usr/share/crate/lib/crate-jmx-exporter-1.2.0.jar=8080"
```

and restart the crate daemon:

```bash
systemctl restart crate
```


[Crate JMX HTTP Exporter]: https://github.com/crate/jmx_exporter
107 changes: 107 additions & 0 deletions docs/admin/monitoring/prometheus-sql-exporter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
(prometheus-sql-exporter)=

# Prometheus SQL Exporter

The SQL Exporter allows running arbitrary SQL statements against a CrateDB
cluster to retrieve additional information. As the cluster contains information
from each node, we do not need to install the SQL Exporter on every node.
Instead, we install it centrally on the same machine that also hosts Prometheus.

Please note that it is not the same to set up a data source in Grafana pointing
to CrateDB to display the output from queries in real-time as to use Prometheus
to collect these values over time.

Installing the package is straight-forward:

```shell
apt install prometheus-sql-exporter
```

For the SQL exporter to connect to the cluster, we need to create a new user
`sql_exporter`. We grant the user reading access to the `sys` schema. Run the
below commands on any CrateDB node:

```shell
curl -H 'Content-Type: application/json' -X POST 'http://localhost:4200/_sql' -d '{"stmt":"CREATE USER sql_exporter WITH (password = '\''insert_password'\'');"}'
curl -H 'Content-Type: application/json' -X POST 'http://localhost:4200/_sql' -d '{"stmt":"GRANT DQL ON SCHEMA sys TO sql_exporter;"}'
```

We then create a configuration file in `/etc/prometheus-sql-exporter.yml` with a
sample query that retrieves the number of shards per node:

```yaml
jobs:
- name: "global"
interval: '5m'
connections: ['postgres://sql_exporter:insert_password@ubuntuvm1:5433?sslmode=disable']
queries:
- name: "shard_distribution"
help: "Number of shards per node"
labels: ["node_name"]
values: ["shards"]
query: |
SELECT node['name'] AS node_name, COUNT(*) AS shards
FROM sys.shards
GROUP BY 1;
allow_zero_rows: true

- name: "heap_usage"
help: "Used heap space per node"
labels: ["node_name"]
values: ["heap_used"]
query: |
SELECT name AS node_name, heap['used'] / heap['max']::DOUBLE AS heap_used
FROM sys.nodes;

- name: "global_translog"
help: "Global translog statistics"
values: ["translog_uncommitted_size"]
query: |
SELECT COALESCE(SUM(translog_stats['uncommitted_size']), 0) AS translog_uncommitted_size
FROM sys.shards;

- name: "checkpoints"
help: "Maximum global/local checkpoint delta"
values: ["max_checkpoint_delta"]
query: |
SELECT COALESCE(MAX(seq_no_stats['local_checkpoint'] - seq_no_stats['global_checkpoint']), 0) AS max_checkpoint_delta
FROM sys.shards;

- name: "shard_allocation_issues"
help: "Shard allocation issues"
labels: ["shard_type"]
values: ["shards"]
query: |
SELECT IF(s.primary = TRUE, 'primary', 'replica') AS shard_type, COALESCE(shards, 0) AS shards
FROM UNNEST([true, false]) s(primary)
LEFT JOIN (
SELECT primary, COUNT(*) AS shards
FROM sys.allocations
WHERE current_state <> 'STARTED'
GROUP BY 1
) a ON s.primary = a.primary;
```

*Please note: There exist two implementations of the SQL Exporter:
[burningalchemist/sql_exporter](https://github.com/burningalchemist/sql_exporter)
and [justwatchcom/sql_exporter](https://github.com/justwatchcom/sql_exporter).
They don't share the same configuration options. Our example is based on the
implementation that is shipped with the Ubuntu package, which is
`justwatchcom/sql_exporter.*`.

To apply the new configuration, we restart the service:

```shell
systemctl restart prometheus-sql-exporter
```

The SQL Exporter can also be used to monitor any business metrics as well, but
be careful with regularly running expensive queries. Below are two more advanced
monitoring queries of CrateDB that may be useful:

```sql
/* Time since the last successful snapshot (backup) */
SELECT (NOW() - MAX(started)) / 60000 AS MinutesSinceLastSuccessfulSnapshot
FROM sys.snapshots
WHERE "state" = 'SUCCESS';
```
Loading
Loading