This project is meant as a PoC implementing a Prometheus metrics exporter for Apache Cassandra. Since Cassandra started exposing metrics via virtual tables available via CQL, a Prometheus exporter leveraging that seemed reasonable. Implemented as a sidecar service, cql-metrics-exporter is connecting an Apache Cassandra node via localhost, queries its metrics via regular CQL and exports them in Prometheus' text format on http://localhost:9500/metrics endpoint.
Unback any of the bundles or install the Debian package provided in the release.
When installing the Debian package, a system user for the service will also be created and a SystemD service unit is installed, allowing to launch and control the application instance.
The project used Typesafe config for configuration. The main configuration file is placed at /etc/application.conf containing commented basic configuration.
If the Cassandra node is requiring user authentication, a tool user might be created in Cassandra. When using Cassandra PasswordAuthenticator and CassandraAuthorizer, follow the example below to set up a tool user"
cassandra@cqlsh> CREATE ROLE monitor WITH PASSWORD = 'secret' AND LOGIN = true AND SUPERUSER = false;
cassandra@cqlsh> GRANT SELECT PERMISSION ON KEYSPACE system_virtual_schema TO monitor;
cassandra@cqlsh> GRANT SELECT PERMISSION ON KEYSPACE system_views TO monitor;Then follow the example in /etc/application.conf to set up the application authentication:
datastax-java-driver.advanced.auth-provider {
class = PlainTextAuthProvider
username = monitor
password = secret
}
For more detailed configuration parameters refer to reference.conf.
The service may be started within an unprivileged user context with
bin/cql-metrics-collector.
Or on Debian deployments just start the cql-metrics-collector.service unit with systemctl.
Metrics can be collected from HTTP /metrics endpoint available by default on port 9500.
Metrics collected by a TSDB, e.g. VictoriaMetrics can be visualized with e.g. Grafana. While you are free to create metrics based visualizations, a few pre-defined dashboards are available in the dashboards folder. These are part of the release, packaged in dashboards.tar.gz and can be imported to any Grafana instance.
All exported metrics get a few labels. These sets of labels are merged from a common set of label and a set of individual lables.
The common set of labels contains:
cluster- cluster name as configured for Cassandradc- datacenter of the current node as repoterd by Cassandra's snitchrack- rack of the current node as reported by Cassandra's snitchnode- resolved host name, IP address and port the Cassandra node is listening
Currently, only a few metrics are supported. The following virtual tables are accessed and exported as listed:
disk_usage,max_partition_size,max_sstable_sizecassandra_<basename>(gauge) - labeled withkeyspaceandtable
thread_poolscassandra_thread_pools(gauge) - labeled withnameof the threadpool andmetricreferring to one of active_tasks, active_tasks_limit, blocked_tasks, blocked_tasks_all_time or pending_taskscassandra_completed_tasks_counter- labeled withnameas above andmetriccompleted_tasks
cachescassandra_system_caches(gauge) - labeled withnameof the system cache andmetricreferring to one of capacity_bytes, hit_ratio, recent_hit_rate_per_second, recent_request_rate_per_second or size_bytescassandra_system_cache_counter- labeled withnameas above andmetricreferring to one of entry_count, hit_count or request_count
coordinator_read_latency,coordinator_scan_latency,coordinator_write_latency,local_read_latency,local_scan_latency,local_write_latency- all above latency metrics tables are exoprted using four metric names:
cassandra_<basename>_count- exporting the count field of the base tablecassandra_<basename>_max- exporting the max latency in millisecondscassandra_<basename>_buckets- exporting p50th and p99th buckets in millisecondscassandra_<basename>_rate- exporting the request rate per seconds
- all metrics are labeled with
keyspaceandtablereferring to the subject of the metrics - the buckets are additionally labeled with
quantile
- all above latency metrics tables are exoprted using four metric names:
rows_per_read,tombstones_per_readcassandra_<basename>- labeled withkeyspace,tableandmetricreferring to one of max, p50th and p99thcassandra_<basename>_count- labeled withkeyspace,tableandmetricreferring to reads
batch_metricscassandra_batch_metrics- labeled withstatementandmetric="max"cassandra_batch_metrics_summary- labeled withstatementandquantile
cql_metricscassandra_cql_metrics- labeled withmetricfor the actual metric name