-
Notifications
You must be signed in to change notification settings - Fork 301
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2139 from harrykimpel/main
Adding MariaDB dashboard and alert conditions
- Loading branch information
Showing
22 changed files
with
4,163 additions
and
548 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,4 +15,6 @@ snapshots/ | |
|
||
# yarn | ||
yarn.lock | ||
.yarn-integrity | ||
.yarn-integrity | ||
yarn-error.log | ||
utils/yarn-error.log |
27 changes: 27 additions & 0 deletions
27
alert-policies/mariadb/innodb-pending-reads-and-writes.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: InnoDB Pending Reads and Writes | ||
|
||
description: |+ | ||
This alert is triggered when the aggregate number of pending reads and writes in the MySQL buffer pool is greater than 2 for 5 minutes, which indicates the database engine is backlogged and waiting on resources. | ||
type: STATIC | ||
nrql: | ||
query: "FROM MysqlSample SELECT max(db.innodb.dataPendingReads) + max(db.innodb.dataPendingWrites) FACET displayName" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 2 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 300 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
29 changes: 29 additions & 0 deletions
29
alert-policies/mariadb/max-connection-errors-per-second.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
name: Max Connection Errors per Second | ||
|
||
description: |+ | ||
This alert is triggered when there is at least one error against the max_connections limit in a 5 minute window, which indicates you have requests to your MariaDB instance that are failing to connect. | ||
This setting's default is 151, but can vary based on the underlying resources available to your instance. You can review your current max_connections limit with this query: | ||
SHOW VARIABLES LIKE 'max_connections'; | ||
type: STATIC | ||
nrql: | ||
query: "FROM MysqlSample SELECT max(net.connectionErrorsMaxConnectionsPerSecond) FACET displayName" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 1 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 300 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: AT_LEAST_ONCE | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
name: Questions per Second | ||
|
||
description: |+ | ||
This alert is triggered when the current rate of Questions is greater than 2 standard deviations above the baseline for 60s, which could be an early indicator of a saturation problem for your instance. | ||
It is important to note that this alert is disabled by default and you need to edit the configuration in New Relic One to add a targeted MySQL instance: | ||
"WHERE displayName = 'MySql Instance Name'" | ||
This allows the baseline to be calculated against a single instance instead of all running MySQL instances being monitored. | ||
type: BASELINE | ||
nrql: | ||
# Cannot use FACET in Baseline alerts | ||
query: "FROM MysqlSample SELECT average(query.questionsPerSecond)" | ||
|
||
# Direction in which baseline is set (Default: LOWER_ONLY) | ||
baselineDirection: UPPER_ONLY | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 2 | ||
# Time in seconds; 120 - 3600, must be a multiple of 60 for Baseline conditions | ||
thresholdDuration: 120 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Adding a Warning threshold is optional | ||
- priority: WARNING | ||
operator: ABOVE | ||
threshold: 1 | ||
thresholdDuration: 300 | ||
thresholdOccurrences: ALL | ||
|
||
# Loss of Signal Settings | ||
expiration: | ||
# Close open violations if signal is lost (Default: false) | ||
closeViolationsOnExpiration: false | ||
# Open "Loss of Signal" violation if signal is lost (Default: false) | ||
openViolationOnExpiration: false | ||
# Time in seconds; Max value: 172800 (48hrs), null if closeViolationsOnExpiration and openViolationOnExpiration are both 'false' | ||
expirationDuration: | ||
|
||
# Advanced Signal Settings | ||
signal: | ||
# Max Value for Baseline conditions = 20 | ||
evaluationOffset: 3 | ||
# Type of value that should be used to fill gaps | ||
fillOption: NONE | ||
# Integer; Used in conjunction with STATIC fillOption, otherwise null | ||
fillValue: | ||
|
||
# OPTIONAL: URL of runbook to be sent with notification | ||
runbookUrl: | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
name: Slow Queries per Second | ||
|
||
description: |+ | ||
This alert is triggered when the number of slow queries per second is greater than 5 for 5 minutes, which could indicate capacity issues or a query that has been changed and is experiencing performance issues. | ||
The Slow_queries counter increments based on your settings applied to MySQL's long_query_time parameter (default 10s), which you can review with this query: | ||
SHOW VARIABLES LIKE 'long_query_time'; | ||
type: STATIC | ||
nrql: | ||
query: "FROM MysqlSample SELECT average(query.slowQueriesPerSecond) FACET displayName" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 5 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 300 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: Blocked clients alert | ||
|
||
description: |+ | ||
This alert is triggered when at least one blocked client occurs. | ||
type: STATIC | ||
nrql: | ||
query: "SELECT sum(`net.blockedClients`) FROM RedisSample facet entityName" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 0 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 300 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
name: Anomalies in current connections | ||
|
||
# Description and details | ||
description: | | ||
This alert is triggered when the number of current connections deviates from the norm either up or down. | ||
# Type of alert: BASELINE | STATIC | ||
type: BASELINE | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# NRQL query | ||
nrql: | ||
query: "SELECT max(`net.connectedClients`) FROM RedisSample facet entityName" | ||
|
||
# Direction in which baseline is set (Default: LOWER_ONLY) | ||
baselineDirection: UPPER_AND_LOWER | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 30 | ||
# Time in seconds; 120 - 3600, must be a multiple of 60 for Baseline conditions | ||
thresholdDuration: 3600 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: AT_LEAST_ONCE | ||
|
||
# Adding a Warning threshold is optional | ||
- priority: WARNING | ||
operator: ABOVE | ||
threshold: 5 | ||
thresholdDuration: 300 | ||
thresholdOccurrences: AT_LEAST_ONCE | ||
|
||
# Loss of Signal Settings | ||
expiration: | ||
# Close open violations if signal is lost (Default: false) | ||
closeViolationsOnExpiration: false | ||
# Open "Loss of Signal" violation if signal is lost (Default: false) | ||
openViolationOnExpiration: false | ||
# Time in seconds; Max value: 172800 (48hrs), null if closeViolationsOnExpiration and openViolationOnExpiration are both 'false' | ||
expirationDuration: | ||
|
||
# Advanced Signal Settings | ||
signal: | ||
# Max Value for Baseline conditions = 20 | ||
evaluationOffset: 3 | ||
# Type of value that should be used to fill gaps | ||
fillOption: NONE | ||
# Integer; Used in conjunction with STATIC fillOption, otherwise null | ||
fillValue: | ||
|
||
# OPTIONAL: URL of runbook to be sent with notification | ||
runbookUrl: | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
Oops, something went wrong.