Skip to content

Commit

Permalink
Merge pull request #2139 from harrykimpel/main
Browse files Browse the repository at this point in the history
Adding MariaDB dashboard and alert conditions
  • Loading branch information
aswanson-nr authored Jan 10, 2024
2 parents e3975b8 + f12ba9d commit 503d5c4
Show file tree
Hide file tree
Showing 22 changed files with 4,163 additions and 548 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,6 @@ snapshots/

# yarn
yarn.lock
.yarn-integrity
.yarn-integrity
yarn-error.log
utils/yarn-error.log
27 changes: 27 additions & 0 deletions alert-policies/mariadb/innodb-pending-reads-and-writes.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: InnoDB Pending Reads and Writes

description: |+
This alert is triggered when the aggregate number of pending reads and writes in the MySQL buffer pool is greater than 2 for 5 minutes, which indicates the database engine is backlogged and waiting on resources.
type: STATIC
nrql:
query: "FROM MysqlSample SELECT max(db.innodb.dataPendingReads) + max(db.innodb.dataPendingWrites) FACET displayName"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 2
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
29 changes: 29 additions & 0 deletions alert-policies/mariadb/max-connection-errors-per-second.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Max Connection Errors per Second

description: |+
This alert is triggered when there is at least one error against the max_connections limit in a 5 minute window, which indicates you have requests to your MariaDB instance that are failing to connect.
This setting's default is 151, but can vary based on the underlying resources available to your instance. You can review your current max_connections limit with this query:
SHOW VARIABLES LIKE 'max_connections';
type: STATIC
nrql:
query: "FROM MysqlSample SELECT max(net.connectionErrorsMaxConnectionsPerSecond) FACET displayName"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 1
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: AT_LEAST_ONCE

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
59 changes: 59 additions & 0 deletions alert-policies/mariadb/questions-per-second.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Questions per Second

description: |+
This alert is triggered when the current rate of Questions is greater than 2 standard deviations above the baseline for 60s, which could be an early indicator of a saturation problem for your instance.
It is important to note that this alert is disabled by default and you need to edit the configuration in New Relic One to add a targeted MySQL instance:
"WHERE displayName = 'MySql Instance Name'"
This allows the baseline to be calculated against a single instance instead of all running MySQL instances being monitored.
type: BASELINE
nrql:
# Cannot use FACET in Baseline alerts
query: "FROM MysqlSample SELECT average(query.questionsPerSecond)"

# Direction in which baseline is set (Default: LOWER_ONLY)
baselineDirection: UPPER_ONLY

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 2
# Time in seconds; 120 - 3600, must be a multiple of 60 for Baseline conditions
thresholdDuration: 120
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Adding a Warning threshold is optional
- priority: WARNING
operator: ABOVE
threshold: 1
thresholdDuration: 300
thresholdOccurrences: ALL

# Loss of Signal Settings
expiration:
# Close open violations if signal is lost (Default: false)
closeViolationsOnExpiration: false
# Open "Loss of Signal" violation if signal is lost (Default: false)
openViolationOnExpiration: false
# Time in seconds; Max value: 172800 (48hrs), null if closeViolationsOnExpiration and openViolationOnExpiration are both 'false'
expirationDuration:

# Advanced Signal Settings
signal:
# Max Value for Baseline conditions = 20
evaluationOffset: 3
# Type of value that should be used to fill gaps
fillOption: NONE
# Integer; Used in conjunction with STATIC fillOption, otherwise null
fillValue:

# OPTIONAL: URL of runbook to be sent with notification
runbookUrl:

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
29 changes: 29 additions & 0 deletions alert-policies/mariadb/slow-queries-per-second.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Slow Queries per Second

description: |+
This alert is triggered when the number of slow queries per second is greater than 5 for 5 minutes, which could indicate capacity issues or a query that has been changed and is experiencing performance issues.
The Slow_queries counter increments based on your settings applied to MySQL's long_query_time parameter (default 10s), which you can review with this query:
SHOW VARIABLES LIKE 'long_query_time';
type: STATIC
nrql:
query: "FROM MysqlSample SELECT average(query.slowQueriesPerSecond) FACET displayName"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 5
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
27 changes: 27 additions & 0 deletions alert-policies/redis/blocked-clients.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Blocked clients alert

description: |+
This alert is triggered when at least one blocked client occurs.
type: STATIC
nrql:
query: "SELECT sum(`net.blockedClients`) FROM RedisSample facet entityName"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 0
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
62 changes: 62 additions & 0 deletions alert-policies/redis/current-connections-anomaly.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
name: Anomalies in current connections

# Description and details
description: |
This alert is triggered when the number of current connections deviates from the norm either up or down.
# Type of alert: BASELINE | STATIC
type: BASELINE

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# NRQL query
nrql:
query: "SELECT max(`net.connectedClients`) FROM RedisSample facet entityName"

# Direction in which baseline is set (Default: LOWER_ONLY)
baselineDirection: UPPER_AND_LOWER

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 30
# Time in seconds; 120 - 3600, must be a multiple of 60 for Baseline conditions
thresholdDuration: 3600
# How many data points must be in violation for the duration
thresholdOccurrences: AT_LEAST_ONCE

# Adding a Warning threshold is optional
- priority: WARNING
operator: ABOVE
threshold: 5
thresholdDuration: 300
thresholdOccurrences: AT_LEAST_ONCE

# Loss of Signal Settings
expiration:
# Close open violations if signal is lost (Default: false)
closeViolationsOnExpiration: false
# Open "Loss of Signal" violation if signal is lost (Default: false)
openViolationOnExpiration: false
# Time in seconds; Max value: 172800 (48hrs), null if closeViolationsOnExpiration and openViolationOnExpiration are both 'false'
expirationDuration:

# Advanced Signal Settings
signal:
# Max Value for Baseline conditions = 20
evaluationOffset: 3
# Type of value that should be used to fill gaps
fillOption: NONE
# Integer; Used in conjunction with STATIC fillOption, otherwise null
fillValue:

# OPTIONAL: URL of runbook to be sent with notification
runbookUrl:

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
Loading

0 comments on commit 503d5c4

Please sign in to comment.