This Terraform module will create OpenSearch CloudWatch alarm for use on the Cloud Platform.
module "opensearch_cloudwatch_alarm" {
source = "github.com/ministryofjustice/cloud-platform-terraform-opensearch-cloudwatch-alarm?ref=version" # use the latest release
alarm_name_prefix = local.<os_domain_name>
domain_name = local.<os_domain_name>
sns_topic = module.baselines.slack_sns_topic
min_available_nodes = aws_opensearch_domain.<os_domain_name>.cluster_config[0].instance_count
tags = local.logs_tags
}
Metric name | Statistic | Period (second) | ComparisonOperator | Threshold | EvaluationPeriods |
---|---|---|---|---|---|
ClusterStatus.red | Maximum | 60 | GreaterThanOrEqualToThreshold | 1 | 1 |
ClusterStatus.yellow | Maximum | 60 | GreaterThanOrEqualToThreshold | 1 | 1 |
FreeStorageSpace | Minimum | 60 | LessThanOrEqualToThreshold | 20480 | 1 |
ClusterIndexWritesBlocked | Maximum | 300 | GreaterThanOrEqualToThreshold | 1 | 1 |
Nodes | Minimum | 86400 | LessThanThreshold | 1 | 1 |
AutomatedSnapshotFailure | Maximum | 60 | GreaterThanOrEqualToThreshold | 1 | 1 |
CPUUtilization | Maximum | 900 | GreaterThanOrEqualToThreshold | 80 | 3 |
JVMMemoryPressure | Maximum | 60 | GreaterThanOrEqualToThreshold | 95 | 3 |
MasterCPUUtilization | Maximum | 900 | GreaterThanOrEqualToThreshold | 50 | 3 |
MasterJVMMemoryPressure | Maximum | 60 | GreaterThanOrEqualToThreshold | 95 | 3 |
KMSKeyError | Maximum | 60 | GreaterThanOrEqualToThreshold | 1 | 1 |
KMSKeyInaccessible | Maximum | 60 | GreaterThanOrEqualToThreshold | 1 | 1 |
Shards.active | Maximum | 60 | GreaterThanOrEqualToThreshold | 30000 | 1 |
MasterReachableFromNode | Maximum | 86400 | LessThanThreshold | 1 | 1 |
ThreadpoolWriteQueue | Average | 60 | GreaterThanOrEqualToThreshold | 100 | 3 |
ThreadpoolSearchQueue | Average | 60 | GreaterThanOrEqualToThreshold | 500 | 1 |
ThreadpoolSearchQueue | Maximum | 60 | GreaterThanOrEqualToThreshold | 5000 | 1 |
ThreadpoolWriteRejected | Maximum | 60 | GreaterThanOrEqualToThreshold | 1 | 1 |
ThreadpoolSearchRejected | Maximum | 60 | GreaterThanOrEqualToThreshold | 1 | 1 |
Name | Version |
---|---|
terraform | >= 1.2.5 |
Name | Version |
---|---|
aws | n/a |
No modules.
Name | Description | Type | Default | Required |
---|---|---|---|---|
alarm_automated_snapshot_failure_period | The period of the automated snapshot failure. The statistics should be applied in seconds | number |
60 |
no |
alarm_automated_snapshot_failure_periods | The number of periods to alert that automatic snapshots failed. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_cluster_index_writes_blocked_period | The period of the cluster index writes being blocked. The statistics should be applied in seconds | number |
300 |
no |
alarm_cluster_index_writes_blocked_periods | The number of periods to alert that cluster index writes are blocked. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_cluster_status_is_red_period | The period of the cluster status is in red. The statistics should be applied in seconds | number |
60 |
no |
alarm_cluster_status_is_red_periods | The number of periods to alert that cluster status is red. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_cluster_status_is_yellow_period | The period of the cluster status is in yellow. The statistics should be applied in seconds | number |
60 |
no |
alarm_cluster_status_is_yellow_periods | The number of periods to alert that cluster status is yellow. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_cpu_utilization_too_high_period | The period of the CPU utilization is too high. The statistics should be applied in seconds | number |
900 |
no |
alarm_cpu_utilization_too_high_periods | The number of periods to alert that CPU usage is too high. Default: 3, raise this to be less noisy, as this can occur often for only 1 period | number |
3 |
no |
alarm_free_storage_space_too_low_period | The period of the per-node free storage is too low. The statistics should be applied in seconds | number |
60 |
no |
alarm_free_storage_space_too_low_periods | The number of periods to alert that the per-node free storage space is too low. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_free_storage_space_total_too_low_period | The period of the total cluster free storage is too low. The statistics should be applied in seconds | number |
60 |
no |
alarm_free_storage_space_total_too_low_periods | The number of periods to alert that total cluster free storage space is too low. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_jvm_memory_pressure_too_high_period | The period of the JVM memory pressure is too high. The statistics should be applied in seconds | number |
900 |
no |
alarm_jvm_memory_pressure_too_high_periods | The number of periods which it must be in the alarmed state to alert | number |
3 |
no |
alarm_kms_period | The period of the KMS-related metrics. The statistics should be applied in seconds | number |
60 |
no |
alarm_kms_periods | The number of periods to alert that kms has failed. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_master_cpu_utilization_too_high_period | The period of the CPU utilization of master nodes are too high. The statistics should be applied in seconds | number |
900 |
no |
alarm_master_cpu_utilization_too_high_periods | The number of periods to alert that masters CPU usage is too high. Default: 3, raise this to be less noisy, as this can occur often for only 1 period | number |
3 |
no |
alarm_master_jvm_memory_pressure_too_high_period | The period of the JVM memory pressure of master nodes are too high. The statistics should be applied in seconds | number |
900 |
no |
alarm_master_jvm_memory_pressure_too_high_periods | The number of periods which it must be in the alarmed state to alert | number |
3 |
no |
alarm_min_available_nodes_period | The period of the minimum available nodes. The statistics should be applied in seconds | number |
86400 |
no |
alarm_min_available_nodes_periods | The number of periods to alert that minimum number of available nodes dropped below a threshold. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_name_postfix | Alarm name suffix, used in the naming of alarms created | string |
"" |
no |
alarm_name_prefix | Alarm name prefix, used in the naming of alarms created | string |
"" |
no |
alarm_shard_active_number_too_high_period | The period of the active shard number are too high. The statistics should be applied in seconds | number |
60 |
no |
alarm_shard_active_number_too_high_periods | The number of periods to alert that active shard number is too high. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_threadpool_search_queue_too_high_period | The period of the threadpool search queue is too high. The statistics should be applied in seconds | number |
60 |
no |
alarm_threadpool_search_queue_too_high_periods | The number of periods to alert that threadpool search queue is too high. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_threadpool_search_rejected_period | The period of the threadpool search queue rejected is increasing. The statistics should be applied in seconds | number |
60 |
no |
alarm_threadpool_search_rejected_periods | The number of periods to alert that threadpool write queue rejected is increasing. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_threadpool_write_queue_too_high_period | The period of the threadpool write queue is too high. The statistics should be applied in seconds | number |
60 |
no |
alarm_threadpool_write_queue_too_high_periods | The number of periods to alert that threadpool write queue is too high. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
3 |
no |
alarm_threadpool_write_rejected_period | The period of the threadpool write queue rejected is increasing. The statistics should be applied in seconds | number |
60 |
no |
alarm_threadpool_write_rejected_periods | The number of periods to alert that threadpool write queue rejected is increasing. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
alarm_unreachable_master_node_period | The period of the master node is unreachable. The statistics should be applied in seconds | number |
86400 |
no |
alarm_unreachable_master_node_periods | The number of periods to alert that master node is unreachable. Default: 1, raise this to be less noisy, as this can occur often for only 1 period | number |
1 |
no |
cpu_utilization_threshold | The maximum percentage of CPU utilization | number |
80 |
no |
domain_name | The Elasticsearch domain name you want to monitor | string |
n/a | yes |
free_storage_space_threshold | The minimum amount of available storage space in megabytes. This is per-node. | number |
20480 |
no |
free_storage_space_total_threshold | The minimum amount of available storage space in megabytes aggregated across your cluster (for multi-node). This is an aggregate, typically use (free_storage_space_threshold * min_available_nodes) | number |
20480 |
no |
jvm_memory_pressure_threshold | The maximum percentage of the Java heap used for all data nodes in the cluster | number |
80 |
no |
master_cpu_utilization_threshold | The maximum percentage of CPU utilization of master nodes | number |
80 |
no |
master_jvm_memory_pressure_threshold | The maximum percentage of the Java heap used for master nodes in the cluster | number |
80 |
no |
min_available_nodes | The minimum available (reachable) nodes to have, set to non-zero to enable | number |
0 |
no |
monitor_automated_snapshot_failure | Enable monitoring of automated snapshot failure | bool |
true |
no |
monitor_cluster_index_writes_blocked | Enable monitoring of cluster index writes being blocked | bool |
true |
no |
monitor_cluster_status_is_red | Enable monitoring of cluster status is in red | bool |
true |
no |
monitor_cluster_status_is_yellow | Enable monitoring of cluster status is in yellow | bool |
true |
no |
monitor_cpu_utilization_too_high | Enable monitoring of CPU utilization is too high | bool |
true |
no |
monitor_free_storage_space_too_low | Enable monitoring of cluster per-node free storage is too low | bool |
true |
no |
monitor_free_storage_space_total_too_low | Enable monitoring of cluster total free storage is too low. Disabled by default, if you set this you must set free_storage_space_total_threshold also | bool |
false |
no |
monitor_jvm_memory_pressure_too_high | Enable monitoring of JVM memory pressure is too high | bool |
true |
no |
monitor_kms | Enable monitoring of KMS-related metrics. Only enable this when using KMS with ElasticSearch | bool |
true |
no |
monitor_master_cpu_utilization_too_high | Enable monitoring of CPU utilization of master nodes are too high. Only enable this when dedicated master is enabled | bool |
true |
no |
monitor_master_jvm_memory_pressure_too_high | Enable monitoring of JVM memory pressure of master nodes are too high. Only enable this wwhen dedicated master is enabled | bool |
true |
no |
monitor_min_available_nodes | Enable monitoring of minimum available nodes | bool |
true |
no |
monitor_shard | Enable monitoring of sharding of master nodes are too high. | bool |
true |
no |
monitor_threadpool_search_queue | Enable monitoring of threadpool search queue number is too high | bool |
true |
no |
monitor_threadpool_search_rejected | Enable monitoring of threadpool search queue rejected number is increasing | bool |
true |
no |
monitor_threadpool_write_queue | Enable monitoring of threadpool write queue number is too high. | bool |
true |
no |
monitor_threadpool_write_rejected | Enable monitoring of threadpool write queue rejected number is increasing | bool |
true |
no |
monitor_unreachable_master_node | Enable monitoring of master nodes are running and reachable. Only enable this wwhen dedicated master is enabled | bool |
true |
no |
shard_active_number_threshold | The maximum number of active primary and replica shards number | number |
30000 |
no |
sns_topic | SNS topic you want to specify. If leave empty, it will use a prefix and a timestampe appended | string |
"" |
no |
tags | A map of tags to add to all resources | map(string) |
{} |
no |
threadpool_search_queue_average_threshold | The average number of cluster searching concurrency | number |
500 |
no |
threadpool_search_queue_max_threshold | The maximum number of cluster searching concurrency | number |
5000 |
no |
threadpool_search_rejected_threshold | The number of cluster threadpool search rejected threshold. Value 1 means it is increasing | number |
1 |
no |
threadpool_write_queue_threshold | The maximum number of cluster indexing concurrency | number |
100 |
no |
threadpool_write_rejected_threshold | The number of cluster threadpool write rejected threshold. Value 1 means it is increasing | number |
1 |
no |
No outputs.
Some of the inputs for this module are tags. All infrastructure resources must be tagged to meet the MOJ Technical Guidance on Documenting owners of infrastructure.
You should use your namespace variables to populate these. See the Usage section for more information.