Skip to content

ministryofjustice/cloud-platform-terraform-opensearch-cloudwatch-alarm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cloud-platform-terraform-opensearch-cloudwatch-alarm

Releases

This Terraform module will create OpenSearch CloudWatch alarm for use on the Cloud Platform.

Usage

module "opensearch_cloudwatch_alarm" {
  source              = "github.com/ministryofjustice/cloud-platform-terraform-opensearch-cloudwatch-alarm?ref=version" # use the latest release

  alarm_name_prefix   = local.<os_domain_name>
  domain_name         = local.<os_domain_name>
  sns_topic           = module.baselines.slack_sns_topic
  min_available_nodes = aws_opensearch_domain.<os_domain_name>.cluster_config[0].instance_count
  tags                = local.logs_tags
}
Metric name Statistic Period (second) ComparisonOperator Threshold EvaluationPeriods
ClusterStatus.red Maximum 60 GreaterThanOrEqualToThreshold 1 1
ClusterStatus.yellow Maximum 60 GreaterThanOrEqualToThreshold 1 1
FreeStorageSpace Minimum 60 LessThanOrEqualToThreshold 20480 1
ClusterIndexWritesBlocked Maximum 300 GreaterThanOrEqualToThreshold 1 1
Nodes Minimum 86400 LessThanThreshold 1 1
AutomatedSnapshotFailure Maximum 60 GreaterThanOrEqualToThreshold 1 1
CPUUtilization Maximum 900 GreaterThanOrEqualToThreshold 80 3
JVMMemoryPressure Maximum 60 GreaterThanOrEqualToThreshold 95 3
MasterCPUUtilization Maximum 900 GreaterThanOrEqualToThreshold 50 3
MasterJVMMemoryPressure Maximum 60 GreaterThanOrEqualToThreshold 95 3
KMSKeyError Maximum 60 GreaterThanOrEqualToThreshold 1 1
KMSKeyInaccessible Maximum 60 GreaterThanOrEqualToThreshold 1 1
Shards.active Maximum 60 GreaterThanOrEqualToThreshold 30000 1
MasterReachableFromNode Maximum 86400 LessThanThreshold 1 1
ThreadpoolWriteQueue Average 60 GreaterThanOrEqualToThreshold 100 3
ThreadpoolSearchQueue Average 60 GreaterThanOrEqualToThreshold 500 1
ThreadpoolSearchQueue Maximum 60 GreaterThanOrEqualToThreshold 5000 1
ThreadpoolWriteRejected Maximum 60 GreaterThanOrEqualToThreshold 1 1
ThreadpoolSearchRejected Maximum 60 GreaterThanOrEqualToThreshold 1 1

Requirements

Name Version
terraform >= 1.2.5

Providers

Name Version
aws n/a

Modules

No modules.

Resources

Name Type
aws_cloudwatch_metric_alarm.automated_snapshot_failure resource
aws_cloudwatch_metric_alarm.cluster_index_writes_blocked resource
aws_cloudwatch_metric_alarm.cluster_status_is_red resource
aws_cloudwatch_metric_alarm.cluster_status_is_yellow resource
aws_cloudwatch_metric_alarm.cpu_utilization_too_high resource
aws_cloudwatch_metric_alarm.free_storage_space_too_low resource
aws_cloudwatch_metric_alarm.free_storage_space_total_too_low resource
aws_cloudwatch_metric_alarm.insufficient_available_nodes resource
aws_cloudwatch_metric_alarm.jvm_memory_pressure_too_high resource
aws_cloudwatch_metric_alarm.kms_key_error resource
aws_cloudwatch_metric_alarm.kms_key_inaccessible resource
aws_cloudwatch_metric_alarm.master_cpu_utilization_too_high resource
aws_cloudwatch_metric_alarm.master_jvm_memory_pressure_too_high resource
aws_cloudwatch_metric_alarm.shards_active_too_high resource
aws_cloudwatch_metric_alarm.threadpool_search_queue_average resource
aws_cloudwatch_metric_alarm.threadpool_search_queue_max resource
aws_cloudwatch_metric_alarm.threadpool_search_rejected resource
aws_cloudwatch_metric_alarm.threadpool_write_queue_too_high resource
aws_cloudwatch_metric_alarm.threadpool_write_rejected resource
aws_cloudwatch_metric_alarm.unreachable_master_node resource
aws_caller_identity.default data source

Inputs

Name Description Type Default Required
alarm_automated_snapshot_failure_period The period of the automated snapshot failure. The statistics should be applied in seconds number 60 no
alarm_automated_snapshot_failure_periods The number of periods to alert that automatic snapshots failed. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_cluster_index_writes_blocked_period The period of the cluster index writes being blocked. The statistics should be applied in seconds number 300 no
alarm_cluster_index_writes_blocked_periods The number of periods to alert that cluster index writes are blocked. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_cluster_status_is_red_period The period of the cluster status is in red. The statistics should be applied in seconds number 60 no
alarm_cluster_status_is_red_periods The number of periods to alert that cluster status is red. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_cluster_status_is_yellow_period The period of the cluster status is in yellow. The statistics should be applied in seconds number 60 no
alarm_cluster_status_is_yellow_periods The number of periods to alert that cluster status is yellow. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_cpu_utilization_too_high_period The period of the CPU utilization is too high. The statistics should be applied in seconds number 900 no
alarm_cpu_utilization_too_high_periods The number of periods to alert that CPU usage is too high. Default: 3, raise this to be less noisy, as this can occur often for only 1 period number 3 no
alarm_free_storage_space_too_low_period The period of the per-node free storage is too low. The statistics should be applied in seconds number 60 no
alarm_free_storage_space_too_low_periods The number of periods to alert that the per-node free storage space is too low. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_free_storage_space_total_too_low_period The period of the total cluster free storage is too low. The statistics should be applied in seconds number 60 no
alarm_free_storage_space_total_too_low_periods The number of periods to alert that total cluster free storage space is too low. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_jvm_memory_pressure_too_high_period The period of the JVM memory pressure is too high. The statistics should be applied in seconds number 900 no
alarm_jvm_memory_pressure_too_high_periods The number of periods which it must be in the alarmed state to alert number 3 no
alarm_kms_period The period of the KMS-related metrics. The statistics should be applied in seconds number 60 no
alarm_kms_periods The number of periods to alert that kms has failed. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_master_cpu_utilization_too_high_period The period of the CPU utilization of master nodes are too high. The statistics should be applied in seconds number 900 no
alarm_master_cpu_utilization_too_high_periods The number of periods to alert that masters CPU usage is too high. Default: 3, raise this to be less noisy, as this can occur often for only 1 period number 3 no
alarm_master_jvm_memory_pressure_too_high_period The period of the JVM memory pressure of master nodes are too high. The statistics should be applied in seconds number 900 no
alarm_master_jvm_memory_pressure_too_high_periods The number of periods which it must be in the alarmed state to alert number 3 no
alarm_min_available_nodes_period The period of the minimum available nodes. The statistics should be applied in seconds number 86400 no
alarm_min_available_nodes_periods The number of periods to alert that minimum number of available nodes dropped below a threshold. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_name_postfix Alarm name suffix, used in the naming of alarms created string "" no
alarm_name_prefix Alarm name prefix, used in the naming of alarms created string "" no
alarm_shard_active_number_too_high_period The period of the active shard number are too high. The statistics should be applied in seconds number 60 no
alarm_shard_active_number_too_high_periods The number of periods to alert that active shard number is too high. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_threadpool_search_queue_too_high_period The period of the threadpool search queue is too high. The statistics should be applied in seconds number 60 no
alarm_threadpool_search_queue_too_high_periods The number of periods to alert that threadpool search queue is too high. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_threadpool_search_rejected_period The period of the threadpool search queue rejected is increasing. The statistics should be applied in seconds number 60 no
alarm_threadpool_search_rejected_periods The number of periods to alert that threadpool write queue rejected is increasing. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_threadpool_write_queue_too_high_period The period of the threadpool write queue is too high. The statistics should be applied in seconds number 60 no
alarm_threadpool_write_queue_too_high_periods The number of periods to alert that threadpool write queue is too high. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 3 no
alarm_threadpool_write_rejected_period The period of the threadpool write queue rejected is increasing. The statistics should be applied in seconds number 60 no
alarm_threadpool_write_rejected_periods The number of periods to alert that threadpool write queue rejected is increasing. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
alarm_unreachable_master_node_period The period of the master node is unreachable. The statistics should be applied in seconds number 86400 no
alarm_unreachable_master_node_periods The number of periods to alert that master node is unreachable. Default: 1, raise this to be less noisy, as this can occur often for only 1 period number 1 no
cpu_utilization_threshold The maximum percentage of CPU utilization number 80 no
domain_name The Elasticsearch domain name you want to monitor string n/a yes
free_storage_space_threshold The minimum amount of available storage space in megabytes. This is per-node. number 20480 no
free_storage_space_total_threshold The minimum amount of available storage space in megabytes aggregated across your cluster (for multi-node). This is an aggregate, typically use (free_storage_space_threshold * min_available_nodes) number 20480 no
jvm_memory_pressure_threshold The maximum percentage of the Java heap used for all data nodes in the cluster number 80 no
master_cpu_utilization_threshold The maximum percentage of CPU utilization of master nodes number 80 no
master_jvm_memory_pressure_threshold The maximum percentage of the Java heap used for master nodes in the cluster number 80 no
min_available_nodes The minimum available (reachable) nodes to have, set to non-zero to enable number 0 no
monitor_automated_snapshot_failure Enable monitoring of automated snapshot failure bool true no
monitor_cluster_index_writes_blocked Enable monitoring of cluster index writes being blocked bool true no
monitor_cluster_status_is_red Enable monitoring of cluster status is in red bool true no
monitor_cluster_status_is_yellow Enable monitoring of cluster status is in yellow bool true no
monitor_cpu_utilization_too_high Enable monitoring of CPU utilization is too high bool true no
monitor_free_storage_space_too_low Enable monitoring of cluster per-node free storage is too low bool true no
monitor_free_storage_space_total_too_low Enable monitoring of cluster total free storage is too low. Disabled by default, if you set this you must set free_storage_space_total_threshold also bool false no
monitor_jvm_memory_pressure_too_high Enable monitoring of JVM memory pressure is too high bool true no
monitor_kms Enable monitoring of KMS-related metrics. Only enable this when using KMS with ElasticSearch bool true no
monitor_master_cpu_utilization_too_high Enable monitoring of CPU utilization of master nodes are too high. Only enable this when dedicated master is enabled bool true no
monitor_master_jvm_memory_pressure_too_high Enable monitoring of JVM memory pressure of master nodes are too high. Only enable this wwhen dedicated master is enabled bool true no
monitor_min_available_nodes Enable monitoring of minimum available nodes bool true no
monitor_shard Enable monitoring of sharding of master nodes are too high. bool true no
monitor_threadpool_search_queue Enable monitoring of threadpool search queue number is too high bool true no
monitor_threadpool_search_rejected Enable monitoring of threadpool search queue rejected number is increasing bool true no
monitor_threadpool_write_queue Enable monitoring of threadpool write queue number is too high. bool true no
monitor_threadpool_write_rejected Enable monitoring of threadpool write queue rejected number is increasing bool true no
monitor_unreachable_master_node Enable monitoring of master nodes are running and reachable. Only enable this wwhen dedicated master is enabled bool true no
shard_active_number_threshold The maximum number of active primary and replica shards number number 30000 no
sns_topic SNS topic you want to specify. If leave empty, it will use a prefix and a timestampe appended string "" no
tags A map of tags to add to all resources map(string) {} no
threadpool_search_queue_average_threshold The average number of cluster searching concurrency number 500 no
threadpool_search_queue_max_threshold The maximum number of cluster searching concurrency number 5000 no
threadpool_search_rejected_threshold The number of cluster threadpool search rejected threshold. Value 1 means it is increasing number 1 no
threadpool_write_queue_threshold The maximum number of cluster indexing concurrency number 100 no
threadpool_write_rejected_threshold The number of cluster threadpool write rejected threshold. Value 1 means it is increasing number 1 no

Outputs

No outputs.

Tags

Some of the inputs for this module are tags. All infrastructure resources must be tagged to meet the MOJ Technical Guidance on Documenting owners of infrastructure.

You should use your namespace variables to populate these. See the Usage section for more information.

Reading Material

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages