Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Product Partnerships] Added Amazon Backup #2180

Merged
merged 7 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions alert-policies/amazon-backup/HighBackupJobFailure.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: High Backup Job Failure

description: |+
This alert is triggered if the number of Backup Job Failure exceeds 10 for 10 minutes.

type: STATIC
nrql:
query: "SELECT sum(`aws.backup.NumberOfBackupJobsFailed`) as 'Query' FROM Metric"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 10
# Time in seconds; 120 - 3600
thresholdDuration: 600
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Adding a Warning threshold is optional
- priority: WARNING
operator: ABOVE
threshold: 5
thresholdDuration: 600
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
34 changes: 34 additions & 0 deletions alert-policies/amazon-backup/HighCopyJobsFailure.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: High Copy Job Failure

description: |+
This alert is triggered if the number of Copy Job Failure exceeds 10 for 10 minutes.

type: STATIC
nrql:
query: "SELECT sum(`aws.backup.NumberOfCopyJobsFailed`) as 'Query' FROM Metric"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 10
# Time in seconds; 120 - 3600
thresholdDuration: 600
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Adding a Warning threshold is optional
- priority: WARNING
operator: ABOVE
threshold: 5
thresholdDuration: 600
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
34 changes: 34 additions & 0 deletions alert-policies/amazon-backup/HighRestoreJobsFailure.yml.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: High Restore Job Failure

description: |+
This alert is triggered if the number of Restore Job Failure exceeds 10 for 10 minutes.

type: STATIC
nrql:
query: "SELECT sum(`aws.backup.NumberOfRestoreJobsFailed`) as 'Query' FROM Metric"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 10
# Time in seconds; 120 - 3600
thresholdDuration: 600
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Adding a Warning threshold is optional
- priority: WARNING
operator: ABOVE
threshold: 5
thresholdDuration: 600
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
Loading