Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinecone(Prometheus) quickstart. #2195

Merged
merged 8 commits into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions alert-policies/pinecone-prometheus/PineconeIndexFullness.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Pinecone Index Fullness

description: |+
- This metric indicates the index's fullness on a scale from 0 to 1
- An alert is triggered if the value exceeds the 80% threshold
- Resolution: If it surpasses 80%, we need to add another replica or increase the pod size
type: STATIC
nrql:
query: "FROM Metric SELECT average(pinecone_index_fullness) AS '(%) index fullness ' WHERE instrumentation.name = 'remote-write' and instrumentation.provider = 'prometheus' LIMIT MAX "

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 0.8
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
27 changes: 27 additions & 0 deletions alert-policies/pinecone-prometheus/PineconeRequestsErrors.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Pinecone Request Errors

description: |+
- This metric displays the total count of data plane calls executed by clients that resulted in errors
- An alert is triggered if the value exceeds 0
type: STATIC
nrql:
query: "FROM Metric SELECT latest(pinecone_request_error_count_total) AS 'request errors' WHERE instrumentation.name = 'remote-write' and instrumentation.provider = 'prometheus' LIMIT MAX"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 1
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
29 changes: 29 additions & 0 deletions alert-policies/pinecone-prometheus/PineconeRequestsLatency.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Pinecone Request Latency

description: |+
- This metric illustrates the server-side processing latency distribution for
Pinecone data plane calls
- An alert is triggered if the 50th percentile exceeds 100 ms
- Resolution: If it surpasses 100 ms, we need to add another replica
type: STATIC
nrql:
query: "FROM Metric SELECT percentile(pinecone_request_latency_seconds, 50) * 1000 as 'requests latency' WHERE instrumentation.name = 'remote-write' and instrumentation.provider = 'prometheus' LIMIT MAX"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 100
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading