Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Release 2024-04-25 #2383

Merged
merged 79 commits into from
Apr 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
df1a653
feat(Speedscale): Use framework config install
JuliaNocera Mar 22, 2024
da70883
Ray new Quickstart.
pkudikyala Mar 26, 2024
26ec4e5
updated dashboard and alerts.
pkudikyala Mar 28, 2024
2dc2e6a
Merge branch 'release' into NR-249390
pkudikyala Mar 28, 2024
265aa86
feat: New singlestore otel quickstart
jcountsNR Apr 9, 2024
bfae19c
chore: Sanitized ss dashbaord
jcountsNR Apr 9, 2024
fc19359
feat(sidekiq-prometheus): Added a quickstart for Sidekiq Prometheus m…
Apr 11, 2024
e5ec011
fix(sidekiq-prometheus): Added icon in config
Apr 11, 2024
3f05a5e
Merge branch 'release' into jcounts/speedscale-data-source
aswanson-nr Apr 11, 2024
dbc508b
Merge branch 'release' into jcounts/speedscale-data-source
aswanson-nr Apr 12, 2024
bbc2b97
chore: Update ss logo
jcountsNR Apr 12, 2024
a7b7c46
Merge branch 'release' into NR-249390
pkudikyala Apr 15, 2024
ba18101
Merge branch 'release' into release
sjyothi54 Apr 16, 2024
ad8e9f0
updated the review comments.
pkudikyala Apr 17, 2024
7373ff7
Merge branch 'release' into NR-249390
pkudikyala Apr 17, 2024
8878620
Merge branch 'release' into NR-249390
pkudikyala Apr 18, 2024
44e6835
Update apm-signals.json
DarrenDoyle Apr 19, 2024
6ceadff
Updated doc URL's for AI quickstarts
sjyothi54 Apr 19, 2024
738d1b0
Updated the syntax
sjyothi54 Apr 19, 2024
5c537d7
Updated url
sjyothi54 Apr 23, 2024
9083d2e
feat(triton): Added new integration NVIDIA Triton.
pkudikyala Apr 23, 2024
1864676
Updated NR1_addData and NR1_sys keywords
sjyothi54 Apr 23, 2024
06f8def
Updated NR1_addData and NR1_sys keywords
sjyothi54 Apr 23, 2024
f14706b
Updated NR1_addData and NR1_sys keywords
RamanaReddy8801 Apr 23, 2024
2def42c
Added NR1_addData & NR1_sys keywords.
pkudikyala Apr 23, 2024
8dcdc60
added NR1_sys keyword
pkudikyala Apr 23, 2024
7d6ccfd
chore: update datasource logo
jcountsNR Apr 23, 2024
f0ffc0b
Merge branch 'release' into joseph/singlestore
jcountsNR Apr 23, 2024
64e55f1
updated config descriptions.
pkudikyala Apr 24, 2024
a01a52d
fix: Updated the NR1_addData and NR1_sys keywords to quickstarts.
pkudikyala Apr 24, 2024
5ea9a80
Merge pull request #2376 from sjyothi54/NR-261503
RamanaReddy8801 Apr 24, 2024
079fae8
Merge branch 'release' into NR-261501
sjyothi54 Apr 24, 2024
5e06041
feat: Add IBM MQ to Add Data
Apr 24, 2024
6c4cb8d
Merge pull request #2381 from newrelic/andrew/ibm-mb-add-data
Apr 24, 2024
99f1c5c
Merge branch 'newrelic:release' into release
sjyothi54 Apr 25, 2024
5632804
Updated dashboard and alerts
sjyothi54 Apr 25, 2024
5e5f49c
Merge branch 'release' into NR-249390
pkudikyala Apr 25, 2024
ffe5439
Merge branch 'release' into patch-2
pkudikyala Apr 25, 2024
ac9adff
Merge branch 'release' into release
pkudikyala Apr 25, 2024
f665334
Merge branch 'release' into joseph/singlestore
pkudikyala Apr 25, 2024
0fae0d7
Merge pull request #2373 from DarrenDoyle/patch-2
RamanaReddy8801 Apr 25, 2024
889594c
Merge branch 'release' into joseph/singlestore
RamanaReddy8801 Apr 25, 2024
a736ccc
Merge branch 'release' into NR-249390
sjyothi54 Apr 25, 2024
bff5e02
Merge pull request #2341 from pkudikyala/NR-249390
sjyothi54 Apr 25, 2024
2eb0ee3
chore: generate UUID(s) [skip ci]
nr-opensource-bot Apr 25, 2024
5ca69cd
Merge branch 'release' into NR-262340
RamanaReddy8801 Apr 25, 2024
4e13724
Merge pull request #2382 from sjyothi54/NR-262340
RamanaReddy8801 Apr 25, 2024
1ba8451
Merge branch 'release' into NR-255366
RamanaReddy8801 Apr 25, 2024
25c8a5c
Merge branch 'release' into joseph/singlestore
RamanaReddy8801 Apr 25, 2024
940a818
updated review changes
pkudikyala Apr 25, 2024
600ff04
Merge branch 'release' into NR-261873
RamanaReddy8801 Apr 25, 2024
92af6c6
Merge pull request #2362 from newrelic/joseph/singlestore
RamanaReddy8801 Apr 25, 2024
c79a427
Merge branch 'release' into NR-257908-triton
pkudikyala Apr 25, 2024
8956be2
chore: generate UUID(s) [skip ci]
nr-opensource-bot Apr 25, 2024
bc40b72
Merge branch 'release' into NR-257908-triton
pkudikyala Apr 25, 2024
30e4a75
Merge branch 'release' into NR-255366
sjyothi54 Apr 25, 2024
5c970a4
Merge pull request #2375 from pkudikyala/NR-257908-triton
sjyothi54 Apr 25, 2024
c4b94bd
Merge branch 'release' into NR-261873
sjyothi54 Apr 25, 2024
734299a
chore: generate UUID(s) [skip ci]
nr-opensource-bot Apr 25, 2024
b11c616
Merge branch 'release' into release
sjyothi54 Apr 25, 2024
b85300c
Merge branch 'release' into NR-261501
sjyothi54 Apr 25, 2024
88d9222
Merge branch 'release' into NR-261873
sjyothi54 Apr 25, 2024
900a3d0
Merge pull request #2377 from RamanaReddy8801/NR-261501
sjyothi54 Apr 25, 2024
9168da9
Merge branch 'release' into NR-255366
RamanaReddy8801 Apr 25, 2024
d89a761
Merge branch 'release' into NR-261873
sjyothi54 Apr 25, 2024
17f8d01
Merge pull request #2380 from pkudikyala/NR-261873
RamanaReddy8801 Apr 25, 2024
6fdf72c
Merge branch 'release' into NR-255366
RamanaReddy8801 Apr 25, 2024
de59887
Merge pull request #2374 from sjyothi54/NR-255366
RamanaReddy8801 Apr 25, 2024
0f416c2
Merge branch 'release' into NR-261502
Apr 25, 2024
768d4e2
Merge branch 'release' into release
Apr 25, 2024
7a1670d
feat: Fix capitalization
d3caf Apr 25, 2024
8adda53
Merge branch 'release' into jcounts/speedscale-data-source
Apr 25, 2024
5fee177
Merge pull request #2378 from pkudikyala/NR-261502
Apr 25, 2024
e7064dd
Merge branch 'release' into release
Apr 25, 2024
62d426c
Merge pull request #2366 from jospdeleon/release
Apr 25, 2024
05b824c
Merge branch 'release' into jcounts/speedscale-data-source
Apr 25, 2024
6347177
chore: generate UUID(s) [skip ci]
nr-opensource-bot Apr 25, 2024
1327f06
Merge branch 'release' into jcounts/speedscale-data-source
Apr 25, 2024
54819b2
Merge pull request #2336 from newrelic/jcounts/speedscale-data-source
Apr 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions alert-policies/nvidia-triton/CpuUsedPercentage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: CPU Utilization (%)

description: |+
This alert is triggered when the CPU utilization exceeds 85% for 5 minutes.

type: STATIC
nrql:
query: "SELECT average(cpuPercent) AS `CPU used %` FROM SystemSample"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 90
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

- priority: WARNING
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 85
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
27 changes: 27 additions & 0 deletions alert-policies/nvidia-triton/RequestFailures.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: HTTP Request Failures

description: |+
This alert is triggered when HTTP Request Failures exceed 1 every 5 minutes.

type: STATIC
nrql:
query: "SELECT latest(nv_inference_request_failure) FROM Metric"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 1
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
37 changes: 37 additions & 0 deletions alert-policies/nvidia-triton/StorageUsagePercentage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Storage Utilization (%)

description: |+
This alert is triggered when the storage utilization exceeds 85% for 5 minutes.

type: STATIC
nrql:
query: "SELECT average(diskUsedPercent) AS `Storage used %` FROM StorageSample"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 90
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

- priority: WARNING
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 85
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
31 changes: 31 additions & 0 deletions alert-policies/ray/ActiveNodes.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Name of the alert
name: Ray Active Nodes

# Description and details
description: |+
This alert triggers when there are no active nodes for 5 minutes.
# Type of alert
type: STATIC

# NRQL query
nrql:
query: "SELECT latest(ray_cluster_active_nodes) AS 'active nodes' FROM Metric"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: BELOW
# Value that triggers a violation
threshold: 1
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
41 changes: 41 additions & 0 deletions alert-policies/ray/FreeDiskPercentage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Name of the alert
name: Ray Free Disk Percentage

# Description and details
description: |+
This alert is triggered if there is less than 10% of free disk space for 5 minutes.
# Type of alert
type: STATIC

# NRQL query
nrql:
query: "SELECT (latest(ray_node_disk_free) / 1e+9) / (latest(ray_node_disk_usage) / 1e+9 + latest(ray_node_disk_free) / 1e+9) * 100 AS 'free disk %' FROM Metric"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: BELOW
# Value that triggers a violation
threshold: 10
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

- priority: WARNING
# Operator used to compare against the threshold.
operator: BELOW
# Value that triggers a violation
threshold: 15
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
41 changes: 41 additions & 0 deletions alert-policies/ray/RPCHealthCheckLatency.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Name of the alert
name: Ray RPC Health Check Latency

# Description and details
description: |+
This alert is triggered if the RPC health check latency exceeds 2 seconds for 5 minutes.
# Type of alert
type: STATIC

# NRQL query
nrql:
query: "SELECT latest(ray_health_check_rpc_latency_ms_bucket) / 1000 as 'rpc latency' FROM Metric"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 2
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

- priority: WARNING
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 1.5
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
2 changes: 1 addition & 1 deletion alert-policies/snowflake/FailedQueries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: |+

type: STATIC
nrql:
query: "FROM SnowflakeVirtualWarehouse SELECT uniqueCount(QUERY_ID) AS 'Queries' WHERE EXECUTION_STATUS = 'FAIL'"
query: "FROM snowflakeLongestQueriesSample SELECT uniqueCount(QUERY_ID) AS 'Queries' WHERE EXECUTION_STATUS = 'FAIL'"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE
Expand Down
2 changes: 1 addition & 1 deletion alert-policies/snowflake/QueuedQueries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: |+

type: STATIC
nrql:
query: "FROM SnowflakeVirtualWarehouse SELECT latest(QUEUED_LOAD_AVERAGE) as 'Queued Queries'"
query: "FROM snowflakeWarehouseLoadHistorySample SELECT latest(QUEUED_LOAD_AVERAGE) as 'Queued Queries'"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE
Expand Down
2 changes: 1 addition & 1 deletion alert-policies/snowflake/SpilledLocalStorage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: |+

type: STATIC
nrql:
query: "SELECT latest(BYTES_SPILLED_TO_LOCAL_STORAGE_AVERAGE) as 'Bytes Spilled to Local Storage' FROM SnowflakeVirtualWarehouse"
query: "SELECT latest(BYTES_SPILLED_TO_LOCAL_STORAGE_AVERAGE) as 'Bytes Spilled to Local Storage' FROM snowflakeQueryHistorySample"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE
Expand Down
2 changes: 1 addition & 1 deletion alert-policies/snowflake/SpilledRemoteStorage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: |+

type: STATIC
nrql:
query: "SELECT latest(BYTES_SPILLED_TO_REMOTE_STORAGE_AVERAGE) as 'Bytes Spilled to Remote Storage' FROM SnowflakeVirtualWarehouse"
query: "SELECT latest(BYTES_SPILLED_TO_REMOTE_STORAGE_AVERAGE) as 'Bytes Spilled to Remote Storage' FROM snowflakeQueryHistorySample"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE
Expand Down
4 changes: 2 additions & 2 deletions dashboards/apm-signals/apm-signals.json
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@
"nrqlQueries": [
{
"accountIds": [],
"query": "SELECT apdex(apm.service.apdex) as 'Apdex' FROM Metric WHERE appName LIKE '%' WHERE appName = 'Proxy-East' SINCE 10 minutes ago COMPARE WITH 1 day ago"
"query": "SELECT apdex(apm.service.apdex) as 'Apdex' FROM Metric WHERE appName LIKE '%' SINCE 10 minutes ago COMPARE WITH 1 day ago"
}
],
"platformOptions": {
Expand Down Expand Up @@ -1069,4 +1069,4 @@
}
],
"variables": []
}
}
Binary file added dashboards/nvidia-triton/nvidia-triton-01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added dashboards/nvidia-triton/nvidia-triton-02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added dashboards/nvidia-triton/nvidia-triton-03.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading