Skip to content

Commit

Permalink
Updated review comments
Browse files Browse the repository at this point in the history
  • Loading branch information
RamanaReddy8801 committed Nov 2, 2023
1 parent 15ecf0b commit fee57b9
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 8 deletions.
2 changes: 1 addition & 1 deletion alert-policies/nvidia-dcgm/HighTemperature.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: High GPU Temperature

description: |+
This alert is triggered when the Nvidia GPU Temperature is above 90%.
TThis alert is triggered when the NVIDIA GPU Temperature is above 90%.
type: STATIC
nrql:
Expand Down
6 changes: 3 additions & 3 deletions dashboards/nvidia-dcgm/nvidia-dcgm.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"id": "viz.markdown"
},
"rawConfiguration": {
"text": "![NVIDIA DCGM](https://assets.nvidiagrid.net/ngc/logos/DCGM.png)"
"text": "![NVIDIA DCGM](https://github-production-user-asset-6210df.s3.amazonaws.com/104448291/279630087-461421da-3f8b-4d71-bac7-2e20d58b4180.png)"
}
},
{
Expand Down Expand Up @@ -83,7 +83,7 @@
}
},
{
"title": "Total nvlink bandwidth",
"title": "Total NVLink bandwidth",
"layout": {
"column": 10,
"row": 1,
Expand Down Expand Up @@ -125,7 +125,7 @@
"id": "viz.markdown"
},
"rawConfiguration": {
"text": "**About**\n\nInstrument your application with New Relic - [Add Data](https://one.newrelic.com).\n\nInstrument NVIDIA DCGM with New Relic using the [documentation](https://docs.newrelic.com/).\n\n[Please rate this dashboard](https://docs.google.com/forms/d/e/1FAIpQLSclR38J8WbbB2J1tHnllKUkzWZkJhf4SrJGyavpMd4t82NjnQ/viewform?usp=pp_url&entry.1615922415=nvidia-dcgm) here and let us know how we can improve it for you."
"text": "**About**\n\nInstrument your application with New Relic - [Add Data](https://one.newrelic.com).\n\nInstrument NVIDIA DCGM with New Relic using the [documentation](https://docs.newrelic.com/docs/infrastructure/host-integrations/host-integrations-list/nvidia-dcgm-integration/).\n\n[Please rate this dashboard](https://docs.google.com/forms/d/e/1FAIpQLSclR38J8WbbB2J1tHnllKUkzWZkJhf4SrJGyavpMd4t82NjnQ/viewform?usp=pp_url&entry.1615922415=nvidia-dcgm) here and let us know how we can improve it for you."
}
},
{
Expand Down
Binary file modified data-sources/nvidia-dcgm/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 11 additions & 4 deletions quickstarts/nvidia-dcgm/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ description: |
## What’s included in this quickstart?
New Relic NVIDIA DCGM monitoring quickstart provides quality out-of-the-box reporting:
- Dashboards (power usage, gpu utilisation, clocks, etc)
- Alerts for ZooKeeper (gpu temperature, xid error)
- Dashboards (power usage, GPU utilisation, clocks, etc)
- Alerts for NVIDIA DCGM (GPU temperature, Xid error)
summary: |
Expand All @@ -27,8 +27,15 @@ documentation:
url: https://docs.newrelic.com/docs/infrastructure/host-integrations/host-integrations-list/nvidia-dcgm-integration/
keywords:
- NVIDIA DCGM
- dcgm
- gpu
- AI Acceleration
- Machine Learning Acceleration
- GPU Management
- AI Management
- Machine Learning Management
- Deep Learning Performance
- AI Performance
- GPU Optimization
- AI Optimization
dataSourceIds:
- nvidia-dcgm
dashboards:
Expand Down
Binary file modified quickstarts/nvidia-dcgm/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit fee57b9

Please sign in to comment.