-
Notifications
You must be signed in to change notification settings - Fork 416
Add the alert triage agent example #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dagardner-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, added some comments.
examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config.yml
Outdated
Show resolved
Hide resolved
examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config.yml
Outdated
Show resolved
Hide resolved
examples/alert_triage_agent/src/aiq_alert_triage_agent/hardware_check_tool.py
Show resolved
Hide resolved
examples/alert_triage_agent/src/aiq_alert_triage_agent/host_performance_check_tool.py
Outdated
Show resolved
Hide resolved
examples/alert_triage_agent/src/aiq_alert_triage_agent/network_connectivity_check_tool.py
Show resolved
Hide resolved
|
/ok to test 34a5cbb |
Signed-off-by: Hsin Chen <[email protected]>
Signed-off-by: Hsin Chen <[email protected]>
34a5cbb to
d1030f1
Compare
Co-authored-by: David Gardner <[email protected]> Signed-off-by: hsin-c <[email protected]>
Co-authored-by: David Gardner <[email protected]> Signed-off-by: hsin-c <[email protected]>
Signed-off-by: Hsin Chen <[email protected]>
…sinc-alert-triage-agent
Signed-off-by: Hsin Chen <[email protected]>
|
/ok to test bd2679b |
dagardner-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, merge once CI is passing
Co-authored-by: David Gardner <[email protected]> Signed-off-by: hsin-c <[email protected]>
|
/ok to test e47204f |
Signed-off-by: Hsin Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new multi-agent example for alert triaging by introducing an HTTP server, several diagnostic tools, and workflow configuration files. Key changes include:
- Implementation of a Flask-based HTTP server (run.py) for receiving alerts.
- Addition of multiple diagnostic tools (e.g., network connectivity, maintenance check, hardware check) and a triage workflow (register.py, prompts.py, playbooks.py).
- New configuration files and documentation updates to support the alert triage example.
Reviewed Changes
Copilot reviewed 18 out of 24 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/aiq_alert_triage_agent/run.py | Adds the HTTP server and alert processing logic. |
| src/aiq_alert_triage_agent/register.py | Defines the triage workflow and tool orchestration. |
| (others) | Introduces diagnostic tools, prompts, playbooks, configuration, and documentation for the alert triage agent. |
Files not reviewed (6)
- .gitattributes: Language not supported
- examples/alert_triage_agent/.env_example: Language not supported
- examples/alert_triage_agent/src/aiq_alert_triage_agent/data/ata_diagram.png: Language not supported
- examples/alert_triage_agent/src/aiq_alert_triage_agent/data/benign_fallback_test_data.json: Language not supported
- examples/alert_triage_agent/src/aiq_alert_triage_agent/data/maintenance_static_dataset.csv: Language not supported
- examples/alert_triage_agent/src/aiq_alert_triage_agent/data/test_data.csv: Language not supported
Comments suppressed due to low confidence (1)
examples/alert_triage_agent/src/aiq_alert_triage_agent/register.py:156
- [nitpick] The variable name 'input' shadows the built-in function. Renaming it (e.g., 'alert_data') would improve code clarity and avoid confusion.
input = row["alert"]
Signed-off-by: Hsin Chen <[email protected]>
Signed-off-by: Hsin Chen <[email protected]>
…it into hsinc-alert-triage-agent
Signed-off-by: Hsin Chen <[email protected]>
|
/ok to test b19c20c |
Signed-off-by: Hsin Chen <[email protected]>
|
/ok to test a5e2a23 |
|
/ok to test 5e2a23517e834b33f685afacfcac40ef8ad09ca |
@dagardner-nv, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test a5e2a23 |
For easy quick start Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Hsin Chen <[email protected]>
This is to allow the example to be in installed as a part of "uv sync --all-groups --all-extras" Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Hsin Chen <[email protected]>
|
/ok to test 68b4f86 |
|
/merge |
This PR adds a new multi-agent example for alert triaging. Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder). Authors: - https://github.com/hsin-c - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: NVIDIA#193 Signed-off-by: Yuchen Zhang <[email protected]>
This PR adds a new multi-agent example for alert triaging. Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder). Authors: - https://github.com/hsin-c - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: NVIDIA#193 Signed-off-by: Yuchen Zhang <[email protected]>
This PR adds a new multi-agent example for alert triaging. Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder). Authors: - https://github.com/hsin-c - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: NVIDIA#193 Signed-off-by: Eric Evans <[email protected]>
This PR adds a new multi-agent example for alert triaging. Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder). Authors: - https://github.com/hsin-c - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: NVIDIA#193 Signed-off-by: Eric Evans <[email protected]>
This PR adds a new multi-agent example for alert triaging. Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder). Authors: - https://github.com/hsin-c - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: NVIDIA#193
This PR adds a new multi-agent example for alert triaging. Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder). Authors: - https://github.com/hsin-c - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: NVIDIA#193
This PR adds a new multi-agent example for alert triaging.
Tested offline and verified the output to match expected labels on the test dataset (in the
data/folder).