Skip to content

Conversation

@hsin-c
Copy link
Contributor

@hsin-c hsin-c commented May 2, 2025

This PR adds a new multi-agent example for alert triaging.

Tested offline and verified the output to match expected labels on the test dataset (in the data/ folder).

@copy-pr-bot
Copy link

copy-pr-bot bot commented May 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@dagardner-nv dagardner-nv added improvement Improvement to existing functionality non-breaking Non-breaking change labels May 2, 2025
Copy link
Contributor

@dagardner-nv dagardner-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, added some comments.

@dagardner-nv
Copy link
Contributor

/ok to test 34a5cbb

@hsin-c hsin-c force-pushed the hsinc-alert-triage-agent branch from 34a5cbb to d1030f1 Compare May 2, 2025 17:41
@dagardner-nv
Copy link
Contributor

/ok to test bd2679b

Copy link
Contributor

@dagardner-nv dagardner-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, merge once CI is passing

Co-authored-by: David Gardner <[email protected]>
Signed-off-by: hsin-c <[email protected]>
@dagardner-nv
Copy link
Contributor

/ok to test e47204f

Signed-off-by: Hsin Chen <[email protected]>
@AnuradhaKaruppiah AnuradhaKaruppiah requested a review from Copilot May 2, 2025 20:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new multi-agent example for alert triaging by introducing an HTTP server, several diagnostic tools, and workflow configuration files. Key changes include:

  • Implementation of a Flask-based HTTP server (run.py) for receiving alerts.
  • Addition of multiple diagnostic tools (e.g., network connectivity, maintenance check, hardware check) and a triage workflow (register.py, prompts.py, playbooks.py).
  • New configuration files and documentation updates to support the alert triage example.

Reviewed Changes

Copilot reviewed 18 out of 24 changed files in this pull request and generated 1 comment.

File Description
src/aiq_alert_triage_agent/run.py Adds the HTTP server and alert processing logic.
src/aiq_alert_triage_agent/register.py Defines the triage workflow and tool orchestration.
(others) Introduces diagnostic tools, prompts, playbooks, configuration, and documentation for the alert triage agent.
Files not reviewed (6)
  • .gitattributes: Language not supported
  • examples/alert_triage_agent/.env_example: Language not supported
  • examples/alert_triage_agent/src/aiq_alert_triage_agent/data/ata_diagram.png: Language not supported
  • examples/alert_triage_agent/src/aiq_alert_triage_agent/data/benign_fallback_test_data.json: Language not supported
  • examples/alert_triage_agent/src/aiq_alert_triage_agent/data/maintenance_static_dataset.csv: Language not supported
  • examples/alert_triage_agent/src/aiq_alert_triage_agent/data/test_data.csv: Language not supported
Comments suppressed due to low confidence (1)

examples/alert_triage_agent/src/aiq_alert_triage_agent/register.py:156

  • [nitpick] The variable name 'input' shadows the built-in function. Renaming it (e.g., 'alert_data') would improve code clarity and avoid confusion.
input = row["alert"]

@dagardner-nv
Copy link
Contributor

/ok to test b19c20c

@hsin-c
Copy link
Contributor Author

hsin-c commented May 2, 2025

/ok to test a5e2a23

@dagardner-nv
Copy link
Contributor

/ok to test 5e2a23517e834b33f685afacfcac40ef8ad09ca

@copy-pr-bot
Copy link

copy-pr-bot bot commented May 2, 2025

/ok to test 5e2a23517e834b33f685afacfcac40ef8ad09ca

@dagardner-nv, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@dagardner-nv
Copy link
Contributor

/ok to test a5e2a23

AnuradhaKaruppiah and others added 9 commits May 5, 2025 10:35
For easy quick start

Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Hsin Chen <[email protected]>
This is to allow the example to be in installed as a part of
"uv sync --all-groups --all-extras"

Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
@AnuradhaKaruppiah
Copy link
Contributor

/ok to test 68b4f86

@AnuradhaKaruppiah
Copy link
Contributor

@hsin-c Thanks for the contribution!
Please see #212 and #213 for the required followup.

@AnuradhaKaruppiah
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 606e4b7 into NVIDIA:develop May 5, 2025
10 checks passed
yczhang-nv pushed a commit to yczhang-nv/NeMo-Agent-Toolkit that referenced this pull request May 8, 2025
This PR adds a new multi-agent example for alert triaging.

Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder).

Authors:
  - https://github.com/hsin-c
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - David Gardner (https://github.com/dagardner-nv)

URL: NVIDIA#193
Signed-off-by: Yuchen Zhang <[email protected]>
yczhang-nv pushed a commit to yczhang-nv/NeMo-Agent-Toolkit that referenced this pull request May 9, 2025
This PR adds a new multi-agent example for alert triaging.

Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder).

Authors:
  - https://github.com/hsin-c
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - David Gardner (https://github.com/dagardner-nv)

URL: NVIDIA#193
Signed-off-by: Yuchen Zhang <[email protected]>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
This PR adds a new multi-agent example for alert triaging.

Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder).

Authors:
  - https://github.com/hsin-c
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - David Gardner (https://github.com/dagardner-nv)

URL: NVIDIA#193
Signed-off-by: Eric Evans <[email protected]>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
This PR adds a new multi-agent example for alert triaging.

Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder).

Authors:
  - https://github.com/hsin-c
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - David Gardner (https://github.com/dagardner-nv)

URL: NVIDIA#193
Signed-off-by: Eric Evans <[email protected]>
AnuradhaKaruppiah pushed a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
This PR adds a new multi-agent example for alert triaging.

Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder).

Authors:
  - https://github.com/hsin-c
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - David Gardner (https://github.com/dagardner-nv)

URL: NVIDIA#193
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
This PR adds a new multi-agent example for alert triaging.

Tested offline and verified the output to match expected labels on the test dataset (in the `data/` folder).

Authors:
  - https://github.com/hsin-c
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - David Gardner (https://github.com/dagardner-nv)

URL: NVIDIA#193
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants