NVIDIA · rapids-bot · Jun 6, 2025 · Jun 5, 2025 · Jun 6, 2025 · Jun 6, 2025
@@ -41,7 +41,7 @@ This example demonstrates how to build an intelligent alert triage system using
     - [Running in a live environment](#running-in-a-live-environment)
       - [Note on credentials and access](#note-on-credentials-and-access)
     - [Running live with a HTTP server listening for alerts](#running-live-with-a-http-server-listening-for-alerts)
-    - [Running in test mode](#running-in-test-mode)
+    - [Running in offline mode](#running-in-offline-mode)
 
 
 ## Use case description
@@ -149,20 +149,20 @@ The triage agent may call one or more of the following tools based on the alert
 
 #### Functions
 
-Each entry in the `functions` section defines a tool or sub-agent that can be invoked by the main workflow agent. Tools can operate in test mode, using mocked data for simulation.
+Each entry in the `functions` section defines a tool or sub-agent that can be invoked by the main workflow agent. Tools can operate in offline mode, using mocked data for simulation.
 
 Example:
 
 ```yaml
 hardware_check:
   _type: hardware_check
   llm_name: tool_reasoning_llm
-  test_mode: true
+  offline_mode: true
 ```
 
 * `_type`: Identifies the name of the tool (matching the names in the tools' python files.)
 * `llm_name`: LLM used to support the tool’s reasoning of the raw fetched data.
-* `test_mode`: If `true`, the tool uses predefined mock results for offline testing.
+* `offline_mode`: If `true`, the tool uses predefined mock results for offline testing.
 
 Some entries, like `telemetry_metrics_analysis_agent`, are sub-agents that coordinate multiple tools:
 
@@ -185,19 +185,19 @@ workflow:
     - hardware_check
     - ...
   llm_name: ata_agent_llm
-  test_mode: true
-  test_data_path: ...
+  offline_mode: true
+  offline_data_path: ...
   benign_fallback_data_path: ...
-  test_output_path: ...
+  offline_output_path: ...
 ```
 
 * `_type`: The name of the agent (matching the agent's name in `register.py`).
 * `tool_names`: List of tools (from the `functions` section) used in the triage process.
 * `llm_name`: Main LLM used by the agent for reasoning, tool-calling, and report generation.
-* `test_mode`: Enables test execution using predefined input/output instead of real systems.
-* `test_data_path`: CSV file containing test alerts and their corresponding mocked tool responses.
+* `offline_mode`: Enables offline execution using predefined input/output instead of real systems.
+* `offline_data_path`: CSV file containing offline test alerts and their corresponding mocked tool responses.
 * `benign_fallback_data_path`: JSON file with baseline healthy system responses for tools not explicitly mocked.
-* `test_output_path`: Output CSV file path where the agent writes triage results. Each processed alert adds a new `output` column with the generated report.
+* `offline_output_path`: Output CSV file path where the agent writes triage results. Each processed alert adds a new `output` column with the generated report.
 
 #### LLMs
 
@@ -240,7 +240,7 @@ export $(grep -v '^#' .env | xargs)
 ```
 
 ## Example Usage
-You can run the agent in [test mode](#running-in-test-mode) or [live mode](#running-live-with-a-http-server-listening-for-alerts). Test mode allows you to evaluate the agent in a controlled, offline environment using synthetic data. Live mode allows you to run the agent in a real environment.
+You can run the agent in [offline mode](#running-in-offline-mode) or [live mode](#running-live-with-a-http-server-listening-for-alerts). offline mode allows you to evaluate the agent in a controlled, offline environment using synthetic data. Live mode allows you to run the agent in a real environment.
 
 ### Running in a live environment
 In live mode, each tool used by the triage agent connects to real systems to collect data. These systems can include:
@@ -262,11 +262,11 @@ To run the agent live, follow these steps:
 
    If your environment includes unique systems or data sources, you can define new tools or modify existing ones. This allows your triage agent to pull in the most relevant data for your alerts and infrastructure.
 
-3. **Disable test mode**
+3. **Disable offline mode**
 
-   Set `test_mode: false` in the workflow section and for each tool in the functions section of your config file to ensure the agent uses real data instead of synthetic test datasets.
+   Set `offline_mode: false` in the workflow section and for each tool in the functions section of your config file to ensure the agent uses real data instead of offline datasets.
 
-   You can also selectively keep some tools in test mode by leaving their `test_mode: true` for more granular testing.
+   You can also selectively keep some tools in offline mode by leaving their `offline_mode: true` for more granular testing.
 
 4. **Run the agent with a real alert**
 
@@ -371,31 +371,31 @@ To use this mode, first ensure you have configured your live environment as desc
 
    You can monitor the progress of the triage process through these logs and the generated reports.
 
-### Running in test mode
-Test mode lets you evaluate the triage agent in a controlled, offline environment using synthetic data. Instead of calling real systems, the agent uses predefined inputs to simulate alerts and tool outputs, ideal for development, debugging, and tuning.
+### Running in offline mode
+offline mode lets you evaluate the triage agent in a controlled, offline environment using synthetic data. Instead of calling real systems, the agent uses predefined inputs to simulate alerts and tool outputs, ideal for development, debugging, and tuning.
 
-To run in test mode:
+To run in offline mode:
 1. **Set required environment variables**
 
-   Make sure `test_mode: true` is set in both the `workflow` section and individual tool sections of your config file (see [Understanding the config](#understanding-the-config) section).
+   Make sure `offline_mode: true` is set in both the `workflow` section and individual tool sections of your config file (see [Understanding the config](#understanding-the-config) section).
 
 1. **How it works**
-- The **main test CSV** provides both alert details and a mock environment. For each alert, expected tool return values are included. These simulate how the environment would behave if the alert occurred on a real system.
-- The **benign fallback dataset** fills in tool responses when the agent calls a tool not explicitly defined in the alert's test data. These fallback responses mimic healthy system behavior and help provide the "background scenery" without obscuring the true root cause.
+- The **main CSV offline dataset** provides both alert details and a mock environment. For each alert, expected tool return values are included. These simulate how the environment would behave if the alert occurred on a real system.
+- The **benign fallback dataset** fills in tool responses when the agent calls a tool not explicitly defined in the alert's offline data. These fallback responses mimic healthy system behavior and help provide the "background scenery" without obscuring the true root cause.
 
-3. **Run the agent in test mode**
+3. **Run the agent in offline mode**
 
    Run the agent with:
    ```bash
-   aiq run --config_file=examples/alert_triage_agent/configs/config_test_mode.yml --input "test_mode"
+   aiq run --config_file=examples/alert_triage_agent/configs/config_offline_mode.yml --input "offline_mode"
    ```
-    Note: The `--input` value is ignored in test mode.
+    Note: The `--input` value is ignored in offline mode.
 
     The agent will:
-   - Load alerts from the test dataset specified in `test_data_path` in the workflow config
+   - Load alerts from the offline dataset specified in `offline_data_path` in the workflow config
    - Simulate an investigation using predefined tool results
    - Iterate through all the alerts in the dataset
-   - Save reports as a new column in a copy of the test CSV file to the path specified in `test_output_path` in the workflow config
+   - Save reports as a new column in a copy of the offline CSV file to the path specified in `offline_output_path` in the workflow config
 
 2. **Understanding the output**
 

@@ -20,27 +20,27 @@ functions:
   hardware_check:
     _type: hardware_check
     llm_name: tool_reasoning_llm
-    test_mode: false
+    offline_mode: false
   host_performance_check:
     _type: host_performance_check
     llm_name: tool_reasoning_llm
-    test_mode: false
+    offline_mode: false
   monitoring_process_check:
     _type: monitoring_process_check
     llm_name: tool_reasoning_llm
-    test_mode: false
+    offline_mode: false
   network_connectivity_check:
     _type: network_connectivity_check
     llm_name: tool_reasoning_llm
-    test_mode: false
+    offline_mode: false
   telemetry_metrics_host_heartbeat_check:
     _type: telemetry_metrics_host_heartbeat_check
     llm_name: tool_reasoning_llm
-    test_mode: false
+    offline_mode: false
   telemetry_metrics_host_performance_check:
     _type: telemetry_metrics_host_performance_check
     llm_name: tool_reasoning_llm
-    test_mode: false
+    offline_mode: false
   telemetry_metrics_analysis_agent:
     _type: telemetry_metrics_analysis_agent
     tool_names:
@@ -64,11 +64,11 @@ workflow:
     - network_connectivity_check
     - telemetry_metrics_analysis_agent
   llm_name: ata_agent_llm
-  test_mode: false
-  # The below paths are only used if test_mode is true
-  test_data_path: null
+  offline_mode: false
+  # The below paths are only used if offline_mode is true
+  offline_data_path: null
   benign_fallback_data_path: null
-  test_output_path: null
+  offline_output_path: null
 
 llms:
   ata_agent_llm:

@@ -20,28 +20,28 @@ functions:
   hardware_check:
     _type: hardware_check
     llm_name: tool_reasoning_llm
-    test_mode: true
+    offline_mode: true
   host_performance_check:
     _type: host_performance_check
     llm_name: tool_reasoning_llm
-    test_mode: true
+    offline_mode: true
   monitoring_process_check:
     _type: monitoring_process_check
     llm_name: tool_reasoning_llm
-    test_mode: true
+    offline_mode: true
   network_connectivity_check:
     _type: network_connectivity_check
     llm_name: tool_reasoning_llm
-    test_mode: true
+    offline_mode: true
   telemetry_metrics_host_heartbeat_check:
     _type: telemetry_metrics_host_heartbeat_check
     llm_name: tool_reasoning_llm
-    test_mode: true
+    offline_mode: true
     metrics_url: http://your-monitoring-server:9090 # Replace with your monitoring system URL if running in live mode
   telemetry_metrics_host_performance_check:
     _type: telemetry_metrics_host_performance_check
     llm_name: tool_reasoning_llm
-    test_mode: true
+    offline_mode: true
     metrics_url: http://your-monitoring-server:9090 # Replace with your monitoring system URL if running in live mode
   telemetry_metrics_analysis_agent:
     _type: telemetry_metrics_analysis_agent
@@ -66,11 +66,11 @@ workflow:
     - network_connectivity_check
     - telemetry_metrics_analysis_agent
   llm_name: ata_agent_llm
-  test_mode: true
-  # The below paths are only used if test_mode is true
-  test_data_path: examples/alert_triage_agent/data/test_data.csv
-  benign_fallback_data_path: examples/alert_triage_agent/data/benign_fallback_test_data.json
-  test_output_path: .tmp/aiq/examples/alert_triage_agent/output/test_output.csv
+  offline_mode: true
+  # The below paths are only used if offline_mode is true
+  offline_data_path: examples/alert_triage_agent/data/offline_data.csv
+  benign_fallback_data_path: examples/alert_triage_agent/data/benign_fallback_offline_data.json
+  offline_output_path: .tmp/aiq/examples/alert_triage_agent/output/offline_output.csv
 
 llms:
   ata_agent_llm:

@@ -33,7 +33,7 @@ class HardwareCheckToolConfig(FunctionBaseConfig, name="hardware_check"):
                  "hardware degradation, and anomalies that could explain alerts. Args: host_id: str"),
         description="Description of the tool for the agent.")
     llm_name: LLMRef
-    test_mode: bool = Field(default=True, description="Whether to run in test mode")
+    offline_mode: bool = Field(default=True, description="Whether to run in offline model")
 
 
 def _get_ipmi_monitor_data(ip_address, username, password):
@@ -74,14 +74,14 @@ async def _arun(host_id: str) -> str:
         utils.log_header("Hardware Status Checker")
 
         try:
-            if not config.test_mode:
+            if not config.offline_mode:
                 ip = "ipmi_ip"  # Replace with your actual IPMI IP address
                 user = "ipmi_user"  # Replace with your actual username
                 pwd = "ipmi_password"  # Replace with your actual password
                 monitoring_data = _get_ipmi_monitor_data(ip, user, pwd)
             else:
-                # In test mode, load test data from CSV file
-                df = utils.get_test_data()
+                # In offline model, load test data from CSV file
+                df = utils.get_offline_data()
 
                 # Get IPMI data from test data, falling back to static data if needed
                 monitoring_data = utils.load_column_or_static(

@@ -32,7 +32,7 @@ class HostPerformanceCheckToolConfig(FunctionBaseConfig, name="host_performance_
                  "and hardware I/O usage details for a given host. Args: host_id: str"),
         description="Description of the tool for the agent.")
     llm_name: LLMRef
-    test_mode: bool = Field(default=True, description="Whether to run in test mode")
+    offline_mode: bool = Field(default=True, description="Whether to run in offline model")
 
 
 async def _run_ansible_playbook_for_host_performance_check(config: HostPerformanceCheckToolConfig,
@@ -113,7 +113,7 @@ async def _arun(host_id: str) -> str:
         utils.log_header("Host Performance Analyzer")
 
         try:
-            if not config.test_mode:
+            if not config.offline_mode:
                 # In production mode, use actual Ansible connection details
                 # Replace placeholder values with connection info from configuration
                 ansible_host = "your.host.example.name"  # Input your target host
@@ -130,8 +130,8 @@ async def _arun(host_id: str) -> str:
                     ansible_port=ansible_port,
                     ansible_private_key_path=ansible_private_key_path)
             else:
-                # In test mode, load performance data from test dataset
-                df = utils.get_test_data()
+                # In offline model, load performance data from test dataset
+                df = utils.get_offline_data()
 
                 # Get CPU metrics from test data, falling back to static data if needed
                 data_top_cpu = utils.load_column_or_static(df=df,

@@ -31,7 +31,7 @@ class MonitoringProcessCheckToolConfig(FunctionBaseConfig, name="monitoring_proc
                                       "on a target host by executing system commands. Args: host_id: str"),
                              description="Description of the tool for the agent.")
     llm_name: LLMRef
-    test_mode: bool = Field(default=True, description="Whether to run in test mode")
+    offline_mode: bool = Field(default=True, description="Whether to run in offline model")
 
 
 async def _run_ansible_playbook_for_monitor_process_check(ansible_host: str,
@@ -72,7 +72,7 @@ async def monitoring_process_check_tool(config: MonitoringProcessCheckToolConfig
 
     async def _arun(host_id: str) -> str:
         try:
-            if not config.test_mode:
+            if not config.offline_mode:
                 # In production mode, use actual Ansible connection details
                 # Replace placeholder values with connection info from configuration
                 ansible_host = "your.host.example.name"  # Input your target host
@@ -87,8 +87,8 @@ async def _arun(host_id: str) -> str:
                     ansible_private_key_path=ansible_private_key_path)
                 output_for_prompt = f"`ps` and `top` result:{output}"
             else:
-                # In test mode, load performance data from test dataset
-                df = utils.get_test_data()
+                # In offline model, load performance data from test dataset
+                df = utils.get_offline_data()
 
                 # Load process status data from ps command output
                 ps_data = utils.load_column_or_static(df=df,

@@ -34,7 +34,7 @@ class NetworkConnectivityCheckToolConfig(FunctionBaseConfig, name="network_conne
                  "Args: host_id: str"),
         description="Description of the tool for the agent.")
     llm_name: LLMRef
-    test_mode: bool = Field(default=True, description="Whether to run in test mode")
+    offline_mode: bool = Field(default=True, description="Whether to run in offline model")
 
 
 def _check_service_banner(host: str, port: int = 80, connect_timeout: float = 10, read_timeout: float = 10) -> str:
@@ -72,7 +72,7 @@ async def _arun(host_id: str) -> str:
         utils.log_header("Network Connectivity Tester")
 
         try:
-            if not config.test_mode:
+            if not config.offline_mode:
                 # NOTE: The ping and telnet commands below are example implementations of network connectivity checking.
                 # Users should implement their own network connectivity check logic specific to their environment
                 # and infrastructure setup.
@@ -91,7 +91,7 @@ async def _arun(host_id: str) -> str:
 
             else:
                 # Load test data
-                df = utils.get_test_data()
+                df = utils.get_offline_data()
 
                 # Get ping data from test data, falling back to static data if needed
                 ping_data = utils.load_column_or_static(df=df,