diff --git a/examples/alert_triage_agent/README.md b/examples/alert_triage_agent/README.md index 82853c527..adb8f204c 100644 --- a/examples/alert_triage_agent/README.md +++ b/examples/alert_triage_agent/README.md @@ -41,7 +41,7 @@ This example demonstrates how to build an intelligent alert triage system using - [Running in a live environment](#running-in-a-live-environment) - [Note on credentials and access](#note-on-credentials-and-access) - [Running live with a HTTP server listening for alerts](#running-live-with-a-http-server-listening-for-alerts) - - [Running in test mode](#running-in-test-mode) + - [Running in offline mode](#running-in-offline-mode) ## Use case description @@ -149,7 +149,7 @@ The triage agent may call one or more of the following tools based on the alert #### Functions -Each entry in the `functions` section defines a tool or sub-agent that can be invoked by the main workflow agent. Tools can operate in test mode, using mocked data for simulation. +Each entry in the `functions` section defines a tool or sub-agent that can be invoked by the main workflow agent. Tools can operate in offline mode, using mocked data for simulation. Example: @@ -157,12 +157,12 @@ Example: hardware_check: _type: hardware_check llm_name: tool_reasoning_llm - test_mode: true + offline_mode: true ``` * `_type`: Identifies the name of the tool (matching the names in the tools' python files.) * `llm_name`: LLM used to support the tool’s reasoning of the raw fetched data. -* `test_mode`: If `true`, the tool uses predefined mock results for offline testing. +* `offline_mode`: If `true`, the tool uses predefined mock results for offline testing. Some entries, like `telemetry_metrics_analysis_agent`, are sub-agents that coordinate multiple tools: @@ -185,19 +185,19 @@ workflow: - hardware_check - ... llm_name: ata_agent_llm - test_mode: true - test_data_path: ... + offline_mode: true + offline_data_path: ... benign_fallback_data_path: ... - test_output_path: ... + offline_output_path: ... ``` * `_type`: The name of the agent (matching the agent's name in `register.py`). * `tool_names`: List of tools (from the `functions` section) used in the triage process. * `llm_name`: Main LLM used by the agent for reasoning, tool-calling, and report generation. -* `test_mode`: Enables test execution using predefined input/output instead of real systems. -* `test_data_path`: CSV file containing test alerts and their corresponding mocked tool responses. +* `offline_mode`: Enables offline execution using predefined input/output instead of real systems. +* `offline_data_path`: CSV file containing offline test alerts and their corresponding mocked tool responses. * `benign_fallback_data_path`: JSON file with baseline healthy system responses for tools not explicitly mocked. -* `test_output_path`: Output CSV file path where the agent writes triage results. Each processed alert adds a new `output` column with the generated report. +* `offline_output_path`: Output CSV file path where the agent writes triage results. Each processed alert adds a new `output` column with the generated report. #### LLMs @@ -240,7 +240,7 @@ export $(grep -v '^#' .env | xargs) ``` ## Example Usage -You can run the agent in [test mode](#running-in-test-mode) or [live mode](#running-live-with-a-http-server-listening-for-alerts). Test mode allows you to evaluate the agent in a controlled, offline environment using synthetic data. Live mode allows you to run the agent in a real environment. +You can run the agent in [offline mode](#running-in-offline-mode) or [live mode](#running-live-with-a-http-server-listening-for-alerts). offline mode allows you to evaluate the agent in a controlled, offline environment using synthetic data. Live mode allows you to run the agent in a real environment. ### Running in a live environment In live mode, each tool used by the triage agent connects to real systems to collect data. These systems can include: @@ -262,11 +262,11 @@ To run the agent live, follow these steps: If your environment includes unique systems or data sources, you can define new tools or modify existing ones. This allows your triage agent to pull in the most relevant data for your alerts and infrastructure. -3. **Disable test mode** +3. **Disable offline mode** - Set `test_mode: false` in the workflow section and for each tool in the functions section of your config file to ensure the agent uses real data instead of synthetic test datasets. + Set `offline_mode: false` in the workflow section and for each tool in the functions section of your config file to ensure the agent uses real data instead of offline datasets. - You can also selectively keep some tools in test mode by leaving their `test_mode: true` for more granular testing. + You can also selectively keep some tools in offline mode by leaving their `offline_mode: true` for more granular testing. 4. **Run the agent with a real alert** @@ -371,31 +371,31 @@ To use this mode, first ensure you have configured your live environment as desc You can monitor the progress of the triage process through these logs and the generated reports. -### Running in test mode -Test mode lets you evaluate the triage agent in a controlled, offline environment using synthetic data. Instead of calling real systems, the agent uses predefined inputs to simulate alerts and tool outputs, ideal for development, debugging, and tuning. +### Running in offline mode +offline mode lets you evaluate the triage agent in a controlled, offline environment using synthetic data. Instead of calling real systems, the agent uses predefined inputs to simulate alerts and tool outputs, ideal for development, debugging, and tuning. -To run in test mode: +To run in offline mode: 1. **Set required environment variables** - Make sure `test_mode: true` is set in both the `workflow` section and individual tool sections of your config file (see [Understanding the config](#understanding-the-config) section). + Make sure `offline_mode: true` is set in both the `workflow` section and individual tool sections of your config file (see [Understanding the config](#understanding-the-config) section). 1. **How it works** -- The **main test CSV** provides both alert details and a mock environment. For each alert, expected tool return values are included. These simulate how the environment would behave if the alert occurred on a real system. -- The **benign fallback dataset** fills in tool responses when the agent calls a tool not explicitly defined in the alert's test data. These fallback responses mimic healthy system behavior and help provide the "background scenery" without obscuring the true root cause. +- The **main CSV offline dataset** provides both alert details and a mock environment. For each alert, expected tool return values are included. These simulate how the environment would behave if the alert occurred on a real system. +- The **benign fallback dataset** fills in tool responses when the agent calls a tool not explicitly defined in the alert's offline data. These fallback responses mimic healthy system behavior and help provide the "background scenery" without obscuring the true root cause. -3. **Run the agent in test mode** +3. **Run the agent in offline mode** Run the agent with: ```bash - aiq run --config_file=examples/alert_triage_agent/configs/config_test_mode.yml --input "test_mode" + aiq run --config_file=examples/alert_triage_agent/configs/config_offline_mode.yml --input "offline_mode" ``` - Note: The `--input` value is ignored in test mode. + Note: The `--input` value is ignored in offline mode. The agent will: - - Load alerts from the test dataset specified in `test_data_path` in the workflow config + - Load alerts from the offline dataset specified in `offline_data_path` in the workflow config - Simulate an investigation using predefined tool results - Iterate through all the alerts in the dataset - - Save reports as a new column in a copy of the test CSV file to the path specified in `test_output_path` in the workflow config + - Save reports as a new column in a copy of the offline CSV file to the path specified in `offline_output_path` in the workflow config 2. **Understanding the output** diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_live_mode.yml b/examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_live_mode.yml index 1acd676ea..e1111758e 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_live_mode.yml +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_live_mode.yml @@ -20,27 +20,27 @@ functions: hardware_check: _type: hardware_check llm_name: tool_reasoning_llm - test_mode: false + offline_mode: false host_performance_check: _type: host_performance_check llm_name: tool_reasoning_llm - test_mode: false + offline_mode: false monitoring_process_check: _type: monitoring_process_check llm_name: tool_reasoning_llm - test_mode: false + offline_mode: false network_connectivity_check: _type: network_connectivity_check llm_name: tool_reasoning_llm - test_mode: false + offline_mode: false telemetry_metrics_host_heartbeat_check: _type: telemetry_metrics_host_heartbeat_check llm_name: tool_reasoning_llm - test_mode: false + offline_mode: false telemetry_metrics_host_performance_check: _type: telemetry_metrics_host_performance_check llm_name: tool_reasoning_llm - test_mode: false + offline_mode: false telemetry_metrics_analysis_agent: _type: telemetry_metrics_analysis_agent tool_names: @@ -64,11 +64,11 @@ workflow: - network_connectivity_check - telemetry_metrics_analysis_agent llm_name: ata_agent_llm - test_mode: false - # The below paths are only used if test_mode is true - test_data_path: null + offline_mode: false + # The below paths are only used if offline_mode is true + offline_data_path: null benign_fallback_data_path: null - test_output_path: null + offline_output_path: null llms: ata_agent_llm: diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_test_mode.yml b/examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_offline_mode.yml similarity index 88% rename from examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_test_mode.yml rename to examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_offline_mode.yml index 6ee8419fd..db236728f 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_test_mode.yml +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/configs/config_offline_mode.yml @@ -20,28 +20,28 @@ functions: hardware_check: _type: hardware_check llm_name: tool_reasoning_llm - test_mode: true + offline_mode: true host_performance_check: _type: host_performance_check llm_name: tool_reasoning_llm - test_mode: true + offline_mode: true monitoring_process_check: _type: monitoring_process_check llm_name: tool_reasoning_llm - test_mode: true + offline_mode: true network_connectivity_check: _type: network_connectivity_check llm_name: tool_reasoning_llm - test_mode: true + offline_mode: true telemetry_metrics_host_heartbeat_check: _type: telemetry_metrics_host_heartbeat_check llm_name: tool_reasoning_llm - test_mode: true + offline_mode: true metrics_url: http://your-monitoring-server:9090 # Replace with your monitoring system URL if running in live mode telemetry_metrics_host_performance_check: _type: telemetry_metrics_host_performance_check llm_name: tool_reasoning_llm - test_mode: true + offline_mode: true metrics_url: http://your-monitoring-server:9090 # Replace with your monitoring system URL if running in live mode telemetry_metrics_analysis_agent: _type: telemetry_metrics_analysis_agent @@ -66,11 +66,11 @@ workflow: - network_connectivity_check - telemetry_metrics_analysis_agent llm_name: ata_agent_llm - test_mode: true - # The below paths are only used if test_mode is true - test_data_path: examples/alert_triage_agent/data/test_data.csv - benign_fallback_data_path: examples/alert_triage_agent/data/benign_fallback_test_data.json - test_output_path: .tmp/aiq/examples/alert_triage_agent/output/test_output.csv + offline_mode: true + # The below paths are only used if offline_mode is true + offline_data_path: examples/alert_triage_agent/data/offline_data.csv + benign_fallback_data_path: examples/alert_triage_agent/data/benign_fallback_offline_data.json + offline_output_path: .tmp/aiq/examples/alert_triage_agent/output/offline_output.csv llms: ata_agent_llm: diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/data/benign_fallback_test_data.json b/examples/alert_triage_agent/src/aiq_alert_triage_agent/data/benign_fallback_offline_data.json similarity index 100% rename from examples/alert_triage_agent/src/aiq_alert_triage_agent/data/benign_fallback_test_data.json rename to examples/alert_triage_agent/src/aiq_alert_triage_agent/data/benign_fallback_offline_data.json diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/data/test_data.csv b/examples/alert_triage_agent/src/aiq_alert_triage_agent/data/offline_data.csv similarity index 100% rename from examples/alert_triage_agent/src/aiq_alert_triage_agent/data/test_data.csv rename to examples/alert_triage_agent/src/aiq_alert_triage_agent/data/offline_data.csv diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/hardware_check_tool.py b/examples/alert_triage_agent/src/aiq_alert_triage_agent/hardware_check_tool.py index 7b9cb539e..df565bce2 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/hardware_check_tool.py +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/hardware_check_tool.py @@ -33,7 +33,7 @@ class HardwareCheckToolConfig(FunctionBaseConfig, name="hardware_check"): "hardware degradation, and anomalies that could explain alerts. Args: host_id: str"), description="Description of the tool for the agent.") llm_name: LLMRef - test_mode: bool = Field(default=True, description="Whether to run in test mode") + offline_mode: bool = Field(default=True, description="Whether to run in offline model") def _get_ipmi_monitor_data(ip_address, username, password): @@ -74,14 +74,14 @@ async def _arun(host_id: str) -> str: utils.log_header("Hardware Status Checker") try: - if not config.test_mode: + if not config.offline_mode: ip = "ipmi_ip" # Replace with your actual IPMI IP address user = "ipmi_user" # Replace with your actual username pwd = "ipmi_password" # Replace with your actual password monitoring_data = _get_ipmi_monitor_data(ip, user, pwd) else: - # In test mode, load test data from CSV file - df = utils.get_test_data() + # In offline model, load test data from CSV file + df = utils.get_offline_data() # Get IPMI data from test data, falling back to static data if needed monitoring_data = utils.load_column_or_static( diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/host_performance_check_tool.py b/examples/alert_triage_agent/src/aiq_alert_triage_agent/host_performance_check_tool.py index 9e2015193..1982b4500 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/host_performance_check_tool.py +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/host_performance_check_tool.py @@ -32,7 +32,7 @@ class HostPerformanceCheckToolConfig(FunctionBaseConfig, name="host_performance_ "and hardware I/O usage details for a given host. Args: host_id: str"), description="Description of the tool for the agent.") llm_name: LLMRef - test_mode: bool = Field(default=True, description="Whether to run in test mode") + offline_mode: bool = Field(default=True, description="Whether to run in offline model") async def _run_ansible_playbook_for_host_performance_check(config: HostPerformanceCheckToolConfig, @@ -113,7 +113,7 @@ async def _arun(host_id: str) -> str: utils.log_header("Host Performance Analyzer") try: - if not config.test_mode: + if not config.offline_mode: # In production mode, use actual Ansible connection details # Replace placeholder values with connection info from configuration ansible_host = "your.host.example.name" # Input your target host @@ -130,8 +130,8 @@ async def _arun(host_id: str) -> str: ansible_port=ansible_port, ansible_private_key_path=ansible_private_key_path) else: - # In test mode, load performance data from test dataset - df = utils.get_test_data() + # In offline model, load performance data from test dataset + df = utils.get_offline_data() # Get CPU metrics from test data, falling back to static data if needed data_top_cpu = utils.load_column_or_static(df=df, diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/monitoring_process_check_tool.py b/examples/alert_triage_agent/src/aiq_alert_triage_agent/monitoring_process_check_tool.py index 542315432..a41d52371 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/monitoring_process_check_tool.py +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/monitoring_process_check_tool.py @@ -31,7 +31,7 @@ class MonitoringProcessCheckToolConfig(FunctionBaseConfig, name="monitoring_proc "on a target host by executing system commands. Args: host_id: str"), description="Description of the tool for the agent.") llm_name: LLMRef - test_mode: bool = Field(default=True, description="Whether to run in test mode") + offline_mode: bool = Field(default=True, description="Whether to run in offline model") async def _run_ansible_playbook_for_monitor_process_check(ansible_host: str, @@ -72,7 +72,7 @@ async def monitoring_process_check_tool(config: MonitoringProcessCheckToolConfig async def _arun(host_id: str) -> str: try: - if not config.test_mode: + if not config.offline_mode: # In production mode, use actual Ansible connection details # Replace placeholder values with connection info from configuration ansible_host = "your.host.example.name" # Input your target host @@ -87,8 +87,8 @@ async def _arun(host_id: str) -> str: ansible_private_key_path=ansible_private_key_path) output_for_prompt = f"`ps` and `top` result:{output}" else: - # In test mode, load performance data from test dataset - df = utils.get_test_data() + # In offline model, load performance data from test dataset + df = utils.get_offline_data() # Load process status data from ps command output ps_data = utils.load_column_or_static(df=df, diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/network_connectivity_check_tool.py b/examples/alert_triage_agent/src/aiq_alert_triage_agent/network_connectivity_check_tool.py index af157e61c..56fa48181 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/network_connectivity_check_tool.py +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/network_connectivity_check_tool.py @@ -34,7 +34,7 @@ class NetworkConnectivityCheckToolConfig(FunctionBaseConfig, name="network_conne "Args: host_id: str"), description="Description of the tool for the agent.") llm_name: LLMRef - test_mode: bool = Field(default=True, description="Whether to run in test mode") + offline_mode: bool = Field(default=True, description="Whether to run in offline model") def _check_service_banner(host: str, port: int = 80, connect_timeout: float = 10, read_timeout: float = 10) -> str: @@ -72,7 +72,7 @@ async def _arun(host_id: str) -> str: utils.log_header("Network Connectivity Tester") try: - if not config.test_mode: + if not config.offline_mode: # NOTE: The ping and telnet commands below are example implementations of network connectivity checking. # Users should implement their own network connectivity check logic specific to their environment # and infrastructure setup. @@ -91,7 +91,7 @@ async def _arun(host_id: str) -> str: else: # Load test data - df = utils.get_test_data() + df = utils.get_offline_data() # Get ping data from test data, falling back to static data if needed ping_data = utils.load_column_or_static(df=df, diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/register.py b/examples/alert_triage_agent/src/aiq_alert_triage_agent/register.py index 4e55e4b0f..7481f3ae2 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/register.py +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/register.py @@ -61,15 +61,15 @@ class AlertTriageAgentWorkflowConfig(FunctionBaseConfig, name="alert_triage_agen """ tool_names: list[str] = [] llm_name: LLMRef - test_mode: bool = Field(default=True, description="Whether to run in test mode") - test_data_path: str | None = Field( - default="examples/alert_triage_agent/data/test_data.csv", - description="Path to the main test dataset in CSV format containing alerts and their simulated environments") + offline_mode: bool = Field(default=True, description="Whether to run in offline model") + offline_data_path: str | None = Field( + default="examples/alert_triage_agent/data/offline_data.csv", + description="Path to the main offline dataset in CSV format containing alerts and their simulated environments") benign_fallback_data_path: str | None = Field( - default="examples/alert_triage_agent/data/benign_fallback_test_data.json", + default="examples/alert_triage_agent/data/benign_fallback_offline_data.json", description="Path to the JSON file with baseline/normal system behavior data") - test_output_path: str | None = Field(default=".tmp/aiq/examples/alert_triage_agent/output/test_output.csv", - description="Path to save the test output CSV file") + offline_output_path: str | None = Field(default=".tmp/aiq/examples/alert_triage_agent/output/offline_output.csv", + description="Path to save the offline output CSV file") @register_function(config_type=AlertTriageAgentWorkflowConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN]) @@ -144,20 +144,20 @@ async def _response_fn(input_message: str) -> str: finally: utils.logger.info("Finished agent execution") - async def _response_test_fn(input_message: str) -> str: - """Test mode response function that processes multiple alerts from a CSV file. + async def _response_offline_fn(input_message: str) -> str: + """offline model response function that processes multiple alerts from a CSV file. Args: - input_message: Not used in test mode, alerts are read from CSV instead + input_message: Not used in offline model, alerts are read from CSV instead Returns: Confirmation message after processing completes """ - if config.test_output_path is None: - raise ValueError("test_output_path must be provided") + if config.offline_output_path is None: + raise ValueError("offline_output_path must be provided") # Load test alerts from CSV file - df = utils.get_test_data() + df = utils.get_offline_data() df["output"] = "" # Initialize output column utils.log_header(f"Processing {len(df)} Alerts") @@ -172,18 +172,18 @@ async def _response_test_fn(input_message: str) -> str: utils.log_header("Saving Results") # Write results to output CSV - os.makedirs(os.path.dirname(config.test_output_path), exist_ok=True) - df.to_csv(config.test_output_path, index=False) + os.makedirs(os.path.dirname(config.offline_output_path), exist_ok=True) + df.to_csv(config.offline_output_path, index=False) utils.log_footer() - return f"Successfully processed {len(df)} alerts. Results saved to {config.test_output_path}" + return f"Successfully processed {len(df)} alerts. Results saved to {config.offline_output_path}" try: - if config.test_mode: - utils.preload_test_data(test_data_path=config.test_data_path, - benign_fallback_data_path=config.benign_fallback_data_path) - utils.log_header("Running in test mode", dash_length=120, level=logging.INFO) - yield _response_test_fn + if config.offline_mode: + utils.preload_offline_data(offline_data_path=config.offline_data_path, + benign_fallback_data_path=config.benign_fallback_data_path) + utils.log_header("Running in offline model", dash_length=120, level=logging.INFO) + yield _response_offline_fn else: yield _response_fn diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/telemetry_metrics_host_heartbeat_check_tool.py b/examples/alert_triage_agent/src/aiq_alert_triage_agent/telemetry_metrics_host_heartbeat_check_tool.py index 94521bcd5..f5b35f9b8 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/telemetry_metrics_host_heartbeat_check_tool.py +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/telemetry_metrics_host_heartbeat_check_tool.py @@ -32,7 +32,7 @@ class TelemetryMetricsHostHeartbeatCheckToolConfig(FunctionBaseConfig, name="tel "This tells us if the host is up and running. Args: host_id: str"), description="Description of the tool for the agent.") llm_name: LLMRef - test_mode: bool = Field(default=True, description="Whether to run in test mode") + offline_mode: bool = Field(default=True, description="Whether to run in offline model") metrics_url: str = Field(default="", description="URL of the monitoring system") @@ -44,7 +44,7 @@ async def _arun(host_id: str) -> str: utils.log_header("Telemetry Metrics Host Heartbeat Check", dash_length=50) try: - if not config.test_mode: + if not config.offline_mode: # Example implementation using a monitoring system's API to check host status monitoring_url = config.metrics_url @@ -61,8 +61,8 @@ async def _arun(host_id: str) -> str: if data is not None: data = data["data"] else: - # In test mode, load test data from CSV file - df = utils.get_test_data() + # In offline model, load test data from CSV file + df = utils.get_offline_data() data = utils.load_column_or_static( df=df, host_id=host_id, column="telemetry_metrics_host_heartbeat_check_tool:heartbeat_check_output") diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/telemetry_metrics_host_performance_check_tool.py b/examples/alert_triage_agent/src/aiq_alert_triage_agent/telemetry_metrics_host_performance_check_tool.py index 391411702..32c49f886 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/telemetry_metrics_host_performance_check_tool.py +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/telemetry_metrics_host_performance_check_tool.py @@ -38,7 +38,7 @@ class TelemetryMetricsHostPerformanceCheckToolConfig(FunctionBaseConfig, "usage timeseries. Args: host_id: str"), description="Description of the tool for the agent.") llm_name: LLMRef - test_mode: bool = Field(default=True, description="Whether to run in test mode") + offline_mode: bool = Field(default=True, description="Whether to run in offline model") metrics_url: str = Field(default="", description="URL of the monitoring system") @@ -121,7 +121,7 @@ async def _arun(host_id: str) -> str: utils.log_header("Telemetry Metrics CPU Usage Pattern Analysis", dash_length=100) try: - if not config.test_mode: + if not config.offline_mode: # Example implementation using a monitoring system's API to check host status monitoring_url = config.metrics_url @@ -144,8 +144,8 @@ async def _arun(host_id: str) -> str: data = response.json() else: - # In test mode, load test data from CSV file - df = utils.get_test_data() + # In offline model, load offline data from CSV file + df = utils.get_offline_data() data_str = utils.load_column_or_static( df=df, host_id=host_id, diff --git a/examples/alert_triage_agent/src/aiq_alert_triage_agent/utils.py b/examples/alert_triage_agent/src/aiq_alert_triage_agent/utils.py index ceeb46536..2aa3b673b 100644 --- a/examples/alert_triage_agent/src/aiq_alert_triage_agent/utils.py +++ b/examples/alert_triage_agent/src/aiq_alert_triage_agent/utils.py @@ -30,8 +30,8 @@ # module‐level variable; loaded on first use _DATA_CACHE: dict[str, pd.DataFrame | dict | None] = { - 'test_data': None, - 'benign_fallback_test_data': None, + 'offline_data': None, + 'benign_fallback_offline_data': None, } # Cache LLMs by name and wrapper type @@ -86,40 +86,40 @@ def log_footer(dash_length: int = 100, level: int = logging.DEBUG): logger.log(level, footer) -def preload_test_data(test_data_path: str | None, benign_fallback_data_path: str | None): +def preload_offline_data(offline_data_path: str | None, benign_fallback_data_path: str | None): """ Preloads test data from CSV and JSON files into module-level cache. Args: - test_data_path (str): Path to the test data CSV file + offline_data_path (str): Path to the test data CSV file benign_fallback_data_path (str): Path to the benign fallback data JSON file """ - if test_data_path is None: - raise ValueError("test_data_path must be provided") + if offline_data_path is None: + raise ValueError("offline_data_path must be provided") if benign_fallback_data_path is None: raise ValueError("benign_fallback_data_path must be provided") - _DATA_CACHE['test_data'] = pd.read_csv(test_data_path) - logger.info(f"Preloaded test data from: {test_data_path}") + _DATA_CACHE['offline_data'] = pd.read_csv(offline_data_path) + logger.info(f"Preloaded test data from: {offline_data_path}") with open(benign_fallback_data_path, "r") as f: - _DATA_CACHE['benign_fallback_test_data'] = json.load(f) + _DATA_CACHE['benign_fallback_offline_data'] = json.load(f) logger.info(f"Preloaded benign fallback data from: {benign_fallback_data_path}") -def get_test_data() -> pd.DataFrame: +def get_offline_data() -> pd.DataFrame: """Returns the preloaded test data.""" - if _DATA_CACHE['test_data'] is None: - raise ValueError("Test data not preloaded. Call `preload_test_data` first.") - return pd.DataFrame(_DATA_CACHE['test_data']) + if _DATA_CACHE['offline_data'] is None: + raise ValueError("Test data not preloaded. Call `preload_offline_data` first.") + return pd.DataFrame(_DATA_CACHE['offline_data']) def _get_static_data(): """Returns the preloaded benign fallback test data.""" - if _DATA_CACHE['benign_fallback_test_data'] is None: - raise ValueError("Benign fallback test data not preloaded. Call `preload_test_data` first.") - return _DATA_CACHE['benign_fallback_test_data'] + if _DATA_CACHE['benign_fallback_offline_data'] is None: + raise ValueError("Benign fallback test data not preloaded. Call `preload_offline_data` first.") + return _DATA_CACHE['benign_fallback_offline_data'] def load_column_or_static(df, host_id, column): diff --git a/examples/alert_triage_agent/tests/test_alert_triage_agent_workflow.py b/examples/alert_triage_agent/tests/test_alert_triage_agent_workflow.py index ec5212b58..c08275856 100644 --- a/examples/alert_triage_agent/tests/test_alert_triage_agent_workflow.py +++ b/examples/alert_triage_agent/tests/test_alert_triage_agent_workflow.py @@ -34,14 +34,15 @@ async def test_full_workflow(): package_name = inspect.getmodule(AlertTriageAgentWorkflowConfig).__package__ - config_file: Path = importlib.resources.files(package_name).joinpath("configs", "config_test_mode.yml").absolute() + config_file: Path = importlib.resources.files(package_name).joinpath("configs", + "config_offline_mode.yml").absolute() with open(config_file, "r") as file: config = yaml.safe_load(file) - output_filepath = config["workflow"]["test_output_path"] + output_filepath = config["workflow"]["offline_output_path"] output_filepath_abs = importlib.resources.files(package_name).joinpath("../../../../", output_filepath).absolute() - input_message = "run in test mode" + input_message = "run in offline model" async with load_workflow(config_file) as workflow: async with workflow.run(input_message) as runner: diff --git a/examples/alert_triage_agent/tests/test_maintenance_check.py b/examples/alert_triage_agent/tests/test_maintenance_check.py index 04880ce9e..977a8f485 100644 --- a/examples/alert_triage_agent/tests/test_maintenance_check.py +++ b/examples/alert_triage_agent/tests/test_maintenance_check.py @@ -42,7 +42,8 @@ def test_load_maintenance_data(): # Load paths from config like in test_utils.py package_name = inspect.getmodule(AlertTriageAgentWorkflowConfig).__package__ - config_file: Path = importlib.resources.files(package_name).joinpath("configs", "config_test_mode.yml").absolute() + config_file: Path = importlib.resources.files(package_name).joinpath("configs", + "config_offline_mode.yml").absolute() with open(config_file, "r") as file: config = yaml.safe_load(file) maintenance_data_path = config["functions"]["maintenance_check"]["static_data_path"] diff --git a/examples/alert_triage_agent/tests/test_telemetry_metrics_host_heartbeat_check_tool.py b/examples/alert_triage_agent/tests/test_telemetry_metrics_host_heartbeat_check_tool.py index 2bcb2c9bf..cb4d8bb18 100644 --- a/examples/alert_triage_agent/tests/test_telemetry_metrics_host_heartbeat_check_tool.py +++ b/examples/alert_triage_agent/tests/test_telemetry_metrics_host_heartbeat_check_tool.py @@ -68,7 +68,7 @@ async def test_telemetry_metrics_host_heartbeat_check_tool(): # Configure the tool config = TelemetryMetricsHostHeartbeatCheckToolConfig( llm_name=LLMRef(value="dummy"), - test_mode=False, # Important: testing in live mode + offline_mode=False, # Important: testing in live mode metrics_url="http://test-monitoring-system:9090") # Set up mock builder and LLM diff --git a/examples/alert_triage_agent/tests/test_telemetry_metrics_host_performance_check_tool.py b/examples/alert_triage_agent/tests/test_telemetry_metrics_host_performance_check_tool.py index 05cf4b14f..1fc52a438 100644 --- a/examples/alert_triage_agent/tests/test_telemetry_metrics_host_performance_check_tool.py +++ b/examples/alert_triage_agent/tests/test_telemetry_metrics_host_performance_check_tool.py @@ -82,7 +82,7 @@ async def test_telemetry_metrics_host_performance_check_tool(): # Configure the tool config = TelemetryMetricsHostPerformanceCheckToolConfig( llm_name=LLMRef(value="dummy"), - test_mode=False, # Testing in live mode + offline_mode=False, # Testing in live mode metrics_url="http://test-monitoring-system:9090") # Set up mock builder and LLM diff --git a/examples/alert_triage_agent/tests/test_utils.py b/examples/alert_triage_agent/tests/test_utils.py index 7851f2744..a994c70fd 100644 --- a/examples/alert_triage_agent/tests/test_utils.py +++ b/examples/alert_triage_agent/tests/test_utils.py @@ -29,7 +29,7 @@ from aiq_alert_triage_agent.utils import _LLM_CACHE from aiq_alert_triage_agent.utils import _get_llm from aiq_alert_triage_agent.utils import load_column_or_static -from aiq_alert_triage_agent.utils import preload_test_data +from aiq_alert_triage_agent.utils import preload_offline_data from aiq_alert_triage_agent.utils import run_ansible_playbook from aiq.builder.framework_enum import LLMFrameworkEnum @@ -87,53 +87,55 @@ async def test_get_llm(): assert _LLM_CACHE[(llm_name_2, wrapper_type)] is llms[(llm_name_2, wrapper_type)] -def test_preload_test_data(): +def test_preload_offline_data(): # Clear the data cache before test _DATA_CACHE.clear() - _DATA_CACHE.update({'test_data': None, 'benign_fallback_test_data': None}) + _DATA_CACHE.update({'offline_data': None, 'benign_fallback_offline_data': None}) # Load paths from config package_name = inspect.getmodule(AlertTriageAgentWorkflowConfig).__package__ - config_file: Path = importlib.resources.files(package_name).joinpath("configs", "config_test_mode.yml").absolute() + config_file: Path = importlib.resources.files(package_name).joinpath("configs", + "config_offline_mode.yml").absolute() with open(config_file, "r") as file: config = yaml.safe_load(file) - test_data_path = config["workflow"]["test_data_path"] + offline_data_path = config["workflow"]["offline_data_path"] benign_fallback_data_path = config["workflow"]["benign_fallback_data_path"] - test_data_path_abs = importlib.resources.files(package_name).joinpath("../../../../", test_data_path).absolute() + offline_data_path_abs = importlib.resources.files(package_name).joinpath("../../../../", + offline_data_path).absolute() benign_fallback_data_path_abs = importlib.resources.files(package_name).joinpath( "../../../../", benign_fallback_data_path).absolute() # Test successful loading with actual test files - preload_test_data(test_data_path_abs, benign_fallback_data_path_abs) + preload_offline_data(offline_data_path_abs, benign_fallback_data_path_abs) # Verify data was loaded correctly assert len(_DATA_CACHE) == 2 - assert isinstance(_DATA_CACHE['test_data'], pd.DataFrame) - assert isinstance(_DATA_CACHE['benign_fallback_test_data'], dict) - assert not _DATA_CACHE['test_data'].empty - assert len(_DATA_CACHE['benign_fallback_test_data']) > 0 + assert isinstance(_DATA_CACHE['offline_data'], pd.DataFrame) + assert isinstance(_DATA_CACHE['benign_fallback_offline_data'], dict) + assert not _DATA_CACHE['offline_data'].empty + assert len(_DATA_CACHE['benign_fallback_offline_data']) > 0 # Test error cases - with pytest.raises(ValueError, match="test_data_path must be provided"): - preload_test_data(None, benign_fallback_data_path) + with pytest.raises(ValueError, match="offline_data_path must be provided"): + preload_offline_data(None, benign_fallback_data_path) with pytest.raises(ValueError, match="benign_fallback_data_path must be provided"): - preload_test_data(test_data_path, None) + preload_offline_data(offline_data_path, None) # Test with non-existent files with pytest.raises(FileNotFoundError): - preload_test_data("nonexistent.csv", benign_fallback_data_path) + preload_offline_data("nonexistent.csv", benign_fallback_data_path) with pytest.raises(FileNotFoundError): - preload_test_data(test_data_path, "nonexistent.json") + preload_offline_data(offline_data_path, "nonexistent.json") def test_load_column_or_static(): # Clear and initialize the data cache with test data _DATA_CACHE.clear() _DATA_CACHE.update({ - 'test_data': None, - 'benign_fallback_test_data': { + 'offline_data': None, + 'benign_fallback_offline_data': { 'static_column': 'static_value', 'another_static': 'another_value' } }) @@ -169,8 +171,8 @@ def test_load_column_or_static(): load_column_or_static(df_duplicate, 'host1', 'string_column') # Test error when benign fallback data not preloaded - _DATA_CACHE['benign_fallback_test_data'] = None - with pytest.raises(ValueError, match="Benign fallback test data not preloaded. Call `preload_test_data` first."): + _DATA_CACHE['benign_fallback_offline_data'] = None + with pytest.raises(ValueError, match="Benign fallback test data not preloaded. Call `preload_offline_data` first."): load_column_or_static(df, 'host1', 'static_column')