[Security Solution] Migrate Threat Hunting Agent to modular Agent Builder skills#255697
Conversation
…lder skills Decomposes the monolithic security.agent (Threat Hunting Agent) into three focused skills for the Elastic AI Agent, improving modularity and enabling on-demand capability loading: - **threat-hunting**: Hypothesis-driven hunting with ES|QL, IOC search, and statistical anomaly detection across security indices - **alert-analysis**: Alert triage, entity correlation, threat intelligence enrichment, and disposition workflow - **detection-engineering**: Rule authoring (KQL/EQL/threshold), coverage gap analysis, and exception management Key changes: - Create three SkillDefinition registrations with curated tool sets - Add `security.manage_rule_exceptions` registry tool for programmatic exception list management - Add inline `get-related-alerts` tool using space-scoped detection engine alert search for entity-based alert correlation - Enrich threat hunting agent instructions to skill-quality content - Add comprehensive unit tests for skills and tools - Add eval suite for skill activation and tool selection validation Closes elastic/security-team#15697
|
/ci |
|
/ci |
1 similar comment
|
/ci |
There was a problem hiding this comment.
| if ((await uiSettings.get(AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID)) === true) { | |
| return { | |
| status: 'unavailable', | |
| reason: | |
| 'Skills are enabled, which takes precedence over threat hunting agent availability', | |
| }; | |
| } | |
| return getAgentBuilderResourceAvailability({ core, request, logger }); |
If skills are enabled, the threat hunting agent should not be available.
There was a problem hiding this comment.
We also invoke attachments with the threat hunting agent, and should add a condition to use_agent_builder_attachment.ts that if skills are enabled, do not pass an agentId argument
# Conflicts: # x-pack/platform/packages/shared/agent-builder/agent-builder-server/allow_lists.ts # x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/register_skills.ts # x-pack/solutions/security/plugins/security_solution/server/agent_builder/tools/register_tools.ts # x-pack/solutions/security/plugins/security_solution/server/plugin.ts
- Make threat hunting agent unavailable when skills are enabled via AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID check - Conditionally pass agentId in use_agent_builder_attachment only when skills are disabled - Fix Zod v3→v4 migration for manage_rule_exceptions_tool schema - Fix StaticToolRegistration return type and ExceptionRulesClient typing - Add 3 new test cases: lists-unavailable, createExceptionListItem error, rulesClient.update payload verification - Align threat hunting agent description with actual capabilities - Add entity_risk_score dedicated eval example, fix grammar in eval spec
…d inline tool tests - Sanitize alertIds against injection patterns before interpolating into ES|QL queries in attack_discovery_search_tool (reject IDs with quotes, semicolons, spaces, etc.) - Validate spaceId format before use in ES|QL index patterns (defense-in-depth) - Add comprehensive handler-level tests for alert_analysis get-related-alerts inline tool covering entity extraction, multi-valued fields, space scoping, and error paths - Add KQL injection guard for ruleId in manage_rule_exceptions_tool - Move lists plugin availability check early in manage_rule_exceptions_tool handler - Add transactional gap protection: log orphaned exception lists on rule update failure - Improve threat_hunting_agent description and soften exception/tuning language - Clarify skill content: reference templates as embedded content, not filesystem paths - Ensure consistent await on skill registration calls - Add warn-once feedback in useAgentBuilderAttachment when agentBuilder unavailable
|
/ci |
Await getInlineTools() since it returns MaybePromise<SkillBoundedTool[]>, which cannot be indexed directly without resolving the promise first.
|
/ci |
|
Both items from @stephmilovic's review addressed:
Additionally, the sample |
…eferences - Add ExpectedSkillInvocation evaluator to dynamically validate expectedSkill and shouldNotActivateSkill metadata from eval examples against traces - Restore platformCoreTools.cases to threat-hunting skill (lost during agent-to-skill migration) so hunts can escalate into cases - Add cross-skill reference to entity-analytics in alert-analysis skill content for deeper entity profiling and asset criticality review
|
/ci |
|
/ci |
- Fix threat-hunting tool count assertion (5 -> 6) after adding cases tool - Add assertion that platformCoreTools.cases is in getRegistryTools() - Add content assertions for cross-skill references: threat-hunting -> alert-analysis, detection-engineering alert-analysis -> entity-analytics, detection-engineering detection-engineering -> threat-hunting, alert-analysis
|
/ci |
The detection-engineering skill is net-new functionality, not a decomposition of the existing threat hunting agent. Moving it to a dedicated follow-up PR keeps this PR focused on the actual agent decomposition (threat-hunting + alert-analysis). Changes: - Remove detection_engineering/ skill directory - Remove registration and barrel export - Remove detection-engineering tests and eval examples - Remove cross-skill references to detection-engineering from threat-hunting and alert-analysis content - Update distractor eval examples
|
/ci |
The manage_rule_exceptions tool was only used by the detection-engineering skill, which was extracted to a separate follow-up PR. Removing the orphaned tool and reverting the registerTools signature change (setupPlugins param) keeps this PR focused on the threat-hunting agent decomposition. Changes: - Delete manage_rule_exceptions_tool.ts and its test (738 lines) - Remove from tools/index.ts barrel export - Revert registerTools to 4-param signature (remove setupPlugins) - Revert plugin.ts call site to not pass lists plugin
|
/ci |
Missed this reference when extracting the manage_rule_exceptions tool to the detection-engineering follow-up PR.
|
/ci |
Without this entry, skill registration throws at server startup because isAllowedBuiltinSkill rejects unrecognized skill IDs.
|
/ci |
Extract useSecurityAgentId hook to centralize the skills-based agent selection logic. When skills are enabled, Security now falls back to the default elastic-ai-agent instead of the unavailable security.agent, preventing the "Agent has been deleted" error in the UI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts: # x-pack/solutions/security/plugins/security_solution/public/app/app.tsx # x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/register_skills.ts
Only keep the availability check that makes the agent unavailable when skills are enabled. The agent description and instructions should not change in this PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jonwalstedt
left a comment
There was a problem hiding this comment.
Nice work Patryk! 🚢
Skill vs. Agent: Comparison Summary
Setup: Two agent configurations tested on the same prompts in the same Elastic Security environment. The "skill" version pre-loads SKILL.md files with ES|QL templates from a filestore before acting. The "agent" (threat hunting agent) uses Elastic Security tools directly without pre-loading.
Results by Task
| Task | Winner | Key differentiator |
|---|---|---|
| Lateral movement hunt (7d) | Skill | Found 4-user PowerShell anomaly via aggregation query; agent never ran it |
| C2 beaconing | Skill | Pivoted to process-frequency analysis when network logs absent; surfaced 5 IPs, counts, paths. Agent stayed surface-level |
| Brute force login | Tie | Both hit dead end (no auth indices). Skill got there in 1 wildcard query vs. 3 targeted searches; gave credential-spraying query too |
| Rare process execution | Ambiguous | Different query filters → different results. Agent found 3 processes cheaply; skill found 1 rare + full landscape including mimikatz (11×) |
| Summarize malware alert | Split | Skill: 20-alert scope table, 5 C2 IPs, MITRE T1003, explicit "True Positive". Agent: found "Detect Malware Only" policy gap (critical config finding skill missed) |
| Write certutil ES|QL rule | Agent slight edge | Agent covered both -// flag prefixes, ftp://, MITRE+schedule guidance. Skill had better index targeting |
| Find lateral movement from host | Tie | Same conclusion, same IPs. Skill broader investigation + data gap disclosure; agent tighter table output |
| How many critical alerts? | Agent (marginal) | Skill loaded full SKILL.md unnecessarily; agent answered in one sentence |
| Complex flow (Alert→VT→On-call→Slack) | Skill | Used get-related-alerts for 20 structured alerts + Security Labs enrichment. Agent did flat summary only |
Overall Pattern
Skill wins on depth (5-6/9 tasks), especially when:
- Data is absent and investigation requires pivoting (C2 beaconing is the clearest example)
- Alert triage requires scope awareness across related alerts
- Template queries surface patterns the agent would never formulate independently
Agent wins on efficiency — consistently 40–60% fewer tokens and faster to first token. Also occasionally surfaces unique signals: the "Detect Malware Only" policy gap, ML job recommendations, and risk engine status check are things the skill never exercised.
Core Tradeoff
| Dimension | Skill | Agent |
|---|---|---|
| Avg. token cost | Higher (sometimes 4×) | Lower |
| Template queries | Yes — ready-to-run patterns, catches edge cases (credential spraying, proxy beaconing) | No |
| Alert correlation | get-related-alerts → full scope |
Flat alert list only |
| Elastic ecosystem tools | Never uses ML jobs, risk engine, entity analytics | Uses them (even when not enabled) |
| Error recovery | Investigates failures, adapts queries | Moves on faster |
| Simple factual queries | Wastes tokens loading SKILL.md | Answers directly |
| Security Labs / product docs | Available in this env → valuable | Not installed → dead end |
Bottom Line
The skill approach is the better threat hunter — it finds more signals and handles data gaps more gracefully. The agent is cheaper and occasionally finds high-value operational findings (config gaps, ML job recommendations) that the skill ignores entirely. For the tasks in this eval, they are complementary rather than substitutes, which is exactly the architecture the PR implements: decompose by specialization and let the router decide which skill activates.
| .array(z.string()) | ||
| .min(1) |
There was a problem hiding this comment.
This change result.success flips from true to false for empty arrays. This is correct behavior, but any existing callers passing [] will now get a schema error instead of an empty result. Might be worth checking downstream how callers handle this
…ty fields - Reduce attack discovery search LIMIT from 100 to 10 to prevent overwhelming LLM context with large payloads - Add optional entity fields (hostNames, userNames, sourceIps, destIps) to get-related-alerts tool so callers can skip the GET round-trip when entities are already available from a previous tool call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use Array.isArray() instead of truthiness checks and nullish
coalescing for optional array fields, since @kbn/zod/v4's
.optional() infers `T | {}` rather than `T | undefined`.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ci |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Async chunks
History
|
denar50
left a comment
There was a problem hiding this comment.
Code review only. LGTM for files owned by the Detection Engine.
jaredburgettelastic
left a comment
There was a problem hiding this comment.
Entity Analytics changes LGTM 👍
Thank you!
|
/ci |
|
Caution Review failedAn error occurred during the review process. Please try again later. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Summary
Decomposes the monolithic
security.agent(Threat Hunting Agent) into two focused, modular skills for the Elastic AI Agent. This improves discoverability, reduces token overhead via on-demand loading, and enables independent evolution of each security capability.New Skills
threat-huntingalert-analysisKey Changes
SkillDefinitionregistrations with curated tool sets and rich skill contentget-related-alertstool using space-scoped detection engine alert search for entity-based alert correlation (withtermsqueries for multi-valued ECS fields)platformCoreTools.casesto threat-hunting skill so confirmed findings can be escalated into investigation casesExpectedSkillInvocationevaluator for dynamic skill activation validationSecurity Hardening
alertIdsinattack_discovery_search_toolare validated againstSAFE_ID_PATTERN(/^[a-zA-Z0-9_\-.:]+$/) before interpolation into ES|QL queries. Invalid IDs are silently filtered; all-invalid returns an error result.spaceIdvalidation: Validated before interpolation into ES|QL index patterns (defense-in-depth).Architecture
getRegistryTools()to reference shared platform/security toolsToolHandlerContext(esClient, spaceId, request) — no plugin service dependenciesagentIdwhen skills are enabled (agent availability gated onAGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID)Eval Suite
The eval suite includes a metadata-driven
ExpectedSkillInvocationevaluator that dynamically validates skill activation based onexpectedSkillandshouldNotActivateSkillmetadata on each example. It queries OTEL traces forfilestore.readtool calls matching the skill'sSKILL.mdpath, producing pass/fail scores per example. This covers:threat-huntingskill activationalert-analysisskill activationRun locally:
node scripts/evals start --suite agent-builderTrigger in CI: add
evals:agent-builderlabel to PRTest Coverage
attack_discovery_searchtool testsalert_analysisinline tool testsuse_agent_builder_attachmenthook testsTest Plan
Automated (CI)
attack_discovery_searchtool — injection guards, spaceId validation (12 tests)alert_analysisinline tool handler — entity extraction, correlation query, error paths (11 tests)use_agent_builder_attachmenthook — conditionalagentIdwhen skills enabled (7 tests)ExpectedSkillInvocationevaluator validatesexpectedSkill/shouldNotActivateSkillmetadata against OTEL tracessecurity_solutionandkbn-evals-suite-agent-builderprojectsnode scripts/check_changes.tspassesManual verification
AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_IDis enabledthreat-huntingskill (e.g. "Hunt for lateral movement in my environment")alert-analysisskill (e.g. "Help me triage this critical LSASS alert")threat-huntingskill can create cases viaplatform.core.caseswhen escalating confirmed findingsalert-analysisskill'sget-related-alertsinline tool returns related alerts using space-scoped.alerts-security.alerts-<spaceId>indexalert-analysisskill references entity-analytics for deeper entity profiling (visible in skill content)agentIdwhen skills are enabled (attachment hook respects feature flag)Follow-up PRs
manage_rule_exceptionstool — net-new skill for rule authoring, coverage gaps, and exception management (tracked separately, not part of this decomposition)Closes https://github.com/elastic/security-team/issues/15697