[Security Solution] Migrate Threat Hunting Agent to modular Agent Builder skills by patrykkopycinski · Pull Request #255697 · elastic/kibana

patrykkopycinski · 2026-03-03T09:11:57Z

Summary

Decomposes the monolithic security.agent (Threat Hunting Agent) into two focused, modular skills for the Elastic AI Agent. This improves discoverability, reduces token overhead via on-demand loading, and enables independent evolution of each security capability.

New Skills

Skill	Purpose	Registry Tools	Inline Tools
`threat-hunting`	Hypothesis-driven hunting, IOC search, anomaly detection, case creation	6 (generateEsql, executeEsql, search, listIndices, getIndexMapping, cases)	—
`alert-analysis`	Alert triage, entity correlation, disposition workflow	3 (security.alerts, security_labs_search, entity_risk_score)	1 (get-related-alerts)

Key Changes

Created 2 SkillDefinition registrations with curated tool sets and rich skill content
Added inline get-related-alerts tool using space-scoped detection engine alert search for entity-based alert correlation (with terms queries for multi-valued ECS fields)
Enriched threat hunting agent instructions to skill-quality content
Added platformCoreTools.cases to threat-hunting skill so confirmed findings can be escalated into investigation cases
Added cross-skill reference: alert-analysis references entity-analytics for deeper entity profiling and asset criticality
Added comprehensive unit tests for all skills, tools, and the inline handler
Added eval suite with ExpectedSkillInvocation evaluator for dynamic skill activation validation

Security Hardening

ES|QL injection prevention: alertIds in attack_discovery_search_tool are validated against SAFE_ID_PATTERN (/^[a-zA-Z0-9_\-.:]+$/) before interpolation into ES|QL queries. Invalid IDs are silently filtered; all-invalid returns an error result.
spaceId validation: Validated before interpolation into ES|QL index patterns (defense-in-depth).

Architecture

Skills use getRegistryTools() to reference shared platform/security tools
Inline tools only access ToolHandlerContext (esClient, spaceId, request) — no plugin service dependencies
Threat hunting agent conditionally hides agentId when skills are enabled (agent availability gated on AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID)
Agent description and instructions softened for exception/tuning language to match actual tool capabilities

Eval Suite

The eval suite includes a metadata-driven ExpectedSkillInvocation evaluator that dynamically validates skill activation based on expectedSkill and shouldNotActivateSkill metadata on each example. It queries OTEL traces for filestore.read tool calls matching the skill's SKILL.md path, producing pass/fail scores per example. This covers:

Threat hunting queries → threat-hunting skill activation
Alert investigation queries → alert-analysis skill activation
Negative cases ensuring non-security queries do not activate security skills

Run locally: node scripts/evals start --suite agent-builder
Trigger in CI: add evals:agent-builder label to PR

Test Coverage

Category	Count	Details
Skill definition tests	20	Metadata, tool counts, cases tool presence, content assertions, cross-skill references, referenced content
`attack_discovery_search` tool tests	12	Schema, ES
`alert_analysis` inline tool tests	11	Entity extraction (single/multi-type/multi-valued), space-scoped index, error paths
`use_agent_builder_attachment` hook tests	7	Conditional agentId, skills-enabled behavior, attachment callbacks
Eval suite examples	12	Skill activation for threat hunting, alert analysis queries; cross-skill routing; distractor queries
Total	62	Across 5 test files

Test Plan

Automated (CI)

Manual verification

Skills appear in Agent Builder skill listing when AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID is enabled
Threat hunting queries activate the threat-hunting skill (e.g. "Hunt for lateral movement in my environment")
Alert investigation queries activate the alert-analysis skill (e.g. "Help me triage this critical LSASS alert")
threat-hunting skill can create cases via platform.core.cases when escalating confirmed findings
alert-analysis skill's get-related-alerts inline tool returns related alerts using space-scoped .alerts-security.alerts-<spaceId> index
alert-analysis skill references entity-analytics for deeper entity profiling (visible in skill content)
Threat hunting agent hides agentId when skills are enabled (attachment hook respects feature flag)
Non-security queries (e.g. "Show me Kibana dashboards") do not activate security skills

Follow-up PRs

Detection Engineering skill + manage_rule_exceptions tool — net-new skill for rule authoring, coverage gaps, and exception management (tracked separately, not part of this decomposition)
Scout API integration tests — [Security Solution] Add Scout API integration tests for Agent Builder security skills #259755

Closes https://github.com/elastic/security-team/issues/15697

…lder skills Decomposes the monolithic security.agent (Threat Hunting Agent) into three focused skills for the Elastic AI Agent, improving modularity and enabling on-demand capability loading: - **threat-hunting**: Hypothesis-driven hunting with ES|QL, IOC search, and statistical anomaly detection across security indices - **alert-analysis**: Alert triage, entity correlation, threat intelligence enrichment, and disposition workflow - **detection-engineering**: Rule authoring (KQL/EQL/threshold), coverage gap analysis, and exception management Key changes: - Create three SkillDefinition registrations with curated tool sets - Add `security.manage_rule_exceptions` registry tool for programmatic exception list management - Add inline `get-related-alerts` tool using space-scoped detection engine alert search for entity-based alert correlation - Enrich threat hunting agent instructions to skill-quality content - Add comprehensive unit tests for skills and tools - Add eval suite for skill activation and tool selection validation Closes elastic/security-team#15697

patrykkopycinski · 2026-03-03T09:14:12Z

/ci

patrykkopycinski · 2026-03-03T10:32:11Z

/ci

patrykkopycinski · 2026-03-03T11:17:46Z

/ci

stephmilovic · 2026-03-05T20:38:39Z

Suggested change

if ((await uiSettings.get(AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID)) === true) {

return {

status: 'unavailable',

reason:

'Skills are enabled, which takes precedence over threat hunting agent availability',

};

}

return getAgentBuilderResourceAvailability({ core, request, logger });

If skills are enabled, the threat hunting agent should not be available.

We also invoke attachments with the threat hunting agent, and should add a condition to use_agent_builder_attachment.ts that if skills are enabled, do not pass an agentId argument

# Conflicts: # x-pack/platform/packages/shared/agent-builder/agent-builder-server/allow_lists.ts # x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/register_skills.ts # x-pack/solutions/security/plugins/security_solution/server/agent_builder/tools/register_tools.ts # x-pack/solutions/security/plugins/security_solution/server/plugin.ts

- Make threat hunting agent unavailable when skills are enabled via AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID check - Conditionally pass agentId in use_agent_builder_attachment only when skills are disabled - Fix Zod v3→v4 migration for manage_rule_exceptions_tool schema - Fix StaticToolRegistration return type and ExceptionRulesClient typing - Add 3 new test cases: lists-unavailable, createExceptionListItem error, rulesClient.update payload verification - Align threat hunting agent description with actual capabilities - Add entity_risk_score dedicated eval example, fix grammar in eval spec

…d inline tool tests - Sanitize alertIds against injection patterns before interpolating into ES|QL queries in attack_discovery_search_tool (reject IDs with quotes, semicolons, spaces, etc.) - Validate spaceId format before use in ES|QL index patterns (defense-in-depth) - Add comprehensive handler-level tests for alert_analysis get-related-alerts inline tool covering entity extraction, multi-valued fields, space scoping, and error paths - Add KQL injection guard for ruleId in manage_rule_exceptions_tool - Move lists plugin availability check early in manage_rule_exceptions_tool handler - Add transactional gap protection: log orphaned exception lists on rule update failure - Improve threat_hunting_agent description and soften exception/tuning language - Clarify skill content: reference templates as embedded content, not filesystem paths - Ensure consistent await on skill registration calls - Add warn-once feedback in useAgentBuilderAttachment when agentBuilder unavailable

patrykkopycinski · 2026-03-25T12:19:19Z

/ci

Await getInlineTools() since it returns MaybePromise<SkillBoundedTool[]>, which cannot be indexed directly without resolving the promise first.

patrykkopycinski · 2026-03-25T12:28:27Z

/ci

patrykkopycinski · 2026-03-25T20:34:07Z

Both items from @stephmilovic's review addressed:

Agent availability gating: When AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID is true, the agent returns { status: 'unavailable' } — see threat_hunting_agent.ts line 93.
Conditional agentId in attachments: use_agent_builder_attachment.ts reads the setting and conditionally omits agentId: ...(skillsEnabled ? {} : { agentId: THREAT_HUNTING_AGENT_ID }) — see line 79.

Additionally, the sample alertAnalysisSampleSkill (skills/alert_analysis_skill.ts) has been deleted since the registered alertAnalysisSkill (skills/alert_analysis/) supersedes it.

…eferences - Add ExpectedSkillInvocation evaluator to dynamically validate expectedSkill and shouldNotActivateSkill metadata from eval examples against traces - Restore platformCoreTools.cases to threat-hunting skill (lost during agent-to-skill migration) so hunts can escalate into cases - Add cross-skill reference to entity-analytics in alert-analysis skill content for deeper entity profiling and asset criticality review

patrykkopycinski · 2026-03-26T09:21:47Z

/ci

patrykkopycinski · 2026-03-26T09:23:30Z

/ci

- Fix threat-hunting tool count assertion (5 -> 6) after adding cases tool - Add assertion that platformCoreTools.cases is in getRegistryTools() - Add content assertions for cross-skill references: threat-hunting -> alert-analysis, detection-engineering alert-analysis -> entity-analytics, detection-engineering detection-engineering -> threat-hunting, alert-analysis

patrykkopycinski · 2026-03-26T11:31:09Z

/ci

The detection-engineering skill is net-new functionality, not a decomposition of the existing threat hunting agent. Moving it to a dedicated follow-up PR keeps this PR focused on the actual agent decomposition (threat-hunting + alert-analysis). Changes: - Remove detection_engineering/ skill directory - Remove registration and barrel export - Remove detection-engineering tests and eval examples - Remove cross-skill references to detection-engineering from threat-hunting and alert-analysis content - Update distractor eval examples

patrykkopycinski · 2026-03-26T21:19:06Z

/ci

The manage_rule_exceptions tool was only used by the detection-engineering skill, which was extracted to a separate follow-up PR. Removing the orphaned tool and reverting the registerTools signature change (setupPlugins param) keeps this PR focused on the threat-hunting agent decomposition. Changes: - Delete manage_rule_exceptions_tool.ts and its test (738 lines) - Remove from tools/index.ts barrel export - Revert registerTools to 4-param signature (remove setupPlugins) - Revert plugin.ts call site to not pass lists plugin

patrykkopycinski · 2026-03-26T23:48:33Z

/ci

Missed this reference when extracting the manage_rule_exceptions tool to the detection-engineering follow-up PR.

patrykkopycinski · 2026-03-26T23:50:11Z

/ci

Without this entry, skill registration throws at server startup because isAllowedBuiltinSkill rejects unrecognized skill IDs.

patrykkopycinski · 2026-03-27T05:34:24Z

/ci

Extract useSecurityAgentId hook to centralize the skills-based agent selection logic. When skills are enabled, Security now falls back to the default elastic-ai-agent instead of the unavailable security.agent, preventing the "Agent has been deleted" error in the UI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Conflicts: # x-pack/solutions/security/plugins/security_solution/public/app/app.tsx # x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/register_skills.ts

Only keep the availability check that makes the agent unavailable when skills are enabled. The agent description and instructions should not change in this PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jonwalstedt

Nice work Patryk! 🚢

Skill vs. Agent: Comparison Summary

Setup: Two agent configurations tested on the same prompts in the same Elastic Security environment. The "skill" version pre-loads SKILL.md files with ES|QL templates from a filestore before acting. The "agent" (threat hunting agent) uses Elastic Security tools directly without pre-loading.

Results by Task

Task	Winner	Key differentiator
Lateral movement hunt (7d)	Skill	Found 4-user PowerShell anomaly via aggregation query; agent never ran it
C2 beaconing	Skill	Pivoted to process-frequency analysis when network logs absent; surfaced 5 IPs, counts, paths. Agent stayed surface-level
Brute force login	Tie	Both hit dead end (no auth indices). Skill got there in 1 wildcard query vs. 3 targeted searches; gave credential-spraying query too
Rare process execution	Ambiguous	Different query filters → different results. Agent found 3 processes cheaply; skill found 1 rare + full landscape including mimikatz (11×)
Summarize malware alert	Split	Skill: 20-alert scope table, 5 C2 IPs, MITRE T1003, explicit "True Positive". Agent: found "Detect Malware Only" policy gap (critical config finding skill missed)
Write certutil ES\|QL rule	Agent slight edge	Agent covered both `-`/`/` flag prefixes, ftp://, MITRE+schedule guidance. Skill had better index targeting
Find lateral movement from host	Tie	Same conclusion, same IPs. Skill broader investigation + data gap disclosure; agent tighter table output
How many critical alerts?	Agent (marginal)	Skill loaded full SKILL.md unnecessarily; agent answered in one sentence
Complex flow (Alert→VT→On-call→Slack)	Skill	Used `get-related-alerts` for 20 structured alerts + Security Labs enrichment. Agent did flat summary only

Overall Pattern

Skill wins on depth (5-6/9 tasks), especially when:

Data is absent and investigation requires pivoting (C2 beaconing is the clearest example)
Alert triage requires scope awareness across related alerts
Template queries surface patterns the agent would never formulate independently

Agent wins on efficiency — consistently 40–60% fewer tokens and faster to first token. Also occasionally surfaces unique signals: the "Detect Malware Only" policy gap, ML job recommendations, and risk engine status check are things the skill never exercised.

Core Tradeoff

Dimension	Skill	Agent
Avg. token cost	Higher (sometimes 4×)	Lower
Template queries	Yes — ready-to-run patterns, catches edge cases (credential spraying, proxy beaconing)	No
Alert correlation	`get-related-alerts` → full scope	Flat alert list only
Elastic ecosystem tools	Never uses ML jobs, risk engine, entity analytics	Uses them (even when not enabled)
Error recovery	Investigates failures, adapts queries	Moves on faster
Simple factual queries	Wastes tokens loading SKILL.md	Answers directly
Security Labs / product docs	Available in this env → valuable	Not installed → dead end

Bottom Line

The skill approach is the better threat hunter — it finds more signals and handles data gaps more gracefully. The agent is cheaper and occasionally finds high-value operational findings (config gaps, ML job recommendations) that the skill ignores entirely. For the tasks in this eval, they are complementary rather than substitutes, which is exactly the architecture the PR implements: decompose by specialization and let the router decide which skill activates.

jonwalstedt · 2026-03-31T05:53:42Z

    .array(z.string())
+    .min(1)


This change result.success flips from true to false for empty arrays. This is correct behavior, but any existing callers passing [] will now get a schema error instead of an empty result. Might be worth checking downstream how callers handle this

…ty fields - Reduce attack discovery search LIMIT from 100 to 10 to prevent overwhelming LLM context with large payloads - Add optional entity fields (hostNames, userNames, sourceIps, destIps) to get-related-alerts tool so callers can skip the GET round-trip when entities are already available from a previous tool call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use Array.isArray() instead of truthiness checks and nullish coalescing for optional array fields, since @kbn/zod/v4's .optional() infers `T | {}` rather than `T | undefined`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

patrykkopycinski · 2026-03-31T10:41:36Z

/ci

elasticmachine · 2026-03-31T12:48:54Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: adfbee0

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #51 / task_manager migrations 8.5.0 migrates active tasks to set enabled to true

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`securitySolution`	9241	9242	+1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	11.5MB	11.5MB	+603.0B

History

💔 Build #419380 failed e40496b
💛 Build #417613 was flaky a25fe92
💛 Build #417556 was flaky 56db000
💔 Build #417482 failed 9bb0fd7
💛 Build #417039 was flaky 250eeee
💔 Build #416910 failed 71c79d8

denar50

Code review only. LGTM for files owned by the Detection Engine.

jaredburgettelastic

Entity Analytics changes LGTM 👍

Thank you!

patrykkopycinski · 2026-04-01T19:43:48Z

/ci

coderabbitai · 2026-04-02T13:39:53Z

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

🛠️ Update Documentation: Commit on current branch
🛠️ Update Documentation: Create PR

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Changes from node scripts/eslint_all_files --no-cache --fix

083e04e

patrykkopycinski mentioned this pull request Mar 3, 2026

[Observability] Migrate Observability Agent to modular Agent Builder skills #255706

Closed

8 tasks

stephmilovic reviewed Mar 5, 2026

View reviewed changes

patrykkopycinski added 3 commits March 24, 2026 11:47

Fix MaybePromise type error in alert_analysis inline tool tests

1a64405

Await getInlineTools() since it returns MaybePromise<SkillBoundedTool[]>, which cannot be indexed directly without resolving the promise first.

Merge remote-tracking branch 'upstream/main' into threat-hunting-skill

71c79d8

patrykkopycinski mentioned this pull request Mar 26, 2026

[Security Solution] Add Scout API integration tests for Agent Builder security skills #259755

Open

9 tasks

kibanamachine and others added 2 commits March 26, 2026 21:48

Changes from node scripts/eslint_all_files --no-cache --fix

d52fca4

Remove manage_rule_exceptions from allow_lists.ts

56db000

Missed this reference when extracting the manage_rule_exceptions tool to the detection-engineering follow-up PR.

Add threat-hunting to AGENT_BUILDER_BUILTIN_SKILLS allow list

e5b1021

Without this entry, skill registration throws at server startup because isAllowedBuiltinSkill rejects unrecognized skill IDs.

patrykkopycinski requested review from a team as code owners March 27, 2026 05:38

patrykkopycinski added backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes labels Mar 27, 2026

pgayvallet approved these changes Mar 27, 2026

View reviewed changes

patrykkopycinski requested review from a team as code owners March 30, 2026 09:54

patrykkopycinski requested review from nkhristinin and tiansivive March 30, 2026 09:54

patrykkopycinski and others added 2 commits March 30, 2026 16:33

Merge remote-tracking branch 'upstream/main' into threat-hunting-skill

ebdfb35

# Conflicts: # x-pack/solutions/security/plugins/security_solution/public/app/app.tsx # x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/register_skills.ts

jonwalstedt approved these changes Mar 31, 2026

View reviewed changes

patrykkopycinski and others added 2 commits March 31, 2026 08:56

Fix type errors with zod v4 optional array inference

adfbee0

Use Array.isArray() instead of truthiness checks and nullish coalescing for optional array fields, since @kbn/zod/v4's .optional() infers `T | {}` rather than `T | undefined`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

denar50 approved these changes Apr 1, 2026

View reviewed changes

Comment thread x-pack/solutions/security/plugins/security_solution/public/app/app.tsx

Comment thread x-pack/solutions/security/plugins/security_solution/public/app/app.tsx

jaredburgettelastic approved these changes Apr 1, 2026

View reviewed changes

Merge branch 'main' into threat-hunting-skill

a1ab76b

patrykkopycinski enabled auto-merge (squash) April 1, 2026 15:54

patrykkopycinski added ci:build-next-docs v9.4.0 labels Apr 2, 2026

Merge branch 'main' into threat-hunting-skill

a257a4d

patrykkopycinski merged commit ca51db3 into elastic:main Apr 2, 2026
19 checks passed

vitaliidm mentioned this pull request Apr 10, 2026

[DE][9.4 & Serverless] Attach Security Detection rule as AI Agent context elastic/docs-content#5540

Merged

2 tasks

+        if ((await uiSettings.get(AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID)) === true) {
+          return {
+                status: 'unavailable',
+                reason:
+                'Skills are enabled, which takes precedence over threat hunting agent availability',
+          };
+        }
+        return getAgentBuilderResourceAvailability({ core, request, logger });

Conversation

patrykkopycinski commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Skills

Key Changes

Security Hardening

Architecture

Eval Suite

Test Coverage

Test Plan

Automated (CI)

Manual verification

Follow-up PRs

Uh oh!

patrykkopycinski commented Mar 3, 2026

Uh oh!

patrykkopycinski commented Mar 3, 2026

Uh oh!

patrykkopycinski commented Mar 3, 2026

Uh oh!

stephmilovic Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

stephmilovic Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

patrykkopycinski commented Mar 25, 2026

Uh oh!

patrykkopycinski commented Mar 25, 2026

Uh oh!

patrykkopycinski commented Mar 25, 2026

Uh oh!

patrykkopycinski commented Mar 26, 2026

Uh oh!

patrykkopycinski commented Mar 26, 2026

Uh oh!

patrykkopycinski commented Mar 26, 2026

Uh oh!

patrykkopycinski commented Mar 26, 2026

Uh oh!

patrykkopycinski commented Mar 26, 2026

Uh oh!

patrykkopycinski commented Mar 26, 2026

Uh oh!

patrykkopycinski commented Mar 27, 2026

Uh oh!

jonwalstedt left a comment

Choose a reason for hiding this comment

Skill vs. Agent: Comparison Summary

Results by Task

Overall Pattern

Core Tradeoff

Bottom Line

Uh oh!

jonwalstedt Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

patrykkopycinski commented Mar 31, 2026

Uh oh!

elasticmachine commented Mar 31, 2026

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

Module Count

Async chunks

History

Uh oh!

denar50 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jaredburgettelastic left a comment

Choose a reason for hiding this comment

Uh oh!

patrykkopycinski commented Apr 1, 2026

patrykkopycinski commented Mar 3, 2026 •

edited

Loading