Skip to content

[Security Solution] Migrate Threat Hunting Agent to modular Agent Builder skills#255697

Merged
patrykkopycinski merged 22 commits into
elastic:mainfrom
patrykkopycinski:threat-hunting-skill
Apr 2, 2026
Merged

[Security Solution] Migrate Threat Hunting Agent to modular Agent Builder skills#255697
patrykkopycinski merged 22 commits into
elastic:mainfrom
patrykkopycinski:threat-hunting-skill

Conversation

@patrykkopycinski
Copy link
Copy Markdown
Contributor

@patrykkopycinski patrykkopycinski commented Mar 3, 2026

Summary

Decomposes the monolithic security.agent (Threat Hunting Agent) into two focused, modular skills for the Elastic AI Agent. This improves discoverability, reduces token overhead via on-demand loading, and enables independent evolution of each security capability.

New Skills

Skill Purpose Registry Tools Inline Tools
threat-hunting Hypothesis-driven hunting, IOC search, anomaly detection, case creation 6 (generateEsql, executeEsql, search, listIndices, getIndexMapping, cases)
alert-analysis Alert triage, entity correlation, disposition workflow 3 (security.alerts, security_labs_search, entity_risk_score) 1 (get-related-alerts)

Key Changes

  • Created 2 SkillDefinition registrations with curated tool sets and rich skill content
  • Added inline get-related-alerts tool using space-scoped detection engine alert search for entity-based alert correlation (with terms queries for multi-valued ECS fields)
  • Enriched threat hunting agent instructions to skill-quality content
  • Added platformCoreTools.cases to threat-hunting skill so confirmed findings can be escalated into investigation cases
  • Added cross-skill reference: alert-analysis references entity-analytics for deeper entity profiling and asset criticality
  • Added comprehensive unit tests for all skills, tools, and the inline handler
  • Added eval suite with ExpectedSkillInvocation evaluator for dynamic skill activation validation

Security Hardening

  • ES|QL injection prevention: alertIds in attack_discovery_search_tool are validated against SAFE_ID_PATTERN (/^[a-zA-Z0-9_\-.:]+$/) before interpolation into ES|QL queries. Invalid IDs are silently filtered; all-invalid returns an error result.
  • spaceId validation: Validated before interpolation into ES|QL index patterns (defense-in-depth).

Architecture

  • Skills use getRegistryTools() to reference shared platform/security tools
  • Inline tools only access ToolHandlerContext (esClient, spaceId, request) — no plugin service dependencies
  • Threat hunting agent conditionally hides agentId when skills are enabled (agent availability gated on AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID)
  • Agent description and instructions softened for exception/tuning language to match actual tool capabilities

Eval Suite

The eval suite includes a metadata-driven ExpectedSkillInvocation evaluator that dynamically validates skill activation based on expectedSkill and shouldNotActivateSkill metadata on each example. It queries OTEL traces for filestore.read tool calls matching the skill's SKILL.md path, producing pass/fail scores per example. This covers:

  • Threat hunting queries → threat-hunting skill activation
  • Alert investigation queries → alert-analysis skill activation
  • Negative cases ensuring non-security queries do not activate security skills

Run locally: node scripts/evals start --suite agent-builder
Trigger in CI: add evals:agent-builder label to PR

Test Coverage

Category Count Details
Skill definition tests 20 Metadata, tool counts, cases tool presence, content assertions, cross-skill references, referenced content
attack_discovery_search tool tests 12 Schema, ES
alert_analysis inline tool tests 11 Entity extraction (single/multi-type/multi-valued), space-scoped index, error paths
use_agent_builder_attachment hook tests 7 Conditional agentId, skills-enabled behavior, attachment callbacks
Eval suite examples 12 Skill activation for threat hunting, alert analysis queries; cross-skill routing; distractor queries
Total 62 Across 5 test files

Test Plan

Automated (CI)

  • Unit tests pass for both skills — metadata, tool counts, content/description lengths, cross-skill references, cases tool, referenced content (20 tests)
  • Unit tests pass for attack_discovery_search tool — injection guards, spaceId validation (12 tests)
  • Unit tests pass for alert_analysis inline tool handler — entity extraction, correlation query, error paths (11 tests)
  • Unit tests pass for use_agent_builder_attachment hook — conditional agentId when skills enabled (7 tests)
  • Eval suite examples cover skill activation routing for both skills plus distractor queries (12 examples)
  • ExpectedSkillInvocation evaluator validates expectedSkill / shouldNotActivateSkill metadata against OTEL traces
  • ESLint passes on all changed files
  • TypeScript type check passes for security_solution and kbn-evals-suite-agent-builder projects
  • node scripts/check_changes.ts passes
  • Buildkite CI passes

Manual verification

  • Skills appear in Agent Builder skill listing when AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID is enabled
  • Threat hunting queries activate the threat-hunting skill (e.g. "Hunt for lateral movement in my environment")
  • Alert investigation queries activate the alert-analysis skill (e.g. "Help me triage this critical LSASS alert")
  • threat-hunting skill can create cases via platform.core.cases when escalating confirmed findings
  • alert-analysis skill's get-related-alerts inline tool returns related alerts using space-scoped .alerts-security.alerts-<spaceId> index
  • alert-analysis skill references entity-analytics for deeper entity profiling (visible in skill content)
  • Threat hunting agent hides agentId when skills are enabled (attachment hook respects feature flag)
  • Non-security queries (e.g. "Show me Kibana dashboards") do not activate security skills

Follow-up PRs

Closes https://github.com/elastic/security-team/issues/15697

…lder skills

Decomposes the monolithic security.agent (Threat Hunting Agent) into three
focused skills for the Elastic AI Agent, improving modularity and enabling
on-demand capability loading:

- **threat-hunting**: Hypothesis-driven hunting with ES|QL, IOC search,
  and statistical anomaly detection across security indices
- **alert-analysis**: Alert triage, entity correlation, threat intelligence
  enrichment, and disposition workflow
- **detection-engineering**: Rule authoring (KQL/EQL/threshold), coverage
  gap analysis, and exception management

Key changes:
- Create three SkillDefinition registrations with curated tool sets
- Add `security.manage_rule_exceptions` registry tool for programmatic
  exception list management
- Add inline `get-related-alerts` tool using space-scoped detection engine
  alert search for entity-based alert correlation
- Enrich threat hunting agent instructions to skill-quality content
- Add comprehensive unit tests for skills and tools
- Add eval suite for skill activation and tool selection validation

Closes elastic/security-team#15697
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

1 similar comment
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((await uiSettings.get(AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID)) === true) {
return {
status: 'unavailable',
reason:
'Skills are enabled, which takes precedence over threat hunting agent availability',
};
}
return getAgentBuilderResourceAvailability({ core, request, logger });

If skills are enabled, the threat hunting agent should not be available.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also invoke attachments with the threat hunting agent, and should add a condition to use_agent_builder_attachment.ts that if skills are enabled, do not pass an agentId argument

# Conflicts:
#	x-pack/platform/packages/shared/agent-builder/agent-builder-server/allow_lists.ts
#	x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/register_skills.ts
#	x-pack/solutions/security/plugins/security_solution/server/agent_builder/tools/register_tools.ts
#	x-pack/solutions/security/plugins/security_solution/server/plugin.ts
- Make threat hunting agent unavailable when skills are enabled via
  AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID check
- Conditionally pass agentId in use_agent_builder_attachment only when
  skills are disabled
- Fix Zod v3→v4 migration for manage_rule_exceptions_tool schema
- Fix StaticToolRegistration return type and ExceptionRulesClient typing
- Add 3 new test cases: lists-unavailable, createExceptionListItem error,
  rulesClient.update payload verification
- Align threat hunting agent description with actual capabilities
- Add entity_risk_score dedicated eval example, fix grammar in eval spec
…d inline tool tests

- Sanitize alertIds against injection patterns before interpolating into ES|QL queries
  in attack_discovery_search_tool (reject IDs with quotes, semicolons, spaces, etc.)
- Validate spaceId format before use in ES|QL index patterns (defense-in-depth)
- Add comprehensive handler-level tests for alert_analysis get-related-alerts inline tool
  covering entity extraction, multi-valued fields, space scoping, and error paths
- Add KQL injection guard for ruleId in manage_rule_exceptions_tool
- Move lists plugin availability check early in manage_rule_exceptions_tool handler
- Add transactional gap protection: log orphaned exception lists on rule update failure
- Improve threat_hunting_agent description and soften exception/tuning language
- Clarify skill content: reference templates as embedded content, not filesystem paths
- Ensure consistent await on skill registration calls
- Add warn-once feedback in useAgentBuilderAttachment when agentBuilder unavailable
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

Await getInlineTools() since it returns MaybePromise<SkillBoundedTool[]>,
which cannot be indexed directly without resolving the promise first.
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

Both items from @stephmilovic's review addressed:

  1. Agent availability gating: When AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID is true, the agent returns { status: 'unavailable' } — see threat_hunting_agent.ts line 93.

  2. Conditional agentId in attachments: use_agent_builder_attachment.ts reads the setting and conditionally omits agentId: ...(skillsEnabled ? {} : { agentId: THREAT_HUNTING_AGENT_ID }) — see line 79.

Additionally, the sample alertAnalysisSampleSkill (skills/alert_analysis_skill.ts) has been deleted since the registered alertAnalysisSkill (skills/alert_analysis/) supersedes it.

…eferences

- Add ExpectedSkillInvocation evaluator to dynamically validate expectedSkill
  and shouldNotActivateSkill metadata from eval examples against traces
- Restore platformCoreTools.cases to threat-hunting skill (lost during
  agent-to-skill migration) so hunts can escalate into cases
- Add cross-skill reference to entity-analytics in alert-analysis skill
  content for deeper entity profiling and asset criticality review
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

- Fix threat-hunting tool count assertion (5 -> 6) after adding cases tool
- Add assertion that platformCoreTools.cases is in getRegistryTools()
- Add content assertions for cross-skill references:
  threat-hunting -> alert-analysis, detection-engineering
  alert-analysis -> entity-analytics, detection-engineering
  detection-engineering -> threat-hunting, alert-analysis
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

The detection-engineering skill is net-new functionality, not a
decomposition of the existing threat hunting agent. Moving it to a
dedicated follow-up PR keeps this PR focused on the actual agent
decomposition (threat-hunting + alert-analysis).

Changes:
- Remove detection_engineering/ skill directory
- Remove registration and barrel export
- Remove detection-engineering tests and eval examples
- Remove cross-skill references to detection-engineering from
  threat-hunting and alert-analysis content
- Update distractor eval examples
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

kibanamachine and others added 2 commits March 26, 2026 21:48
The manage_rule_exceptions tool was only used by the detection-engineering
skill, which was extracted to a separate follow-up PR. Removing the orphaned
tool and reverting the registerTools signature change (setupPlugins param)
keeps this PR focused on the threat-hunting agent decomposition.

Changes:
- Delete manage_rule_exceptions_tool.ts and its test (738 lines)
- Remove from tools/index.ts barrel export
- Revert registerTools to 4-param signature (remove setupPlugins)
- Revert plugin.ts call site to not pass lists plugin
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

Missed this reference when extracting the manage_rule_exceptions tool
to the detection-engineering follow-up PR.
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

Without this entry, skill registration throws at server startup because
isAllowedBuiltinSkill rejects unrecognized skill IDs.
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

@patrykkopycinski patrykkopycinski requested review from a team as code owners March 27, 2026 05:38
@patrykkopycinski patrykkopycinski added backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes labels Mar 27, 2026
Extract useSecurityAgentId hook to centralize the skills-based agent
selection logic. When skills are enabled, Security now falls back to
the default elastic-ai-agent instead of the unavailable security.agent,
preventing the "Agent has been deleted" error in the UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
patrykkopycinski and others added 2 commits March 30, 2026 16:33
# Conflicts:
#	x-pack/solutions/security/plugins/security_solution/public/app/app.tsx
#	x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/register_skills.ts
Only keep the availability check that makes the agent unavailable
when skills are enabled. The agent description and instructions
should not change in this PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@jonwalstedt jonwalstedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Patryk! 🚢

Skill vs. Agent: Comparison Summary

Setup: Two agent configurations tested on the same prompts in the same Elastic Security environment. The "skill" version pre-loads SKILL.md files with ES|QL templates from a filestore before acting. The "agent" (threat hunting agent) uses Elastic Security tools directly without pre-loading.


Results by Task

Task Winner Key differentiator
Lateral movement hunt (7d) Skill Found 4-user PowerShell anomaly via aggregation query; agent never ran it
C2 beaconing Skill Pivoted to process-frequency analysis when network logs absent; surfaced 5 IPs, counts, paths. Agent stayed surface-level
Brute force login Tie Both hit dead end (no auth indices). Skill got there in 1 wildcard query vs. 3 targeted searches; gave credential-spraying query too
Rare process execution Ambiguous Different query filters → different results. Agent found 3 processes cheaply; skill found 1 rare + full landscape including mimikatz (11×)
Summarize malware alert Split Skill: 20-alert scope table, 5 C2 IPs, MITRE T1003, explicit "True Positive". Agent: found "Detect Malware Only" policy gap (critical config finding skill missed)
Write certutil ES|QL rule Agent slight edge Agent covered both -// flag prefixes, ftp://, MITRE+schedule guidance. Skill had better index targeting
Find lateral movement from host Tie Same conclusion, same IPs. Skill broader investigation + data gap disclosure; agent tighter table output
How many critical alerts? Agent (marginal) Skill loaded full SKILL.md unnecessarily; agent answered in one sentence
Complex flow (Alert→VT→On-call→Slack) Skill Used get-related-alerts for 20 structured alerts + Security Labs enrichment. Agent did flat summary only

Overall Pattern

Skill wins on depth (5-6/9 tasks), especially when:

  • Data is absent and investigation requires pivoting (C2 beaconing is the clearest example)
  • Alert triage requires scope awareness across related alerts
  • Template queries surface patterns the agent would never formulate independently

Agent wins on efficiency — consistently 40–60% fewer tokens and faster to first token. Also occasionally surfaces unique signals: the "Detect Malware Only" policy gap, ML job recommendations, and risk engine status check are things the skill never exercised.


Core Tradeoff

Dimension Skill Agent
Avg. token cost Higher (sometimes 4×) Lower
Template queries Yes — ready-to-run patterns, catches edge cases (credential spraying, proxy beaconing) No
Alert correlation get-related-alerts → full scope Flat alert list only
Elastic ecosystem tools Never uses ML jobs, risk engine, entity analytics Uses them (even when not enabled)
Error recovery Investigates failures, adapts queries Moves on faster
Simple factual queries Wastes tokens loading SKILL.md Answers directly
Security Labs / product docs Available in this env → valuable Not installed → dead end

Bottom Line

The skill approach is the better threat hunter — it finds more signals and handles data gaps more gracefully. The agent is cheaper and occasionally finds high-value operational findings (config gaps, ML job recommendations) that the skill ignores entirely. For the tasks in this eval, they are complementary rather than substitutes, which is exactly the architecture the PR implements: decompose by specialization and let the router decide which skill activates.

Comment on lines 19 to +20
.array(z.string())
.min(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change result.success flips from true to false for empty arrays. This is correct behavior, but any existing callers passing [] will now get a schema error instead of an empty result. Might be worth checking downstream how callers handle this

patrykkopycinski and others added 2 commits March 31, 2026 08:56
…ty fields

- Reduce attack discovery search LIMIT from 100 to 10 to prevent
  overwhelming LLM context with large payloads
- Add optional entity fields (hostNames, userNames, sourceIps, destIps)
  to get-related-alerts tool so callers can skip the GET round-trip
  when entities are already available from a previous tool call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use Array.isArray() instead of truthiness checks and nullish
coalescing for optional array fields, since @kbn/zod/v4's
.optional() infers `T | {}` rather than `T | undefined`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

@elasticmachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #51 / task_manager migrations 8.5.0 migrates active tasks to set enabled to true

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 9241 9242 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 11.5MB 11.5MB +603.0B

History

Copy link
Copy Markdown
Contributor

@denar50 denar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review only. LGTM for files owned by the Detection Engine.

Copy link
Copy Markdown
Contributor

@jaredburgettelastic jaredburgettelastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entity Analytics changes LGTM 👍

Thank you!

@patrykkopycinski patrykkopycinski enabled auto-merge (squash) April 1, 2026 15:54
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 2, 2026

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

@patrykkopycinski patrykkopycinski merged commit ca51db3 into elastic:main Apr 2, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting ci:build-next-docs release_note:skip Skip the PR/issue when compiling release notes v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants