[Agent Builder] Add skill developer experience CLI by patrykkopycinski · Pull Request #255890 · elastic/kibana

patrykkopycinski · 2026-03-03T23:31:45Z

Summary

Introduces node scripts/agent_builder_skill — a CLI tool that provides the framework for Agent Builder skill creation, validation, evaluation, and distribution.

Commands

Command	Purpose
`generate`	Scaffold skill definitions (`defineSkillType`), tool files (Zod schema + handler), and Jest tests from templates
`validate`	Check naming conventions, schema constraints, content quality, base path validity, TODO detection, and test coverage
`eval:generate`	Create 50-100 agent task datasets with pass/fail rubrics from skill metadata
`eval:run`	Execute evaluations across LLMs with phase gate metrics (80%/60%/85% thresholds), delegates to `@kbn/evals`
`export`	Extract a Kibana skill to universal format (`SKILL.md` + `skill.extensions.yaml`) for distribution
`import`	Sync universal format changes back into Kibana source, preserving hand-authored handler implementations

Architecture

New package: @kbn/agent-builder-skill-cli at x-pack/platform/packages/shared/kbn-agent-builder-skill-cli/
Entry point: scripts/agent_builder_skill.js (follows standard Kibana script conventions)
Uses RunWithCommands from @kbn/dev-cli-runner (same pattern as @kbn/evals CLI)
Domain-aware: supports security, observability, platform, and search skill domains
Template-based generation produces code that follows existing Agent Builder patterns (defineSkillType, Zod schemas, BuiltinToolDefinition)

Mapping to Program Phase Gates

Phase Gate	CLI Command	Threshold
Phase 1: ES skills first-try	`eval:run --metric first-try`	>= 80%
Phase 2: Solution skills e2e	`eval:run --metric e2e`	>= 60%
Phase 3: Overall eval	`eval:run --metric overall`	>= 85%

Test plan

CLI help output renders correctly for all commands
generate creates skill, tool, and test files in the correct plugin directories
validate detects TODOs, naming violations, missing test files
eval:generate creates eval datasets with categorized tasks from skill metadata
export produces SKILL.md and skill.extensions.yaml from defineSkillType source
import --dry-run shows changes without modifying files
Unit tests pass: 14/14 (utils, templates)
CI type-check passes
CI eslint passes

Production-Readiness Checklist — Agent Skills Ecosystem

Generated against [Epic] Creation of the Agent Skills Ecosystem for Elastic Security.

Narrative role: Out-of-band delivery foundation. Maps directly to the epic's "decoupled delivery for agent definitions, skills, tools, workflow templates" requirement.

Must-do before this can ship

Wire the declared phase-gate thresholds (80%/60%/85%) to a required Buildkite step that fails on regression — today they exist in docs only
Round-trip integration test in CI: generate → validate → eval:generate → eval:run against a real skill
export / import universal format (SKILL.md + skill.extensions.yaml) needs a signed-manifest step and version pinning before it can replace in-repo authoring
Add a kill_switch field to SkillDefinition and surface it in Agent Builder admin UI — required by the out-of-band delivery pillar
Define the skill invocation telemetry schema (invocation count, time saved vs baseline, accept/reject, error rate) — skills cannot ship out-of-band until this schema exists

Follow-ups (post-merge)

Document the "SKILL.md as source of truth" authoring flow in a top-level contributor guide
Add a skill lint mode that enforces the PCI hardening patterns (param-bound ES|QL, dark-by-default flag, scope claims) as a default validator

Introduces `node scripts/agent_builder_skill` — a CLI tool that provides the framework for Agent Builder skill creation, validation, evaluation, and distribution. Implements the "framework for skill creation" from workstream 1 and connects to the eval-suite.md specification. Commands: - generate: Scaffold skill definitions, tool files, and tests - validate: Check naming, schema, content, and test coverage - eval:generate: Create 50-100 agent task datasets with rubrics - eval:run: Execute evals across LLMs with phase gate metrics - export: Extract skills to universal format (SKILL.md + extensions) - import: Sync universal format changes back preserving handlers

elasticmachine · 2026-03-03T23:32:00Z

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

Click to trigger kibana-pull-request for this PR!
Click to trigger kibana-deploy-project-from-pr for this PR!
Click to trigger kibana-deploy-cloud-from-pr for this PR!
Click to trigger kibana-entity-store-performance-from-pr for this PR!
Click to trigger kibana-storybooks-from-pr for this PR!

… templates - Extract shared extractSkillMetadata to utils.ts, remove duplicated versions from eval_generate.ts and export.ts - Fix discoverSkillFiles skipping index.ts-based skills (e.g. automatic_troubleshooting/index.ts containing defineSkillType) - Remove dead baseline flag from eval:run command - Fix inline tool count heuristic to match dot-notation tool IDs only - Remove redundant as SkillDomain casts after validateDomain assertion - Guard empty description in eval task generation - Add escapeForSingleQuoteString for rendered eval datasets - Escape single quotes in rendered skill and tool descriptions - Add --name requires --domain validation in validate command - Fix misleading eval:run example missing required --name flag - Deduplicate findEvalSuiteConfig candidate paths for flat skills - Use toPascalCase/toSnakeCase utilities in renderToolTestFile - Remove unused imports (ToolingLog, ToolingLog, SkillDomain) - Use @kbn/repo-info for resolveRepoRoot - Add toPascalCase underscore test, split validateDomain test

patrykkopycinski · 2026-03-16T09:45:57Z

/ci

These commands are not Kibana-internal concerns — they bridge Kibana skill definitions with the universal SKILL.md format for distribution and external tooling. Moved to the agent-builder-skill-dev Cursor plugin as export_skill and import_skill MCP tools. The Kibana CLI now only contains Kibana-focused commands: generate, validate, eval:generate, eval:run.

patrykkopycinski · 2026-03-16T09:55:17Z

/ci

…mprovements New commands: - `list` — discover and summarize all skills with table/JSON output Validate improvements: - Detect duplicate skill IDs across domains - Check referencedContent entries for valid paths and content - Verify getRegistryTools() tool IDs exist in the codebase - Content quality heuristics: min length, structured sections, tool refs - Accept @kbn/zod/v4 import path alongside @kbn/zod - --fix flag auto-creates missing test files and fixes invalid basePath - Tag fixable issues in output with [fixable] Generate improvements: - --nested: scaffold skill in nested directory with barrel export - --with-inline-tool: generate getInlineTools() with Zod schema stub - --with-referenced-content: scaffold referencedContent with ES|QL sample - --auto-register: auto-add import/registration to register_skills.ts Eval improvements: - Smarter task generation: inline tools, referencedContent, negative tests - --count flag to control tasks per category - eval:run parses stdout JSON to display pass/fail summary vs threshold - eval:run --watch mode re-runs on file changes Cleanup: - Remove unused discoverToolFiles (replaced by discoverRegisteredToolIds) - Add MIN_CONTENT_LENGTH constant - Enrich SkillFileMetadata with hasInlineTools, referencedContent, etc.

patrykkopycinski · 2026-03-16T10:18:55Z

/ci

elasticmachine · 2026-03-16T10:24:41Z

💔 Build Failed

Buildkite Build
Commit: 3c56160

Failed CI Steps

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`@kbn/agent-builder-skill-cli`	-	1	+1

Unknown metric groups

API count

id	before	after	diff
`@kbn/agent-builder-skill-cli`	-	1	+1

History

💔 Build #410417 failed a02c3d7

patrykkopycinski and others added 2 commits March 16, 2026 09:03

Merge branch 'main' into skill-developer-experience

ff16dbc

kibanamachine and others added 2 commits March 16, 2026 10:01

Changes from node scripts/lint.js --fix

998344a

Changes from node scripts/lint.js --fix

c504b8a

patrykkopycinski closed this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Agent Builder] Add skill developer experience CLI#255890

[Agent Builder] Add skill developer experience CLI#255890
patrykkopycinski wants to merge 7 commits into
elastic:mainfrom
patrykkopycinski:skill-developer-experience

patrykkopycinski commented Mar 3, 2026 •

edited

Loading

Uh oh!

elasticmachine commented Mar 3, 2026

Uh oh!

patrykkopycinski commented Mar 16, 2026

Uh oh!

patrykkopycinski commented Mar 16, 2026

Uh oh!

patrykkopycinski commented Mar 16, 2026

Uh oh!

elasticmachine commented Mar 16, 2026 •

edited

Loading

API count

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

patrykkopycinski commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commands

Architecture

Mapping to Program Phase Gates

Test plan

Production-Readiness Checklist — Agent Skills Ecosystem

Must-do before this can ship

Follow-ups (post-merge)

Uh oh!

elasticmachine commented Mar 3, 2026

Uh oh!

patrykkopycinski commented Mar 16, 2026

Uh oh!

patrykkopycinski commented Mar 16, 2026

Uh oh!

patrykkopycinski commented Mar 16, 2026

Uh oh!

elasticmachine commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💔 Build Failed

Failed CI Steps

Metrics [docs]

Public APIs missing comments

API count

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

patrykkopycinski commented Mar 3, 2026 •

edited

Loading

elasticmachine commented Mar 16, 2026 •

edited

Loading