Skip to content

[Agent Builder] Add skill developer experience CLI#255890

Closed
patrykkopycinski wants to merge 7 commits into
elastic:mainfrom
patrykkopycinski:skill-developer-experience
Closed

[Agent Builder] Add skill developer experience CLI#255890
patrykkopycinski wants to merge 7 commits into
elastic:mainfrom
patrykkopycinski:skill-developer-experience

Conversation

@patrykkopycinski
Copy link
Copy Markdown
Contributor

@patrykkopycinski patrykkopycinski commented Mar 3, 2026

Summary

Introduces node scripts/agent_builder_skill — a CLI tool that provides the framework for Agent Builder skill creation, validation, evaluation, and distribution.

Commands

Command Purpose
generate Scaffold skill definitions (defineSkillType), tool files (Zod schema + handler), and Jest tests from templates
validate Check naming conventions, schema constraints, content quality, base path validity, TODO detection, and test coverage
eval:generate Create 50-100 agent task datasets with pass/fail rubrics from skill metadata
eval:run Execute evaluations across LLMs with phase gate metrics (80%/60%/85% thresholds), delegates to @kbn/evals
export Extract a Kibana skill to universal format (SKILL.md + skill.extensions.yaml) for distribution
import Sync universal format changes back into Kibana source, preserving hand-authored handler implementations

Architecture

  • New package: @kbn/agent-builder-skill-cli at x-pack/platform/packages/shared/kbn-agent-builder-skill-cli/
  • Entry point: scripts/agent_builder_skill.js (follows standard Kibana script conventions)
  • Uses RunWithCommands from @kbn/dev-cli-runner (same pattern as @kbn/evals CLI)
  • Domain-aware: supports security, observability, platform, and search skill domains
  • Template-based generation produces code that follows existing Agent Builder patterns (defineSkillType, Zod schemas, BuiltinToolDefinition)

Mapping to Program Phase Gates

Phase Gate CLI Command Threshold
Phase 1: ES skills first-try eval:run --metric first-try >= 80%
Phase 2: Solution skills e2e eval:run --metric e2e >= 60%
Phase 3: Overall eval eval:run --metric overall >= 85%

Test plan

  • CLI help output renders correctly for all commands
  • generate creates skill, tool, and test files in the correct plugin directories
  • validate detects TODOs, naming violations, missing test files
  • eval:generate creates eval datasets with categorized tasks from skill metadata
  • export produces SKILL.md and skill.extensions.yaml from defineSkillType source
  • import --dry-run shows changes without modifying files
  • Unit tests pass: 14/14 (utils, templates)
  • CI type-check passes
  • CI eslint passes

Production-Readiness Checklist — Agent Skills Ecosystem

Generated against [Epic] Creation of the Agent Skills Ecosystem for Elastic Security.

Narrative role: Out-of-band delivery foundation. Maps directly to the epic's "decoupled delivery for agent definitions, skills, tools, workflow templates" requirement.

Must-do before this can ship

  • Wire the declared phase-gate thresholds (80%/60%/85%) to a required Buildkite step that fails on regression — today they exist in docs only
  • Round-trip integration test in CI: generate → validate → eval:generate → eval:run against a real skill
  • export / import universal format (SKILL.md + skill.extensions.yaml) needs a signed-manifest step and version pinning before it can replace in-repo authoring
  • Add a kill_switch field to SkillDefinition and surface it in Agent Builder admin UI — required by the out-of-band delivery pillar
  • Define the skill invocation telemetry schema (invocation count, time saved vs baseline, accept/reject, error rate) — skills cannot ship out-of-band until this schema exists

Follow-ups (post-merge)

  • Document the "SKILL.md as source of truth" authoring flow in a top-level contributor guide
  • Add a skill lint mode that enforces the PCI hardening patterns (param-bound ES|QL, dark-by-default flag, scope claims) as a default validator

Introduces `node scripts/agent_builder_skill` — a CLI tool that provides
the framework for Agent Builder skill creation, validation, evaluation,
and distribution. Implements the "framework for skill creation" from
workstream 1 and connects to the eval-suite.md specification.

Commands:
- generate: Scaffold skill definitions, tool files, and tests
- validate: Check naming, schema, content, and test coverage
- eval:generate: Create 50-100 agent task datasets with rubrics
- eval:run: Execute evals across LLMs with phase gate metrics
- export: Extract skills to universal format (SKILL.md + extensions)
- import: Sync universal format changes back preserving handlers
@elasticmachine
Copy link
Copy Markdown
Contributor

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

  • Click to trigger kibana-pull-request for this PR!
  • Click to trigger kibana-deploy-project-from-pr for this PR!
  • Click to trigger kibana-deploy-cloud-from-pr for this PR!
  • Click to trigger kibana-entity-store-performance-from-pr for this PR!
  • Click to trigger kibana-storybooks-from-pr for this PR!

patrykkopycinski and others added 2 commits March 16, 2026 09:03
… templates

- Extract shared extractSkillMetadata to utils.ts, remove duplicated
  versions from eval_generate.ts and export.ts
- Fix discoverSkillFiles skipping index.ts-based skills (e.g.
  automatic_troubleshooting/index.ts containing defineSkillType)
- Remove dead baseline flag from eval:run command
- Fix inline tool count heuristic to match dot-notation tool IDs only
- Remove redundant as SkillDomain casts after validateDomain assertion
- Guard empty description in eval task generation
- Add escapeForSingleQuoteString for rendered eval datasets
- Escape single quotes in rendered skill and tool descriptions
- Add --name requires --domain validation in validate command
- Fix misleading eval:run example missing required --name flag
- Deduplicate findEvalSuiteConfig candidate paths for flat skills
- Use toPascalCase/toSnakeCase utilities in renderToolTestFile
- Remove unused imports (ToolingLog, ToolingLog, SkillDomain)
- Use @kbn/repo-info for resolveRepoRoot
- Add toPascalCase underscore test, split validateDomain test
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

These commands are not Kibana-internal concerns — they bridge Kibana
skill definitions with the universal SKILL.md format for distribution
and external tooling. Moved to the agent-builder-skill-dev Cursor
plugin as export_skill and import_skill MCP tools.

The Kibana CLI now only contains Kibana-focused commands: generate,
validate, eval:generate, eval:run.
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

kibanamachine and others added 2 commits March 16, 2026 10:01
…mprovements

New commands:
- `list` — discover and summarize all skills with table/JSON output

Validate improvements:
- Detect duplicate skill IDs across domains
- Check referencedContent entries for valid paths and content
- Verify getRegistryTools() tool IDs exist in the codebase
- Content quality heuristics: min length, structured sections, tool refs
- Accept @kbn/zod/v4 import path alongside @kbn/zod
- --fix flag auto-creates missing test files and fixes invalid basePath
- Tag fixable issues in output with [fixable]

Generate improvements:
- --nested: scaffold skill in nested directory with barrel export
- --with-inline-tool: generate getInlineTools() with Zod schema stub
- --with-referenced-content: scaffold referencedContent with ES|QL sample
- --auto-register: auto-add import/registration to register_skills.ts

Eval improvements:
- Smarter task generation: inline tools, referencedContent, negative tests
- --count flag to control tasks per category
- eval:run parses stdout JSON to display pass/fail summary vs threshold
- eval:run --watch mode re-runs on file changes

Cleanup:
- Remove unused discoverToolFiles (replaced by discoverRegisteredToolIds)
- Add MIN_CONTENT_LENGTH constant
- Enrich SkillFileMetadata with hasInlineTools, referencedContent, etc.
@patrykkopycinski
Copy link
Copy Markdown
Contributor Author

/ci

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Mar 16, 2026

💔 Build Failed

Failed CI Steps

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/agent-builder-skill-cli - 1 +1
Unknown metric groups

API count

id before after diff
@kbn/agent-builder-skill-cli - 1 +1

History

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants