[AO] realign-kibana-265798-inline-local-into-kbn-evals-retire-kbn by patrykkopycinski · Pull Request #268862 · elastic/kibana

patrykkopycinski · 2026-05-12T11:32:47Z

Auto-generated by patryks-treadmill (per-plan worktree)

Realign kibana#265798: inline `--local` into `@kbn/evals`, retire `@kbn/evals-local`

Why

PR #265798 put too much responsibility on the framework too early — auto-provisioning a runtime, maintaining a model registry, running a tool-calling probe before every eval, auto-generating recommendatio

Tasks completed

One commit per task on a single shared plan branch.
This PR was autonomously generated and verified by the patryks-treadmill pipeline.

Move src/cli/inject.ts from @kbn/evals-local into @kbn/evals as src/cli/inject_local_connector.ts. The file is made self-contained by inlining the runtime-detection helpers (probeEndpoint, getOllamaModels, getLmStudioModel, commandExists, detect) and the connector env-setter from connector_factory, both of which are being deleted in the broader kbn-evals-local retirement. The ModelRegistry dependency is dropped in favour of accepting --local-model values as plain model-name strings (bare-connector path). The three load-bearing behaviours are preserved verbatim: the hard-fail guard when no endpoint is detected, the process.argv strip-and-sync after --local removal, and the execFileSync-based commandExists call that prevents shell injection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds the named re-export so scripts/evals.js can require it directly from @kbn/evals instead of @kbn/evals-local. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When scripts/evals.js is invoked with --local, call injectLocalConnector(process.argv) from @kbn/evals before handing off to cli.run(). Passing process.argv directly (not a slice) lets the function strip --local / --local-endpoint / --local-model in-place so cli.run() sees a clean argv. The .then() chain ensures cli.run() only starts after connector env vars are set. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…anner Replace the early-return stub with env-var injection so --dry-run falls through to spawn Playwright: sets EVALUATION_REPETITIONS=1 and EVALUATION_DRY_RUN=true in envOverrides, prints the '[DRY-RUN] sampling 1 example per dataset, repetitions=1' banner, then proceeds to the existing spawn block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…RUN=true In KibanaEvalsClient.runExperiment(), check process.env.EVALUATION_DRY_RUN and slice resolvedDataset.examples to [examples[0]] before the run loop. This wires the --dry-run flag end-to-end: CLI sets EVALUATION_DRY_RUN=true, Playwright inherits it, and the executor limits each dataset to one example. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Covers: one-line Ollama install, one model recommendation per RAM tier (16/32/48/64 GB+), EVAL_TASK_TIMEOUT_MS=600000 requirement, --local vs --dry-run guidance, and pointer to elastic-agent-builder-skill-dev for advanced benchmarking orchestration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…view Move the --dry-run envOverrides block above the commandPreview snapshot so EVALUATION_REPETITIONS=1 and EVALUATION_DRY_RUN=true appear in the logged "Running: ..." line. Previously the preview was built before the dry-run mutation, making the logged command unreproducible when copy-pasted. Runtime behavior is unchanged — spawn() always received the correct env. Smoke test verified: node scripts/evals run --suite agent-builder --dry-run --local prints the [DRY-RUN] banner, shows all overrides in the Running: line, and Playwright starts 12 tests (1 per dataset spec) with EVALUATION_REPETITIONS=1 and EVALUATION_DRY_RUN=true set. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d; fix timer leak Adds inject_local_connector.test.ts with 7 unit tests covering the hard-fail path (no Ollama, no LM Studio, no binary installed), the env-var injection happy path, and the binary-installed-but-not-running path. All assertions verified: throws with actionable message, strips --local from args before detection, never sets EVALUATION_CONNECTOR_ID or KIBANA_TESTING_AI_CONNECTORS when no runtime is found. Also fixes a timer resource leak in probeEndpoint / getOllamaModels / getLmStudioModel: clearTimeout was only called on the success path; moved to finally{} so it fires on rejection too. This eliminated the "Jest did not exit" open-handle warning that surfaced during test authoring. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lsClient Verifies that runExperiment() limits execution to the first example when EVALUATION_DRY_RUN=true (regardless of repetitions), and runs all examples when the var is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

infra-vault-gh-plugin-prod · 2026-05-12T11:33:02Z

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

Click to trigger kibana-pull-request for this PR!
Click to trigger kibana-deploy-project-from-pr for this PR!
Click to trigger kibana-deploy-cloud-from-pr for this PR!
Click to trigger kibana-entity-store-performance-from-pr for this PR!
Click to trigger kibana-storybooks-from-pr for this PR!

patrykkopycinski and others added 9 commits May 12, 2026 09:14

feat(kbn-evals): export injectLocalConnector from package index

6073da3

Adds the named re-export so scripts/evals.js can require it directly from @kbn/evals instead of @kbn/evals-local. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AO] realign-kibana-265798-inline-local-into-kbn-evals-retire-kbn#268862

[AO] realign-kibana-265798-inline-local-into-kbn-evals-retire-kbn#268862
patrykkopycinski wants to merge 9 commits into
elastic:mainfrom
patrykkopycinski:ao/realign-kibana-265798-inl-f9be30

patrykkopycinski commented May 12, 2026

Uh oh!

infra-vault-gh-plugin-prod Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

patrykkopycinski commented May 12, 2026

Auto-generated by patryks-treadmill (per-plan worktree)

Realign kibana#265798: inline --local into @kbn/evals, retire @kbn/evals-local

Why

Tasks completed

Uh oh!

infra-vault-gh-plugin-prod Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Realign kibana#265798: inline `--local` into `@kbn/evals`, retire `@kbn/evals-local`