[AO] realign-kibana-265798-inline-local-into-kbn-evals-retire-kbn#268862
Draft
patrykkopycinski wants to merge 9 commits into
Draft
[AO] realign-kibana-265798-inline-local-into-kbn-evals-retire-kbn#268862patrykkopycinski wants to merge 9 commits into
patrykkopycinski wants to merge 9 commits into
Conversation
Move src/cli/inject.ts from @kbn/evals-local into @kbn/evals as src/cli/inject_local_connector.ts. The file is made self-contained by inlining the runtime-detection helpers (probeEndpoint, getOllamaModels, getLmStudioModel, commandExists, detect) and the connector env-setter from connector_factory, both of which are being deleted in the broader kbn-evals-local retirement. The ModelRegistry dependency is dropped in favour of accepting --local-model values as plain model-name strings (bare-connector path). The three load-bearing behaviours are preserved verbatim: the hard-fail guard when no endpoint is detected, the process.argv strip-and-sync after --local removal, and the execFileSync-based commandExists call that prevents shell injection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the named re-export so scripts/evals.js can require it directly from @kbn/evals instead of @kbn/evals-local. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When scripts/evals.js is invoked with --local, call injectLocalConnector(process.argv) from @kbn/evals before handing off to cli.run(). Passing process.argv directly (not a slice) lets the function strip --local / --local-endpoint / --local-model in-place so cli.run() sees a clean argv. The .then() chain ensures cli.run() only starts after connector env vars are set. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…anner Replace the early-return stub with env-var injection so --dry-run falls through to spawn Playwright: sets EVALUATION_REPETITIONS=1 and EVALUATION_DRY_RUN=true in envOverrides, prints the '[DRY-RUN] sampling 1 example per dataset, repetitions=1' banner, then proceeds to the existing spawn block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…RUN=true In KibanaEvalsClient.runExperiment(), check process.env.EVALUATION_DRY_RUN and slice resolvedDataset.examples to [examples[0]] before the run loop. This wires the --dry-run flag end-to-end: CLI sets EVALUATION_DRY_RUN=true, Playwright inherits it, and the executor limits each dataset to one example. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers: one-line Ollama install, one model recommendation per RAM tier (16/32/48/64 GB+), EVAL_TASK_TIMEOUT_MS=600000 requirement, --local vs --dry-run guidance, and pointer to elastic-agent-builder-skill-dev for advanced benchmarking orchestration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…view Move the --dry-run envOverrides block above the commandPreview snapshot so EVALUATION_REPETITIONS=1 and EVALUATION_DRY_RUN=true appear in the logged "Running: ..." line. Previously the preview was built before the dry-run mutation, making the logged command unreproducible when copy-pasted. Runtime behavior is unchanged — spawn() always received the correct env. Smoke test verified: node scripts/evals run --suite agent-builder --dry-run --local prints the [DRY-RUN] banner, shows all overrides in the Running: line, and Playwright starts 12 tests (1 per dataset spec) with EVALUATION_REPETITIONS=1 and EVALUATION_DRY_RUN=true set. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d; fix timer leak
Adds inject_local_connector.test.ts with 7 unit tests covering the
hard-fail path (no Ollama, no LM Studio, no binary installed), the
env-var injection happy path, and the binary-installed-but-not-running
path. All assertions verified: throws with actionable message, strips
--local from args before detection, never sets EVALUATION_CONNECTOR_ID
or KIBANA_TESTING_AI_CONNECTORS when no runtime is found.
Also fixes a timer resource leak in probeEndpoint / getOllamaModels /
getLmStudioModel: clearTimeout was only called on the success path; moved
to finally{} so it fires on rejection too. This eliminated the
"Jest did not exit" open-handle warning that surfaced during test authoring.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lsClient Verifies that runExperiment() limits execution to the first example when EVALUATION_DRY_RUN=true (regardless of repetitions), and runs all examples when the var is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
🤖 Jobs for this PR can be triggered through checkboxes. 🚧
ℹ️ To trigger the CI, please tick the checkbox below 👇
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Auto-generated by patryks-treadmill (per-plan worktree)
Realign kibana#265798: inline
--localinto@kbn/evals, retire@kbn/evals-localWhy
PR #265798 put too much responsibility on the framework too early — auto-provisioning a runtime, maintaining a model registry, running a tool-calling probe before every eval, auto-generating recommendatio
Tasks completed
x-pack/platform/packages/shared/kbn-evals/index.ts:export { injectLocalConnector } from './src/cli/inject_local_connector'x-pack/platform/packages/shared/kbn-evals-local/directory viagit rm -rx-pack/platform/packages/shared/kbn-evals-local/src/cli/inject.tstox-pack/platform/packages/shared/kbn-evals/src/cli/inject_local_connector.ts, preserving all logic verbatim (hard-fail guard,process.argvsync,execFileSyncmodel-list call)kbn-evals-localentry from.github/CODEOWNERS@kbn/evals-localpath mapping fromtsconfig.base.json@kbn/evals-localworkspace entry from rootpackage.jsonyarn installto regenerateyarn.lockwithout stale@kbn/evals-localentriesscripts/evals.jsto require from@kbn/evalsinstead of@kbn/evals-localfor the--localbranch, preserving the.then()chainingx-pack/platform/packages/shared/kbn-evals/src/cli/commands/run.ts, replace the--dry-runearly-return stub with logic that: setsEVALUATION_REPETITIONS=1inenvOverrides, setsEVALUATION_DRY_RUN=trueinenvOverrides, prints the[DRY-RUN] sampling 1 example per dataset, repetitions=1banner, and falls through to spawn Playwright (~15 LOC)EVALUATION_DRY_RUN=trueand slice the examples array to[examples[0]]when true (~15–35 LOC across fixture files)x-pack/platform/packages/shared/kbn-evals/README.mdcontaining: one-line install command (brew install ollama && ollama pull <model>), one recommended model per RAM tier (16 GB / 32 GB / 48 GB / 64 GB+), required env var (EVAL_TASK_TIMEOUT_MS=600000), guidance on when to use--localvs--dry-run, and pointer to thelocal-evalsskill inelastic-agent-builder-skill-devfor automated orchestrationnode scripts/eslint --fixon all changed files and verify no remaining lint errorsnode scripts/type_check --project x-pack/platform/packages/shared/kbn-evals/tsconfig.jsonand verify zero exit codenode scripts/evals run --suite agent-builder --dry-runand verify 1 example per dataset executes withEVALUATION_REPETITIONS=1node scripts/evals run --suite agent-builder --localwith no Ollama running and verify hard-fail with actionable error message and non-zero exitnode scripts/evals run --suite agent-builder --localend-to-end--localconnector injection in@kbn/evals, (b) configurable timeout, (c)--dry-run, (d) doc. Orchestrator + benchmark + registry moved toelastic-agent-builder-skill-dev— see ."