[Agent Builder] add basic smoke tests by pgayvallet · Pull Request #243248 · elastic/kibana

pgayvallet · 2025-11-17T16:38:16Z

Summary

Fix https://github.com/elastic/search-team/issues/11697
Inspired by #198000

Add basic "real LLM" smoke tests to make sure agent builder's basic functionalities are working as expected with our list of officially supported models.

Smoke tests are triggered when the agent-builder:run-smoke-tests tag is present on the PR, or when any of the following code has changes:

the GenAI stack connectors
the inference plugin and packages
the agent builder plugin or packages

This PR re-uses the set of "CI" LLM connectors which was introduced in #198000 for the AI-Infra inference tests.

pgayvallet · 2025-11-18T06:52:25Z

/ci

pgayvallet

self-review

pgayvallet · 2025-11-18T13:15:12Z

.buildkite/pipelines/agent_builder/smoke_tests.yml

+env:
+  FTR_GEN_AI: '1'
+steps:
+  - label: '👨‍🔧 Pre-Build'
+    command: .buildkite/scripts/lifecycle/pre_build.sh
+    agents:
+      image: family/kibana-ubuntu-2404


The dedicated pipeline which will be running every day

Would you like to set up a scheduled pipeline for this?
If so, find inspiration in .buildkite/pipeline-resource-definitions/kibana-api-docs.yml for example. You'll need such a pipeline resource def file (and an entry in locations.yml) so that Terrazzo can generate it's own pipeline.

Thank you! Done in 8e21dd1

pgayvallet · 2025-11-18T13:15:28Z

.buildkite/pipelines/pull_request/agent_builder_smoke_tests.yml

+steps:
+  - group: Agent Builder Smoke Tests
+    key: agent-builder-smoke-tests
+    depends_on:
+      - build
+      - quick_checks
+      - checks


The conditional pipeline for the pull requests

pgayvallet · 2025-11-18T13:16:14Z

.buildkite/scripts/pipelines/pull_request/pipeline.ts

+    if (
+      (await doAnyChangesMatch([...aiInfraPaths, ...aiConnectorPaths, ...agentBuilderPaths])) ||
+      GITHUB_PR_LABELS.includes('agent-builder:run-smoke-tests') ||
+      GITHUB_PR_LABELS.includes('ci:all-gen-ai-suites')
+    ) {
+      pipeline.push(getPipeline('.buildkite/pipelines/pull_request/agent_builder_smoke_tests.yml'));
+    }


Condition on how the smoke test pipeline is executed for PRs

pgayvallet · 2025-11-18T13:18:49Z

x-pack/platform/test/onechat/smoke_tests/tests/index.ts

+    connectors.forEach((connector) => {
+      describe(`Connector "${connector.id}"`, () => {
+        converseApiSuite(connector, providerContext);
+      });
+    });


Run the smoke tests for each CI connector

Just so I understand, this will run for every connector that is created with this line in the config:

const preconfiguredConnectors = getPreconfiguredConnectorConfig();? And presumably, right now that's only 1 given the naming of this function is singular: getPreconfiguredConnectorConfig

There is a single config, but it contains multiple connectors (I'm just bad at naming or at plurals).

I didn't check the exact list, but from what I remember we are running against

claude 3.5 and 3.7

Gemini pro 2.5

GPT-4o

Maybe others

pgayvallet · 2025-11-18T13:20:41Z

x-pack/platform/test/onechat/smoke_tests/tests/converse.ts

+  describe('Converse API', () => {
+    describe('sync', () => {
+      it('returns an answer for a simple message', async () => {
+        const response = await converse({
+          input: 'Hello',
+          connector_id: connectorId,
+        });
+
+        expect(response.response.message.length).to.be.greaterThan(0);
+      });
+
+      it('can execute a tool', async () => {
+        const response = await converse({
+          input: 'Please list my indices',
+          connector_id: connectorId,
+        });
+
+        expect(response.response.message.length).to.be.greaterThan(0);
+
+        const toolCalls = response.steps.filter(isToolCallStep);
+        expect(toolCalls.length).to.eql(1);
+
+        const toolCall = toolCalls[0];
+        expect(toolCall.tool_id).to.eql(platformCoreTools.listIndices);
+      });
+    });
+  });


We have stubbed API integ tests to test individual features. The goal is is just to make sure the framework "functions" properly with real LLMs

Makes sense, I don't know what the requirements are for smoke tests or how far we should go with them

chrisbmar · 2025-11-18T13:38:40Z

x-pack/platform/test/onechat/smoke_tests/config.stateful.ts

+export default async function (ftrContext: FtrConfigProviderContext) {
+  const preconfiguredConnectors = getPreconfiguredConnectorConfig();
+
+  return createStatefulTestConfig({
+    services: oneChatApiServices,


Can you use the services that you expose in the ftrContext argument instead of importing import { oneChatApiServices } from '../services/api';?

That's misleading because I had to wrap createStatefulTestConfig with my own function to get the connector list, but you are supposed to pass the list of services explicitly (this is actually where you defined your list of services)

We're doing the same for our other test suites, e.g.

kibana/x-pack/platform/test/onechat_api_integration/configs/config.stateful.ts

Lines 9 to 12 in 0303293

import { oneChatApiServices } from '../../onechat/services/api';

export default createStatefulTestConfig({

services: oneChatApiServices,

chrisbmar · 2025-11-18T13:39:54Z

x-pack/platform/test/onechat/smoke_tests/tests/index.ts

+    connectors.forEach((connector) => {
+      describe(`Connector "${connector.id}"`, () => {
+        converseApiSuite(connector, providerContext);
+      });
+    });


Just so I understand, this will run for every connector that is created with this line in the config:

const preconfiguredConnectors = getPreconfiguredConnectorConfig();? And presumably, right now that's only 1 given the naming of this function is singular: getPreconfiguredConnectorConfig

chrisbmar · 2025-11-18T13:41:34Z

x-pack/platform/test/onechat/smoke_tests/tests/converse.ts

+  describe('Converse API', () => {
+    describe('sync', () => {
+      it('returns an answer for a simple message', async () => {
+        const response = await converse({
+          input: 'Hello',
+          connector_id: connectorId,
+        });
+
+        expect(response.response.message.length).to.be.greaterThan(0);
+      });
+
+      it('can execute a tool', async () => {
+        const response = await converse({
+          input: 'Please list my indices',
+          connector_id: connectorId,
+        });
+
+        expect(response.response.message.length).to.be.greaterThan(0);
+
+        const toolCalls = response.steps.filter(isToolCallStep);
+        expect(toolCalls.length).to.eql(1);
+
+        const toolCall = toolCalls[0];
+        expect(toolCall.tool_id).to.eql(platformCoreTools.listIndices);
+      });
+    });
+  });


Makes sense, I don't know what the requirements are for smoke tests or how far we should go with them

chrisbmar · 2025-11-18T13:42:47Z

.buildkite/pipelines/pull_request/agent_builder_smoke_tests.yml

+          FTR_GEN_AI: '1'
+        label: Agent Builder API Smoke Tests
+        key: ftr-agent-builder-smoke-tests
+        timeout_in_minutes: 50


In your experience testing this, what was the average minutes for a successful run?

Atm it takes less than 10mins to run. Now TBH this is the default timeout we have on all our FTR test pipeline, I just re-used it 😄

jedrazb · 2025-11-18T15:04:08Z

.buildkite/pipelines/agent_builder/smoke_tests.yml

+  - command: .buildkite/scripts/steps/test/ftr_configs.sh
+    env:
+      FTR_CONFIG: 'x-pack/platform/test/onechat/smoke_tests/config.stateful.ts'
+      FTR_CONFIG_GROUP_KEY: 'ftr-ai-infra-gen-ai-inference-api'


Is this a typo? Shouldn't this be ftr-agent-builder-smoke-tests?

It indeed is a typo, thanks! (not that this key is really being used for anything as long as its unique)

elasticmachine · 2025-11-18T15:15:58Z

💚 Build Succeeded

Buildkite Build
Commit: 7efd725

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id	before	after	diff
`@kbn/test-suites-xpack-platform`	158	160	+2

Total ESLint disabled count

id	before	after	diff
`@kbn/test-suites-xpack-platform`	168	170	+2

delanni

with a small comment of how to set up a scheduled pipeline if needed

## Summary Fix elastic/search-team#11697 Inspired by elastic#198000 Add basic "real LLM" smoke tests to make sure agent builder's basic functionalities are working as expected with our list of officially supported models. Smoke tests are triggered when the `agent-builder:run-smoke-tests` tag is present on the PR, or when any of the following code has changes: - the GenAI stack connectors - the `inference` plugin and packages - the agent builder plugin or packages This PR re-uses the set of "CI" LLM connectors which was introduced in elastic#198000 for the AI-Infra inference tests.

[Agent Builder] add basic smoke tests

c79070c

pgayvallet added agent-builder:run-smoke-tests Run agent builder's smoke tests on the PR release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting v9.3.0 labels Nov 18, 2025

pgayvallet marked this pull request as ready for review November 18, 2025 13:03

pgayvallet requested review from a team as code owners November 18, 2025 13:03

pgayvallet added 3 commits November 18, 2025 14:17

update suite name

bde5cc7

Merge remote-tracking branch 'upstream/main' into ab-11697-smoke-tests

826b9ce

typo

7efd725

pgayvallet commented Nov 18, 2025

View reviewed changes

chrisbmar reviewed Nov 18, 2025

View reviewed changes

chrisbmar approved these changes Nov 18, 2025

View reviewed changes

jedrazb reviewed Nov 18, 2025

View reviewed changes

delanni approved these changes Nov 18, 2025

View reviewed changes

joemcelroy approved these changes Nov 18, 2025

View reviewed changes

pgayvallet added 4 commits November 19, 2025 13:33

Merge remote-tracking branch 'upstream/main' into ab-11697-smoke-tests

27b206e

add pipeline definition

8e21dd1

fix FTR key

7e54952

add one more test

42e9ff4

pgayvallet merged commit f4b00f3 into elastic:main Nov 19, 2025
12 checks passed

	import { oneChatApiServices } from '../../onechat/services/api';

	export default createStatefulTestConfig({
	services: oneChatApiServices,

Conversation

pgayvallet commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pgayvallet commented Nov 18, 2025

Uh oh!

pgayvallet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgayvallet Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Nov 18, 2025

💚 Build Succeeded

Metrics [docs]

ESLint disabled line counts

Total ESLint disabled count

Uh oh!

delanni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pgayvallet commented Nov 17, 2025 •

edited

Loading

pgayvallet Nov 18, 2025 •

edited

Loading