Skip to content

[Agent Builder] add basic smoke tests#243248

Merged
pgayvallet merged 8 commits intoelastic:mainfrom
pgayvallet:ab-11697-smoke-tests
Nov 19, 2025
Merged

[Agent Builder] add basic smoke tests#243248
pgayvallet merged 8 commits intoelastic:mainfrom
pgayvallet:ab-11697-smoke-tests

Conversation

@pgayvallet
Copy link
Copy Markdown
Contributor

@pgayvallet pgayvallet commented Nov 17, 2025

Summary

Fix https://github.com/elastic/search-team/issues/11697
Inspired by #198000

Add basic "real LLM" smoke tests to make sure agent builder's basic functionalities are working as expected with our list of officially supported models.

Smoke tests are triggered when the agent-builder:run-smoke-tests tag is present on the PR, or when any of the following code has changes:

  • the GenAI stack connectors
  • the inference plugin and packages
  • the agent builder plugin or packages

This PR re-uses the set of "CI" LLM connectors which was introduced in #198000 for the AI-Infra inference tests.

@pgayvallet pgayvallet added agent-builder:run-smoke-tests Run agent builder's smoke tests on the PR release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting v9.3.0 labels Nov 18, 2025
@pgayvallet
Copy link
Copy Markdown
Contributor Author

/ci

@pgayvallet pgayvallet marked this pull request as ready for review November 18, 2025 13:03
@pgayvallet pgayvallet requested review from a team as code owners November 18, 2025 13:03
Copy link
Copy Markdown
Contributor Author

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-review

Comment on lines +1 to +7
env:
FTR_GEN_AI: '1'
steps:
- label: '👨‍🔧 Pre-Build'
command: .buildkite/scripts/lifecycle/pre_build.sh
agents:
image: family/kibana-ubuntu-2404
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dedicated pipeline which will be running every day

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to set up a scheduled pipeline for this?
If so, find inspiration in .buildkite/pipeline-resource-definitions/kibana-api-docs.yml for example. You'll need such a pipeline resource def file (and an entry in locations.yml) so that Terrazzo can generate it's own pipeline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Done in 8e21dd1

Comment on lines +1 to +7
steps:
- group: Agent Builder Smoke Tests
key: agent-builder-smoke-tests
depends_on:
- build
- quick_checks
- checks
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional pipeline for the pull requests

Comment on lines +158 to +164
if (
(await doAnyChangesMatch([...aiInfraPaths, ...aiConnectorPaths, ...agentBuilderPaths])) ||
GITHUB_PR_LABELS.includes('agent-builder:run-smoke-tests') ||
GITHUB_PR_LABELS.includes('ci:all-gen-ai-suites')
) {
pipeline.push(getPipeline('.buildkite/pipelines/pull_request/agent_builder_smoke_tests.yml'));
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition on how the smoke test pipeline is executed for PRs

Comment on lines +17 to +21
connectors.forEach((connector) => {
describe(`Connector "${connector.id}"`, () => {
converseApiSuite(connector, providerContext);
});
});
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the smoke tests for each CI connector

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand, this will run for every connector that is created with this line in the config:

const preconfiguredConnectors = getPreconfiguredConnectorConfig();? And presumably, right now that's only 1 given the naming of this function is singular: getPreconfiguredConnectorConfig

Copy link
Copy Markdown
Contributor Author

@pgayvallet pgayvallet Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a single config, but it contains multiple connectors (I'm just bad at naming or at plurals).

I didn't check the exact list, but from what I remember we are running against

  • claude 3.5 and 3.7
  • Gemini pro 2.5
  • GPT-4o
  • Maybe others

Comment on lines +35 to +61
describe('Converse API', () => {
describe('sync', () => {
it('returns an answer for a simple message', async () => {
const response = await converse({
input: 'Hello',
connector_id: connectorId,
});

expect(response.response.message.length).to.be.greaterThan(0);
});

it('can execute a tool', async () => {
const response = await converse({
input: 'Please list my indices',
connector_id: connectorId,
});

expect(response.response.message.length).to.be.greaterThan(0);

const toolCalls = response.steps.filter(isToolCallStep);
expect(toolCalls.length).to.eql(1);

const toolCall = toolCalls[0];
expect(toolCall.tool_id).to.eql(platformCoreTools.listIndices);
});
});
});
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have stubbed API integ tests to test individual features. The goal is is just to make sure the framework "functions" properly with real LLMs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I don't know what the requirements are for smoke tests or how far we should go with them

Comment on lines +14 to +18
export default async function (ftrContext: FtrConfigProviderContext) {
const preconfiguredConnectors = getPreconfiguredConnectorConfig();

return createStatefulTestConfig({
services: oneChatApiServices,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the services that you expose in the ftrContext argument instead of importing import { oneChatApiServices } from '../services/api';?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's misleading because I had to wrap createStatefulTestConfig with my own function to get the connector list, but you are supposed to pass the list of services explicitly (this is actually where you defined your list of services)

We're doing the same for our other test suites, e.g.

import { oneChatApiServices } from '../../onechat/services/api';
export default createStatefulTestConfig({
services: oneChatApiServices,

Comment on lines +17 to +21
connectors.forEach((connector) => {
describe(`Connector "${connector.id}"`, () => {
converseApiSuite(connector, providerContext);
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand, this will run for every connector that is created with this line in the config:

const preconfiguredConnectors = getPreconfiguredConnectorConfig();? And presumably, right now that's only 1 given the naming of this function is singular: getPreconfiguredConnectorConfig

Comment on lines +35 to +61
describe('Converse API', () => {
describe('sync', () => {
it('returns an answer for a simple message', async () => {
const response = await converse({
input: 'Hello',
connector_id: connectorId,
});

expect(response.response.message.length).to.be.greaterThan(0);
});

it('can execute a tool', async () => {
const response = await converse({
input: 'Please list my indices',
connector_id: connectorId,
});

expect(response.response.message.length).to.be.greaterThan(0);

const toolCalls = response.steps.filter(isToolCallStep);
expect(toolCalls.length).to.eql(1);

const toolCall = toolCalls[0];
expect(toolCall.tool_id).to.eql(platformCoreTools.listIndices);
});
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I don't know what the requirements are for smoke tests or how far we should go with them

FTR_GEN_AI: '1'
label: Agent Builder API Smoke Tests
key: ftr-agent-builder-smoke-tests
timeout_in_minutes: 50
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your experience testing this, what was the average minutes for a successful run?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atm it takes less than 10mins to run. Now TBH this is the default timeout we have on all our FTR test pipeline, I just re-used it 😄

- command: .buildkite/scripts/steps/test/ftr_configs.sh
env:
FTR_CONFIG: 'x-pack/platform/test/onechat/smoke_tests/config.stateful.ts'
FTR_CONFIG_GROUP_KEY: 'ftr-ai-infra-gen-ai-inference-api'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo? Shouldn't this be ftr-agent-builder-smoke-tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It indeed is a typo, thanks! (not that this key is really being used for anything as long as its unique)

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id before after diff
@kbn/test-suites-xpack-platform 158 160 +2

Total ESLint disabled count

id before after diff
@kbn/test-suites-xpack-platform 168 170 +2

Copy link
Copy Markdown
Contributor

@delanni delanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a small comment of how to set up a scheduled pipeline if needed

@pgayvallet pgayvallet merged commit f4b00f3 into elastic:main Nov 19, 2025
12 checks passed
andrimal pushed a commit to andrimal/kibana that referenced this pull request Nov 20, 2025
## Summary
Fix elastic/search-team#11697
Inspired by elastic#198000

Add basic "real LLM" smoke tests to make sure agent builder's basic
functionalities are working as expected with our list of officially
supported models.

Smoke tests are triggered when the `agent-builder:run-smoke-tests` tag
is present on the PR, or when any of the following code has changes:
- the GenAI stack connectors
- the `inference` plugin and packages
- the agent builder plugin or packages

This PR re-uses the set of "CI" LLM connectors which was introduced in
elastic#198000 for the AI-Infra inference
tests.
eokoneyo pushed a commit to eokoneyo/kibana that referenced this pull request Dec 2, 2025
## Summary
Fix elastic/search-team#11697
Inspired by elastic#198000

Add basic "real LLM" smoke tests to make sure agent builder's basic
functionalities are working as expected with our list of officially
supported models.

Smoke tests are triggered when the `agent-builder:run-smoke-tests` tag
is present on the PR, or when any of the following code has changes:
- the GenAI stack connectors
- the `inference` plugin and packages
- the agent builder plugin or packages

This PR re-uses the set of "CI" LLM connectors which was introduced in
elastic#198000 for the AI-Infra inference
tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-builder:run-smoke-tests Run agent builder's smoke tests on the PR backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants