[Agent Builder] add basic smoke tests#243248
Conversation
|
/ci |
| env: | ||
| FTR_GEN_AI: '1' | ||
| steps: | ||
| - label: '👨🔧 Pre-Build' | ||
| command: .buildkite/scripts/lifecycle/pre_build.sh | ||
| agents: | ||
| image: family/kibana-ubuntu-2404 |
There was a problem hiding this comment.
The dedicated pipeline which will be running every day
There was a problem hiding this comment.
Would you like to set up a scheduled pipeline for this?
If so, find inspiration in .buildkite/pipeline-resource-definitions/kibana-api-docs.yml for example. You'll need such a pipeline resource def file (and an entry in locations.yml) so that Terrazzo can generate it's own pipeline.
| steps: | ||
| - group: Agent Builder Smoke Tests | ||
| key: agent-builder-smoke-tests | ||
| depends_on: | ||
| - build | ||
| - quick_checks | ||
| - checks |
There was a problem hiding this comment.
The conditional pipeline for the pull requests
| if ( | ||
| (await doAnyChangesMatch([...aiInfraPaths, ...aiConnectorPaths, ...agentBuilderPaths])) || | ||
| GITHUB_PR_LABELS.includes('agent-builder:run-smoke-tests') || | ||
| GITHUB_PR_LABELS.includes('ci:all-gen-ai-suites') | ||
| ) { | ||
| pipeline.push(getPipeline('.buildkite/pipelines/pull_request/agent_builder_smoke_tests.yml')); | ||
| } |
There was a problem hiding this comment.
Condition on how the smoke test pipeline is executed for PRs
| connectors.forEach((connector) => { | ||
| describe(`Connector "${connector.id}"`, () => { | ||
| converseApiSuite(connector, providerContext); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Run the smoke tests for each CI connector
There was a problem hiding this comment.
Just so I understand, this will run for every connector that is created with this line in the config:
const preconfiguredConnectors = getPreconfiguredConnectorConfig();? And presumably, right now that's only 1 given the naming of this function is singular: getPreconfiguredConnectorConfig
There was a problem hiding this comment.
There is a single config, but it contains multiple connectors (I'm just bad at naming or at plurals).
I didn't check the exact list, but from what I remember we are running against
- claude 3.5 and 3.7
- Gemini pro 2.5
- GPT-4o
- Maybe others
| describe('Converse API', () => { | ||
| describe('sync', () => { | ||
| it('returns an answer for a simple message', async () => { | ||
| const response = await converse({ | ||
| input: 'Hello', | ||
| connector_id: connectorId, | ||
| }); | ||
|
|
||
| expect(response.response.message.length).to.be.greaterThan(0); | ||
| }); | ||
|
|
||
| it('can execute a tool', async () => { | ||
| const response = await converse({ | ||
| input: 'Please list my indices', | ||
| connector_id: connectorId, | ||
| }); | ||
|
|
||
| expect(response.response.message.length).to.be.greaterThan(0); | ||
|
|
||
| const toolCalls = response.steps.filter(isToolCallStep); | ||
| expect(toolCalls.length).to.eql(1); | ||
|
|
||
| const toolCall = toolCalls[0]; | ||
| expect(toolCall.tool_id).to.eql(platformCoreTools.listIndices); | ||
| }); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
We have stubbed API integ tests to test individual features. The goal is is just to make sure the framework "functions" properly with real LLMs
There was a problem hiding this comment.
Makes sense, I don't know what the requirements are for smoke tests or how far we should go with them
| export default async function (ftrContext: FtrConfigProviderContext) { | ||
| const preconfiguredConnectors = getPreconfiguredConnectorConfig(); | ||
|
|
||
| return createStatefulTestConfig({ | ||
| services: oneChatApiServices, |
There was a problem hiding this comment.
Can you use the services that you expose in the ftrContext argument instead of importing import { oneChatApiServices } from '../services/api';?
There was a problem hiding this comment.
That's misleading because I had to wrap createStatefulTestConfig with my own function to get the connector list, but you are supposed to pass the list of services explicitly (this is actually where you defined your list of services)
We're doing the same for our other test suites, e.g.
| connectors.forEach((connector) => { | ||
| describe(`Connector "${connector.id}"`, () => { | ||
| converseApiSuite(connector, providerContext); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Just so I understand, this will run for every connector that is created with this line in the config:
const preconfiguredConnectors = getPreconfiguredConnectorConfig();? And presumably, right now that's only 1 given the naming of this function is singular: getPreconfiguredConnectorConfig
| describe('Converse API', () => { | ||
| describe('sync', () => { | ||
| it('returns an answer for a simple message', async () => { | ||
| const response = await converse({ | ||
| input: 'Hello', | ||
| connector_id: connectorId, | ||
| }); | ||
|
|
||
| expect(response.response.message.length).to.be.greaterThan(0); | ||
| }); | ||
|
|
||
| it('can execute a tool', async () => { | ||
| const response = await converse({ | ||
| input: 'Please list my indices', | ||
| connector_id: connectorId, | ||
| }); | ||
|
|
||
| expect(response.response.message.length).to.be.greaterThan(0); | ||
|
|
||
| const toolCalls = response.steps.filter(isToolCallStep); | ||
| expect(toolCalls.length).to.eql(1); | ||
|
|
||
| const toolCall = toolCalls[0]; | ||
| expect(toolCall.tool_id).to.eql(platformCoreTools.listIndices); | ||
| }); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Makes sense, I don't know what the requirements are for smoke tests or how far we should go with them
| FTR_GEN_AI: '1' | ||
| label: Agent Builder API Smoke Tests | ||
| key: ftr-agent-builder-smoke-tests | ||
| timeout_in_minutes: 50 |
There was a problem hiding this comment.
In your experience testing this, what was the average minutes for a successful run?
There was a problem hiding this comment.
Atm it takes less than 10mins to run. Now TBH this is the default timeout we have on all our FTR test pipeline, I just re-used it 😄
| - command: .buildkite/scripts/steps/test/ftr_configs.sh | ||
| env: | ||
| FTR_CONFIG: 'x-pack/platform/test/onechat/smoke_tests/config.stateful.ts' | ||
| FTR_CONFIG_GROUP_KEY: 'ftr-ai-infra-gen-ai-inference-api' |
There was a problem hiding this comment.
Is this a typo? Shouldn't this be ftr-agent-builder-smoke-tests?
There was a problem hiding this comment.
It indeed is a typo, thanks! (not that this key is really being used for anything as long as its unique)
delanni
left a comment
There was a problem hiding this comment.
with a small comment of how to set up a scheduled pipeline if needed
## Summary Fix elastic/search-team#11697 Inspired by elastic#198000 Add basic "real LLM" smoke tests to make sure agent builder's basic functionalities are working as expected with our list of officially supported models. Smoke tests are triggered when the `agent-builder:run-smoke-tests` tag is present on the PR, or when any of the following code has changes: - the GenAI stack connectors - the `inference` plugin and packages - the agent builder plugin or packages This PR re-uses the set of "CI" LLM connectors which was introduced in elastic#198000 for the AI-Infra inference tests.
## Summary Fix elastic/search-team#11697 Inspired by elastic#198000 Add basic "real LLM" smoke tests to make sure agent builder's basic functionalities are working as expected with our list of officially supported models. Smoke tests are triggered when the `agent-builder:run-smoke-tests` tag is present on the PR, or when any of the following code has changes: - the GenAI stack connectors - the `inference` plugin and packages - the agent builder plugin or packages This PR re-uses the set of "CI" LLM connectors which was introduced in elastic#198000 for the AI-Infra inference tests.
Summary
Fix https://github.com/elastic/search-team/issues/11697
Inspired by #198000
Add basic "real LLM" smoke tests to make sure agent builder's basic functionalities are working as expected with our list of officially supported models.
Smoke tests are triggered when the
agent-builder:run-smoke-teststag is present on the PR, or when any of the following code has changes:inferenceplugin and packagesThis PR re-uses the set of "CI" LLM connectors which was introduced in #198000 for the AI-Infra inference tests.