[Obs AI Assistant] Gemini prompt improvements by viduni94 · Pull Request #223476 · elastic/kibana

viduni94 · 2025-06-11T22:15:20Z

Closes https://github.com/elastic/obs-ai-assistant-team/issues/276

Summary

This PR includes the follows:

A new system prompt structure that would work with the below models
- Claude 3.5
- Claude 3.7
- Claude 4
- GPT-4o
- GPT-4.1
- Gemini 2.0 Flash
- Gemini 2.5 Flash
Re-factoring around where some tool instructions are defined. Some tool instructions were moved to the system prompt because we don't have a way of enforcing an order for the instructions in the system prompt if they are registered at the point of function registration.
Tool names were extracted to a single file for ease of use.
Improvements for some scenarios in the evaluation framework.

These are the scores after the improvements:

There are still some known issues (specially in the APM area). It's still being worked on, but would appreciate any feedback/suggestions.

Checklist

Unit or functional tests were updated or added to match the most common scenarios
The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines

…rieval and further refine related parts of the system prompt

…on" module everywhere

…execution workflow

…uction ordering

viduni94 · 2025-07-28T14:08:49Z

Prompt changes in #229497 needs to be pulled into this PR.

sorenlouv · 2025-07-28T21:38:36Z

...server/prompts/tests/__snapshots__/system_prompt.obs_kb_ready_doc_available_ech.test.ts.snap

+          * **Take Action:** When you detect a keyword, your primary action is to call the \`summarize\` tool. Do not just say that you will remember something.
+          * **Language:** All summaries **MUST** be generated in English.
+
+**Context Retrieval:** You can use the \`context\` tool to retrieve relevant information from the knowledge database. The response will include a \\"learnings\\" field containing information


Site note and perhaps better for a follow-up: Should we still call these "learnings" and not just "knowledge base"? It includes both internal knowledge base, search connectors, custom indices and (soon) product docs.

Updating it makes sense to me.

...server/prompts/tests/__snapshots__/system_prompt.obs_kb_ready_doc_available_ech.test.ts.snap

sorenlouv · 2025-07-28T22:08:08Z

...__snapshots__/system_prompt.obs_kb_ready_doc_available_serverless_all_functions.test.ts.snap

+
+1. **Be Proactive but Clear:** Try to fulfill the user's request directly. If essential information like a time range is missing for tools like \`alerts\` or \`get_apm_dataset_info\` first attempt to retrieve it using the \`context\` tool response. If the context does not provide it, assume a default time range of **start='now-15m'** and **end='now'**. When you use a default time range, *always inform the user* which range was used in your response (e.g., \\"Based on the last 15 minutes...\\").
+
+2. **Ask Only When Necessary:** If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. **Exception:**  as mentioned, time range can be missing and you can assume the default time range.


Gemini suggested to add an example here.

Suggested change

2. **Ask Only When Necessary:** If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. **Exception:** as mentioned, time range can be missing and you can assume the default time range.

2. **Ask Only When Necessary:** If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. **Exception:** as mentioned, time range can be missing and you can assume the default time range. **Example:** If a user asks, *"Are there errors in the checkout service?"*, you should use `get_dataset_info` to find relevant fields and assume they mean fields like `error.message` or `log.level: "error"`. Do NOT ask for the specific field name. However, if a user asks, *"Why is my app slow?"*, this is too ambiguous. This is a case where you MUST ask for clarification (e.g., *"Which service are you referring to? I can then check its latency and error rate."*).

sorenlouv · 2025-07-29T15:00:11Z

...ity/plugins/observability_ai_assistant_app/scripts/evaluation/scenarios/alerts/index.spec.ts

    const result = await chatClient.evaluate(conversation, [
      'Uses the get_alerts_dataset_info function',
-      'Correctly uses the alerts function without a filter',
+      'Correctly uses the alerts function',


Why no longer a filter? I suppose it was adding a filter for "service:my-service"

It doesn't add service:my-service as a filter but sometimes adds a filter for alert-status: active or rule-type: threshold which are valid filters.
I can rephrase this better to avoid confusion

elasticmachine · 2025-08-11T18:15:29Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 51e38ff
Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-223476-51e38ffba219

Failed CI Steps

FTR Configs #129

Test Failures

[job] [logs] FTR Configs #129 / Interactive setup APIs - Manual configuration flow without TLS should be able to configure with valid authentication code

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`observabilityAIAssistant`	107	108	+1

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`observabilityAIAssistant`	434	474	+40
`observabilityAIAssistantApp`	7	5	-2
total			+38

Unknown metric groups

API count

id	before	after	diff
`observabilityAIAssistant`	436	476	+40
`observabilityAIAssistantApp`	8	6	-2
total			+38

History

💚 Build #328357 succeeded 13bc13f
💔 Build #328345 failed 8edf472
💛 Build #328021 was flaky 1db394b
💔 Build #327904 failed 61c0732
💚 Build #327447 succeeded 5e2f90d

cc @SrdjanLL @viduni94

viduni94 · 2025-08-11T23:36:35Z

Latest eval results for ES|QL:

Current eval framework evals - https://github.com/elastic/obs-ai-assistant-team/issues/276#issuecomment-3161763946
Phoenix evals - https://github.com/elastic/obs-ai-assistant-team/issues/276#issuecomment-3177197823

Closes elastic/obs-ai-team#276 ## Summary This PR includes the follows: - A new system prompt structure that would work with the below models - Claude 3.5 - Claude 3.7 - Claude 4 - GPT-4o - GPT-4.1 - Gemini 2.0 Flash - Gemini 2.5 Flash - Re-factoring around where some tool instructions are defined. Some tool instructions were moved to the system prompt because we don't have a way of enforcing an order for the instructions in the system prompt if they are registered at the point of function registration. - Tool names were extracted to a single file for ease of use. - Improvements for some scenarios in the evaluation framework. These are the scores after the improvements: <img width="1572" height="320" alt="image" src="https://github.com/user-attachments/assets/6777da63-2100-415d-8d3d-edb9e9e6c8ae" /> There are still some known issues (specially in the APM area). It's still being worked on, but would appreciate any feedback/suggestions. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Srdjan Lulic <srdjan.lulic@elastic.co>

viduni94 self-assigned this Jun 11, 2025

Gemini prompts

23c1a47

viduni94 force-pushed the improve-prompts-for-gemini-models branch from 4cf5f51 to 23c1a47 Compare June 16, 2025 17:13

viduni94 and others added 9 commits June 16, 2025 14:29

Add new prompt with conditions

3e159ba

Update prompt based on deployment

8679aa6

Remove unnecessary new line

f5aae43

Alerts function

6c6ea27

Elasticsearch function prompt tweaks

9569a46

Merge branch 'main' into improve-prompts-for-gemini-models

07904a1

Merge branch 'main' into improve-prompts-for-gemini-models

42d77c4

Improve system prompt for generating ES|QL

f81358f

Merge branch 'main' into improve-prompts-for-gemini-models

4015d54

viduni94 assigned SrdjanLL Jun 25, 2025

SrdjanLL and others added 17 commits June 26, 2025 14:15

Merge branch 'main' into improve-prompts-for-gemini-models

38f247f

Merge branch 'main' into improve-prompts-for-gemini-models

59b9f68

Prompt tunning for APM scenarios

18f3c1e

Merge branch 'main' into improve-prompts-for-gemini-models

d5c2906

Re-factor system prompt

ebe6a7f

Set "context" tool as non-internal in order to fix knowledge base ret…

5dcbe60

…rieval and further refine related parts of the system prompt

Merge branch 'main' into improve-prompts-for-gemini-models

eac2764

Additional system prompt changes

5de47ff

Make context tool internal again and use its function name from "comm…

8b614a4

…on" module everywhere

A few more refactors

4b9ebd7

Remove *query tools from the system prompt

21e641d

Improve KB retrieval

e8d73af

Merge branch 'main' into improve-prompts-for-gemini-models

55735ff

Update tool name

72d032f

Merge branch 'main' into improve-prompts-for-gemini-models

266979f

Add instructions to the NL-ES|QL system prompt specific to the query …

e7dd784

…execution workflow

Move documentation tool instruction to the system prompt to fix instr…

02ca4a4

…uction ordering

viduni94 added 3 commits July 28, 2025 10:11

Merge branch 'main' into improve-prompts-for-gemini-models

d46b55a

Improve instructions for the ES tool to disallow some actions

a71c1d9

Add snapshot tests

b682dfb

sorenlouv reviewed Jul 28, 2025

View reviewed changes

...server/prompts/tests/__snapshots__/system_prompt.obs_kb_ready_doc_available_ech.test.ts.snap Outdated Show resolved Hide resolved

sorenlouv reviewed Jul 28, 2025

View reviewed changes

sphilipse approved these changes Jul 29, 2025

View reviewed changes

sorenlouv reviewed Jul 29, 2025

View reviewed changes

viduni94 added 15 commits July 30, 2025 11:56

Merge branch 'main' into improve-prompts-for-gemini-models

2366166

Merge branch 'main' into improve-prompts-for-gemini-models

0122059

Fix i18n

218351e

Fix typo

d008e38

Update snapshots

8437ff6

Merge branch 'main' into improve-prompts-for-gemini-models

c918222

Merge branch 'main' into improve-prompts-for-gemini-models

daaa0f1

Merge branch 'main' into improve-prompts-for-gemini-models

a03ef08

Merge branch 'main' into improve-prompts-for-gemini-models

5e2f90d

Small prompt tweak for time range handling

57a44b2

Merge branch 'main' into improve-prompts-for-gemini-models

61c0732

Update prompt snapshots

1db394b

Merge branch 'main' into improve-prompts-for-gemini-models

8edf472

Fix merge conflict

13bc13f

Remove extra ES tool instruction from merging main

51e38ff

viduni94 merged commit 763d503 into elastic:main Aug 11, 2025
12 checks passed

kibanamachine added the v9.2.0 label Aug 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Obs AI Assistant] Gemini prompt improvements#223476

[Obs AI Assistant] Gemini prompt improvements#223476
viduni94 merged 85 commits intoelastic:mainfrom
viduni94:improve-prompts-for-gemini-models

viduni94 commented Jun 11, 2025 •

edited

Loading

Uh oh!

viduni94 commented Jul 28, 2025

Uh oh!

sorenlouv Jul 28, 2025

Uh oh!

viduni94 Jul 31, 2025

Uh oh!

Uh oh!

sorenlouv Jul 28, 2025

Uh oh!

sorenlouv Jul 29, 2025

Uh oh!

viduni94 Jul 30, 2025

Uh oh!

elasticmachine commented Aug 11, 2025 •

edited

Loading

API count

Uh oh!

viduni94 commented Aug 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants


		1. Be Proactive but Clear: Try to fulfill the user's request directly. If essential information like a time range is missing for tools like \`alerts\` or \`get_apm_dataset_info\` first attempt to retrieve it using the \`context\` tool response. If the context does not provide it, assume a default time range of start='now-15m' and end='now'. When you use a default time range, always inform the user which range was used in your response (e.g., \\"Based on the last 15 minutes...\\").

		2. Ask Only When Necessary: If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. Exception: as mentioned, time range can be missing and you can assume the default time range.

Conversation

viduni94 commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Uh oh!

viduni94 commented Jul 28, 2025

Uh oh!

sorenlouv Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

viduni94 Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sorenlouv Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

sorenlouv Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

viduni94 Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

Module Count

Public APIs missing comments

API count

History

Uh oh!

viduni94 commented Aug 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

viduni94 commented Jun 11, 2025 •

edited

Loading

elasticmachine commented Aug 11, 2025 •

edited

Loading