Skip to content

[Obs AI Assistant] Gemini prompt improvements#223476

Merged
viduni94 merged 85 commits intoelastic:mainfrom
viduni94:improve-prompts-for-gemini-models
Aug 11, 2025
Merged

[Obs AI Assistant] Gemini prompt improvements#223476
viduni94 merged 85 commits intoelastic:mainfrom
viduni94:improve-prompts-for-gemini-models

Conversation

@viduni94
Copy link
Copy Markdown
Contributor

@viduni94 viduni94 commented Jun 11, 2025

Closes https://github.com/elastic/obs-ai-assistant-team/issues/276

Summary

This PR includes the follows:

  • A new system prompt structure that would work with the below models
    • Claude 3.5
    • Claude 3.7
    • Claude 4
    • GPT-4o
    • GPT-4.1
    • Gemini 2.0 Flash
    • Gemini 2.5 Flash
  • Re-factoring around where some tool instructions are defined. Some tool instructions were moved to the system prompt because we don't have a way of enforcing an order for the instructions in the system prompt if they are registered at the point of function registration.
  • Tool names were extracted to a single file for ease of use.
  • Improvements for some scenarios in the evaluation framework.

These are the scores after the improvements:

image

There are still some known issues (specially in the APM area). It's still being worked on, but would appreciate any feedback/suggestions.

Checklist

  • Unit or functional tests were updated or added to match the most common scenarios
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines

@viduni94 viduni94 self-assigned this Jun 11, 2025
@viduni94 viduni94 force-pushed the improve-prompts-for-gemini-models branch from 4cf5f51 to 23c1a47 Compare June 16, 2025 17:13
@viduni94
Copy link
Copy Markdown
Contributor Author

Prompt changes in #229497 needs to be pulled into this PR.

* **Take Action:** When you detect a keyword, your primary action is to call the \`summarize\` tool. Do not just say that you will remember something.
* **Language:** All summaries **MUST** be generated in English.

**Context Retrieval:** You can use the \`context\` tool to retrieve relevant information from the knowledge database. The response will include a \\"learnings\\" field containing information
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Site note and perhaps better for a follow-up: Should we still call these "learnings" and not just "knowledge base"? It includes both internal knowledge base, search connectors, custom indices and (soon) product docs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating it makes sense to me.


1. **Be Proactive but Clear:** Try to fulfill the user's request directly. If essential information like a time range is missing for tools like \`alerts\` or \`get_apm_dataset_info\` first attempt to retrieve it using the \`context\` tool response. If the context does not provide it, assume a default time range of **start='now-15m'** and **end='now'**. When you use a default time range, *always inform the user* which range was used in your response (e.g., \\"Based on the last 15 minutes...\\").

2. **Ask Only When Necessary:** If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. **Exception:** as mentioned, time range can be missing and you can assume the default time range.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini suggested to add an example here.

Suggested change
2. **Ask Only When Necessary:** If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. **Exception:** as mentioned, time range can be missing and you can assume the default time range.
2. **Ask Only When Necessary:** If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. **Exception:** as mentioned, time range can be missing and you can assume the default time range. **Example:** If a user asks, *"Are there errors in the checkout service?"*, you should use `get_dataset_info` to find relevant fields and assume they mean fields like `error.message` or `log.level: "error"`. Do NOT ask for the specific field name. However, if a user asks, *"Why is my app slow?"*, this is too ambiguous. This is a case where you MUST ask for clarification (e.g., *"Which service are you referring to? I can then check its latency and error rate."*).

const result = await chatClient.evaluate(conversation, [
'Uses the get_alerts_dataset_info function',
'Correctly uses the alerts function without a filter',
'Correctly uses the alerts function',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no longer a filter? I suppose it was adding a filter for "service:my-service"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't add service:my-service as a filter but sometimes adds a filter for alert-status: active or rule-type: threshold which are valid filters.
I can rephrase this better to avoid confusion

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Aug 11, 2025

💛 Build succeeded, but was flaky

  • Buildkite Build
  • Commit: 51e38ff
  • Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-223476-51e38ffba219

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #129 / Interactive setup APIs - Manual configuration flow without TLS should be able to configure with valid authentication code

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
observabilityAIAssistant 107 108 +1

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
observabilityAIAssistant 434 474 +40
observabilityAIAssistantApp 7 5 -2
total +38
Unknown metric groups

API count

id before after diff
observabilityAIAssistant 436 476 +40
observabilityAIAssistantApp 8 6 -2
total +38

History

cc @SrdjanLL @viduni94

@viduni94
Copy link
Copy Markdown
Contributor Author

@viduni94 viduni94 merged commit 763d503 into elastic:main Aug 11, 2025
12 checks passed
NicholasPeretti pushed a commit to NicholasPeretti/kibana that referenced this pull request Aug 18, 2025
Closes elastic/obs-ai-team#276

## Summary

This PR includes the follows:

- A new system prompt structure that would work with the below models
  - Claude 3.5
  - Claude 3.7
  - Claude 4
  - GPT-4o
  - GPT-4.1
  - Gemini 2.0 Flash
  - Gemini 2.5 Flash
- Re-factoring around where some tool instructions are defined. Some
tool instructions were moved to the system prompt because we don't have
a way of enforcing an order for the instructions in the system prompt if
they are registered at the point of function registration.
- Tool names were extracted to a single file for ease of use.
- Improvements for some scenarios in the evaluation framework.

These are the scores after the improvements:

<img width="1572" height="320" alt="image"
src="https://github.com/user-attachments/assets/6777da63-2100-415d-8d3d-edb9e9e6c8ae"
/>


There are still some known issues (specially in the APM area). It's
still being worked on, but would appreciate any feedback/suggestions.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Srdjan Lulic <srdjan.lulic@elastic.co>
qn895 pushed a commit to qn895/kibana that referenced this pull request Aug 26, 2025
Closes elastic/obs-ai-team#276

## Summary

This PR includes the follows:

- A new system prompt structure that would work with the below models
  - Claude 3.5
  - Claude 3.7
  - Claude 4
  - GPT-4o
  - GPT-4.1
  - Gemini 2.0 Flash
  - Gemini 2.5 Flash
- Re-factoring around where some tool instructions are defined. Some
tool instructions were moved to the system prompt because we don't have
a way of enforcing an order for the instructions in the system prompt if
they are registered at the point of function registration.
- Tool names were extracted to a single file for ease of use.
- Improvements for some scenarios in the evaluation framework.

These are the scores after the improvements:

<img width="1572" height="320" alt="image"
src="https://github.com/user-attachments/assets/6777da63-2100-415d-8d3d-edb9e9e6c8ae"
/>


There are still some known issues (specially in the APM area). It's
still being worked on, but would appreciate any feedback/suggestions.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Srdjan Lulic <srdjan.lulic@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting ci:project-deploy-observability Create an Observability project release_note:enhancement Team:Obs AI Assistant Observability AI Assistant Team:obs-ux-infra_services - DEPRECATED DEPRECATED - Use Team:obs-presentation. v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.