[Obs AI Assistant] Gemini prompt improvements#223476
Conversation
4cf5f51 to
23c1a47
Compare
…rieval and further refine related parts of the system prompt
…on" module everywhere
…execution workflow
|
Prompt changes in #229497 needs to be pulled into this PR. |
| * **Take Action:** When you detect a keyword, your primary action is to call the \`summarize\` tool. Do not just say that you will remember something. | ||
| * **Language:** All summaries **MUST** be generated in English. | ||
|
|
||
| **Context Retrieval:** You can use the \`context\` tool to retrieve relevant information from the knowledge database. The response will include a \\"learnings\\" field containing information |
There was a problem hiding this comment.
Site note and perhaps better for a follow-up: Should we still call these "learnings" and not just "knowledge base"? It includes both internal knowledge base, search connectors, custom indices and (soon) product docs.
There was a problem hiding this comment.
Updating it makes sense to me.
...server/prompts/tests/__snapshots__/system_prompt.obs_kb_ready_doc_available_ech.test.ts.snap
Outdated
Show resolved
Hide resolved
|
|
||
| 1. **Be Proactive but Clear:** Try to fulfill the user's request directly. If essential information like a time range is missing for tools like \`alerts\` or \`get_apm_dataset_info\` first attempt to retrieve it using the \`context\` tool response. If the context does not provide it, assume a default time range of **start='now-15m'** and **end='now'**. When you use a default time range, *always inform the user* which range was used in your response (e.g., \\"Based on the last 15 minutes...\\"). | ||
|
|
||
| 2. **Ask Only When Necessary:** If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. **Exception:** as mentioned, time range can be missing and you can assume the default time range. |
There was a problem hiding this comment.
Gemini suggested to add an example here.
| 2. **Ask Only When Necessary:** If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. **Exception:** as mentioned, time range can be missing and you can assume the default time range. | |
| 2. **Ask Only When Necessary:** If key information is missing or ambiguous, or if using a default seems inappropriate for the specific request, ask the user for clarification. **Exception:** as mentioned, time range can be missing and you can assume the default time range. **Example:** If a user asks, *"Are there errors in the checkout service?"*, you should use `get_dataset_info` to find relevant fields and assume they mean fields like `error.message` or `log.level: "error"`. Do NOT ask for the specific field name. However, if a user asks, *"Why is my app slow?"*, this is too ambiguous. This is a case where you MUST ask for clarification (e.g., *"Which service are you referring to? I can then check its latency and error rate."*). |
| const result = await chatClient.evaluate(conversation, [ | ||
| 'Uses the get_alerts_dataset_info function', | ||
| 'Correctly uses the alerts function without a filter', | ||
| 'Correctly uses the alerts function', |
There was a problem hiding this comment.
Why no longer a filter? I suppose it was adding a filter for "service:my-service"
There was a problem hiding this comment.
It doesn't add service:my-service as a filter but sometimes adds a filter for alert-status: active or rule-type: threshold which are valid filters.
I can rephrase this better to avoid confusion
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Public APIs missing comments
Unknown metric groupsAPI count
History
|
|
Latest eval results for ES|QL: Current eval framework evals - https://github.com/elastic/obs-ai-assistant-team/issues/276#issuecomment-3161763946 |
Closes elastic/obs-ai-team#276 ## Summary This PR includes the follows: - A new system prompt structure that would work with the below models - Claude 3.5 - Claude 3.7 - Claude 4 - GPT-4o - GPT-4.1 - Gemini 2.0 Flash - Gemini 2.5 Flash - Re-factoring around where some tool instructions are defined. Some tool instructions were moved to the system prompt because we don't have a way of enforcing an order for the instructions in the system prompt if they are registered at the point of function registration. - Tool names were extracted to a single file for ease of use. - Improvements for some scenarios in the evaluation framework. These are the scores after the improvements: <img width="1572" height="320" alt="image" src="https://github.com/user-attachments/assets/6777da63-2100-415d-8d3d-edb9e9e6c8ae" /> There are still some known issues (specially in the APM area). It's still being worked on, but would appreciate any feedback/suggestions. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Srdjan Lulic <srdjan.lulic@elastic.co>
Closes elastic/obs-ai-team#276 ## Summary This PR includes the follows: - A new system prompt structure that would work with the below models - Claude 3.5 - Claude 3.7 - Claude 4 - GPT-4o - GPT-4.1 - Gemini 2.0 Flash - Gemini 2.5 Flash - Re-factoring around where some tool instructions are defined. Some tool instructions were moved to the system prompt because we don't have a way of enforcing an order for the instructions in the system prompt if they are registered at the point of function registration. - Tool names were extracted to a single file for ease of use. - Improvements for some scenarios in the evaluation framework. These are the scores after the improvements: <img width="1572" height="320" alt="image" src="https://github.com/user-attachments/assets/6777da63-2100-415d-8d3d-edb9e9e6c8ae" /> There are still some known issues (specially in the APM area). It's still being worked on, but would appreciate any feedback/suggestions. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Srdjan Lulic <srdjan.lulic@elastic.co>
Closes https://github.com/elastic/obs-ai-assistant-team/issues/276
Summary
This PR includes the follows:
These are the scores after the improvements:
There are still some known issues (specially in the APM area). It's still being worked on, but would appreciate any feedback/suggestions.
Checklist
release_note:*label is applied per the guidelines