[Obs AI Assistant] Add intent parameter to the query function and control downstream tool calling#228456
[Obs AI Assistant] Add intent parameter to the query function and control downstream tool calling#228456SrdjanLL wants to merge 4 commits intoelastic:mainfrom
Conversation
…l choice dependeing on the user's prompt
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
|
@SrdjanLL We've had this before actually. It did work better for our specific evaluations, but I don't think we should over-index on those. My concern is here that you force the LLM to make a decision before understanding the context. E.g., once it sees "execute", it might forcefully try to execute a query even though either ES|QL doesn't support it or it cannot find the data. I would prefer that we expand our evaluation examples first with more realistic scenarios, and in general, I'd like to hold off making changes here until we have expanded our evals. We also have #226616 in the waiting room. |
|
Thanks for the context @dgieselaar. This change gave a notable bump to the performance, but I didn't want to throw this in an already crowded PR for prompt improvements and was hoping to get some quick feedback like this. From the workflow point of view I didn't see this change cause any errors (such as invalid/unavailable tool calls), which was a positive. On the decision-making/reasoning constraints through forced tool call, did you have a way to test this or can recall a scenario from the past that prompted the update to this workflow?
I agree with this, overall (and I have some ideas that I'd happily share), but just want to point out that even with the current evaluation scenarios, we see the pattern of hesitant tool calling with Gemini and I was hoping to avoid yelling in the system prompt and tool descriptions - hence this PR 😅 I would like to see whether #226616 helps overcome some of this. |
💔 Build Failed
Failed CI StepsTest Failures
Metrics [docs]
History
|
|
No longer considering this change. Future enhancements on this will likely move us towards Agent Builder. |
Relates to https://github.com/elastic/obs-ai-assistant-team/issues/276
Closes https://github.com/elastic/obs-ai-assistant-team/issues/324
Summary
Adds explicit
queryIntenthandling to the query function of the AI Assistant for more controlled downstream tool control:queryIntentparameter (('example' | 'data' | 'visual').'data'→ forceexecute_query'visual'→ forcevisualize_query'example'→ expose no execution / visualisation toolstoolChoicehint tonaturalLanguageToEsqlfor deterministic tool calling.Why?
As part of Gemini Prompt improvements (PR), we found that some models are less/more eager to execute tools and that no mater how (DIRECT) the system prompt, tool execution is not so deterministic. We also need to thread carefully between too eager models (like Claude) and less eager models (like Gemini 2.0 Flash) and figure out a structured balance.
The change outlined above showed as one of the strongest contributors to the improvements of the scores from the evaluation framework.
Evaluation Benchmark
*execute_connector evaluation scores are available, but omitted from the summary for comparison
Running on prompt improvements branch:
Gemini:
Claude:
Running on main
Gemini (no significant changes):
Claude (no significant changes):
Testing
Identify risks
queryIntentin the system prompt and/or provide few-shot examples.