-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-query speaker name when multiple speaker names returned during Group Chat speaker selection #2304
Conversation
…alize_speaker to requery on multiple speaker names (if enabled)
…ery_speaker_name_on_multiple
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2304 +/- ##
===========================================
+ Coverage 38.14% 50.01% +11.86%
===========================================
Files 78 78
Lines 7865 7874 +9
Branches 1683 1824 +141
===========================================
+ Hits 3000 3938 +938
+ Misses 4615 3605 -1010
- Partials 250 331 +81
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks for the analysis! It looks like requerying aka give LLM a second chance makes the selection more robust. I am wondering instead of further parametrizing the "auto" method, can we add another speaker_selection_method, such as "auto_with_retry", which retries the selection until a single speaker is returned. Do you think this approach goes a step further to address the robustness issue? Effectively, this will be a new built in speaker selection method. You can see how we can currently use user defined selection method like this: https://microsoft.github.io/autogen/docs/topics/groupchat/customized_speaker_selection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Love the analysis.
Thanks for highlighting the possible approach, @ekzhu - I didn't think about it as a new method but it's definitely worth considering. Regarding the retries until a single speaker is returned:During my testing, I found that if it didn't succeed with the first re-query it was because it returned either:
It was rare for it to return text that still had agent names or useful context to then feed back for a re-query. It is possible that we could introduce a second re-query prompt that is different to the first one (possibly simpler like Alternatively, we could just take the first mentioned name from the original response (which seems to be, more often than not, the correct one) rather than throwing an error. New method, auto_with_retryIn terms of adding a new method, I wasn't sure how much "auto" was already used for logic within the code and was wondering if adding a new, but similar, method would result in having to replicate changes for In Let me know if we do want to make this change and I'll update code. |
Okay, so the manual selection process is breaking when the user does not select an agent (response is blank or "q" in So that needs to be handled. As the on-screen direction is "enter nothing or 'q' to use auto selection", I will have it select the next agent if they don't select a valid one during the manual selection. I've committed a fix. |
…up Chat speaker selection (microsoft#2304) * Added requery_on_multiple_speaker_names to GroupChat and updated _finalize_speaker to requery on multiple speaker names (if enabled) * Removed unnecessary comments * Update to current main * Tweak error message. * Comment clarity * Expanded description of Group Chat requery_on_multiple_speaker_names * Reworked to two-way nested chat for speaker selection with default of 2 retries. * Adding validation of new GroupChat attributes * Updates as per @ekzhu's suggestions * Update groupchat - Added select_speaker_auto_multiple_template and select_speaker_auto_none_template - Added max_attempts comment - Re-instated support for role_for_select_speaker_messages - * Update conversable_agent.py Added ability to force override role for a message to support select speaker prompt. * Update test_groupchat.py Updated existing select_speaker test functions as underlying approach has changed, added necessary tests for new functionality. * Removed block for manual selection in select_speaker function. * Catered for no-selection during manual selection mode --------- Co-authored-by: Chi Wang <[email protected]>
Note: See UPDATED approach in the comment below.
Why are these changes needed?
During the speaker selection process (when in "auto" mode), the LLM returns the name of the next speaker. This is fairly reliable with OpenAI's models but with open-source/weight models they can sometimes have trouble returning, simply, the name of the next speaker. Often, they will return a sentence, paragraph, or even a large sequence of text. Furthermore, I have found that to get the correct next speaker name you often have to prompt these LLMs to provide an explanation in order to have a chain-of-thought that leads the LLM to the correct agent and the resulting response can often include the other agent names as part of the reasoning.
Currently, if there is more than one valid agent name it fails the speaker selection process by returning:
GroupChat select_speaker failed to resolve the next speaker's name. This is because the speaker selection OAI call returned: ...
This PR aims to provide a second-chance by prompting the LLM again, this time with a specific prompt text together with the returned text, asking the LLM to provide just the one agent name based on some rules. In my testing, I have found that this simpler step helps to overcome what would be a failing step and, depending on the model, this can drastically reduce the occurrence of the multiple agent names failure.
How does it work?
GroupChat._finalize_speaker
:The prompt to select the name
I have put together a prompt that performs reasonably well in selecting a speaker name. However, I foresee that this prompt will be tweaked over time and, possibly, have the option to be overridden by the user.
I tried zero-shot prompts and few-shot prompts and, oddly, the zero-shot prompt worked better.
The prompt I have is (where
{name}
is the response with multiple agents names):Testing
I tried the following multiple agent-name response texts, here is the result of my testing.
Tests are based on these 7 multiple agent-name responses (which I have seen through my testing of models):
Product_Manager because they speak after the Chief_Marketing_Officer.
Product_Manager
Thanks Chief_Marketing_Officer, as the Product_Manager my plan is to produce some amazing product ideas.
Product_Manager
Product_Manager. Here are five ideas that I think will impress the Chief_Marketing_Officer and be great for a marketing strategy for the Digital_Marketer.
Product_Manager
The next speaker, after the Chief_Marketing_Officer, is the Product_Manager.
Product_Manager
As the Product_Manager I've decided that an infotainment system that links up with your phone will have the biggest impact on the marketplace. Digital_Marketer, over to you for an amazing strategy.
Product_Manager
Thank you Digital_Marketer, the next speaker will be Chief_Marketing_Officer.
Chief_Marketing_Officer
What a great team! Let's hear from the Chief_Marketing_Officer now that the Product_Manager has some ideas.
Chief_Marketing_Officer
Results
I believe this capability will noticeably improve the reliability of GroupChat speaker selection
What is missing from this PR
Tests - As this PR is based on LLM responses, I need some guidance on what tests (if any) to create. Additionally, as this is focused more towards alt-models, can that be tested in any way. Finally, we would need to make sure that the responses are consistent.
Documentation - Along with the broader need for GroupChat documents (see #2243), I think this could be added to that PR as well as tips for Non-OpenAI models.
Thanks!
Related issue number
Based on shortcomings identified in #1746.
Checks
passed.