Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add math-class group chat test #309

Closed

Conversation

LittleLittleCloud
Copy link
Collaborator

@LittleLittleCloud LittleLittleCloud commented Oct 20, 2023

Why are these changes needed?

this test is imported from #102

After #294 it's possible to test the group chat result in a more robust way.

By using function_call and adding a special prefix to messages, we can examine the conversation flow inside a group chat.

This test also provides an example for #133, which terminates group chat/two agent chat in a more controlable way

Related issue number

Checks

@codecov-commenter
Copy link

codecov-commenter commented Oct 20, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (d542340) 28.70% compared to head (c8032f2) 37.00%.

Files Patch % Lines
autogen/agentchat/groupchat.py 14.28% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #309      +/-   ##
==========================================
+ Coverage   28.70%   37.00%   +8.29%     
==========================================
  Files          27       27              
  Lines        3383     3389       +6     
  Branches      760      762       +2     
==========================================
+ Hits          971     1254     +283     
+ Misses       2341     2018     -323     
- Partials       71      117      +46     
Flag Coverage Δ
unittests 36.94% <14.28%> (+8.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@sonichi sonichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good structure wise.

- speaker selection should work under a continuous q&a scenario among two agents and GPT 3.5 model.
- admin should end the class when teacher has created 3 questions.
"""
skip_if_openai_not_available()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This is better than my old way. @rickyloynd-microsoft @thinkall @kevin666aa FYI.

@sonichi
Copy link
Contributor

sonichi commented Oct 20, 2023

If you know which user can benefit from this PR, could you ask them to review it?

@LittleLittleCloud
Copy link
Collaborator Author

@sonichi. It's @afourney. I saw you just sent him a request

],
}

def terminate_group_chat(message):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afourney This is how to achieve a more robust terminating strategy via function_call, could you review it?

Copy link
Collaborator

@gagb gagb Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LittleLittleCloud I am having trouble understand why this strategy is more robust?

Suggested strategy: one of the agents calls a group chat termination function. user proxy executes. manager detects termination string

Old strategy: one of the agent generates a termination string. manager detects.

If an agent is smart enough for suggested strategy then it should be able to also do old strategy? Sorry if I missed the argument for the increased robustness.

Copy link
Collaborator Author

@LittleLittleCloud LittleLittleCloud Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an agent is smart enough for suggested strategy then it should be able to also do old strategy?

Yes! Unfortunately, that's not the case in the real world, especially for gpt-3.5-turbo. For some quite obvious reason I still use gpt-3.5-turbo most frequently and gpt-3.5-turbo agents are not very robust in generating correct termination messages.

Even for gpt-4 agents, they might also fail to give the correct termination world when the conversation grows.

The new strategy (using termination_function_call) can make sure the group chat terminate correctly when that termination function_call get triggered. That strategy also works well on gpt-3.5-turbo which fined-tuned for function_call.

Copy link
Collaborator

@gagb gagb Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it -- the fact that function calls are prioritized makes it more robust. I added this comment to #525

@qingyun-wu qingyun-wu requested a review from yiranwu0 October 21, 2023 19:09
@gagb
Copy link
Collaborator

gagb commented Nov 3, 2023

Once we review and approve this PR I think we can make progress on #525 and #517. (cc @victordibia @pcdeadeasy)

Comment on lines 183 to 191

if (
type(reply) == dict
and self._is_termination_msg(reply)
or type(reply) == str
and self._is_termination_msg({"content": reply})
):
break

Copy link
Collaborator

@gagb gagb Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LittleLittleCloud, should these lines be after speaker.send below??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I usually use isinstance() instead of type()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LittleLittleCloud, should these lines be after speaker.send below??

Did this suggestion make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

system_message="""You are a pre-school math teacher, you create 3 math questions for student to resolve.
Here's your workflow:
-workflow-
if question count > 3 say [COMPLETE].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be better to make the if-else grammar consistent in your code.

@qingyun-wu
Copy link
Contributor

Hi @LittleLittleCloud, do you have a plan to further update this PR? There are some conflicts in two files due to some recent updates in group chat.

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 25, 2023

@LittleLittleCloud bump so this PR doesn't get abandoned :D

@gagb gagb closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants