-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User Proxy Spam - Unresolved Agent Collaboration #624
Comments
I've seen this too. Here's an example:
|
I'm seeing this also on GPT-4 Turbo; not sure if this also happens on vanilla GPT-4. Version:
Code:
Output
Please save the above script as
exitcode: 1 (execution failed) Coder (to chat_manager): The error indicates that the Please run the following command to install the pip install feedparser After installing the module, please try running the
exitcode: 0 (execution succeeded)
exitcode: 0 (execution succeeded)
|
Okay, I briefly looked into this and I think I know broadly the logic flow that is leading to this behavior, at least for my example. Basically, at the end of the installation, the chat manager decides to select the user proxy as the speaker, reasonably so since the next action is to execute the previously unexecutable Python code. But at this point in time, the user proxy can't actually do that, because the Python code was posted several messages ago, and so as far as the user proxy auto-reply is concerned regarding the lookback messages in its code execution config, there's nothing to actually do, so it simply replies with the default auto-reply value of Edit: So I found something of a hack to fix the case I had, though I don't know how generalizable this is. Basically, I added in a task manager agent for reminding/terminating the work, and have the user auto-reply with asking for the next task. That's enough to get the group to rewind back to the previous code execution: Codeimport autogen
config_list = config_list_from_models(model_list=["gpt-4-1106-preview"])
llm_config = {
"raise_on_ratelimit_or_timeout": None,
"request_timeout": 600,
"seed": 42,
"config_list": config_list,
"temperature": 0,
}
user_proxy = autogen.UserProxyAgent(
name="User_proxy",
system_message="A human admin.",
code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
human_input_mode="TERMINATE",
default_auto_reply="Is there anything else left to do?",
)
tm = autogen.AssistantAgent(
name="Task_manager",
system_message="Keeps the group on track by reminding everyone of what needs to be done next, repeating instructions/code if necessary. Reply TERMINATE if the original task is done.",
llm_config=llm_config,
)
coder = autogen.AssistantAgent(
name="Coder",
llm_config=llm_config,
)
pm = autogen.AssistantAgent(
name="Product_manager",
system_message="Creative in software product ideas.",
llm_config=llm_config,
) |
Correct! That's what happened. |
Clever! @gagb @qingyun-wu and I discussed it before and thought this a promising way. |
Just to confirm, a failure to find the original code is 100% occurring and should be addressed. But, it's also the case that the GroupChatManager keeps picking the user_proxy... and I'm not sure that's what we want either. After a few identical failures, I'd expect the GroupChatManager to change its strategy and maybe get the coder involved again. |
I dug into it some more and it feels like part of the issue is the speaker selection prompts. These are currently hardcoded in The first one is an override of the def select_speaker_msg(self, agents: List[Agent]):
"""Return the message for selecting the next speaker."""
return f"""You are in a role play game. The following roles are available:
{self._participant_roles()}.
Read the following conversation.
Then select the next role from {[agent.name for agent in agents]} to play. Only return the role.""" This one is easy enough to change via a custom subclass, but since this method only exists to mutate the original Then there's also this, which is hardcoded in "Read the above conversation. Then select the next role from {[agent.name for agent in agents]} to play. Only return the role." This probably has a bigger effect on the behavior seen here since the prompt's attached to the very end of the chain and is the most direct instruction that the LLM has. To me the problem is that the prompt doesn't actually give any specific instructions on which speaker to choose, and so it makes sense that it doesn't handle things like empty replies very well. Perhaps that's intended given the diversity of possible group configs, but if so it really should be a configurable parameter, and probably as part of |
100% agree with everything here. I'll set up some benchmarks to measure progress here and we can experiment with different prompts... but the prompt templates should be made overridable, perhaps, without subclassing. |
Is this one the same issue? |
Similar in spirit but different exact cause I think, at least from just glancing at the issue. That one looks like it's caused by the assistant asking the user proxy to do some non-coding task, and didn't actually send it any code—basically, by default, the user proxy can't do anything on its own except execute code and returning the empty reply. I suppose this relates to something of an orthogonal issue from the technical cause—essentially, from a UX perspective, it's really confusing when the user proxy replies with an empty message, but that's the default fallback behavior when "something goes wrong", i.e. in both of these cases, some party requested the user proxy to do something it's not able to do. I kind of wish the |
The only thing is that I just realized the current prompts have embedded runtime values inside them, so might be tricky to parametrize. Personally now I'm leaning in favor of just having a better default prompt. Anyway, I've been thinking over this some more, and putting aside the I do think that there's a also a bit of a single-responsibility violation in the usual user proxy setup examples in the docs that contributes to this behavior. The user proxy is a stand-in for the user, but it's also the one that executes the code, but we'd expect different default behaviors, names, and system messages for these roles—the GCM selects the user proxy initially to execute code, but when that goes wrong b/c of the blank message, it then selects the role that it thinks has the authority to solve the problem, which... is the same agent. I tried separating these roles out into a code executor and a task manager, and it seems to work much better, though admittedly I've only tried this example a few times since it's quite expensive to run without caching... import autogen
config_list = config_list_from_models(model_list=["gpt-4-1106-preview"])
llm_config = {
"raise_on_ratelimit_or_timeout": None,
"request_timeout": 600,
"seed": 42,
"config_list": config_list,
"temperature": 0,
}
code_execution_config = {"last_n_messages": 2, "work_dir": "groupchat"}
user_proxy = autogen.UserProxyAgent(
name="User_proxy",
system_message="A human admin.",
human_input_mode="TERMINATE",
)
code_executor = autogen.UserProxyAgent(
name="Code_Executor",
system_message="Executes code.",
code_execution_config=code_execution_config,
human_input_mode="NEVER",
default_auto_reply=f"I'm sorry, I am only able to execute code that's posted within the last {code_execution_config['last_n_messages']} messages.",
)
tm = autogen.UserProxyAgent(
name="Task_manager",
system_message="Keeps the group on track by reminding everyone of what needs to be done next, repeating instructions/code if necessary. Reply TERMINATE if the original task is done.",
llm_config=llm_config,
human_input_mode="TERMINATE",
)
coder = autogen.AssistantAgent(
name="Coder",
llm_config=llm_config,
)
pm = autogen.AssistantAgent(
name="Product_manager",
system_message="Creative in software product ideas.",
llm_config=llm_config,
)
groupchat = autogen.GroupChat(
agents=[code_executor, coder, pm, tm], messages=[], max_round=12
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
user_proxy.initiate_chat(
manager,
message="Find a latest paper about gpt-4 on arxiv and find its potential applications in software.",
) |
The UX is not a problem, but I am unable to see what's going wrong if I cannot see the context and code execution results. For this issue, is it the same solution as above? Or if it is another issue, shall I open a new github issue for it? |
The "solution" (it's really just a hack) that I posted wouldn't work for what you're seeing, I think. It's not the same root cause, even though the error looks similar—the issue in this thread is specifically for GroupChat in GPT-4, whereas it looks like you and the others in #151 were running into an issue with GPT-3.5-Turbo in regular chat. I would think #151 would be the best place to continue that conversation, but apologies, I haven't been around here long enough to know whether the etiquette would be to reopen that issue or create a new one. |
Closing this issue due to inactivity. If you have further questions, please open a new issue or join the discussion in AutoGen Discord server: https://discord.com/invite/Yb5gwGVkE5 |
rename the main classes and mixup folder structure move some tings from samples into core cleanup cross-deps cleanup grpc deps
When trying the code below for multi-agents, it will go one round and then spam user_proxy and lose context.
The text was updated successfully, but these errors were encountered: