Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Proxy Spam - Unresolved Agent Collaboration #624

Closed
jakobkellett opened this issue Nov 10, 2023 · 15 comments
Closed

User Proxy Spam - Unresolved Agent Collaboration #624

jakobkellett opened this issue Nov 10, 2023 · 15 comments
Labels
group chat/teams group-chat-related issues

Comments

@jakobkellett
Copy link

When trying the code below for multi-agents, it will go one round and then spam user_proxy and lose context.

from autogen import AssistantAgent, UserProxyAgent, config_list_from_json, GroupChat, GroupChatManager

config_list = config_list_from_json(
    "OAI_CONFIG_LIST.json",
    filter_dict={
        #"model": ["gpt-4", "gpt-4-0314", "gpt4", "gpt-4-32k", "gpt-4-32k-0314", "gpt-4-32k-v0314"],
        "model": ["gpt-4-1106-preview"]
    },
)

llm_config = {"config_list": config_list}

user_proxy = UserProxyAgent(
    name="user_proxy",
    code_execution_config=False
)

ceo = AssistantAgent(
    name="CEO",
    system_message="CEO. Visonary and business owner for a large software company. Your job is to come up with a plan for the questions that are asked.",
    llm_config=llm_config,
)
cfo = AssistantAgent(
    name="CFO",
    llm_config=llm_config,
    system_message="CFO. You are a CFO for a large software company. Your job is to incorporate financial recommendations."
)
salesperson = AssistantAgent(
    name="Salesperson",
    llm_config=llm_config,
    system_message="Salesperson. You are a salesperson for a large software company. Your job is to make recommendations on how we could sell products."
)
marketer = AssistantAgent(
    name="Marketer",
    llm_config=llm_config,
    system_message="Marketer. You are a marketer for a large software company. Your job is to come up with recommendations for marketing products."
)
groupchat = GroupChat(agents=[user_proxy, ceo, cfo, salesperson, marketer], messages=[], max_round=30)
manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)

user_proxy.initiate_chat(manager, message="We need to create a thorough business plan to adopt new products. Please collaborate and come up with a comprehensive plan.")
@sonichi sonichi added the group chat/teams group-chat-related issues label Nov 11, 2023
@afourney
Copy link
Member

I've seen this too. Here's an example:

Installing collected packages: webencodings, pytz, peewee, multitasking, appdirs, tzdata, soupsieve, six, pyparsing, pillow, packaging, lxml, kiwisolver, frozendict, fonttools, cycler, contourpy, python-dateutil, html5lib, beautifulsoup4, pandas, matplotlib, yfinance
Successfully installed appdirs-1.4.4 beautifulsoup4-4.12.2 contourpy-1.2.0 cycler-0.12.1 fonttools-4.44.0 frozendict-2.3.8 html5lib-1.1 kiwisolver-1.4.5 lxml-4.9.3 matplotlib-3.8.1 multitasking-0.0.11 packaging-23.2 pandas-2.1.3 peewee-3.17.0 pillow-10.1.0 pyparsing-3.1.1 python-dateutil-2.8.2 pytz-2023.3.post1 six-1.16.0 soupsieve-2.5 tzdata-2023.3 webencodings-0.5.1 yfinance-0.2.31


--------------------------------------------------------------------------------
user_proxy (to chat_manager):



--------------------------------------------------------------------------------
user_proxy (to chat_manager):



--------------------------------------------------------------------------------
user_proxy (to chat_manager):



--------------------------------------------------------------------------------
coder (to chat_manager):

It looks like you've successfully installed the necessary packages. Now, please run the previously provided Python script again.

@solarapparition
Copy link
Collaborator

solarapparition commented Nov 13, 2023

I'm seeing this also on GPT-4 Turbo; not sure if this also happens on vanilla GPT-4.

Version:

0.1.14

Code:

import autogen

config_list = config_list_from_models(model_list=["gpt-4-1106-preview"])

llm_config = {
    "raise_on_ratelimit_or_timeout": None,
    "request_timeout": 600,
    "seed": 42,
    "config_list": config_list,
    "temperature": 0,
}

user_proxy = autogen.UserProxyAgent(
   name="User_proxy",
   system_message="A human admin.",
   code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
   human_input_mode="TERMINATE"
)
coder = autogen.AssistantAgent(
    name="Coder",
    llm_config=llm_config,
)
pm = autogen.AssistantAgent(
    name="Product_manager",
    system_message="Creative in software product ideas.",
    llm_config=llm_config,
)
groupchat = autogen.GroupChat(agents=[user_proxy, coder, pm], messages=[], max_round=12)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

user_proxy.initiate_chat(manager, message="Find a latest paper about gpt-4 on arxiv and find its potential applications in software.")

Output

User_proxy (to chat_manager):

Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

--------------------------------------------------------------------------------
Coder (to chat_manager):

To find the latest paper about GPT-4 on arXiv, we can use the arXiv API to search for papers that mention "GPT-4" in their metadata. I will write a Python script that queries the arXiv API for the latest papers on GPT-4 and prints out the title, authors, summary, and URL of the most recent paper. After we have the information about the paper, I can analyze the summary to identify potential applications in software.

Let's start with the Python script to query the arXiv API:

```python
# filename: arxiv_gpt4_search.py
import urllib.request
import urllib.parse
import feedparser

# Base API query URL
base_url = 'http://export.arxiv.org/api/query?'

# Search parameters
search_query = 'all:gpt-4'  # search for the term 'gpt-4' in all fields
start = 0                    # start at the first result
max_results = 1              # maximum results to return

# Construct the query with the search parameters
query = f'search_query={search_query}&start={start}&max_results={max_results}&sortBy=submittedDate&sortOrder=descending'
url = base_url + query

# Perform the GET request to the arXiv API
with urllib.request.urlopen(url) as response:
    response_text = response.read()

# Parse the response using feedparser
feed = feedparser.parse(response_text)

# Print out the information of the latest paper on GPT-4
if feed.entries:
    entry = feed.entries[0]
    print(f"Title: {entry.title}")
    print(f"Authors: {', '.join(author.name for author in entry.authors)}")
    print(f"Summary: {entry.summary}")
    print(f"URL: {entry.link}")
else:
    print("No papers found on GPT-4.")

Please save the above script as arxiv_gpt4_search.py and run it. After we get the information about the latest paper, I will proceed to identify its potential applications in software based on the summary provided.


USING AUTO REPLY...

EXECUTING CODE BLOCK 0 (inferred language is python)...
WARNING:autogen.code_utils:execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change
User_proxy (to chat_manager):

exitcode: 1 (execution failed)
Code output:
Traceback (most recent call last):
File "arxiv_gpt4_search.py", line 4, in
import feedparser
ModuleNotFoundError: No module named 'feedparser'


Coder (to chat_manager):

The error indicates that the feedparser module is not installed in your Python environment. We need to install this module before running the script. You can install feedparser using pip, which is the package installer for Python.

Please run the following command to install the feedparser module:

pip install feedparser

After installing the module, please try running the arxiv_gpt4_search.py script again. If you encounter any further issues, please let me know.


USING AUTO REPLY...

EXECUTING CODE BLOCK 0 (inferred language is sh)...
WARNING:autogen.code_utils:execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change
User_proxy (to chat_manager):

exitcode: 0 (execution succeeded)
Code output:
Collecting feedparser
Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.1/81.1 kB 1.7 MB/s eta 0:00:00
Collecting sgmllib3k (from feedparser)
Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: sgmllib3k
Building wheel for sgmllib3k (pyproject.toml): started
Building wheel for sgmllib3k (pyproject.toml): finished with status 'done'
Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6049 sha256=bacd5d058c920369a3fb57b193cda4f677cc7e8ba9e5dc81ba304a4f5940d6ff
Stored in directory: /Users/solarapparition/Library/Caches/pip/wheels/3b/25/2a/105d6a15df6914f4d15047691c6c28f9052cc1173e40285d03
Successfully built sgmllib3k
Installing collected packages: sgmllib3k, feedparser
Successfully installed feedparser-6.0.10 sgmllib3k-1.0.0


USING AUTO REPLY...

EXECUTING CODE BLOCK 0 (inferred language is sh)...
WARNING:autogen.code_utils:execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change
User_proxy (to chat_manager):

exitcode: 0 (execution succeeded)
Code output:
Requirement already satisfied: feedparser in /Users/solarapparition/repos/hivemind/.venv/lib/python3.11/site-packages (6.0.10)
Requirement already satisfied: sgmllib3k in /Users/solarapparition/repos/hivemind/.venv/lib/python3.11/site-packages (from feedparser) (1.0.0)


USING AUTO REPLY...
User_proxy (to chat_manager):


USING AUTO REPLY...
User_proxy (to chat_manager):


USING AUTO REPLY...
User_proxy (to chat_manager):


USING AUTO REPLY...
User_proxy (to chat_manager):


USING AUTO REPLY...
User_proxy (to chat_manager):


USING AUTO REPLY...
User_proxy (to chat_manager):

@solarapparition
Copy link
Collaborator

solarapparition commented Nov 13, 2023

Okay, I briefly looked into this and I think I know broadly the logic flow that is leading to this behavior, at least for my example.

Basically, at the end of the installation, the chat manager decides to select the user proxy as the speaker, reasonably so since the next action is to execute the previously unexecutable Python code. But at this point in time, the user proxy can't actually do that, because the Python code was posted several messages ago, and so as far as the user proxy auto-reply is concerned regarding the lookback messages in its code execution config, there's nothing to actually do, so it simply replies with the default auto-reply value of "".

Edit: So I found something of a hack to fix the case I had, though I don't know how generalizable this is. Basically, I added in a task manager agent for reminding/terminating the work, and have the user auto-reply with asking for the next task. That's enough to get the group to rewind back to the previous code execution:

Code

import autogen

config_list = config_list_from_models(model_list=["gpt-4-1106-preview"])

llm_config = {
    "raise_on_ratelimit_or_timeout": None,
    "request_timeout": 600,
    "seed": 42,
    "config_list": config_list,
    "temperature": 0,
}
user_proxy = autogen.UserProxyAgent(
   name="User_proxy",
   system_message="A human admin.",
   code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
   human_input_mode="TERMINATE",
   default_auto_reply="Is there anything else left to do?",
)
tm = autogen.AssistantAgent(
    name="Task_manager",
    system_message="Keeps the group on track by reminding everyone of what needs to be done next, repeating instructions/code if necessary. Reply TERMINATE if the original task is done.",
    llm_config=llm_config,
)
coder = autogen.AssistantAgent(
    name="Coder",
    llm_config=llm_config,
)
pm = autogen.AssistantAgent(
    name="Product_manager",
    system_message="Creative in software product ideas.",
    llm_config=llm_config,
)

@sonichi
Copy link
Contributor

sonichi commented Nov 13, 2023

Correct! That's what happened.

@sonichi
Copy link
Contributor

sonichi commented Nov 13, 2023

Clever! @gagb @qingyun-wu and I discussed it before and thought this a promising way.

@afourney
Copy link
Member

Just to confirm, a failure to find the original code is 100% occurring and should be addressed. But, it's also the case that the GroupChatManager keeps picking the user_proxy... and I'm not sure that's what we want either. After a few identical failures, I'd expect the GroupChatManager to change its strategy and maybe get the coder involved again.

@solarapparition
Copy link
Collaborator

solarapparition commented Nov 13, 2023

Just to confirm, a failure to find the original code is 100% occurring and should be addressed. But, it's also the case that the GroupChatManager keeps picking the user_proxy... and I'm not sure that's what we want either. After a few identical failures, I'd expect the GroupChatManager to change its strategy and maybe get the coder involved again.

I dug into it some more and it feels like part of the issue is the speaker selection prompts. These are currently hardcoded in GroupChat, and consist of two parts.

The first one is an override of the GroupChatManager's system prompt:

def select_speaker_msg(self, agents: List[Agent]):
    """Return the message for selecting the next speaker."""
    return f"""You are in a role play game. The following roles are available:
{self._participant_roles()}.

Read the following conversation.
Then select the next role from {[agent.name for agent in agents]} to play. Only return the role."""

This one is easy enough to change via a custom subclass, but since this method only exists to mutate the original GroupChatManager system message, it feels redundant, and we'd be better served to just change the default GCM system message instead.

Then there's also this, which is hardcoded in GroupChat.select_speaker, and gets attached to the end of the GCM message list:

"Read the above conversation. Then select the next role from {[agent.name for agent in agents]} to play. Only return the role."

This probably has a bigger effect on the behavior seen here since the prompt's attached to the very end of the chain and is the most direct instruction that the LLM has. To me the problem is that the prompt doesn't actually give any specific instructions on which speaker to choose, and so it makes sense that it doesn't handle things like empty replies very well.

Perhaps that's intended given the diversity of possible group configs, but if so it really should be a configurable parameter, and probably as part of GroupChatManager instead of GroupChat.

@afourney
Copy link
Member

100% agree with everything here. I'll set up some benchmarks to measure progress here and we can experiment with different prompts... but the prompt templates should be made overridable, perhaps, without subclassing.

@AugustusLZJ
Copy link

Is this one the same issue?
#151

@solarapparition
Copy link
Collaborator

Is this one the same issue? #151

Similar in spirit but different exact cause I think, at least from just glancing at the issue. That one looks like it's caused by the assistant asking the user proxy to do some non-coding task, and didn't actually send it any code—basically, by default, the user proxy can't do anything on its own except execute code and returning the empty reply.

I suppose this relates to something of an orthogonal issue from the technical cause—essentially, from a UX perspective, it's really confusing when the user proxy replies with an empty message, but that's the default fallback behavior when "something goes wrong", i.e. in both of these cases, some party requested the user proxy to do something it's not able to do. I kind of wish the default_auto_reply would be something like "I'm sorry, I'm only able to execute code in the last {n} messages and cannot perform any other task" to at least give some feedback to the other agents, but perhaps that would break some existing behavior that relies on the empty autoreply.

@solarapparition
Copy link
Collaborator

100% agree with everything here. I'll set up some benchmarks to measure progress here and we can experiment with different prompts... but the prompt templates should be made overridable, perhaps, without subclassing.

The only thing is that I just realized the current prompts have embedded runtime values inside them, so might be tricky to parametrize. Personally now I'm leaning in favor of just having a better default prompt.

Anyway, I've been thinking over this some more, and putting aside the GroupChatManager prompting, it seems plausible that in the examples, the reason the user proxy keeps getting selected is because it is identified as a "User" in its name (and thus shows up in the GCM's system prompt), and LLMs tend to defer to humans by default when something goes wrong.

I do think that there's a also a bit of a single-responsibility violation in the usual user proxy setup examples in the docs that contributes to this behavior. The user proxy is a stand-in for the user, but it's also the one that executes the code, but we'd expect different default behaviors, names, and system messages for these roles—the GCM selects the user proxy initially to execute code, but when that goes wrong b/c of the blank message, it then selects the role that it thinks has the authority to solve the problem, which... is the same agent.

I tried separating these roles out into a code executor and a task manager, and it seems to work much better, though admittedly I've only tried this example a few times since it's quite expensive to run without caching...

import autogen

config_list = config_list_from_models(model_list=["gpt-4-1106-preview"])

llm_config = {
    "raise_on_ratelimit_or_timeout": None,
    "request_timeout": 600,
    "seed": 42,
    "config_list": config_list,
    "temperature": 0,
}
code_execution_config = {"last_n_messages": 2, "work_dir": "groupchat"}
user_proxy = autogen.UserProxyAgent(
    name="User_proxy",
    system_message="A human admin.",
    human_input_mode="TERMINATE",
)
code_executor = autogen.UserProxyAgent(
    name="Code_Executor",
    system_message="Executes code.",
    code_execution_config=code_execution_config,
    human_input_mode="NEVER",
    default_auto_reply=f"I'm sorry, I am only able to execute code that's posted within the last {code_execution_config['last_n_messages']} messages.",
)
tm = autogen.UserProxyAgent(
    name="Task_manager",
    system_message="Keeps the group on track by reminding everyone of what needs to be done next, repeating instructions/code if necessary. Reply TERMINATE if the original task is done.",
    llm_config=llm_config,
    human_input_mode="TERMINATE",
)
coder = autogen.AssistantAgent(
    name="Coder",
    llm_config=llm_config,
)
pm = autogen.AssistantAgent(
    name="Product_manager",
    system_message="Creative in software product ideas.",
    llm_config=llm_config,
)
groupchat = autogen.GroupChat(
    agents=[code_executor, coder, pm, tm], messages=[], max_round=12
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
user_proxy.initiate_chat(
    manager,
    message="Find a latest paper about gpt-4 on arxiv and find its potential applications in software.",
)

@AugustusLZJ
Copy link

Is this one the same issue? #151

Similar in spirit but different exact cause I think, at least from just glancing at the issue. That one looks like it's caused by the assistant asking the user proxy to do some non-coding task, and didn't actually send it any code—basically, by default, the user proxy can't do anything on its own except execute code and returning the empty reply.

I suppose this relates to something of an orthogonal issue from the technical cause—essentially, from a UX perspective, it's really confusing when the user proxy replies with an empty message, but that's the default fallback behavior when "something goes wrong", i.e. in both of these cases, some party requested the user proxy to do something it's not able to do. I kind of wish the default_auto_reply would be something like "I'm sorry, I'm only able to execute code in the last {n} messages and cannot perform any other task" to at least give some feedback to the other agents, but perhaps that would break some existing behavior that relies on the empty autoreply.

The UX is not a problem, but I am unable to see what's going wrong if I cannot see the context and code execution results. For this issue, is it the same solution as above? Or if it is another issue, shall I open a new github issue for it?

@afourney
Copy link
Member

Here's another report:
image

@solarapparition
Copy link
Collaborator

The UX is not a problem, but I am unable to see what's going wrong if I cannot see the context and code execution results. For this issue, is it the same solution as above? Or if it is another issue, shall I open a new github issue for it?

The "solution" (it's really just a hack) that I posted wouldn't work for what you're seeing, I think. It's not the same root cause, even though the error looks similar—the issue in this thread is specifically for GroupChat in GPT-4, whereas it looks like you and the others in #151 were running into an issue with GPT-3.5-Turbo in regular chat.

I would think #151 would be the best place to continue that conversation, but apologies, I haven't been around here long enough to know whether the etiquette would be to reopen that issue or create a new one.

@thinkall
Copy link
Collaborator

Closing this issue due to inactivity. If you have further questions, please open a new issue or join the discussion in AutoGen Discord server: https://discord.com/invite/Yb5gwGVkE5

jackgerrits pushed a commit that referenced this issue Oct 2, 2024
rename the main classes and mixup folder structure
move some tings from samples into core
cleanup cross-deps
cleanup grpc deps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
group chat/teams group-chat-related issues
Projects
None yet
Development

No branches or pull requests

6 participants