Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Looping Heuristics / Detection #3668

Closed
1 task done
Boostrix opened this issue May 2, 2023 · 11 comments
Closed
1 task done

Introducing Looping Heuristics / Detection #3668

Boostrix opened this issue May 2, 2023 · 11 comments
Assignees
Labels
AI efficacy enhancement New feature or request meta Meta-issue about a topic that multiple issues already exist for

Comments

@Boostrix
Copy link
Contributor

Boostrix commented May 2, 2023

Duplicates

  • I have searched the existing issues

Summary 💡

This is a "meta" issue to keep track of issues relating to redundant/unnecessary (infinite/endless) looping and the idea to keep track of previous arguments to detect such situations, as per: #3444 (comment)

Idea: maintain a "call stack" that contains hashed values of each query/prompt - whenever this is used, increment a counter to detect if we're inside a loop without making much progress (the arguments will remain the same, and so will the hash) - if we are not making any progress at all, the response will also be the same (hash that too).

This should also help the agent determine if it's trying to re-solve a task that was previously tackled.

Solution: For a sub-agent, it should notify its parent agent using either the messaging API or by throwing the equivalent of an exception, so that it can be terminated/restarted: #1548 (comment)

For top-level agents, it's probably best to interrupt the loop and pursue an alternate option - which may involve human feedback: #3396
The cleanest method might be offering a list of options to the user (inspired by the current state of things, as per #1548), including an option for open ended feedback.

This feedback should be serialized in some form, so that the agent can easily refer back to it as per: #1377
The goal being to provide a means to do some form of self-assessment, as per: #305
This may involve telling the agent to log its progress to a task specific log file so that a parent agent can evaluate the log file, comparing it to the stated long-term goal.

Examples 🌈

This is just based on hashing full thoughts + new decision (command + args) and incrementing a counter every time we get to see the same "situation":
bailout

Motivation 🔦

  • detect whether an agent is trying to tackle a task that it tackled previously
  • detect whether it's using the same arguments and seeing the same response (=being stuck)
  • get rid of unnecessary looping
  • allow an agent to detect whether its work is in line with the stated goal or not
  • provide a means to bail out if necessary, either informing the parent agent and/or asking for human feedback
  • at the very least, use this as a means to change the problem solving strategy
@zachary-kaelan
Copy link

We could have a compact list of previously completed tasks fed into the prompt every interation. But a more robust solution would be to make it so that memory queries are done automatically every iteration and their top N results are fed into the prompt as, "You remember X, Y, Z."

@Boostrix
Copy link
Contributor Author

Boostrix commented May 5, 2023

I've tried to use a separate interpretation step to interpret the result of an action and modify the plan accordingly, that worked at least somewhat better than it did before. However, a number of folks now mentioned that there are 2 issues relating to pinecone memory and self feedback not working, so maybe what I am seeing is not representative currently.

@anonhostpi
Copy link

I think it would be worth creating an issue label for this as well. May prevent the need for this "meta" issue.

@anonhostpi
Copy link

anonhostpi commented May 5, 2023

There should also be a tag for JSON issues. - https://github.com/Significant-Gravitas/Auto-GPT/labels/invalid_json

Every 3rd notification I get is about issues with JSON

@Boostrix
Copy link
Contributor Author

Boostrix commented May 11, 2023

This is just based on hashing full thoughts + new decision (command + args) and incrementing a counter every time we get to see the same "situation":
bailout

For starters:

  • should add a counter to track number of agent invocations with the same request/response resulting in the same local action
  • support an outer agent specific settings to restrict this to MAX_ITERATIONS_IDENTICAL_STEPS (or via the env file)
  • also, should probably consider using a configurable TIMEOUT_SECS so that the action is interrupted using a timeout error and an exact message stating it's doing some redundant.

And this stuff needs to work per agent instance, so that sub-agents can be set up accordingly.

We could have a compact list of previously completed tasks fed into the prompt every interation.

This is interesting stuff and touching on keep track of "experiences", the agent being able to remember its actions by maintaining a history of command/param tuples that worked/didn't and the associated errors/interpretation - as per: #3835 (comment)

@eyalk11
Copy link
Contributor

eyalk11 commented Jun 30, 2023

There could be two sets.

set of already executed commands+args
set of already executed commands+args that we asked the user about and he approved

If a command was already executed , then it stops and ask the user. If the user confirms, then we know that we are good to go in terms of this command and we won't stop next time. If the user rejects/ add feedback, we will stop next time.

I originally introduced the first one in #3914 . So I think I will delete this section and open a new PR.

8139493 is the old (deleted) version.

@eyalk11
Copy link
Contributor

eyalk11 commented Jul 1, 2023

Following the discussion with @Boostrix, I did some work on the subject. I allowed every command to have its own calculate_hash function so that it could return the hash of the file instead of the hash of the command arguments (which is the default case). eyalk11@25d694f .

@Boostrix
Copy link
Contributor Author

Boostrix commented Jul 1, 2023

as I mentioned on discord, I believe the first step to be coming up with ideas/challenges to trigger redundant looping and then use that as a baseline for any fixes we can come up with - no matter if it's using my original hashing based approach or something that you came up with.

Therefore, gonna ping @merwanehamadi to keep him in the loop (head of our challenges department)

@Pwuts Pwuts self-assigned this Jul 1, 2023
@Pwuts Pwuts added this to the v0.5.0 Release milestone Jul 1, 2023
@Boostrix
Copy link
Contributor Author

Boostrix commented Jul 3, 2023

FWIW, this was recently posted on discord and the article covers our looping issue: https://lorenzopieri.com/autogpt_fix/

Do they work? Nope!
The problem is … AI agents do not work. The typical session of AutoGPT ends up stuck in an infinite cycle of actions, such as google something, write it to file, read the file, google again… In general, goals requiring more than 4-5 actions seem to be out of reach.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

@github-actions github-actions bot added the Stale label Sep 6, 2023
@Pwuts Pwuts added the meta Meta-issue about a topic that multiple issues already exist for label Sep 14, 2023
@Pwuts Pwuts moved this to ⏩ In Progress in AutoGPT development kanban Sep 14, 2023
@Pwuts
Copy link
Member

Pwuts commented Sep 14, 2023

Partial solution in e437065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI efficacy enhancement New feature or request meta Meta-issue about a topic that multiple issues already exist for
Projects
Status: Done
Development

No branches or pull requests

7 participants