-
Notifications
You must be signed in to change notification settings - Fork 44.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Psychological challenge: Sally and Anne's Test. #3871
Comments
Interesting proposal. I like the idea of collecting metrics on the behavior of AutoGPT. May propose a different approach to resolving these issues: There has been a lot of talk about using observer/regulatory agents to catch, stop, and report bad behavior. General consensus has been that these agents get obsessed with that kind of role and tend to report violations of compliancy frequently. However, collecting metrics on how likely AutoGPT will misbehave is also a good idea. |
This would be a very fascinating test. May serve as a basis for making such regulatory/observer agents |
And there goes all the fun because you're offering to help us implement the equivalent of a police-officer.agent .... |
@Boostrix @anonhostpi do you know anyone that wants to do that challenge? It's pretty fun. |
Personally, I doubt that agpt is there yet - there are currenty so many challenges in other parts of the project. For instance, look at the number of folks who have complained that it takes agpt using GPT 3.5 more than 10 cycles and 10 minutes to write a hello world program in python, whereas GPT itself can directly provide the same program in a single second. Thus, I don't believe this is the level of problem to be tackled currently. We need to solve at least another 20+ "basic" problems (challenges) first before even thinking about pychological stuff like this. Agents not being able to plan out their actions properly is a problem, but it's worsened by lacking suspend/resume support. We need to build the foundation for more complex challenges at first. |
If i give the full level 5 to chatgpt4, it solves it in one prompt. Autogpt manages the level one well and is only slightly off on level 2. if u ask it, it is also able to give u the real position of the marbles, the conversations between each individuals, the believes of believes, … Also if people struggle to get an hello world, it comes from the prompt. The model is confused by the role/goals that are being given. The goals are really often not well written. I havent fully tested this theory but i think the ai is REALLY acurate on the wording. Differences between goals, outcome, objectives can be potentially crucial. |
@Boostrix yeah ok I am never going to complain when someone says "too early". |
Summary 💡
This idea was brought up by @javableu. We would like to create a psychological challenge inspired from the Sally and Anne's test..
https://en.wikipedia.org/wiki/Sally%E2%80%93Anne_test
GPT-3.5 is able to solve the problem.
But we need to make sure that autogpt is able to do it temporally.
What that means is, we can create a scenario that simulates Sally and Anne's test, but across multiple cycles, and make sure that Auto-GPT is able to remember that Sally put the marble in her own basket.
DM me on discord and join us on discord and DM me if you're interested in creating this challenge (https://discord.gg/autogpt my discord is merwanehamadi)
The text was updated successfully, but these errors were encountered: