Psychological challenge: Sally and Anne's Test. #3871

waynehamadi · 2023-05-05T23:38:05Z

Summary 💡

This idea was brought up by @javableu. We would like to create a psychological challenge inspired from the Sally and Anne's test..

https://en.wikipedia.org/wiki/Sally%E2%80%93Anne_test

GPT-3.5 is able to solve the problem.

But we need to make sure that autogpt is able to do it temporally.
What that means is, we can create a scenario that simulates Sally and Anne's test, but across multiple cycles, and make sure that Auto-GPT is able to remember that Sally put the marble in her own basket.

DM me on discord and join us on discord and DM me if you're interested in creating this challenge (https://discord.gg/autogpt my discord is merwanehamadi)

anonhostpi · 2023-05-07T04:07:25Z

Interesting proposal. I like the idea of collecting metrics on the behavior of AutoGPT.

May propose a different approach to resolving these issues:
https://gist.github.com/anonhostpi/97d4bb3e9535c92b8173fae704b76264#observerregulatory-agents

There has been a lot of talk about using observer/regulatory agents to catch, stop, and report bad behavior. General consensus has been that these agents get obsessed with that kind of role and tend to report violations of compliancy frequently.

However, collecting metrics on how likely AutoGPT will misbehave is also a good idea.

anonhostpi · 2023-05-07T04:08:52Z

This would be a very fascinating test. May serve as a basis for making such regulatory/observer agents

Boostrix · 2023-05-07T04:55:49Z

And there goes all the fun because you're offering to help us implement the equivalent of a police-officer.agent ....

waynehamadi · 2023-05-14T12:37:53Z

@Boostrix @anonhostpi do you know anyone that wants to do that challenge? It's pretty fun.

Boostrix · 2023-05-14T15:13:43Z

we need to make sure that autogpt is able to do it temporally [...] across multiple cycles

Personally, I doubt that agpt is there yet - there are currenty so many challenges in other parts of the project. For instance, look at the number of folks who have complained that it takes agpt using GPT 3.5 more than 10 cycles and 10 minutes to write a hello world program in python, whereas GPT itself can directly provide the same program in a single second.

Thus, I don't believe this is the level of problem to be tackled currently.
Don't get me wrong, I applaud you for all those challenges - but some of these are waaaay off currently.

We need to solve at least another 20+ "basic" problems (challenges) first before even thinking about pychological stuff like this.

Agents not being able to plan out their actions properly is a problem, but it's worsened by lacking suspend/resume support.
So, while we need more challenges, these need to be more basic ones at this stage, I am afraid.

We need to build the foundation for more complex challenges at first.

javableu · 2023-05-14T16:13:09Z

If i give the full level 5 to chatgpt4, it solves it in one prompt. Autogpt manages the level one well and is only slightly off on level 2. if u ask it, it is also able to give u the real position of the marbles, the conversations between each individuals, the believes of believes, … Also if people struggle to get an hello world, it comes from the prompt. The model is confused by the role/goals that are being given. The goals are really often not well written. I havent fully tested this theory but i think the ai is REALLY acurate on the wording. Differences between goals, outcome, objectives can be potentially crucial.

waynehamadi · 2023-05-14T16:36:11Z

@Boostrix yeah ok I am never going to complain when someone says "too early".
good call, let's get the basics first.

waynehamadi mentioned this issue May 5, 2023

Help us build challenges! #3835

Closed

Boostrix mentioned this issue May 9, 2023

Train spawned agents #2944

Closed

1 task

javableu mentioned this issue May 13, 2023

False believes challenge based on sally anne test. #4167

Merged

5 tasks

waynehamadi closed this as completed Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Psychological challenge: Sally and Anne's Test. #3871

Psychological challenge: Sally and Anne's Test. #3871

waynehamadi commented May 5, 2023 •

edited

Loading

anonhostpi commented May 7, 2023

anonhostpi commented May 7, 2023

Boostrix commented May 7, 2023

waynehamadi commented May 14, 2023 •

edited

Loading

Boostrix commented May 14, 2023

javableu commented May 14, 2023

waynehamadi commented May 14, 2023

Psychological challenge: Sally and Anne's Test. #3871

Psychological challenge: Sally and Anne's Test. #3871

Comments

waynehamadi commented May 5, 2023 • edited Loading

Summary 💡

anonhostpi commented May 7, 2023

anonhostpi commented May 7, 2023

Boostrix commented May 7, 2023

waynehamadi commented May 14, 2023 • edited Loading

Boostrix commented May 14, 2023

javableu commented May 14, 2023

waynehamadi commented May 14, 2023

waynehamadi commented May 5, 2023 •

edited

Loading

waynehamadi commented May 14, 2023 •

edited

Loading