Web Navigation Challenge #3936

waynehamadi · 2023-05-07T02:26:00Z

Duplicates

I have searched the existing issues

Summary 💡

Auto-GPT should be able to navigate website, enter text and submit forms.

FYI @NotSkynet

Examples 🌈

No response

Motivation 🔦

No response

anonhostpi · 2023-05-07T03:10:46Z

I would really like to learn more about these challenges you keep coming up with. I'm assuming that these serve as guidelines for writing unit tests?

waynehamadi · 2023-05-07T03:38:22Z

yes exactly, but they're more end to end tests that unit tests. We test the system, we don't test the code. This way even if in 6 months we do a complete refactorisation the tests will still be valid, and they will allow us to refactor Auto-GPT.
ping me on discord https://discord.gg/autogpt
merwanehamadi

Boostrix · 2023-05-07T05:17:06Z

Auto-GPT should be able to navigate website, enter text and submit forms.

good thinking:

detecting if a website is using static HTML or lots of JavaScript would be useful to be able to use a different crawler/spider strategy
the very first milestone should probably be accepting cookies and privacy TOS :-)
the next step might be always trying to fetch sitemap.xml to orient itself if it's available
and to fetch the accessible version of a website to reduce parsing/processing overhead
being able to detect/use search forms/fields would be enormously useful !
it should be able to memorize entrypoints for searching and be able to revisit/reuse those as needed whenever information is missing

anonhostpi · 2023-05-07T05:38:54Z

It's probably best to use selenium awaiters for a portion of that, and prompt the human user to navigate those, as the human would be who is agreeing to them, not the bot.

Boostrix · 2023-05-07T05:58:24Z

those were based on observing the agent "browse" with the browser instance set to visible - it's not confirming anything, and also isn't using search facilities and also doesn't seem to be using sitemap.xml to find its way (personally, not a huge fan of using selenium for non-dynamic websites, where classic crawling should suffice)

waynehamadi · 2023-05-07T13:09:58Z

@anonhostpi @Boostrix I like that you think of how to solve this problem.

This issue is more about how to test whether the problem has been solved (aka create the challenge).

@NotSkynet is going to help us find good static website where we can test whether Auto-GPT is able to navigate.

Boostrix · 2023-05-07T13:23:19Z

just keep the browser open/visible to see for yourself that browsing could use some TLC ...

This issue is more about how to test whether the problem has been solved (aka create the challenge).

re navigation: randomly generate navbar structures using a nested loop with different labels and menus/sub menus, generate a matching sitemap.xml and use the two in combination to see if the agent is able to "visit" a certain part, each link would trigger the same python CGI script to tell the back-end which links were found/clicked.
Evaluation-wise, we will then need to specify a goal "navigate to the contacts/team/about/company page", the crawler/selenium should be able to determine what link that is (we could use random file names here to ensure that the LLM isn't guessing). The Python CGI running inside HTTPServer would then register an even if the link was found or not.
We could probably bootstrap the whole thing by copying a bunch of drupal/wordpress templates into a directory to have actual navigation bars that a python helper would then customize (randomize) with different tiles/descriptions and a href links

re the search form: probably analogous to the click form we talked about yesterday, just with the twist that it's a single SEARCH field (input text) and a simple python based back-end that we can execute via a conventional HTTPServer instance, at which point we can then hook up the whole thing to pytest, as before.

zachary-kaelan · 2023-05-09T16:13:46Z

We can just use the XPath //form | //input | //textarea | //button | //select | | //a[@href] with maybe a couple other things, to find all the important elements on the page it needs to know about for forms and navigation. Feed it the list of results and their paths, and it should be able to infer what it needs to do to navigate the webpage.

Boostrix · 2023-05-09T16:18:48Z

Yes, exactly what I suggested here: #3551 (comment)

I believe, this sort of feature could be useful in general, so we could just as well implement a more generic "browse_website" or extend it as needed (with support for xpath, like you say)

If this is augmented with sitemap.xml data, it's probably rather flexible as is:

browse_website <url> <focus> <constraints> [<xpath>, <use_sitemap.xml>]

zachary-kaelan · 2023-05-10T13:54:03Z

If this is augmented with sitemap.xml data, it's probably rather flexible as is:

browse_website <url> <focus> <constraints> [<xpath>, <use_sitemap.xml>]

Oh yeah, duh. I forgot sitemaps are a thing. We can integrate one of the many, many pre-existing crawling tools there are out there and automatically generate a sitemap for every site visited that doesn't already have one.

I was already thinking about overcomplicating things and indexing the pages on the fly ourselves when I remembered that search engines can do things like site:example.com and inurl:contacts.

Boostrix · 2023-05-10T14:08:20Z

there is one PR that maps HTTP Request to some custom commands for scraping purposes I believe: #2730
Also, this would seem like a really good idea: #2181

anonhostpi · 2023-05-11T02:22:40Z

For a collection of issues/PRs/discussions to base the challenge off of you could use my trackers: Gist (Alt)

You are more than welcome to use them as the "static website." recommend using the README.md file. I do update my tracker files frequently, but I imagine you could find a github xpath on it that doesn't change.

Boostrix · 2023-05-11T06:02:10Z

That gist is really looking good and super useful, must have been a ton of work, thanks for that !

(probably should be added to the wiki in a similar shape or form)

anonhostpi · 2023-05-11T13:12:43Z

That gist is really looking good and super useful, must have been a ton of work, thanks for that !

Thank you!

(probably should be added to the wiki in a similar shape or form)

I would have added it to the catalyzing and/or moderator page, but I think you have to have repo ownership privileges for that. You can't make a PR against a wiki, unfortunately.

anonhostpi · 2023-05-11T13:16:00Z

@Boostrix, if you are interested in contributing to my gist, you can take a look at https://github.com/anonhostpi/AUTOGPT.TRACKERS.

I use Github Copilot to autofill a lot of the data, and then use the powershell script https://github.com/anonhostpi/AUTOGPT.TRACKERS/blob/main/.SCRIPTS/CONTRIBUTE.ps1 to push updates to it.

waynehamadi · 2023-05-14T12:54:48Z

yeah great gist.
Anyone up for the task on this challenge creation ?
@NotSkynet did you find a website we can do that in ?

Boostrix · 2023-05-14T15:03:05Z

I would say, this could just as well be a static local website - i.e. via a Python HTTP Server that is locally running as part of the test suite, we talked about using that idea in the context of the "contact form".

So basically we need a pytest module that starts up a HTTP server to track "actions" (GET/POST requests).

To the agpt agent it should not be relevant if it's navigating google.com or 127.0.0.1 :-)

BaseInfinity · 2023-05-15T09:15:01Z

Oh I was able to get this working with ChromeGPT (https://github.com/richardyc/Chrome-GPT) using the AutoGPT agent.

I was able to login to Reddit with pretty basic sentence

Here's my demo:
https://www.youtube.com/watch?v=RkpvyGla0PA

Not only that but I got extensions working so I was able to use the ScribeAI Plugin to create an auto generated guide (though basic)

Edit:
I'm realizing this is talking about testing and not "is this possible", my bad

Boostrix · 2023-05-15T09:23:31Z

testing would be the level to see if any submissions solve the challenge or not - ideally, we would be able to throw different URLs at the challenge, with goals to navigate to some pre-defined page, and possibly "action" (contact form etc).

For starters/experiments, this could be based on localhost - but eventually, it will need to work using arbitrary websites.
If you are working on this, you should inform @merwanehamadi so that he can update the list of challenges accordingly - to prevent others from working on the same problem, also do state clearly if you'd like to team up with others on the same challenge

waynehamadi · 2023-05-23T22:51:27Z

@Noots123 thanks for the suggestion !
Anyone knows what to do to create a challenge around this ?
FYI @Boostrix @BaseInfinity @anonhostpi @zachary-kaelan

waynehamadi · 2023-06-14T18:55:26Z

Mind2Web is a ground breaking news for web navigation agents : https://osu-nlp-group.github.io/Mind2Web/
@xiang-deng, @ysu1989 thank you for your work, hopefully I will look into this soon

github-actions · 2023-09-06T20:52:54Z

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

BaseInfinity · 2023-09-11T23:01:12Z

I would love to get some momentum on this again. Recently I played around with HyperWriteAI and their autonomous browser atm is the best I have played with. Here's a small demo

HyperWriteAI is closed source and ChromeGPT is a bit slow so I think AutoGPT can make a splash here

I mentioned this to @Pwuts and he put it on his radar, he mentioned having a selenium wrapper that could do something similar to ChromeGPT's implementation:
https://discord.com/channels/1092243196446249134/1111659953493651547/1149034954488037469

I don't see why AutoGPT can't have as good browser navigation as HyperWriteAI, and it would be a huge win for the OS community.

The whole reason why I've been going down this rabbit hole is to experiment with generating E2E tests given a "task". I've been able to prototype with ChromeGPT but would prefer a solution inside of the AutoGPT repo.

Plus I feel like one of the best things about autonomous agents is being able to browse the web effectively and is something people want which you can tell by the hype HyperwriteAI got when it released (it was number 4 on Product Hunt that day). General public went nuts over it understandably. Seeing your browser perform a task like a human in front of your eyes is powerful stuff lol

Any ways hopefully this sparks up some convo and momentum in the browser navigation area and if there's any ways I can help push it forward =)

github-actions · 2023-11-01T01:47:03Z

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

BaseInfinity · 2023-11-02T04:04:09Z

bump, unfortunately I haven't had time to dig into this

ballonJourn · 2023-12-27T15:18:09Z

Beautiful show ! That's my forte @BaseInfinity

BaseInfinity · 2024-01-03T18:06:10Z

@ballonJourn while I haven't been able to make progress in this repo, I have been able to make progress in others that support AutoGPT.

Here's my latest prototype, PlaywrightGPT:
https://www.youtube.com/watch?v=DH9cIm1qfug

I imagine a future where tests can self repair themselves and even generate code diffs on code change in CI.

github-actions · 2024-02-28T01:45:20Z

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

github-actions · 2024-03-13T01:45:54Z

This issue was closed automatically because it has been stale for 10 days with no activity.

waynehamadi mentioned this issue May 7, 2023

Help us build challenges! #3835

Closed

anonhostpi mentioned this issue May 7, 2023

[Challenge Creation] Information Retrieval Challenge B #3837

Closed

1 task

Boostrix mentioned this issue May 11, 2023

Improve reasoning based on web data #3029

Closed

1 task

Boostrix mentioned this issue Jul 3, 2023

Fill out forms and interact with a webpage using GPT functions? #4739

Closed

1 task

github-actions bot added the Stale label Sep 6, 2023

github-actions bot removed the Stale label Sep 12, 2023

github-actions bot added the Stale label Nov 1, 2023

github-actions bot removed the Stale label Nov 3, 2023

Pwuts mentioned this issue Jan 9, 2024

feat(benchmark): JungleGym WebArena #6691

Merged

9 tasks

github-actions bot added the Stale label Feb 28, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web Navigation Challenge #3936

Web Navigation Challenge #3936

waynehamadi commented May 7, 2023

anonhostpi commented May 7, 2023

waynehamadi commented May 7, 2023 •

edited

Loading

Boostrix commented May 7, 2023

anonhostpi commented May 7, 2023

Boostrix commented May 7, 2023 •

edited

Loading

waynehamadi commented May 7, 2023 •

edited

Loading

Boostrix commented May 7, 2023

zachary-kaelan commented May 9, 2023

Boostrix commented May 9, 2023 •

edited

Loading

zachary-kaelan commented May 10, 2023

Boostrix commented May 10, 2023 •

edited

Loading

anonhostpi commented May 11, 2023 •

edited

Loading

Boostrix commented May 11, 2023

anonhostpi commented May 11, 2023

anonhostpi commented May 11, 2023

waynehamadi commented May 14, 2023

Boostrix commented May 14, 2023

BaseInfinity commented May 15, 2023 •

edited

Loading

Boostrix commented May 15, 2023 •

edited

Loading

waynehamadi commented May 23, 2023

waynehamadi commented Jun 14, 2023

github-actions bot commented Sep 6, 2023

BaseInfinity commented Sep 11, 2023 •

edited

Loading

github-actions bot commented Nov 1, 2023

BaseInfinity commented Nov 2, 2023

ballonJourn commented Dec 27, 2023

BaseInfinity commented Jan 3, 2024

github-actions bot commented Feb 28, 2024

github-actions bot commented Mar 13, 2024

Web Navigation Challenge #3936

Web Navigation Challenge #3936

Comments

waynehamadi commented May 7, 2023

Duplicates

Summary 💡

Examples 🌈

Motivation 🔦

anonhostpi commented May 7, 2023

waynehamadi commented May 7, 2023 • edited Loading

Boostrix commented May 7, 2023

anonhostpi commented May 7, 2023

Boostrix commented May 7, 2023 • edited Loading

waynehamadi commented May 7, 2023 • edited Loading

Boostrix commented May 7, 2023

zachary-kaelan commented May 9, 2023

Boostrix commented May 9, 2023 • edited Loading

zachary-kaelan commented May 10, 2023

Boostrix commented May 10, 2023 • edited Loading

anonhostpi commented May 11, 2023 • edited Loading

Boostrix commented May 11, 2023

anonhostpi commented May 11, 2023

anonhostpi commented May 11, 2023

waynehamadi commented May 14, 2023

Boostrix commented May 14, 2023

BaseInfinity commented May 15, 2023 • edited Loading

Boostrix commented May 15, 2023 • edited Loading

waynehamadi commented May 23, 2023

waynehamadi commented Jun 14, 2023

github-actions bot commented Sep 6, 2023

BaseInfinity commented Sep 11, 2023 • edited Loading

github-actions bot commented Nov 1, 2023

BaseInfinity commented Nov 2, 2023

ballonJourn commented Dec 27, 2023

BaseInfinity commented Jan 3, 2024

github-actions bot commented Feb 28, 2024

github-actions bot commented Mar 13, 2024

waynehamadi commented May 7, 2023 •

edited

Loading

Boostrix commented May 7, 2023 •

edited

Loading

waynehamadi commented May 7, 2023 •

edited

Loading

Boostrix commented May 9, 2023 •

edited

Loading

Boostrix commented May 10, 2023 •

edited

Loading

anonhostpi commented May 11, 2023 •

edited

Loading

BaseInfinity commented May 15, 2023 •

edited

Loading

Boostrix commented May 15, 2023 •

edited

Loading

BaseInfinity commented Sep 11, 2023 •

edited

Loading