Add headless browser to the WebSurferAgent #1534

vijaykramesh · 2024-02-05T00:43:08Z

Why are these changes needed?

This adds a simple selenium driven headless browser to the WebSurferAgent. It does not yet make this headless browser agent multi-modal (e.g., it can't do anything with images in the browser) but this should work much better for javascript powered websites than the current SimpleTextBrowser implementation.

This also refactors the existing SImpleTextBrowser implementation a bit to have both share a common base class, and adds unit tests across both the existing and new HeadlessChromeBrowser implementation.

https://app.codecov.io/github/microsoft/autogen/pull/1534 shows the coverage additions there

Related issue number

Closes #1481

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

codecov-commenter · 2024-02-05T00:44:27Z

Codecov Report

Attention: Patch coverage is 88.11189% with 17 lines in your changes missing coverage. Please review.

Project coverage is 48.00%. Comparing base (26daa18) to head (b0ab6c1).
Report is 734 commits behind head on 0.2.

Files with missing lines	Patch %	Lines
autogen/browser_utils/headless_chrome_browser.py	90.42%	7 Missing and 2 partials ⚠️
autogen/browser_utils/abstract_browser.py	73.33%	8 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##              0.2    #1534       +/-   ##
===========================================
+ Coverage   34.26%   48.00%   +13.73%     
===========================================
  Files          42       45        +3     
  Lines        5099     5225      +126     
  Branches     1165     1261       +96     
===========================================
+ Hits         1747     2508      +761     
+ Misses       3209     2513      -696     
- Partials      143      204       +61

Flag	Coverage Δ
unittests	`47.94% <88.11%> (+13.68%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

afourney · 2024-02-05T17:38:26Z

First of all, this looks fantastic. Thanks so much for working on this.

I will dig in and try this out as soon as possible today. We've discussed some possibilities on Discord already, but those are arguably future improvements. I think driving the browser, and using innerHTML are perfect first steps, and are completely sufficient for a first PR.

One thing to test will be PDFs. In many benchmark scenarios, PDFs are the final document sought by browsing, but they don't have innerHTML. As a first step, simply downloading them to the Downloads folder might be sufficient.

afourney · 2024-02-07T05:42:17Z

Just adding some notes while I continue to test this:

To install chrome in Docker or WSL:

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb

Resolve dependencies with:
sudo apt -f install

Then try this again:
sudo dpkg -i google-chrome-stable_current_amd64.deb

afourney · 2024-02-07T06:23:32Z

autogen/browser_utils/headless_chrome_browser.py

+            script.extract()
+
+        # Convert to text
+        text = soup.get_text()


At least for now this should be:

text = markdownify.MarkdownConverter().convert_soup(soup)

GPT models reason about markdown well, and we want to preserve links etc.

afourney · 2024-02-07T06:27:02Z

autogen/browser_utils/headless_chrome_browser.py

+        if uri_or_path.startswith("bing:"):
+            self._bing_search(uri_or_path[len("bing:") :].strip())
+        else:
+            self.driver.get(uri_or_path)


In the SimpleTextBrowser setting this would also process the content. I recommend we maintain this behavior (again, at least for now)

afourney · 2024-02-07T06:29:44Z

autogen/browser_utils/headless_chrome_browser.py

+
+    def _split_pages(self):
+        # Split only regular pages
+        if not self.address.startswith("http:") and not self.address.startswith("https:"):


One intention here with SimpleTextBrowser is to not break up web search results across multiple viewports. In SimpleTextBrowser, that is indicated by "bing:" URLs, but this won't work here. I would recommend some other mechanism to ensure search results are kept whole to maintain the behavior.

afourney · 2024-02-07T06:33:59Z

autogen/browser_utils/headless_chrome_browser.py

+    def _bing_search(self, query):
+        self.driver.get("https://www.bing.com")
+
+        search_bar = self.driver.find_element(By.NAME, "q")


Clever, but maybe we can just navigate directly to "https://www.bing.com/search?q=" with the query provided in the url itself?

Also, an advantage of using the API is that we can render results to Markdown cleanly however we want. that may not happen here. I recommend using the API is we have a key available, otherwise reverting to this method.

We may want to use implicit waits with driver.implicitly_wait(10).

Given the dynamic nature of the page, calling 126 right after 124 will cause NoSuchElementException.

"Note that as soon as the element is located, the driver will return the element reference and the code will continue executing, so a larger implicit wait value won’t necessarily increase the duration of the session.". So, here implicitly_wait(10) will not always wait for 10s. It will stop waiting as soon as the element is located.

gitguardian · 2024-07-20T21:22:01Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}}

rysweet · 2024-10-10T21:08:38Z

This PR is against AutoGen 0.2. AutoGen 0.2 has been moved to the 0.2 branch. Please rebase your PR on the 0.2 branch or update it to work with the new AutoGen 0.4 that is now in main.

rysweet · 2024-10-11T21:53:31Z

@vijaykramesh closing as stale, please update addressing reviews if you would like to reopen

vijaykramesh had a problem deploying to openai1 February 5, 2024 00:43 — with GitHub Actions Failure

vijaykramesh changed the title ~~Add headless browser to the WebSurferAgent, closes #1481~~ Draft: Add headless browser to the WebSurferAgent Feb 5, 2024

victordibia requested a review from afourney February 5, 2024 05:53

vijaykramesh force-pushed the vr/headless_browser branch from b8e400d to 266a9ce Compare February 6, 2024 00:48

vijaykramesh had a problem deploying to openai1 February 6, 2024 00:48 — with GitHub Actions Failure

vijaykramesh force-pushed the vr/headless_browser branch from 266a9ce to db79a1b Compare February 6, 2024 01:12

vijaykramesh had a problem deploying to openai1 February 6, 2024 01:12 — with GitHub Actions Failure

vijaykramesh had a problem deploying to openai1 February 6, 2024 02:10 — with GitHub Actions Failure

Add headless browser to the WebSurferAgent, closes microsoft#1481

b0ab6c1

vijaykramesh force-pushed the vr/headless_browser branch from 1499ae3 to b0ab6c1 Compare February 6, 2024 02:15

vijaykramesh had a problem deploying to openai1 February 6, 2024 02:15 — with GitHub Actions Failure

vijaykramesh had a problem deploying to openai1 February 6, 2024 02:16 — with GitHub Actions Failure

vijaykramesh changed the title ~~Draft: Add headless browser to the WebSurferAgent~~ Add headless browser to the WebSurferAgent Feb 6, 2024

afourney reviewed Feb 7, 2024

View reviewed changes

This was referenced Mar 9, 2024

WebSurfer Updated (Selenium, Playwright, and support for many filetypes) #1929

Merged

[Roadmap] Web Browsing #2017

Closed

ekzhu changed the base branch from main to 0.2 October 2, 2024 18:30

jackgerrits added the 0.2 Issues which are related to the pre 0.4 codebase label Oct 4, 2024

rysweet added the awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster label Oct 10, 2024

rysweet closed this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add headless browser to the WebSurferAgent #1534

Add headless browser to the WebSurferAgent #1534

vijaykramesh commented Feb 5, 2024 •

edited

Loading

codecov-commenter commented Feb 5, 2024 •

edited

Loading

afourney commented Feb 5, 2024

afourney commented Feb 7, 2024

afourney Feb 7, 2024

afourney Feb 7, 2024

afourney Feb 7, 2024

afourney Feb 7, 2024

INF800 Feb 29, 2024

gitguardian bot commented Jul 20, 2024 •

edited

Loading

rysweet commented Oct 10, 2024

rysweet commented Oct 11, 2024

Add headless browser to the WebSurferAgent #1534

Add headless browser to the WebSurferAgent #1534

Conversation

vijaykramesh commented Feb 5, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

codecov-commenter commented Feb 5, 2024 • edited Loading

Codecov Report

afourney commented Feb 5, 2024

afourney commented Feb 7, 2024

afourney Feb 7, 2024

Choose a reason for hiding this comment

afourney Feb 7, 2024

Choose a reason for hiding this comment

afourney Feb 7, 2024

Choose a reason for hiding this comment

afourney Feb 7, 2024

Choose a reason for hiding this comment

INF800 Feb 29, 2024

Choose a reason for hiding this comment

gitguardian bot commented Jul 20, 2024 • edited Loading

️✅ There are no secrets present in this pull request anymore.

rysweet commented Oct 10, 2024

rysweet commented Oct 11, 2024

vijaykramesh commented Feb 5, 2024 •

edited

Loading

codecov-commenter commented Feb 5, 2024 •

edited

Loading

gitguardian bot commented Jul 20, 2024 •

edited

Loading