Skip to content

Commit

Permalink
WebSurfer Updated (Selenium, Playwright, and support for many filetyp…
Browse files Browse the repository at this point in the history
…es) (#1929)

* Feat/headless browser (retargeted) (#1832)

* Add headless browser to the WebSurferAgent, closes #1481

* replace soup.get_text() with markdownify.MarkdownConverter().convert_soup(soup)

* import HeadlessChromeBrowser

* implicitly wait for 10s

* inicrease max. wait time to 99s

* fix: trim trailing whitespace

* test: fix headless tests

* better bing query search

* docs: add example 3 for headless option

---------

Co-authored-by: Vijay Ramesh <[email protected]>

* Handle missing Selenium package.

* Added browser_chat.py example to simplify testing.

* Based browser on mdconvert. (#1847)

* Based browser on mdconvert.

* Updated web_surfer.

* Renamed HeadlessChromeBrowser to SeleniumChromeBrowser

* Added an initial POC with Playwright.

* Separated Bing search into it's own utility module.

* Simple browser now uses Bing tools.

* Updated Playwright browser to inherit from SimpleTextBrowser

* Got Selenium working too.

* Renamed classes and files for consistency.

* Added more instructions.

* Initial work to support other search providers.

* Added some basic behavior when the BING_API_KEY is missing.

* Cleaned up some search results.

* Moved to using the request.Sessions object. Moved Bing SERP paring to mdconvert to be more broadly useful.

* Added backward compatibility to WebSurferAgent

* Selenium and Playwright now grab the whole DOM, not jus the body, allowing the converters access to metadata.

* Fixed printing of page titles in Playwright.

* Moved installation of WebSurfer dependencies to contrib-tests.yml

* Fixing pre-commit issues.

* Reverting conversable_agent, which should not have been changed in prior commit.

* Added RequestMarkdownBrowser tests.

* Fixed a bug with Bing search, and added search test cases.

* Added tests for Bing search.

* Added tests for md_convert

* Added test files.

* Added missing pptx.

* Added more tests for WebSurfer coverage.

* Fixed guard on requests_markdown_browser test.

* Updated test coverage for mdconvert.

* Fix brwser_utils tests.

* Removed image test from browser, since exiftool isn't installed on test machine.

* Removed image test from browser, since exiftool isn't installed on test machine.

* Disable Selenium GPU and sandbox to ensure it runs headless in Docker.

* Added option for Bing API results to be interleaved (as Bing specifies), or presented in a categorized list (Web, News, Videos), etc

* Print more details when requests exceptions are thrown.

* Added additional documentation to markdown_search

* Added documentation to the selenium_markdown_browser.

* Added documentation to playwright_markdown_browser.py

* Added documentation to requests_markdown_browser

* Added documentation to mdconvert.py

* Updated agentchat_surfer notebook.

* Update .github/workflows/contrib-tests.yml

Co-authored-by: Davor Runje <[email protected]>

* Merge main. Resolve conflicts.

* Resolve pre-commit checks.

* Removed offending LFS file.

* Re-added offending LFS file.

* Fixed browser_utils tests.

* Fixed style errors.

---------

Co-authored-by: Asapanna Rakesh <[email protected]>
Co-authored-by: Vijay Ramesh <[email protected]>
Co-authored-by: Eric Zhu <[email protected]>
Co-authored-by: Davor Runje <[email protected]>
  • Loading branch information
5 people authored Sep 25, 2024
1 parent 2e1f788 commit 0d5163b
Show file tree
Hide file tree
Showing 26 changed files with 5,471 additions and 607 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/contrib-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,9 @@ jobs:
- name: Install packages and dependencies for RetrieveChat
run: |
pip install -e .[retrievechat]
- name: Install packages and dependencies for WebSurfer and browser_utils
run: |
pip install -e .[test,websurfer]
- name: Set AUTOGEN_USE_DOCKER based on OS
shell: bash
run: |
Expand Down Expand Up @@ -275,7 +278,7 @@ jobs:
fi
- name: Coverage
run: |
pytest test/test_browser_utils.py test/agentchat/contrib/test_web_surfer.py --skip-openai
pytest test/browser_utils test/agentchat/contrib/test_web_surfer.py --skip-openai
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
Expand Down
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ repos:
website/docs/tutorial/code-executors.ipynb |
website/docs/topics/code-execution/custom-executor.ipynb |
website/docs/topics/non-openai-models/cloud-gemini.ipynb |
notebook/.*
notebook/.* |
test/browser_utils/test_files/.*
)$
# See https://jaredkhan.com/blog/mypy-pre-commit
- repo: local
Expand Down
105 changes: 88 additions & 17 deletions autogen/agentchat/contrib/web_surfer.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
import copy
import json
import logging
import re
from dataclasses import dataclass
from datetime import datetime
from typing import Any, Callable, Dict, List, Literal, Optional, Tuple, Union

from typing_extensions import Annotated

from ... import Agent, AssistantAgent, ConversableAgent, GroupChat, GroupChatManager, OpenAIWrapper, UserProxyAgent
from ...browser_utils import SimpleTextBrowser
from ...browser_utils import AbstractMarkdownBrowser, BingMarkdownSearch, RequestsMarkdownBrowser
from ...code_utils import content_str
from ...oai.openai_utils import filter_config
from ...token_count_utils import count_token, get_max_token_limit
Expand All @@ -20,12 +18,9 @@
class WebSurferAgent(ConversableAgent):
"""(In preview) An agent that acts as a basic web surfer that can search the web and visit web pages."""

DEFAULT_PROMPT = (
"You are a helpful AI assistant with access to a web browser (via the provided functions). In fact, YOU ARE THE ONLY MEMBER OF YOUR PARTY WITH ACCESS TO A WEB BROWSER, so please help out where you can by performing web searches, navigating pages, and reporting what you find. Today's date is "
+ datetime.now().date().isoformat()
)
DEFAULT_PROMPT = "You are a helpful AI assistant with access to a web browser (via the provided functions). In fact, YOU ARE THE ONLY MEMBER OF YOUR PARTY WITH ACCESS TO A WEB BROWSER, so please help out where you can by performing web searches, navigating pages, and reporting what you find."

DEFAULT_DESCRIPTION = "A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, navigate to Wikipedia, answer questions from pages, and or generate summaries."
DEFAULT_DESCRIPTION = "A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, navigate to Wikipedia, download files, etc. Once on a desired page, ask them to answer questions by reading the page, generate summaries, find specific words or phrases on the page (ctrl+f), or even just scroll up or down in the viewport."

def __init__(
self,
Expand All @@ -40,7 +35,8 @@ def __init__(
llm_config: Optional[Union[Dict, Literal[False]]] = None,
summarizer_llm_config: Optional[Union[Dict, Literal[False]]] = None,
default_auto_reply: Optional[Union[str, Dict, None]] = "",
browser_config: Optional[Union[Dict, None]] = None,
browser_config: Optional[Union[Dict, None]] = None, # Deprecated
browser: Optional[Union[AbstractMarkdownBrowser, None]] = None,
**kwargs,
):
super().__init__(
Expand All @@ -60,11 +56,39 @@ def __init__(
self._create_summarizer_client(summarizer_llm_config, llm_config)

# Create the browser
self.browser = SimpleTextBrowser(**(browser_config if browser_config else {}))
if browser_config is not None:
if browser is not None:
raise ValueError(
"WebSurferAgent cannot accept both a 'browser_config' (deprecated) parameter and 'browser' parameter at the same time. Use only one or the other."
)

inner_llm_config = copy.deepcopy(llm_config)
# Print a warning
logger.warning(
"Warning: the parameter 'browser_config' in WebSurferAgent.__init__() is deprecated. Use 'browser' instead."
)

# Update the settings to the new format
_bconfig = {}
_bconfig.update(browser_config)

if "bing_api_key" in _bconfig:
_bconfig["search_engine"] = BingMarkdownSearch(
bing_api_key=_bconfig["bing_api_key"], interleave_results=False
)
del _bconfig["bing_api_key"]
else:
_bconfig["search_engine"] = BingMarkdownSearch()

if "request_kwargs" in _bconfig:
_bconfig["requests_get_kwargs"] = _bconfig["request_kwargs"]
del _bconfig["request_kwargs"]

self.browser = RequestsMarkdownBrowser(**_bconfig)
else:
self.browser = browser

# Set up the inner monologue
inner_llm_config = copy.deepcopy(llm_config)
self._assistant = AssistantAgent(
self.name + "_inner_assistant",
system_message=system_message, # type: ignore[arg-type]
Expand Down Expand Up @@ -130,6 +154,7 @@ def _browser_state() -> Tuple[str, str]:
total_pages = len(self.browser.viewport_pages)

header += f"Viewport position: Showing page {current_page+1} of {total_pages}.\n"

return (header, self.browser.viewport)

@self._user_proxy.register_for_execution()
Expand All @@ -138,7 +163,7 @@ def _browser_state() -> Tuple[str, str]:
description="Perform an INFORMATIONAL web search query then return the search results.",
)
def _informational_search(query: Annotated[str, "The informational web search query to perform."]) -> str:
self.browser.visit_page(f"bing: {query}")
self.browser.visit_page(f"search: {query}")
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content

Expand All @@ -148,9 +173,9 @@ def _informational_search(query: Annotated[str, "The informational web search qu
description="Perform a NAVIGATIONAL web search query then immediately navigate to the top result. Useful, for example, to navigate to a particular Wikipedia article or other known destination. Equivalent to Google's \"I'm Feeling Lucky\" button.",
)
def _navigational_search(query: Annotated[str, "The navigational web search query to perform."]) -> str:
self.browser.visit_page(f"bing: {query}")
self.browser.visit_page(f"search: {query}")

# Extract the first linl
# Extract the first link
m = re.search(r"\[.*?\]\((http.*?)\)", self.browser.page_content)
if m:
self.browser.visit_page(m.group(1))
Expand All @@ -168,6 +193,15 @@ def _visit_page(url: Annotated[str, "The relative or absolute url of the webapge
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="download_file", description="Download a file at a given URL and, if possible, return its text."
)
def _download_file(url: Annotated[str, "The relative or absolute url of the file to be downloaded."]) -> str:
self.browser.visit_page(url)
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="page_up",
Expand All @@ -188,14 +222,51 @@ def _page_down() -> str:
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="find_on_page_ctrl_f",
description="Scroll the viewport to the first occurrence of the search string. This is equivalent to Ctrl+F.",
)
def _find_on_page_ctrl_f(
search_string: Annotated[
str, "The string to search for on the page. This search string supports wildcards like '*'"
]
) -> str:
find_result = self.browser.find_on_page(search_string)
header, content = _browser_state()

if find_result is None:
return (
header.strip()
+ "\n=======================\nThe search string '"
+ search_string
+ "' was not found on this page."
)
else:
return header.strip() + "\n=======================\n" + content

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="find_next",
description="Scroll the viewport to next occurrence of the search string.",
)
def _find_next() -> str:
find_result = self.browser.find_next()
header, content = _browser_state()

if find_result is None:
return header.strip() + "\n=======================\nThe search string was not found on this page."
else:
return header.strip() + "\n=======================\n" + content

if self.summarization_client is not None:

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="answer_from_page",
name="read_page_and_answer",
description="Uses AI to read the page and directly answer a given question based on the content.",
)
def _answer_from_page(
def _read_page_and_answer(
question: Annotated[Optional[str], "The question to directly answer."],
url: Annotated[Optional[str], "[Optional] The url of the page. (Defaults to the current page)"] = None,
) -> str:
Expand Down Expand Up @@ -256,7 +327,7 @@ def _summarize_page(
Optional[str], "[Optional] The url of the page to summarize. (Defaults to current page)"
] = None,
) -> str:
return _answer_from_page(url=url, question=None)
return _read_page_and_answer(url=url, question=None)

def generate_surfer_reply(
self,
Expand Down
Loading

0 comments on commit 0d5163b

Please sign in to comment.