WebSurfer Updated (Selenium, Playwright, and support for many filetyp…

…es) (#1929) * Feat/headless browser (retargeted) (#1832) * Add headless browser to the WebSurferAgent, closes #1481 * replace soup.get_text() with markdownify.MarkdownConverter().convert_soup(soup) * import HeadlessChromeBrowser * implicitly wait for 10s * inicrease max. wait time to 99s * fix: trim trailing whitespace * test: fix headless tests * better bing query search * docs: add example 3 for headless option --------- Co-authored-by: Vijay Ramesh <[email protected]> * Handle missing Selenium package. * Added browser_chat.py example to simplify testing. * Based browser on mdconvert. (#1847) * Based browser on mdconvert. * Updated web_surfer. * Renamed HeadlessChromeBrowser to SeleniumChromeBrowser * Added an initial POC with Playwright. * Separated Bing search into it's own utility module. * Simple browser now uses Bing tools. * Updated Playwright browser to inherit from SimpleTextBrowser * Got Selenium working too. * Renamed classes and files for consistency. * Added more instructions. * Initial work to support other search providers. * Added some basic behavior when the BING_API_KEY is missing. * Cleaned up some search results. * Moved to using the request.Sessions object. Moved Bing SERP paring to mdconvert to be more broadly useful. * Added backward compatibility to WebSurferAgent * Selenium and Playwright now grab the whole DOM, not jus the body, allowing the converters access to metadata. * Fixed printing of page titles in Playwright. * Moved installation of WebSurfer dependencies to contrib-tests.yml * Fixing pre-commit issues. * Reverting conversable_agent, which should not have been changed in prior commit. * Added RequestMarkdownBrowser tests. * Fixed a bug with Bing search, and added search test cases. * Added tests for Bing search. * Added tests for md_convert * Added test files. * Added missing pptx. * Added more tests for WebSurfer coverage. * Fixed guard on requests_markdown_browser test. * Updated test coverage for mdconvert. * Fix brwser_utils tests. * Removed image test from browser, since exiftool isn't installed on test machine. * Removed image test from browser, since exiftool isn't installed on test machine. * Disable Selenium GPU and sandbox to ensure it runs headless in Docker. * Added option for Bing API results to be interleaved (as Bing specifies), or presented in a categorized list (Web, News, Videos), etc * Print more details when requests exceptions are thrown. * Added additional documentation to markdown_search * Added documentation to the selenium_markdown_browser. * Added documentation to playwright_markdown_browser.py * Added documentation to requests_markdown_browser * Added documentation to mdconvert.py * Updated agentchat_surfer notebook. * Update .github/workflows/contrib-tests.yml Co-authored-by: Davor Runje <[email protected]> * Merge main. Resolve conflicts. * Resolve pre-commit checks. * Removed offending LFS file. * Re-added offending LFS file. * Fixed browser_utils tests. * Fixed style errors. --------- Co-authored-by: Asapanna Rakesh <[email protected]> Co-authored-by: Vijay Ramesh <[email protected]> Co-authored-by: Eric Zhu <[email protected]> Co-authored-by: Davor Runje <[email protected]>
microsoft · Sep 25, 2024 · 0d5163b · 0d5163b
1 parent 2e1f788
commit 0d5163b
Show file tree

Hide file tree

Showing 26 changed files with 5,471 additions and 607 deletions.
diff --git a/.github/workflows/contrib-tests.yml b/.github/workflows/contrib-tests.yml
@@ -134,6 +134,9 @@ jobs:
       - name: Install packages and dependencies for RetrieveChat
         run: |
           pip install -e .[retrievechat]
+      - name: Install packages and dependencies for WebSurfer and browser_utils
+        run: |
+          pip install -e .[test,websurfer]
       - name: Set AUTOGEN_USE_DOCKER based on OS
         shell: bash
         run: |
@@ -275,7 +278,7 @@ jobs:
           fi
       - name: Coverage
         run: |
-          pytest test/test_browser_utils.py test/agentchat/contrib/test_web_surfer.py --skip-openai
+          pytest test/browser_utils test/agentchat/contrib/test_web_surfer.py --skip-openai
       - name: Upload coverage to Codecov
         uses: codecov/codecov-action@v3
         with:

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -46,7 +46,8 @@ repos:
               website/docs/tutorial/code-executors.ipynb |
               website/docs/topics/code-execution/custom-executor.ipynb |
               website/docs/topics/non-openai-models/cloud-gemini.ipynb |
-              notebook/.*
+              notebook/.* |
+              test/browser_utils/test_files/.*
             )$
   # See https://jaredkhan.com/blog/mypy-pre-commit
   - repo: local

diff --git a/autogen/agentchat/contrib/web_surfer.py b/autogen/agentchat/contrib/web_surfer.py
@@ -1,15 +1,13 @@
 import copy
-import json
 import logging
 import re
 from dataclasses import dataclass
-from datetime import datetime
 from typing import Any, Callable, Dict, List, Literal, Optional, Tuple, Union
 
 from typing_extensions import Annotated
 
 from ... import Agent, AssistantAgent, ConversableAgent, GroupChat, GroupChatManager, OpenAIWrapper, UserProxyAgent
-from ...browser_utils import SimpleTextBrowser
+from ...browser_utils import AbstractMarkdownBrowser, BingMarkdownSearch, RequestsMarkdownBrowser
 from ...code_utils import content_str
 from ...oai.openai_utils import filter_config
 from ...token_count_utils import count_token, get_max_token_limit
@@ -20,12 +18,9 @@
 class WebSurferAgent(ConversableAgent):
     """(In preview) An agent that acts as a basic web surfer that can search the web and visit web pages."""
 
-    DEFAULT_PROMPT = (
-        "You are a helpful AI assistant with access to a web browser (via the provided functions). In fact, YOU ARE THE ONLY MEMBER OF YOUR PARTY WITH ACCESS TO A WEB BROWSER, so please help out where you can by performing web searches, navigating pages, and reporting what you find. Today's date is "
-        + datetime.now().date().isoformat()
-    )
+    DEFAULT_PROMPT = "You are a helpful AI assistant with access to a web browser (via the provided functions). In fact, YOU ARE THE ONLY MEMBER OF YOUR PARTY WITH ACCESS TO A WEB BROWSER, so please help out where you can by performing web searches, navigating pages, and reporting what you find."
 
-    DEFAULT_DESCRIPTION = "A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, navigate to Wikipedia, answer questions from pages, and or generate summaries."
+    DEFAULT_DESCRIPTION = "A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, navigate to Wikipedia, download files, etc. Once on a desired page, ask them to answer questions by reading the page, generate summaries, find specific words or phrases on the page (ctrl+f), or even just scroll up or down in the viewport."
 
     def __init__(
         self,
@@ -40,7 +35,8 @@ def __init__(
         llm_config: Optional[Union[Dict, Literal[False]]] = None,
         summarizer_llm_config: Optional[Union[Dict, Literal[False]]] = None,
         default_auto_reply: Optional[Union[str, Dict, None]] = "",
-        browser_config: Optional[Union[Dict, None]] = None,
+        browser_config: Optional[Union[Dict, None]] = None,  # Deprecated
+        browser: Optional[Union[AbstractMarkdownBrowser, None]] = None,
         **kwargs,
     ):
         super().__init__(
@@ -60,11 +56,39 @@ def __init__(
         self._create_summarizer_client(summarizer_llm_config, llm_config)
 
         # Create the browser
-        self.browser = SimpleTextBrowser(**(browser_config if browser_config else {}))
+        if browser_config is not None:
+            if browser is not None:
+                raise ValueError(
+                    "WebSurferAgent cannot accept both a 'browser_config' (deprecated) parameter and 'browser' parameter at the same time. Use only one or the other."
+                )
 
-        inner_llm_config = copy.deepcopy(llm_config)
+            # Print a warning
+            logger.warning(
+                "Warning: the parameter 'browser_config' in WebSurferAgent.__init__() is deprecated. Use 'browser' instead."
+            )
+
+            # Update the settings to the new format
+            _bconfig = {}
+            _bconfig.update(browser_config)
+
+            if "bing_api_key" in _bconfig:
+                _bconfig["search_engine"] = BingMarkdownSearch(
+                    bing_api_key=_bconfig["bing_api_key"], interleave_results=False
+                )
+                del _bconfig["bing_api_key"]
+            else:
+                _bconfig["search_engine"] = BingMarkdownSearch()
+
+            if "request_kwargs" in _bconfig:
+                _bconfig["requests_get_kwargs"] = _bconfig["request_kwargs"]
+                del _bconfig["request_kwargs"]
+
+            self.browser = RequestsMarkdownBrowser(**_bconfig)
+        else:
+            self.browser = browser
 
         # Set up the inner monologue
+        inner_llm_config = copy.deepcopy(llm_config)
         self._assistant = AssistantAgent(
             self.name + "_inner_assistant",
             system_message=system_message,  # type: ignore[arg-type]
@@ -130,6 +154,7 @@ def _browser_state() -> Tuple[str, str]:
             total_pages = len(self.browser.viewport_pages)
 
             header += f"Viewport position: Showing page {current_page+1} of {total_pages}.\n"
+
             return (header, self.browser.viewport)
 
         @self._user_proxy.register_for_execution()
@@ -138,7 +163,7 @@ def _browser_state() -> Tuple[str, str]:
             description="Perform an INFORMATIONAL web search query then return the search results.",
         )
         def _informational_search(query: Annotated[str, "The informational web search query to perform."]) -> str:
-            self.browser.visit_page(f"bing: {query}")
+            self.browser.visit_page(f"search: {query}")
             header, content = _browser_state()
             return header.strip() + "\n=======================\n" + content
 
@@ -148,9 +173,9 @@ def _informational_search(query: Annotated[str, "The informational web search qu
             description="Perform a NAVIGATIONAL web search query then immediately navigate to the top result. Useful, for example, to navigate to a particular Wikipedia article or other known destination. Equivalent to Google's \"I'm Feeling Lucky\" button.",
         )
         def _navigational_search(query: Annotated[str, "The navigational web search query to perform."]) -> str:
-            self.browser.visit_page(f"bing: {query}")
+            self.browser.visit_page(f"search: {query}")
 
-            # Extract the first linl
+            # Extract the first link
             m = re.search(r"\[.*?\]\((http.*?)\)", self.browser.page_content)
             if m:
                 self.browser.visit_page(m.group(1))
@@ -168,6 +193,15 @@ def _visit_page(url: Annotated[str, "The relative or absolute url of the webapge
             header, content = _browser_state()
             return header.strip() + "\n=======================\n" + content
 
+        @self._user_proxy.register_for_execution()
+        @self._assistant.register_for_llm(
+            name="download_file", description="Download a file at a given URL and, if possible, return its text."
+        )
+        def _download_file(url: Annotated[str, "The relative or absolute url of the file to be downloaded."]) -> str:
+            self.browser.visit_page(url)
+            header, content = _browser_state()
+            return header.strip() + "\n=======================\n" + content
+
         @self._user_proxy.register_for_execution()
         @self._assistant.register_for_llm(
             name="page_up",
@@ -188,14 +222,51 @@ def _page_down() -> str:
             header, content = _browser_state()
             return header.strip() + "\n=======================\n" + content
 
+        @self._user_proxy.register_for_execution()
+        @self._assistant.register_for_llm(
+            name="find_on_page_ctrl_f",
+            description="Scroll the viewport to the first occurrence of the search string. This is equivalent to Ctrl+F.",
+        )
+        def _find_on_page_ctrl_f(
+            search_string: Annotated[
+                str, "The string to search for on the page. This search string supports wildcards like '*'"
+            ]
+        ) -> str:
+            find_result = self.browser.find_on_page(search_string)
+            header, content = _browser_state()
+
+            if find_result is None:
+                return (
+                    header.strip()
+                    + "\n=======================\nThe search string '"
+                    + search_string
+                    + "' was not found on this page."
+                )
+            else:
+                return header.strip() + "\n=======================\n" + content
+
+        @self._user_proxy.register_for_execution()
+        @self._assistant.register_for_llm(
+            name="find_next",
+            description="Scroll the viewport to next occurrence of the search string.",
+        )
+        def _find_next() -> str:
+            find_result = self.browser.find_next()
+            header, content = _browser_state()
+
+            if find_result is None:
+                return header.strip() + "\n=======================\nThe search string was not found on this page."
+            else:
+                return header.strip() + "\n=======================\n" + content
+
         if self.summarization_client is not None:
 
             @self._user_proxy.register_for_execution()
             @self._assistant.register_for_llm(
-                name="answer_from_page",
+                name="read_page_and_answer",
                 description="Uses AI to read the page and directly answer a given question based on the content.",
             )
-            def _answer_from_page(
+            def _read_page_and_answer(
                 question: Annotated[Optional[str], "The question to directly answer."],
                 url: Annotated[Optional[str], "[Optional] The url of the page. (Defaults to the current page)"] = None,
             ) -> str:
@@ -256,7 +327,7 @@ def _summarize_page(
                     Optional[str], "[Optional] The url of the page to summarize. (Defaults to current page)"
                 ] = None,
             ) -> str:
-                return _answer_from_page(url=url, question=None)
+                return _read_page_and_answer(url=url, question=None)
 
     def generate_surfer_reply(
         self,