Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSurfer Updated (Selenium, Playwright, and support for many filetypes) #1929

Merged
merged 61 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
96683ee
Feat/headless browser (retargeted) (#1832)
INF800 Mar 2, 2024
348d676
Handle missing Selenium package.
afourney Mar 4, 2024
bb7a249
Added browser_chat.py example to simplify testing.
afourney Mar 4, 2024
7535226
Based browser on mdconvert. (#1847)
afourney Mar 4, 2024
8dc2220
Added an initial POC with Playwright.
afourney Mar 4, 2024
4e7e6a5
Merge branch 'main' into headless_web_surfer
afourney Mar 4, 2024
1d96568
Separated Bing search into it's own utility module.
afourney Mar 8, 2024
21b1789
Simple browser now uses Bing tools.
afourney Mar 8, 2024
19bb19c
Updated Playwright browser to inherit from SimpleTextBrowser
afourney Mar 9, 2024
c6a7ee3
Got Selenium working too.
afourney Mar 9, 2024
d5d6644
Renamed classes and files for consistency.
afourney Mar 9, 2024
acb08c3
Added more instructions.
afourney Mar 9, 2024
d19c9c7
Merge branch 'main' into headless_web_surfer
afourney Mar 9, 2024
f595516
Initial work to support other search providers.
afourney Mar 12, 2024
e8e8de0
Merge branch 'headless_web_surfer' of github.com:microsoft/autogen in…
afourney Mar 12, 2024
df4e3e1
Added some basic behavior when the BING_API_KEY is missing.
afourney Mar 12, 2024
e33a2fa
Cleaned up some search results.
afourney Mar 12, 2024
e221a5f
Moved to using the request.Sessions object. Moved Bing SERP paring to…
afourney Mar 12, 2024
35c48fe
Added backward compatibility to WebSurferAgent
afourney Mar 12, 2024
df3ef28
Selenium and Playwright now grab the whole DOM, not jus the body, all…
afourney Mar 12, 2024
0a52483
Fixed printing of page titles in Playwright.
afourney Mar 13, 2024
802f099
Merge branch 'main' into headless_web_surfer
afourney Mar 14, 2024
156e6f7
Moved installation of WebSurfer dependencies to contrib-tests.yml
afourney Mar 14, 2024
8744405
Fixing pre-commit issues.
afourney Mar 14, 2024
3c2a118
Reverting conversable_agent, which should not have been changed in pr…
afourney Mar 14, 2024
92bc064
Added RequestMarkdownBrowser tests.
afourney Mar 14, 2024
87119a4
Fixed a bug with Bing search, and added search test cases.
afourney Mar 15, 2024
ccf37a4
Added tests for Bing search.
afourney Mar 15, 2024
c33ac26
Added tests for md_convert
afourney Mar 15, 2024
6af5ff9
Added test files.
afourney Mar 15, 2024
9581c07
Added missing pptx.
afourney Mar 15, 2024
ecd5329
Added more tests for WebSurfer coverage.
afourney Mar 15, 2024
b5dca7e
Merge branch 'main' into headless_web_surfer
afourney Mar 15, 2024
25c78c0
Fixed guard on requests_markdown_browser test.
afourney Mar 15, 2024
de011b8
Updated test coverage for mdconvert.
afourney Mar 15, 2024
f897bf3
Fix brwser_utils tests.
afourney Mar 16, 2024
3f8c65f
Removed image test from browser, since exiftool isn't installed on te…
afourney Mar 16, 2024
8e6b5e8
Removed image test from browser, since exiftool isn't installed on te…
afourney Mar 16, 2024
b280028
Merge branch 'main' into headless_web_surfer
afourney Mar 18, 2024
d3b6f68
Disable Selenium GPU and sandbox to ensure it runs headless in Docker.
afourney Mar 18, 2024
852ee33
Merge branch 'main' into headless_web_surfer
afourney Mar 27, 2024
f094e69
Added option for Bing API results to be interleaved (as Bing specifie…
afourney Mar 29, 2024
745dc21
Print more details when requests exceptions are thrown.
afourney Mar 29, 2024
fe8fa07
Merge branch 'main' into headless_web_surfer
afourney Apr 1, 2024
7353681
Added additional documentation to markdown_search
afourney Apr 1, 2024
a174d42
Added documentation to the selenium_markdown_browser.
afourney Apr 1, 2024
2f9de28
Added documentation to playwright_markdown_browser.py
afourney Apr 1, 2024
371b991
Added documentation to requests_markdown_browser
afourney Apr 1, 2024
1ac0a4d
Added documentation to mdconvert.py
afourney Apr 1, 2024
2c1398b
Updated agentchat_surfer notebook.
afourney Apr 1, 2024
6ba05c9
Merge branch 'main' into headless_web_surfer
ekzhu Apr 2, 2024
266cefc
Update .github/workflows/contrib-tests.yml
afourney Apr 2, 2024
8a6ebe1
Merge main. Resolve conflicts.
afourney May 20, 2024
ccbdd1b
Merge main. Resolve conflicts.
afourney May 20, 2024
37b3292
Resolve pre-commit checks.
afourney May 20, 2024
b1ca235
Merge branch 'main' into headless_web_surfer
ekzhu May 22, 2024
5304bab
Removed offending LFS file.
afourney Sep 25, 2024
a486843
Re-added offending LFS file.
afourney Sep 25, 2024
0c5a4a9
Merged main.
afourney Sep 25, 2024
764fb3f
Fixed browser_utils tests.
afourney Sep 25, 2024
42fe8f5
Fixed style errors.
afourney Sep 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/contrib-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,9 @@ jobs:
- name: Install packages and dependencies for RetrieveChat
run: |
pip install -e .[retrievechat]
- name: Install packages and dependencies for WebSurfer and browser_utils
jackgerrits marked this conversation as resolved.
Show resolved Hide resolved
run: |
pip install -e .[test,websurfer]
- name: Set AUTOGEN_USE_DOCKER based on OS
shell: bash
run: |
Expand Down Expand Up @@ -275,7 +278,7 @@ jobs:
fi
- name: Coverage
run: |
pytest test/test_browser_utils.py test/agentchat/contrib/test_web_surfer.py --skip-openai
pytest test/browser_utils test/agentchat/contrib/test_web_surfer.py --skip-openai
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
Expand Down
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ repos:
website/docs/tutorial/code-executors.ipynb |
website/docs/topics/code-execution/custom-executor.ipynb |
website/docs/topics/non-openai-models/cloud-gemini.ipynb |
notebook/.*
notebook/.* |
test/browser_utils/test_files/.*
davorrunje marked this conversation as resolved.
Show resolved Hide resolved
)$
# See https://jaredkhan.com/blog/mypy-pre-commit
- repo: local
Expand Down
105 changes: 88 additions & 17 deletions autogen/agentchat/contrib/web_surfer.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
import copy
import json
import logging
jackgerrits marked this conversation as resolved.
Show resolved Hide resolved
import re
from dataclasses import dataclass
from datetime import datetime
from typing import Any, Callable, Dict, List, Literal, Optional, Tuple, Union

from typing_extensions import Annotated

from ... import Agent, AssistantAgent, ConversableAgent, GroupChat, GroupChatManager, OpenAIWrapper, UserProxyAgent
from ...browser_utils import SimpleTextBrowser
from ...browser_utils import AbstractMarkdownBrowser, BingMarkdownSearch, RequestsMarkdownBrowser
from ...code_utils import content_str
from ...oai.openai_utils import filter_config
from ...token_count_utils import count_token, get_max_token_limit
Expand All @@ -20,12 +18,9 @@
class WebSurferAgent(ConversableAgent):
"""(In preview) An agent that acts as a basic web surfer that can search the web and visit web pages."""

DEFAULT_PROMPT = (
"You are a helpful AI assistant with access to a web browser (via the provided functions). In fact, YOU ARE THE ONLY MEMBER OF YOUR PARTY WITH ACCESS TO A WEB BROWSER, so please help out where you can by performing web searches, navigating pages, and reporting what you find. Today's date is "
+ datetime.now().date().isoformat()
)
DEFAULT_PROMPT = "You are a helpful AI assistant with access to a web browser (via the provided functions). In fact, YOU ARE THE ONLY MEMBER OF YOUR PARTY WITH ACCESS TO A WEB BROWSER, so please help out where you can by performing web searches, navigating pages, and reporting what you find."

DEFAULT_DESCRIPTION = "A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, navigate to Wikipedia, answer questions from pages, and or generate summaries."
DEFAULT_DESCRIPTION = "A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, navigate to Wikipedia, download files, etc. Once on a desired page, ask them to answer questions by reading the page, generate summaries, find specific words or phrases on the page (ctrl+f), or even just scroll up or down in the viewport."

def __init__(
self,
Expand All @@ -40,7 +35,8 @@ def __init__(
llm_config: Optional[Union[Dict, Literal[False]]] = None,
summarizer_llm_config: Optional[Union[Dict, Literal[False]]] = None,
default_auto_reply: Optional[Union[str, Dict, None]] = "",
browser_config: Optional[Union[Dict, None]] = None,
browser_config: Optional[Union[Dict, None]] = None, # Deprecated
browser: Optional[Union[AbstractMarkdownBrowser, None]] = None,
**kwargs,
):
super().__init__(
Expand All @@ -60,11 +56,39 @@ def __init__(
self._create_summarizer_client(summarizer_llm_config, llm_config)

# Create the browser
self.browser = SimpleTextBrowser(**(browser_config if browser_config else {}))
if browser_config is not None:
if browser is not None:
raise ValueError(
"WebSurferAgent cannot accept both a 'browser_config' (deprecated) parameter and 'browser' parameter at the same time. Use only one or the other."
)

inner_llm_config = copy.deepcopy(llm_config)
# Print a warning
logger.warning(
"Warning: the parameter 'browser_config' in WebSurferAgent.__init__() is deprecated. Use 'browser' instead."
)

# Update the settings to the new format
_bconfig = {}
_bconfig.update(browser_config)

if "bing_api_key" in _bconfig:
_bconfig["search_engine"] = BingMarkdownSearch(
bing_api_key=_bconfig["bing_api_key"], interleave_results=False
)
del _bconfig["bing_api_key"]
else:
_bconfig["search_engine"] = BingMarkdownSearch()

if "request_kwargs" in _bconfig:
_bconfig["requests_get_kwargs"] = _bconfig["request_kwargs"]
del _bconfig["request_kwargs"]

self.browser = RequestsMarkdownBrowser(**_bconfig)
else:
self.browser = browser

# Set up the inner monologue
inner_llm_config = copy.deepcopy(llm_config)
self._assistant = AssistantAgent(
self.name + "_inner_assistant",
system_message=system_message, # type: ignore[arg-type]
Expand Down Expand Up @@ -130,6 +154,7 @@ def _browser_state() -> Tuple[str, str]:
total_pages = len(self.browser.viewport_pages)

header += f"Viewport position: Showing page {current_page+1} of {total_pages}.\n"

return (header, self.browser.viewport)

@self._user_proxy.register_for_execution()
Expand All @@ -138,7 +163,7 @@ def _browser_state() -> Tuple[str, str]:
description="Perform an INFORMATIONAL web search query then return the search results.",
)
def _informational_search(query: Annotated[str, "The informational web search query to perform."]) -> str:
self.browser.visit_page(f"bing: {query}")
self.browser.visit_page(f"search: {query}")
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content

Expand All @@ -148,9 +173,9 @@ def _informational_search(query: Annotated[str, "The informational web search qu
description="Perform a NAVIGATIONAL web search query then immediately navigate to the top result. Useful, for example, to navigate to a particular Wikipedia article or other known destination. Equivalent to Google's \"I'm Feeling Lucky\" button.",
)
def _navigational_search(query: Annotated[str, "The navigational web search query to perform."]) -> str:
self.browser.visit_page(f"bing: {query}")
self.browser.visit_page(f"search: {query}")

# Extract the first linl
# Extract the first link
m = re.search(r"\[.*?\]\((http.*?)\)", self.browser.page_content)
if m:
self.browser.visit_page(m.group(1))
Expand All @@ -168,6 +193,15 @@ def _visit_page(url: Annotated[str, "The relative or absolute url of the webapge
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="download_file", description="Download a file at a given URL and, if possible, return its text."
)
def _download_file(url: Annotated[str, "The relative or absolute url of the file to be downloaded."]) -> str:
self.browser.visit_page(url)
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="page_up",
Expand All @@ -188,14 +222,51 @@ def _page_down() -> str:
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="find_on_page_ctrl_f",
description="Scroll the viewport to the first occurrence of the search string. This is equivalent to Ctrl+F.",
)
def _find_on_page_ctrl_f(
search_string: Annotated[
str, "The string to search for on the page. This search string supports wildcards like '*'"
]
) -> str:
find_result = self.browser.find_on_page(search_string)
header, content = _browser_state()

if find_result is None:
return (
header.strip()
+ "\n=======================\nThe search string '"
+ search_string
+ "' was not found on this page."
)
else:
return header.strip() + "\n=======================\n" + content

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="find_next",
description="Scroll the viewport to next occurrence of the search string.",
)
def _find_next() -> str:
find_result = self.browser.find_next()
header, content = _browser_state()

if find_result is None:
return header.strip() + "\n=======================\nThe search string was not found on this page."
else:
return header.strip() + "\n=======================\n" + content

if self.summarization_client is not None:

@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="answer_from_page",
name="read_page_and_answer",
description="Uses AI to read the page and directly answer a given question based on the content.",
)
def _answer_from_page(
def _read_page_and_answer(
question: Annotated[Optional[str], "The question to directly answer."],
url: Annotated[Optional[str], "[Optional] The url of the page. (Defaults to the current page)"] = None,
) -> str:
Expand Down Expand Up @@ -256,7 +327,7 @@ def _summarize_page(
Optional[str], "[Optional] The url of the page to summarize. (Defaults to current page)"
] = None,
) -> str:
return _answer_from_page(url=url, question=None)
return _read_page_and_answer(url=url, question=None)

def generate_surfer_reply(
self,
Expand Down
Loading
Loading