-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add headless browser to the WebSurferAgent #1534
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 0.2 #1534 +/- ##
===========================================
+ Coverage 34.26% 48.00% +13.73%
===========================================
Files 42 45 +3
Lines 5099 5225 +126
Branches 1165 1261 +96
===========================================
+ Hits 1747 2508 +761
+ Misses 3209 2513 -696
- Partials 143 204 +61
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
First of all, this looks fantastic. Thanks so much for working on this. I will dig in and try this out as soon as possible today. We've discussed some possibilities on Discord already, but those are arguably future improvements. I think driving the browser, and using innerHTML are perfect first steps, and are completely sufficient for a first PR. One thing to test will be PDFs. In many benchmark scenarios, PDFs are the final document sought by browsing, but they don't have innerHTML. As a first step, simply downloading them to the Downloads folder might be sufficient. |
b8e400d
to
266a9ce
Compare
266a9ce
to
db79a1b
Compare
1499ae3
to
b0ab6c1
Compare
Just adding some notes while I continue to test this: To install chrome in Docker or WSL: wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb Resolve dependencies with: Then try this again: |
script.extract() | ||
|
||
# Convert to text | ||
text = soup.get_text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for now this should be:
text = markdownify.MarkdownConverter().convert_soup(soup)
GPT models reason about markdown well, and we want to preserve links etc.
if uri_or_path.startswith("bing:"): | ||
self._bing_search(uri_or_path[len("bing:") :].strip()) | ||
else: | ||
self.driver.get(uri_or_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the SimpleTextBrowser setting this would also process the content. I recommend we maintain this behavior (again, at least for now)
|
||
def _split_pages(self): | ||
# Split only regular pages | ||
if not self.address.startswith("http:") and not self.address.startswith("https:"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One intention here with SimpleTextBrowser is to not break up web search results across multiple viewports. In SimpleTextBrowser, that is indicated by "bing:" URLs, but this won't work here. I would recommend some other mechanism to ensure search results are kept whole to maintain the behavior.
def _bing_search(self, query): | ||
self.driver.get("https://www.bing.com") | ||
|
||
search_bar = self.driver.find_element(By.NAME, "q") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever, but maybe we can just navigate directly to "https://www.bing.com/search?q=" with the query provided in the url itself?
Also, an advantage of using the API is that we can render results to Markdown cleanly however we want. that may not happen here. I recommend using the API is we have a key available, otherwise reverting to this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to use implicit waits with driver.implicitly_wait(10)
.
Given the dynamic nature of the page, calling 126 right after 124 will cause NoSuchElementException.
"Note that as soon as the element is located, the driver will return the element reference and the code will continue executing, so a larger implicit wait value won’t necessarily increase the duration of the session.". So, here implicitly_wait(10) will not always wait for 10s. It will stop waiting as soon as the element is located.
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
This PR is against AutoGen 0.2. AutoGen 0.2 has been moved to the 0.2 branch. Please rebase your PR on the 0.2 branch or update it to work with the new AutoGen 0.4 that is now in main. |
@vijaykramesh closing as stale, please update addressing reviews if you would like to reopen |
Why are these changes needed?
This adds a simple selenium driven headless browser to the WebSurferAgent. It does not yet make this headless browser agent multi-modal (e.g., it can't do anything with images in the browser) but this should work much better for javascript powered websites than the current SimpleTextBrowser implementation.
This also refactors the existing SImpleTextBrowser implementation a bit to have both share a common base class, and adds unit tests across both the existing and new HeadlessChromeBrowser implementation.
https://app.codecov.io/github/microsoft/autogen/pull/1534 shows the coverage additions there
Related issue number
Closes #1481
Checks