Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added Browserbase fetcher #221

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions integrations/browserbase.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
layout: integration
name: Browserbase
description: Use Browserbase headless browsers with Haystack
authors:
- name: Browserbase
socials:
github: https://github.com/browserbase
twitter: https://twitter.com/browserbasehq
linkedin: https://www.linkedin.com/company/browserbasehq
pypi: https://pypi.org/project/browserbase-haystack
repo: https://github.com/browserbase/haystack
report_issue: https://github.com/browserbase/haystack/issues
type: Data Ingestion
logo: /logos/browserbase.png
version: Haystack 2.0
---

# Browserbase Haystack Fetcher

[Browserbase](https://browserbase.com) is a serverless platform for running headless browsers, it offers advanced debugging, session recordings, stealth mode, integrated proxies and captcha solving.

## Installation and setup

- Get an API key from [browserbase.com](https://browserbase.com) and set it in environment variables (`BROWSERBASE_API_KEY`).
- Install the required dependencies:

```
pip install browserbase-haystack
```

## Usage

You can load webpages into Haystack using `BrowserbaseFetcher`. Optionally, you can set `text_content` parameter to convert the pages to text-only representation.

### Standalone

```py
from browserbase_haystack import BrowserbaseFetcher

browserbase_fetcher = BrowserbaseFetcher()
browserbase_fetcher.run(urls=["https://example.com"], text_content=False)
```

### In a pipeline

```py
from browserbase_haystack import BrowserbaseFetcher
from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder

prompt_template = (
"Tell me the titles of the given pages. Pages: {{ documents }}"
)
prompt_builder = PromptBuilder(template=prompt_template)
llm = OpenAIGenerator()

pipe = Pipeline()
pipe.add_component("fetcher", self.browserbase_fetcher)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's initialize this in this code block rather than referring to self.browserbase_fetcher

pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)

pipe.connect("fetcher.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")
result = pipe.run(data={"fetcher": {"urls": ["https://example.com"]}})
```
Binary file added logos/browserbase.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.