Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 60 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

**Fara-7B** is Microsoft's first **agentic small language model (SLM)** designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.

Try Fara-7B locally as follows (see [Installation](##Installation) for detailed instructions) or via Magentic-UI:
Try Fara-7B locally as follows (see [Installation](#Installation) for detailed instructions on Windows ) or via Magentic-UI:

```bash
# 1. Clone repository
Expand All @@ -44,7 +44,7 @@ To try Fara-7B inside Magentic-UI, please follow the instructions here [Magentic


Notes:
- If you're using Windows, we highly recommend using WSL2 (Windows Subsystem for Linux).
- If you're using Windows, we highly recommend using WSL2 (Windows Subsystem for Linux). Please the Windows instructions in the [Installation](#Installation) section.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please the Windows instructions in the Installation section

Missing word?
Please FOLLOW the Windows instructions...?
Please VIEW the Windows instructions...?

- You might need to do `--tensor-parallel-size 2` with vllm command if you run out of memory

<table>
Expand Down Expand Up @@ -156,27 +156,45 @@ Our evaluation setup leverages:

---

## Installation
# Installation

Install the package using either UV or pip:

```bash
uv sync --all-extras
```
## Linux

The following instructions are for Linux systems, see the Windows section below for Windows instructions.

or
Install the package using pip and set up the environment with Playwright:

```bash
pip install -e .
# 1. Clone repository
git clone https://github.com/microsoft/fara.git
cd fara

# 2. Setup environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[vllm]
playwright install
```

Then install Playwright browsers:
Note: If you plan on hosting with Azure Foundry only, you can skip the `[vllm]` and just do `pip install -e .`


## Windows

For Windows, we highly recommend using WSL2 (Windows Subsystem for Linux) to provide a Linux-like environment. However, if you prefer to run natively on Windows, follow these steps:

```bash
playwright install
```
# 1. Clone repository
git clone https://github.com/microsoft/fara.git
cd fara

---
# 2. Setup environment
python3 -m venv .venv
.venv\Scripts\activate
pip install -e .
python3 -m playwright install
```

## Hosting the Model

Expand All @@ -189,74 +207,66 @@ Deploy Fara-7B on [Azure Foundry](https://ai.azure.com/explore/models/Fara-7B/ve
**Setup:**

1. Deploy the Fara-7B model on Azure Foundry and obtain your endpoint URL and API key
2. Add your endpoint details to the existing `endpoint_configs/` directory (example configs are already provided):

```bash
# Edit one of the existing config files or create a new one
# endpoint_configs/fara-7b-hosting-ansrz.json (example format):
Then create a endpoint configuration JSON file (e.g., `azure_foundry_config.json`):

```json
{
"model": "Fara-7B",
"base_url": "https://your-endpoint.inference.ml.azure.com/",
"api_key": "YOUR_API_KEY_HERE"
}
```

3. Run the Fara agent:
Then you can run Fara-7B using this endpoint configuration.

2. Run the Fara agent:

```bash
fara-cli --task "how many pages does wikipedia have" --start_page "https://www.bing.com"
fara-cli --task "how many pages does wikipedia have" --endpoint_config azure_foundry_config.json [--headful]
```

That's it! No GPU or model downloads required.
Note: you can also specify the endpoint config with the args `--base_url [your_base_url] --api_key [your_api_key] --model [your_model_name]` instead of using a config JSON file.

### Self-hosting with VLLM

If you have access to GPU resources, you can self-host Fara-7B using VLLM. This requires a GPU machine with sufficient VRAM.

All that is required is to run the following command to start the VLLM server:
Note: If you see an error that the `fara-cli` command is not found, then try:

```bash
vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
python -m fara.run_fara --task "what is the weather in new york now"
```

### Testing the Fara Agent
That's it! No GPU or model downloads required.

Run the test script to see Fara in action:
### Self-hosting with vLLM or LM Studio / Ollama

**If you have access to GPU resources, you can self-host Fara-7B using vLLM. This requires a GPU machine with sufficient VRAM (e.g., 24GB or more).**

Only on Linux: all that is required is to run the following command to start the VLLM server:

```bash
fara-cli --task "how many pages does wikipedia have" --start_page "https://www.bing.com" --endpoint_config endpoint_configs/azure_foundry_config.json [--headful] [--downloads_folder "/path/to/downloads"] [--save_screenshots] [--max_rounds 100] [--browserbase]
vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
```
For quantized models or lower VRAM GPUs, please see [Fara-7B GGUF on HuggingFace](https://huggingface.co/bartowski/microsoft_Fara-7B-GGUF).

In self-hosting scenario the `endpoint_config` points to `endpoint_configs/vllm_config.json` from the VLLM server above.
** For Windows/Mac, vLLM is not natively supported. You can use WSL2 on Windows to run the above command or LM Studio / Ollama as described below. **

If you set `--browserbase`, export environment variables for the API key and project ID.
Otherwise, you can use [LM Studio](https://lmstudio.ai/) or [Ollama](https://ollama.com/) to host the model locally. We currently recommend the following GGUF versions of our models [Fara-7B GGUF on HuggingFace](https://huggingface.co/bartowski/microsoft_Fara-7B-GGUF) for use with LM Studio or Ollama. Select the largest model that fits your GPU. Please ensure that context length is set to at least 15000 tokens and temperature to 0 for best results.

#### Expected Output

```
Initializing Browser...
Browser Running... Starting Fara Agent...
##########################################
Task: how many pages does wikipedia have
##########################################
Running Fara...
Then you can run Fara-7B pointing to your local server:

Run the test script to see Fara in action:

Thought #1: To find the current number of Wikipedia pages, I'll search for the latest Wikipedia page count statistics.
Action #1: executing tool 'web_search' with arguments {"action": "web_search", "query": "Wikipedia total number of articles"}
Observation#1: I typed 'Wikipedia total number of articles' into the browser search bar.
```bash
fara-cli --task "what is the weather in new york now"
```

Thought #2: Wikipedia currently has 7,095,446 articles.
Action #2: executing tool 'terminate' with arguments {"action": "terminate", "status": "success"}
Observation#2: Wikipedia currently has 7,095,446 articles.
If you didn't use vLLM to host, please specify the correct `--base_url [your_base_url] --api_key [your_api_key] --model [your_model_name]`

Final Answer: Wikipedia currently has 7,095,446 articles.
If you see an error that the `fara-cli` command is not found, then try:

Enter another task (or press Enter to exit):
```bash
python -m fara.run_fara --task "what is the weather in new york now"
```

---

# Reproducibility

We provide a framework in `webeval/` to reproduce our results on WebVoyager and OnlineMind2Web.
Expand Down
8 changes: 7 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,20 @@ dependencies = [
"pyyaml",
"jsonschema",
"browserbase",
"vllm>=0.10.0"
]




[project.urls]
Homepage = "https://github.com/microsoft/fara"
Repository = "https://github.com/microsoft/fara"
Issues = "https://github.com/microsoft/fara/issues"

[project.optional-dependencies]
vllm = ["vllm>=0.10.0"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm crashes windows install, so kept optional

lmstudio = ["lmstudio"]
ollama = ["ollama"]

[project.scripts]
fara-cli = "fara.run_fara:main"
Expand Down
9 changes: 5 additions & 4 deletions src/fara/browser/browser_bb.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import subprocess
import time
from typing import Any, Dict, Optional, Callable

import platform
import browserbase
from browserbase import Browserbase
from playwright.async_api import (
Expand Down Expand Up @@ -48,7 +48,7 @@ def __init__(
self.single_tab_mode = single_tab_mode
self.use_browser_base = use_browser_base
self.logger = logger or logging.getLogger("browser_manager")

self.is_linux = platform.system() == "Linux"
self._viewport_height = viewport_height
self._viewport_width = viewport_width

Expand Down Expand Up @@ -194,7 +194,8 @@ async def delayed_resume():

async def _init_regular_browser(self, channel: str = "chromium") -> None:
"""Initialize regular browser according to the specified channel."""
if not self.headless:
if not self.headless and self.is_linux:
print("STARTING XVFB")
self.start_xvfb()

launch_args: Dict[str, Any] = {"headless": self.headless}
Expand All @@ -218,7 +219,7 @@ async def _init_regular_browser(self, channel: str = "chromium") -> None:

async def _init_persistent_browser(self) -> None:
"""Initialize persistent browser with data directory."""
if not self.headless:
if not self.headless and self.is_linux:
self.start_xvfb()

launch_args: Dict[str, Any] = {"headless": self.headless}
Expand Down
15 changes: 10 additions & 5 deletions src/fara/fara_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import asyncio
from .browser.playwright_controller import PlaywrightController
from ._prompts import get_computer_use_system_prompt
from .types import (
from .fara_types import (
LLMMessage,
SystemMessage,
UserMessage,
Expand Down Expand Up @@ -379,15 +379,20 @@ async def run(self, user_message: str) -> Tuple:
thoughts, action_dict = self._parse_thoughts_and_action(raw_response)
action_args = action_dict.get("arguments", {})
action = action_args["action"]
self.logger.info(f"\nThought #{i+1}: {thoughts}\nAction #{i+1}: executing tool '{action}' with arguments {json.dumps(action_args)}")

self.logger.debug(
f"\nThought #{i+1}: {thoughts}\nAction #{i+1}: executing tool '{action}' with arguments {json.dumps(action_args)}"
)
print(
f"\nThought #{i+1}: {thoughts}\nAction #{i+1}: executing tool '{action}' with arguments {json.dumps(action_args)}"
)
(
is_stop_action,
new_screenshot,
action_description,
) = await self.execute_action(function_call)
all_observations.append(action_description)
self.logger.info(f"Observation#{i+1}: {action_description}")
self.logger.debug(f"Observation#{i+1}: {action_description}")
print(f"Observation#{i+1}: {action_description}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

windows dont show the logger so added print

if is_stop_action:
final_answer = thoughts
break
Expand Down Expand Up @@ -564,7 +569,7 @@ async def execute_action(
elif args["action"] == "pause_and_memorize_fact":
fact = str(args.get("fact"))
self._facts.append(fact)
action_description= f"I memorized the following fact: {fact}"
action_description = f"I memorized the following fact: {fact}"
elif args["action"] == "stop" or args["action"] == "terminate":
action_description = args.get("thoughts")
is_stop_action = True
Expand Down
File renamed without changes.
50 changes: 40 additions & 10 deletions src/fara/run_fara.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import asyncio
import argparse
import os
from fara import FaraAgent
from fara.browser.browser_bb import BrowserBB
from .fara_agent import FaraAgent
from .browser.browser_bb import BrowserBB
import logging
from typing import Dict
from pathlib import Path
Expand All @@ -11,8 +11,8 @@

# Configure logging to only show logs from fara.fara_agent
logging.basicConfig(
level=logging.CRITICAL,
format="%(message)s",
level=logging.CRITICAL,
format="%(message)s",
)

# Enable INFO level only for fara.fara_agent
Expand Down Expand Up @@ -159,21 +159,51 @@ def main():
default=None,
help="Path to the endpoint configuration JSON file. By default, tries local vllm on 5000 port",
)
parser.add_argument(
"--api_key",
type=str,
default=None,
help="API key for the model endpoint (overrides endpoint_config)",
)
parser.add_argument(
"--base_url",
type=str,
default=None,
help="Base URL for the model endpoint (overrides endpoint_config)",
)
parser.add_argument(
"--model",
type=str,
default=None,
help="Model name to use (overrides endpoint_config)",
)

args = parser.parse_args()

if args.browserbase:
assert os.environ.get("BROWSERBASE_API_KEY"), (
"BROWSERBASE_API_KEY environment variable must be set to use browserbase"
)
assert os.environ.get("BROWSERBASE_PROJECT_ID"), (
"BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID environment variables must be set to use browserbase"
)
assert os.environ.get(
"BROWSERBASE_API_KEY"
), "BROWSERBASE_API_KEY environment variable must be set to use browserbase"
assert os.environ.get(
"BROWSERBASE_PROJECT_ID"
), "BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID environment variables must be set to use browserbase"

endpoint_config = DEFAULT_ENDPOINT_CONFIG
if args.endpoint_config:
with open(args.endpoint_config, "r") as f:
endpoint_config = json.load(f)
assert (
"api_key" in endpoint_config
and "base_url" in endpoint_config
and "model" in endpoint_config
), "endpoint_config file must contain api_key, base_url, and model fields"
# Override with command-line arguments if provided
if args.api_key:
endpoint_config["api_key"] = args.api_key
if args.base_url:
endpoint_config["base_url"] = args.base_url
if args.model:
endpoint_config["model"] = args.model

asyncio.run(
run_fara_agent(
Expand Down
Loading