Build powerful micro-agents that observe your digital world, remember what matters, and react intelligentlyβall while keeping your data 100% private and secure.
|
Screen OCR & Vision |
Camera Visual Input |
Audio Mic + Computer |
Memory Text + Image |
|
π§ Text & Visual Memory Store and retrieve images or text intelligently. π₯ Smart Screen Recording Start recording when something happens or analyze clips with custom labels πΎ Persistent Context Agents remember what matters across sessions |
π§ Email β’ π¬ Discord β’ π± Telegram π SMS β’ π WhatsApp π₯οΈ System Alerts Native OS notifications and pop-ups πΊ Observer Overlay Custom on-screen messages |
Creating your own Observer AI agent is simple, and consist of three things:
- SENSORS - input that your model will have
- MODELS - models run by ollama or by Ob-Server
- TOOLS - functions for your model to use
- Navigate to the Agent Dashboard and click "Create New Agent"
- Fill in the "Configuration" tab with basic details (name, description, model, loop interval)
- Give your model a system prompt and Sensors! The current Sensors that exist are:
- Screen OCR ($SCREEN_OCR) Captures screen content as text via OCR
- Screenshot ($SCREEN_64) Captures screen as an image for multimodal models
- Agent Memory ($MEMORY@agent_id) Accesses agents' stored information
- Agent Image Memory ($IMEMORY@agent_id) Accesses agents' stored images
- Clipboard ($CLIPBOARD) It pastes the clipboard contents
- Microphone* ($MICROPHONE) Captures the microphone and adds a transcription
- Screen Audio* ($SCREEN_AUDIO) Captures the audio transcription of screen sharing a tab.
- All audio* ($ALL_AUDIO) Mixes the microphone and screen audio and provides a complete transcription of both (used for meetings).
* Uses a whisper model with transformers.js
Agent Tools:
getMemory(agentId)*β Retrieve stored memorysetMemory(agentId, content)*β Replace stored memoryappendMemory(agentId, content)*β Add to existing memorygetImageMemory(agentId)*- Retrieve images stored in memorysetImageMemory(agentId, images)- Set images to memoryappendImageMemory(agentId, images)- Add images to memorystartAgent(agentId)*β Starts an agentstopAgent(agentId)*β Stops an agenttime()- Gets current timesleep(ms)- Waits that ammount of miliseconds
Notification Tools:
sendEmail(email, message, images?)- Sends an emailsendPushover(user_token, message, images?, title?)- Sends a pushover notification.sendDiscord(discord_webhook, message, images?)Sends a discord message to a server.sendTelegram(chat_id, message, images?)Sends a telegram message with the Observer bot. Get the chat_id messaging the bot @observer_notification_bot.sendWhatsapp(phone_number, message)- Sends a whatsapp message with the Observer bot. Send a message first to +1 (555)783-4727 to use.notify(title, options)β Send browser notificationβ οΈ IMPORTANT: Some browsers block notificationssendSms(phone_number, message, images?)- Sends an SMS to a phone number, format as e.g. sendSms("hello",+181429367").β οΈ IMPORTANT : Due to A2P policy, some SMS messages are being blocked, not recommended for US/Canada.
Video Recording Tools:
startClip()- Starts a recording of any video media and saves it to the recording Tab.stopClip()- Stops an active recordingmarkClip(label)- Adds a label to any active recording that will be displayed in the recording Tab.
App Tools:
ask(question, title="Confirmation")- Pops up a system confirmation dialogmessage(message, title="Agent Message")- Pops up a system messagesystem_notify(body, title="Observer AI")- Sends a system notificationoverlay(body)- Pushes a message to the overlayclick()- Triggers a mouse click at the current cursor positionβ οΈ IMPORTANT: Position mouse before agent runs
The "Code" tab receives the following variables as context before running:
response - The model's response
agentId - The id of the agent running the code
screen - The screen as base64 if captured
camera - The camera as base64 if captured
imemory - The agent's current image array
images - All images in context
JavaScript agents run in the browser sandbox, making them ideal for passive monitoring and notifications:
// Remove Think tags for deepseek model
const cleanedResponse = response.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
// Get time
const time = time();
// Update memory with timestamp
appendMemory(`[${time}] ${cleanedResponse}`);
// Send to Telegram for notekeeping
sendTelegram(cleanedResponse, "12345678") // Example chat_idNote: any function marked with
*takes anagentIdargument.
If you omitagentId, it defaults to the agent thatβs running the code.
There are a few ways to get Observer up and running with local inference. I recommend the Observer App.
Option 1: Just Install the Desktop App with any OpenAI compatible endpoint (Ollama, llama.cpp, vLLM)
Download Ollama for the best compatibility.
Observer can connect directly to any server that provides a v1/chat/completions endpoint.
Set the Custom Model Server URL on the App to vLLM, llama.cpp or any OpenAI compatible endpoint if not using Ollama.
nuevo_compressed.mp4
β¨ Major Update: Simpler Setup & More Flexibility! The
observer-ollamaservice no longer requires SSL by default. This means no more browser security warnings for a standard local setup! It now also supports any backend that uses a standard OpenAI-compatible (v1/chat/completions) endpoint, like Llama.cpp.
This method uses Docker Compose to run everything you need in containers: the Observer WebApp, the observer-ollama translator, and a local Ollama instance. This is the easiest way to get a 100% private, local-first setup.
Prerequisites:
- Docker installed.
- Docker Compose installed.
Instructions:
-
Clone the repository and start the services:
git clone https://github.com/Roy3838/Observer.git cd Observer/docker docker-compose up --build -
Access the Local WebApp:
- Open your browser to
http://localhost:8080. This is your self-hosted version of the Observer app.
- Open your browser to
-
Connect to your Ollama service:
- In the app's header/settings, set the Model Server Address to
http://localhost:3838. This is theobserver-ollamatranslator that runs in a container and communicates with Ollama for you.
- In the app's header/settings, set the Model Server Address to
-
Pull Ollama Models:
- Navigate to the "Models" tab and click "Add Model". This opens a terminal to your Ollama instance.
- Pull any model you need, for example:
ollama run gemma3:4b # <- highly recommended model!
For NVIDIA GPUs: it's recommended to edit docker/docker-compose.yml and explicitly add gpu runtime to the ollama docker container.
Add these to the ollama section of docker/docker-compose.yml:
volumes:
- ollama_data:/root/.ollama
# ADD THIS SECTION
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
# UP TO HERE
ports:
- "11434:11434"
To Stop the Docker Setup:
cd docker && docker-compose downTo customize your setup (e.g., enable SSL to access from app.observer-ai.com, disabling docker exec feature), simply edit the environment: section in your docker/docker-compose.yml file. All options are explained with comments directly in the file.
Python agents run on a Jupyter server with system-level access, enabling them to interact directly with your computer:
#python <-- don't remove this!
print("Hello World!", response, agentId)
# Example: Analyze screen content and take action
if "SHUTOFF" in response:
# System level commands can be executed here
import os
# os.system("command") # Be careful with system commands!To use Python agents:
- Run a Jupyter server on your machine with c.ServerApp.allow_origin = '*'
- Configure the connection in the Observer AI interface:
- Host: The server address (e.g., 127.0.0.1)
- Port: The server port (e.g., 8888)
- Token: Your Jupyter server authentication token
- Test the connection using the "Test Connection" button
- Switch to the Python tab in the code editor to write Python-based agents
Save your agent, test it from the dashboard, and export the configuration to share with others!
We welcome contributions from the community! Here's how you can help:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- GitHub: @Roy3838
- Project Link: https://observer-ai.com
Built with β€οΈ by Roy Medina for the Observer AI Community Special thanks to the Ollama team for being an awesome backbone to this project!
