VisionBot: AI Visual Assistance System

VisionBot is an AI-powered visual assistance system designed to help visually impaired individuals by providing real-time descriptions of their surroundings. The system uses an ESP32-CAM microcontroller to capture images, which are processed by a custom Flask-based web API. The API utilizes the Gemini LLM to generate detailed descriptions of the images.

Features

Image Capture: Captures images with an ESP32-CAM when a button is pressed.
Real-Time Descriptions: Provides accurate and detailed descriptions of images using the Gemini LLM.
Custom Flask API: A Flask-based web server handles image processing and communicates with the Gemini Python API.
AI Integration: Leverages the power of the Gemini LLM for natural language processing and image description generation.

System Components

ESP32-CAM MCU: A microcontroller unit that captures images upon button press.
Flask Web API: A server that receives the captured image, processes it, and returns a description.
Gemini LLM: A language model that generates descriptions of the images received by the server.

Installation

Clone the Repository:

git clone https://github.com/yourusername/VisionBot.git
cd VisionBot

Set Up the Flask Server:
- Install required dependencies:
```
pip install -r requirements.txt
```
- Start the Flask server:
```
python web_api.py
```
Configure ESP32-CAM:
- Flash the provided firmware to the ESP32-CAM.
- Ensure the MCU is connected to the correct Wi-Fi network and can communicate with the Flask server.

Usage

Capture Image: Press the button on the ESP32-CAM to capture an image.
Process Image: The image is sent to the Flask API, which processes it using the Gemini LLM.
Receive Description: The system returns a detailed description of the captured image, providing visual assistance.

Potential Applications

VisionBot can be used to assist visually impaired individuals by providing real-time descriptions of their environment, helping them navigate and understand their surroundings better.

Contributing

Contributions are welcome! Please fork this repository and submit a pull request for any feature enhancements or bug fixes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any inquiries or support, please contact Md Yasir Khan at [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
camera_pins.h		camera_pins.h
requirements.txt		requirements.txt
visionbot.ino		visionbot.ino
web_api.py		web_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionBot: AI Visual Assistance System

Features

System Components

Installation

Usage

Potential Applications

Contributing

License

Contact

About

Releases

Packages

Languages

confused-soul/VisionBot

Folders and files

Latest commit

History

Repository files navigation

VisionBot: AI Visual Assistance System

Features

System Components

Installation

Usage

Potential Applications

Contributing

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages