Skip to content

VideoFinder is an advanced video analysis tool powered by multimodal AI, designed to help users easily locate and identify specific objects or people within video content. By combining the capabilities of Llama Vision model with a streamlined web interface, it enables real-time, frame-by-frame video analysis with natural language descriptions.

License

Notifications You must be signed in to change notification settings

win4r/VideoFinder-Llama3.2-vision-Ollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VideoFinder

作者微信:stoeng

English | 中文

English

🔍 About VideoFinder

VideoFinder is an intelligent video analysis tool that leverages multimodal AI models to detect and locate specific objects or people in videos. Built with FastAPI and integrated with the Llama Vision model, it provides a user-friendly web interface for video analysis tasks.

✨ Features

  • Upload and analyze videos through an intuitive web interface
  • Real-time frame-by-frame analysis using multimodal AI
  • Natural language object description support
  • Visual results display with confidence scores
  • Image preprocessing for better detection accuracy
  • Streaming response for real-time analysis feedback

🚀 Getting Started

Prerequisites

  • Python 3.8+
  • Ollama with Llama Vision model installed
  • OpenCV

Installation

  1. Clone the repository
git clone https://github.com/win4r/VideoFinder-Llama3.2-vision-Ollama.git
cd VideoFinder
  1. Install dependencies
pip install -r requirements.txt
  1. Make sure Ollama is running with Llama Vision model
ollama run llama3.2-vision
  1. Start the application
python main.py
  1. Access the web interface at http://localhost:8000

🛠️ Usage

  1. Open the web interface
  2. Upload a video file
  3. Enter a description of the object/person you want to find
  4. Click "Start Analysis"
  5. View results as they appear in real-time

📦 Dependencies

  • FastAPI
  • OpenCV
  • Ollama
  • Jinja2
  • uvicorn

中文

🔍 关于 VideoFinder

VideoFinder 是一个智能视频分析工具,利用多模态AI模型来检测和定位视频中的特定物体或人物。该工具基于 FastAPI 构建,集成了 Llama Vision 模型,提供了友好的 Web 界面进行视频分析任务。

✨ 特性

  • 通过直观的网页界面上传和分析视频
  • 使用多模态 AI 进行实时逐帧分析
  • 支持自然语言目标描述
  • 可视化结果显示与置信度评分
  • 图像预处理以提高检测准确率
  • 流式响应实现实时分析反馈

🚀 快速开始

环境要求

  • Python 3.8+
  • 安装了 Llama Vision 模型的 Ollama
  • OpenCV

安装步骤

  1. 克隆仓库
git clone https://github.com/win4r/VideoFinder-Llama3.2-vision-Ollama.git
cd VideoFinder
  1. 安装依赖
pip install -r requirements.txt
  1. 确保 Ollama 已运行并加载 Llama Vision 模型
ollama run llama3.2-vision
  1. 启动应用
python main.py
  1. 访问 http://localhost:8000 打开 Web 界面

🛠️ 使用方法

  1. 打开 Web 界面
  2. 上传视频文件
  3. 输入要查找的目标描述
  4. 点击"开始分析"
  5. 实时查看分析结果

📦 依赖项

  • FastAPI
  • OpenCV
  • Ollama
  • Jinja2
  • uvicorn

About

VideoFinder is an advanced video analysis tool powered by multimodal AI, designed to help users easily locate and identify specific objects or people within video content. By combining the capabilities of Llama Vision model with a streamlined web interface, it enables real-time, frame-by-frame video analysis with natural language descriptions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published