A modern web interface for the Moondream vision language model, built with Next.js and FastAPI. This project provides a user-friendly way to interact with images using Moondream's vision-language capabilities.
- π Light/Dark Mode: Automatic theme switching with system preference detection
- πΌοΈ Drag-and-Drop Upload: Easy image uploading with drag-and-drop support
- π¬ Interactive Q&A: Ask questions about uploaded images through a chat interface
- π Smooth Animations: Beautiful transitions powered by Framer Motion
- π Privacy-First: All processing happens locally on your machine
- π± Responsive Design: Optimized for all devices and screen sizes
- β‘ CUDA Support: GPU acceleration for faster inference
- π¨ Modern UI: Built with Next.js, Tailwind CSS, and Framer Motion
- Theme System: Light/dark mode with system preference detection
- Image Upload Component: Drag-and-drop image handling and preview
- Chat Interface: Interactive Q&A about uploaded images
- State Management: Maintains image and chat state
- API Integration: Communicates with FastAPI backend
- Model Management: Loads and manages Moondream model
- Image Processing: Handles image encoding and caching
- Q&A System: Processes questions about encoded images
- Memory Management: Cleans up cached encodings
- User uploads image via frontend
- Image sent to
/describe
endpoint - Backend encodes image and caches encoding
- Model generates description
- Frontend displays description and enables Q&A
- User types question in chat interface
- Question sent to
/ask
endpoint with image key - Backend retrieves cached encoding
- Model generates answer
- Frontend displays response in chat
# System Requirements
- Python 3.8+
- Node.js 16+
- CUDA-capable GPU (recommended)
- 8GB+ RAM
# Python Dependencies
pip install transformers einops torch fastapi uvicorn python-multipart pillow
# Node.js Dependencies
npm install axios framer-motion @radix-ui/react-slot formidable
# Clone repository
git clone [repository-url]
cd moondream-web
# Install Python dependencies
pip install -r requirements.txt
# Start FastAPI server
uvicorn app:app --host 127.0.0.1 --port 8000 --reload
# Navigate to frontend directory
cd moondream-web
# Install dependencies
npm install
# Start development server
npm run dev
Handles initial image upload and description
-
Input: Image file (multipart/form-data)
-
Output:
{ "description": "Generated description of the image", "image_key": "Unique key for cached encoding" }
Handles questions about previously uploaded images
-
Input:
{ "question": "User's question about the image", "image_key": "Key from previous describe call" }
-
Output:
{ "answer": "Model's answer to the question" }
System health and status check
-
Output:
{ "status": "healthy", "model_loaded": true, "tokenizer_loaded": true, "cuda_available": true, "device": "cuda:0" }
- Uses next-themes for theme management
- Automatically detects system color scheme preference
- Smooth transitions between light and dark modes
- Persists user theme preference
- Stores encoded images in memory using unique timestamps
- Enables fast subsequent Q&A without re-encoding
- Automatically cleans up on server restart
- Frontend displays user-friendly error messages
- Backend provides detailed error logging
- Graceful fallbacks for common failure cases
- Uses torch.float16 for reduced memory usage
- CUDA acceleration when available
- Efficient image encoding caching
- Streaming responses for large payloads
moondream-web/
βββ src/
β βββ components/
β β βββ ui/ # Reusable UI components
β β βββ ImageUpload.tsx # Image upload component
β β βββ Chat.tsx # Chat interface component
β βββ pages/
β β βββ api/
β β β βββ ask.ts # Question handling endpoint
β β β βββ api.ts # API utilities
β β βββ _app.tsx # App configuration
β β βββ _document.tsx # Document configuration
β β βββ index.tsx # Main page
β βββ styles/
β βββ globals.css # Global styles
βββ public/ # Static assets
βββ app.py # FastAPI backend
- Make changes to frontend or backend
- Backend auto-reloads with uvicorn
- Frontend hot-reloads with Next.js
- Test changes in development environment
-
CUDA Memory Errors
- Reduce batch size
- Close other GPU applications
- Monitor memory usage with
nvidia-smi
-
Connection Errors
- Verify FastAPI server is running
- Check correct ports are open
- Ensure correct IP addresses (127.0.0.1)
-
Image Processing Errors
- Verify supported image formats
- Check image file size
- Monitor server logs
- Fork the repository
- Create feature branch
- Implement changes
- Add tests if applicable
- Submit pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built on Moondream model
- UI components from shadcn/ui
- Animations by Framer Motion
- Theme system by next-themes
Created with β€οΈ for the AI community