Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,18 @@ dmypy.json
# Cython debug symbols
cython_debug/

# cursor
.cursor/

# docker
docker/

# scripts
scripts/

# tests
tests/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Gitignore Excludes Essential Project Directories

The .gitignore file now excludes scripts/, examples/, and tests/. These directories contain essential project files, and ignoring them prevents them from being tracked by Git.

Fix in Cursor Fix in Web

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be added to the global gitignore or merged into this project gitignore. For a PyCharm
Expand Down Expand Up @@ -226,7 +238,6 @@ Dockerfile.dev

# Kubernetes
k8s/
*.yaml
*.yml
!configs/*.yaml
!configs/*.yml
143 changes: 143 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,146 @@
# vLLM-omni: Multi-modal Extension for vLLM

vLLM-omni is designed to extend vLLM capabilities to support multi-modality model inference and serving, particularly focusing on non-autoregressive structures and non-textual outputs.

## 🎯 Overview

Traditional vLLM systems are limited to text-based, autoregressive generation. vLLM-omni addresses this limitation by enabling support for:

- **Multi-modal Models**: Text, image, video, audio, and sensor data processing
- **Non-autoregressive Architectures**: Diffusion Transformers (DiT) and other parallel generation models
- **Heterogeneous Outputs**: Beyond traditional text generation to structured, binary, and streaming outputs

## 🏗️ Architecture

vLLM-omni is built on a modular architecture that extends vLLM's core functionality:


## 🚀 Key Features

### Multi-Engine Support

- **Autoregressive Engine**: Traditional text generation with enhanced KV-caching
- **Diffusion Engine**: Support for DiT models and iterative generation
- **Hybrid Engine**: Combined AR+DiT processing pipelines

### Modality Processing

- **Text**: Advanced tokenization and embedding generation
- **Image**: Vision encoder integration (CLIP, etc.)
- **Audio**: Speech processing and audio embedding
- **Video**: Frame-by-frame and temporal processing
- **Sensor**: IoT and sensor data interpretation

### Output Formats

- **Structured Data**: JSON, XML, and custom formats
- **Binary Outputs**: Images, audio, and video generation
- **Streaming**: Real-time progressive generation
- **Multipart**: Combined multi-modal responses

## 📋 Supported Models

### AR + Diffusion Transformer (DiT) Models
- Qwen-Image (Image generation and editing)
- Qwen-omni (Thinker-Talker-Codec structure)
- Custom DiT and hiybrid architectures
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in 'hiybrid'. It should be 'hybrid'.

Suggested change
- Custom DiT and hiybrid architectures
- Custom DiT and hybrid architectures


## 🛠️ Installation

### Quick Start

#### Option 1: Docker (Recommended for macOS)

```bash
# Clone the repository
git clone https://github.com/hsliuustc0106/vllm-omni.git
cd vllm-omni

# Run the automated Docker setup
./scripts/docker-setup-macos.sh
```

#### Option 2: Local Installation

```bash
# Clone the repository
git clone https://github.com/hsliuustc0106/vllm-omni.git
cd vllm-omni

# Run the installation script
./install.sh
```

### Prerequisites

- Python 3.11+ (recommended)
- Conda or Miniconda
- Git
- CUDA 11.8+ (for GPU acceleration) or CPU-only installation

### Installation Methods

#### Method 1: Automated Installation (Recommended)
```bash
# Using shell script
./install.sh

# Or using Python script
python install.py
```

#### Method 2: Manual Installation
```bash
# Create conda environment
conda create -n vllm_omni python=3.11 -y
conda activate vllm_omni

# Install PyTorch (CPU or GPU)
pip install torch>=2.7 --index-url https://download.pytorch.org/whl/cpu # CPU
# pip install torch>=2.7 --index-url https://download.pytorch.org/whl/cu121 # GPU

# Install dependencies
pip install -r requirements.txt
pip install "vllm>=0.10.2"

# Install vLLM-omni
pip install -e .
```

### Verify Installation

```bash
# Test the installation
python test_installation.py

# Test basic functionality
python -c "import vllm_omni; print('Ready!')"

# Test CLI
vllm --help
```

For detailed installation instructions, see [INSTALL.md](INSTALL.md).

## 📥 Model Download

Models are automatically downloaded when first used, or you can pre-download them:

```bash
# Check downloaded models
python scripts/download_models.py --check-cache

# Download all default models
python scripts/download_models.py --all

# Download specific models
python scripts/download_models.py --ar-models Qwen/Qwen3-0.6B
python scripts/download_models.py --dit-models stabilityai/stable-diffusion-2-1
```

**Model Storage Location:**
- Default: `~/.cache/huggingface/hub/`
- AR models: 100MB - 1GB each
- DiT models: 2GB - 7GB each

For detailed model management, see [MODEL_DOWNLOAD_GUIDE.md](docs/MODEL_DOWNLOAD_GUIDE.md).
5 changes: 0 additions & 5 deletions docs/README.md

This file was deleted.

Loading