-
Notifications
You must be signed in to change notification settings - Fork 833
init main repo structure and demonstrate the AR + DiT demo for omni models #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
8dbe6db
init request.py
hsliuustc0106 0c5caf6
scheduler api design
hsliuustc0106 e12761e
update readme
hsliuustc0106 c1da072
del api md
hsliuustc0106 51d2570
feat: Complete vLLM-omni implementation with conda environment setup
hsliuustc0106 afe5c26
feat: enhance serving functionality and add comprehensive examples
hsliuustc0106 e58e62f
Update docs/architecture/detailed arch design.md
hsliuustc0106 91f60be
Update docs/architecture/detailed arch design.md
hsliuustc0106 ffeb0dc
Update docs/architecture/detailed arch design.md
hsliuustc0106 e8b9425
refactor: move omni_llm.py to entrypoints directory
hsliuustc0106 bd43ce4
feat: add comprehensive testing scripts for vLLM-omni
hsliuustc0106 f403cec
mv omni_llm.py to entrypoints
hsliuustc0106 ff9c7d4
update gitignore
hsliuustc0106 81f04d3
update ar_dit_test
hsliuustc0106 4435d58
update diffusion_engine
hsliuustc0106 f8a6606
fix diffusion engine
hsliuustc0106 c330e5e
del unnessearcy files
hsliuustc0106 7725831
del reamds
hsliuustc0106 721833c
del
hsliuustc0106 05d2367
change omni.py to serve.py and mv to cli
hsliuustc0106 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,146 @@ | ||
| # vLLM-omni: Multi-modal Extension for vLLM | ||
|
|
||
| vLLM-omni is designed to extend vLLM capabilities to support multi-modality model inference and serving, particularly focusing on non-autoregressive structures and non-textual outputs. | ||
|
|
||
| ## 🎯 Overview | ||
|
|
||
| Traditional vLLM systems are limited to text-based, autoregressive generation. vLLM-omni addresses this limitation by enabling support for: | ||
|
|
||
| - **Multi-modal Models**: Text, image, video, audio, and sensor data processing | ||
| - **Non-autoregressive Architectures**: Diffusion Transformers (DiT) and other parallel generation models | ||
| - **Heterogeneous Outputs**: Beyond traditional text generation to structured, binary, and streaming outputs | ||
|
|
||
| ## 🏗️ Architecture | ||
|
|
||
| vLLM-omni is built on a modular architecture that extends vLLM's core functionality: | ||
|
|
||
|
|
||
| ## 🚀 Key Features | ||
|
|
||
| ### Multi-Engine Support | ||
|
|
||
| - **Autoregressive Engine**: Traditional text generation with enhanced KV-caching | ||
| - **Diffusion Engine**: Support for DiT models and iterative generation | ||
| - **Hybrid Engine**: Combined AR+DiT processing pipelines | ||
|
|
||
| ### Modality Processing | ||
|
|
||
| - **Text**: Advanced tokenization and embedding generation | ||
| - **Image**: Vision encoder integration (CLIP, etc.) | ||
| - **Audio**: Speech processing and audio embedding | ||
| - **Video**: Frame-by-frame and temporal processing | ||
| - **Sensor**: IoT and sensor data interpretation | ||
|
|
||
| ### Output Formats | ||
|
|
||
| - **Structured Data**: JSON, XML, and custom formats | ||
| - **Binary Outputs**: Images, audio, and video generation | ||
| - **Streaming**: Real-time progressive generation | ||
| - **Multipart**: Combined multi-modal responses | ||
|
|
||
| ## 📋 Supported Models | ||
|
|
||
| ### AR + Diffusion Transformer (DiT) Models | ||
| - Qwen-Image (Image generation and editing) | ||
| - Qwen-omni (Thinker-Talker-Codec structure) | ||
| - Custom DiT and hiybrid architectures | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| ## 🛠️ Installation | ||
|
|
||
| ### Quick Start | ||
|
|
||
| #### Option 1: Docker (Recommended for macOS) | ||
|
|
||
| ```bash | ||
| # Clone the repository | ||
| git clone https://github.com/hsliuustc0106/vllm-omni.git | ||
| cd vllm-omni | ||
|
|
||
| # Run the automated Docker setup | ||
| ./scripts/docker-setup-macos.sh | ||
| ``` | ||
|
|
||
| #### Option 2: Local Installation | ||
|
|
||
| ```bash | ||
| # Clone the repository | ||
| git clone https://github.com/hsliuustc0106/vllm-omni.git | ||
| cd vllm-omni | ||
|
|
||
| # Run the installation script | ||
| ./install.sh | ||
| ``` | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| - Python 3.11+ (recommended) | ||
| - Conda or Miniconda | ||
| - Git | ||
| - CUDA 11.8+ (for GPU acceleration) or CPU-only installation | ||
|
|
||
| ### Installation Methods | ||
|
|
||
| #### Method 1: Automated Installation (Recommended) | ||
| ```bash | ||
| # Using shell script | ||
| ./install.sh | ||
|
|
||
| # Or using Python script | ||
| python install.py | ||
| ``` | ||
|
|
||
| #### Method 2: Manual Installation | ||
| ```bash | ||
| # Create conda environment | ||
| conda create -n vllm_omni python=3.11 -y | ||
| conda activate vllm_omni | ||
|
|
||
| # Install PyTorch (CPU or GPU) | ||
| pip install torch>=2.7 --index-url https://download.pytorch.org/whl/cpu # CPU | ||
| # pip install torch>=2.7 --index-url https://download.pytorch.org/whl/cu121 # GPU | ||
|
|
||
| # Install dependencies | ||
| pip install -r requirements.txt | ||
| pip install "vllm>=0.10.2" | ||
|
|
||
| # Install vLLM-omni | ||
| pip install -e . | ||
| ``` | ||
|
|
||
| ### Verify Installation | ||
|
|
||
| ```bash | ||
| # Test the installation | ||
| python test_installation.py | ||
|
|
||
| # Test basic functionality | ||
| python -c "import vllm_omni; print('Ready!')" | ||
|
|
||
| # Test CLI | ||
| vllm --help | ||
| ``` | ||
|
|
||
| For detailed installation instructions, see [INSTALL.md](INSTALL.md). | ||
|
|
||
| ## 📥 Model Download | ||
|
|
||
| Models are automatically downloaded when first used, or you can pre-download them: | ||
|
|
||
| ```bash | ||
| # Check downloaded models | ||
| python scripts/download_models.py --check-cache | ||
|
|
||
| # Download all default models | ||
| python scripts/download_models.py --all | ||
|
|
||
| # Download specific models | ||
| python scripts/download_models.py --ar-models Qwen/Qwen3-0.6B | ||
| python scripts/download_models.py --dit-models stabilityai/stable-diffusion-2-1 | ||
| ``` | ||
|
|
||
| **Model Storage Location:** | ||
| - Default: `~/.cache/huggingface/hub/` | ||
| - AR models: 100MB - 1GB each | ||
| - DiT models: 2GB - 7GB each | ||
|
|
||
| For detailed model management, see [MODEL_DOWNLOAD_GUIDE.md](docs/MODEL_DOWNLOAD_GUIDE.md). | ||
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Gitignore Excludes Essential Project Directories
The
.gitignorefile now excludesscripts/,examples/, andtests/. These directories contain essential project files, and ignoring them prevents them from being tracked by Git.