[Model] Add Index-AniSora I2V support (V1 5B + V2 14B) by dorhuri123 · Pull Request #877 · vllm-project/vllm-omni

dorhuri123 · 2026-01-20T22:58:49Z

Summary

This PR adds support for Index-AniSora Image-to-Video models, a family of anime-optimized video generation models developed by Bilibili. Supports both the 5B (CogVideoX-based) and 14B (Wan2.1-based) variants.

Closes #670

Supported Models

Model	Architecture	VRAM Required	HuggingFace
AniSora V1 (5B)	CogVideoX	~24GB	`IndexTeam/AniSora-v1-i2v-diffusers`
AniSora V2/V3 (14B)	Wan2.1	~65GB	`aardsoul-music/Wan2.1-Anisora-14B`

Demo Results

AniSora V1 (5B) - RTX 6000

Input Image:

Generation Settings:

Prompt: "A cat playing with yarn"
Resolution: 480 × 720
Frames: 81 frames @ 16fps
Inference steps: 50
Guidance scale: 5.0

Output Video (5.06 seconds):

anisora_v1_demo_gh.mp4

AniSora V2 (14B) - Short - NVIDIA H200

Input Image:

Generation Settings:

Prompt: "a panda eating bamboo, natural lighting, detailed fur"
Resolution: 480 × 832
Frames: 17 frames @ 8fps
Inference steps: 30
Guidance scale: 5.0

Output Video (2.1 seconds):

anisora_v2_output_gh.mp4

AniSora V2 (14B) - Long - NVIDIA H200

Input Image:

Generation Settings:

Prompt: "a woman smiling gently, soft natural lighting, cinematic quality, subtle head movement, flowing hair"
Resolution: 480 × 832
Frames: 49 frames @ 8fps
Inference steps: 30
Guidance scale: 5.0

Output Video (6.1 seconds):

anisora_v2_long.mp4

Usage

V1 (5B)

python examples/offline_inference/image_to_video/anisora_image_to_video.py \
  --model IndexTeam/AniSora-v1-i2v-diffusers \
  --image input.png \
  --prompt "anime scene, smooth motion" \
  --height 480 \
  --width 720 \
  --num_frames 81 \
  --guidance_scale 5.0 \
  --num_inference_steps 50 \
  --fps 16 \
  --output anisora_v1.mp4

V2/V3 (14B)

python examples/offline_inference/image_to_video/anisora_v2_image_to_video.py \
  --image input.png \
  --prompt "anime scene, high quality animation" \
  --height 480 \
  --width 832 \
  --num-frames 49 \
  --guidance-scale 5.0 \
  --num-inference-steps 30 \
  --fps 8 \
  --output anisora_v2.mp4

Changes

New Files

vllm_omni/diffusion/models/anisora/ - AniSora pipeline module
- pipeline_anisora_i2v_cogvideox.py - V1 (5B) CogVideoX-based pipeline
- pipeline_anisora_v2_i2v.py - V2/V3 (14B) Wan2.1-based pipeline with hybrid loading
- __init__.py - Module exports
examples/offline_inference/image_to_video/anisora_image_to_video.py - V1 CLI example
examples/offline_inference/image_to_video/anisora_v2_image_to_video.py - V2 CLI example

Modified Files

examples/offline_inference/image_to_video/README.md - Added AniSora documentation
vllm_omni/diffusion/registry.py - Register AniSora V1/V2 pipelines and their post-/pre-process hooks

Technical Notes

V2 Hybrid Loading

The V2 pipeline uses a hybrid loading approach because community-converted AniSora weights use different config/naming:

VAE, T5 text encoder, CLIP image encoder loaded from Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
Transformer weights loaded from community AniSora checkpoints
Includes comprehensive key name conversion (AniSora → diffusers format)

Key Name Conversions

Community AniSora weights use different naming conventions:

self_attn → attn1
cross_attn → attn2
ffn → ff
k → to_k, q → to_q, v → to_v, o → to_out.0
modulation → scale_shift_table
And additional mappings for full compatibility

Testing

Model	GPU	Result
V1 (5B)	RTX 6000 (~24GB)	✅ Generates valid video with motion
V2 (14B)	NVIDIA H200 (~140GB)	✅ Generates valid video with motion

Both pipelines produce output with proper animation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Add Index-AniSora I2V support (V1 5B + V2 14B)#877

[Model] Add Index-AniSora I2V support (V1 5B + V2 14B)#877
dorhuri123 wants to merge 1 commit into
vllm-project:mainfrom
dorhuri123:feature/index-anisora

dorhuri123 commented Jan 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

dorhuri123 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Supported Models

Demo Results

AniSora V1 (5B) - RTX 6000

AniSora V2 (14B) - Short - NVIDIA H200

AniSora V2 (14B) - Long - NVIDIA H200

Usage

V1 (5B)

V2/V3 (14B)

Changes

New Files

Modified Files

Technical Notes

V2 Hybrid Loading

Key Name Conversions

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dorhuri123 commented Jan 20, 2026 •

edited

Loading