Skip to content

[Model] Add Index-AniSora I2V support (V1 5B + V2 14B)#877

Open
dorhuri123 wants to merge 1 commit into
vllm-project:mainfrom
dorhuri123:feature/index-anisora
Open

[Model] Add Index-AniSora I2V support (V1 5B + V2 14B)#877
dorhuri123 wants to merge 1 commit into
vllm-project:mainfrom
dorhuri123:feature/index-anisora

Conversation

@dorhuri123
Copy link
Copy Markdown

@dorhuri123 dorhuri123 commented Jan 20, 2026

Summary

This PR adds support for Index-AniSora Image-to-Video models, a family of anime-optimized video generation models developed by Bilibili. Supports both the 5B (CogVideoX-based) and 14B (Wan2.1-based) variants.

Closes #670

Supported Models

Model Architecture VRAM Required HuggingFace
AniSora V1 (5B) CogVideoX ~24GB IndexTeam/AniSora-v1-i2v-diffusers
AniSora V2/V3 (14B) Wan2.1 ~65GB aardsoul-music/Wan2.1-Anisora-14B

Demo Results

AniSora V1 (5B) - RTX 6000

Input Image:

anisora_v1_demo_frame

Generation Settings:

  • Prompt: "A cat playing with yarn"
  • Resolution: 480 × 720
  • Frames: 81 frames @ 16fps
  • Inference steps: 50
  • Guidance scale: 5.0

Output Video (5.06 seconds):

anisora_v1_demo_gh.mp4

AniSora V2 (14B) - Short - NVIDIA H200

Input Image:

panda_input

Generation Settings:

  • Prompt: "a panda eating bamboo, natural lighting, detailed fur"
  • Resolution: 480 × 832
  • Frames: 17 frames @ 8fps
  • Inference steps: 30
  • Guidance scale: 5.0

Output Video (2.1 seconds):

anisora_v2_output_gh.mp4

AniSora V2 (14B) - Long - NVIDIA H200

Input Image:

portrait_input_1

Generation Settings:

  • Prompt: "a woman smiling gently, soft natural lighting, cinematic quality, subtle head movement, flowing hair"
  • Resolution: 480 × 832
  • Frames: 49 frames @ 8fps
  • Inference steps: 30
  • Guidance scale: 5.0

Output Video (6.1 seconds):

anisora_v2_long.mp4

Usage

V1 (5B)

python examples/offline_inference/image_to_video/anisora_image_to_video.py \
  --model IndexTeam/AniSora-v1-i2v-diffusers \
  --image input.png \
  --prompt "anime scene, smooth motion" \
  --height 480 \
  --width 720 \
  --num_frames 81 \
  --guidance_scale 5.0 \
  --num_inference_steps 50 \
  --fps 16 \
  --output anisora_v1.mp4

V2/V3 (14B)

python examples/offline_inference/image_to_video/anisora_v2_image_to_video.py \
  --image input.png \
  --prompt "anime scene, high quality animation" \
  --height 480 \
  --width 832 \
  --num-frames 49 \
  --guidance-scale 5.0 \
  --num-inference-steps 30 \
  --fps 8 \
  --output anisora_v2.mp4

Changes

New Files

  • vllm_omni/diffusion/models/anisora/ - AniSora pipeline module
    • pipeline_anisora_i2v_cogvideox.py - V1 (5B) CogVideoX-based pipeline
    • pipeline_anisora_v2_i2v.py - V2/V3 (14B) Wan2.1-based pipeline with hybrid loading
    • __init__.py - Module exports
  • examples/offline_inference/image_to_video/anisora_image_to_video.py - V1 CLI example
  • examples/offline_inference/image_to_video/anisora_v2_image_to_video.py - V2 CLI example

Modified Files

  • examples/offline_inference/image_to_video/README.md - Added AniSora documentation
  • vllm_omni/diffusion/registry.py - Register AniSora V1/V2 pipelines and their post-/pre-process hooks

Technical Notes

V2 Hybrid Loading

The V2 pipeline uses a hybrid loading approach because community-converted AniSora weights use different config/naming:

  • VAE, T5 text encoder, CLIP image encoder loaded from Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
  • Transformer weights loaded from community AniSora checkpoints
  • Includes comprehensive key name conversion (AniSora → diffusers format)

Key Name Conversions

Community AniSora weights use different naming conventions:

  • self_attnattn1
  • cross_attnattn2
  • ffnff
  • kto_k, qto_q, vto_v, oto_out.0
  • modulationscale_shift_table
  • And additional mappings for full compatibility

Testing

Model GPU Result
V1 (5B) RTX 6000 (~24GB) ✅ Generates valid video with motion
V2 (14B) NVIDIA H200 (~140GB) ✅ Generates valid video with motion

Both pipelines produce output with proper animation.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model add new model ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]: Index-AniSora (Bilibili)

8 participants