[Model] Add Index-AniSora I2V support (V1 5B + V2 14B)#877
Open
dorhuri123 wants to merge 1 commit into
Open
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds support for Index-AniSora Image-to-Video models, a family of anime-optimized video generation models developed by Bilibili. Supports both the 5B (CogVideoX-based) and 14B (Wan2.1-based) variants.
Closes #670
Supported Models
IndexTeam/AniSora-v1-i2v-diffusersaardsoul-music/Wan2.1-Anisora-14BDemo Results
AniSora V1 (5B) - RTX 6000
Input Image:
Generation Settings:
"A cat playing with yarn"Output Video (5.06 seconds):
anisora_v1_demo_gh.mp4
AniSora V2 (14B) - Short - NVIDIA H200
Input Image:
Generation Settings:
"a panda eating bamboo, natural lighting, detailed fur"Output Video (2.1 seconds):
anisora_v2_output_gh.mp4
AniSora V2 (14B) - Long - NVIDIA H200
Input Image:
Generation Settings:
"a woman smiling gently, soft natural lighting, cinematic quality, subtle head movement, flowing hair"Output Video (6.1 seconds):
anisora_v2_long.mp4
Usage
V1 (5B)
python examples/offline_inference/image_to_video/anisora_image_to_video.py \ --model IndexTeam/AniSora-v1-i2v-diffusers \ --image input.png \ --prompt "anime scene, smooth motion" \ --height 480 \ --width 720 \ --num_frames 81 \ --guidance_scale 5.0 \ --num_inference_steps 50 \ --fps 16 \ --output anisora_v1.mp4V2/V3 (14B)
python examples/offline_inference/image_to_video/anisora_v2_image_to_video.py \ --image input.png \ --prompt "anime scene, high quality animation" \ --height 480 \ --width 832 \ --num-frames 49 \ --guidance-scale 5.0 \ --num-inference-steps 30 \ --fps 8 \ --output anisora_v2.mp4Changes
New Files
vllm_omni/diffusion/models/anisora/- AniSora pipeline modulepipeline_anisora_i2v_cogvideox.py- V1 (5B) CogVideoX-based pipelinepipeline_anisora_v2_i2v.py- V2/V3 (14B) Wan2.1-based pipeline with hybrid loading__init__.py- Module exportsexamples/offline_inference/image_to_video/anisora_image_to_video.py- V1 CLI exampleexamples/offline_inference/image_to_video/anisora_v2_image_to_video.py- V2 CLI exampleModified Files
examples/offline_inference/image_to_video/README.md- Added AniSora documentationvllm_omni/diffusion/registry.py- Register AniSora V1/V2 pipelines and their post-/pre-process hooksTechnical Notes
V2 Hybrid Loading
The V2 pipeline uses a hybrid loading approach because community-converted AniSora weights use different config/naming:
Wan-AI/Wan2.1-I2V-14B-480P-DiffusersKey Name Conversions
Community AniSora weights use different naming conventions:
self_attn→attn1cross_attn→attn2ffn→ffk→to_k,q→to_q,v→to_v,o→to_out.0modulation→scale_shift_tableTesting
Both pipelines produce output with proper animation.