vllm-project · gDINESH13 · Feb 8, 2026 · Feb 8, 2026 · Feb 8, 2026 · Feb 9, 2026
@@ -0,0 +1,144 @@
+# SkyReels-V3 Offline Inference Examples
+
+This directory contains examples for using the SkyReels-V3 multimodal video generation models with vLLM-Omni.
+
+## Models
+
+SkyReels-V3 is a family of multimodal video generation models that support:
+
+- **Image-to-Video (R2V)**: Generate videos from reference images
+- **Video-to-Video (V2V)**: Transform existing videos
+- **Audio-to-Video (A2V)**: Generate videos guided by audio
+
+### Available Models
+
+- `Skywork/SkyReels-V3-R2V-14B`: Image-to-Video (14B parameters)
+- `Skywork/SkyReels-V3-V2V-14B`: Video-to-Video (14B parameters)
+- `Skywork/SkyReels-V3-A2V-19B`: Audio-to-Video (19B parameters)
+
+## Installation
+
+Install the required dependencies:
+
+```bash
+pip install vllm-omni
+pip install imageio imageio-ffmpeg  # For video I/O
+```
+
+## Usage
+
+### Image-to-Video (R2V)
+
+Generate a video from a reference image:
+
+```bash
+python image_to_video.py \
+    --model Skywork/SkyReels-V3-R2V-14B \
+    --image path/to/your/image.jpg \
+    --prompt "A person walking through a beautiful garden" \
+    --height 480 \
+    --width 832 \
+    --num-frames 81 \
+    --num-inference-steps 50 \
+    --guidance-scale 7.5 \
+    --seed 42 \
+    --output-dir ./outputs/skyreels_v3 \
+    --output-format mp4
+```
+
+### Parameters
+
+- `--model`: Model name or path (default: `Skywork/SkyReels-V3-R2V-14B`)
+- `--image`: Path to the reference image (required)
+- `--prompt`: Text prompt describing the desired video
+- `--negative-prompt`: Negative prompt to avoid certain content (optional)
+- `--height`: Video height in pixels (default: 480)
+- `--width`: Video width in pixels (default: 832)
+- `--num-frames`: Number of frames to generate (default: 81)
+- `--num-inference-steps`: Number of denoising steps (default: 50, higher = better quality but slower)
+- `--guidance-scale`: Classifier-free guidance scale (default: 7.5, higher = more prompt adherence)
+- `--seed`: Random seed for reproducibility (default: 42)
+- `--output-dir`: Output directory for generated videos
+- `--output-format`: Output format: `mp4`, `gif`, or `frames`
+
+## Examples
+
+### Basic Image-to-Video
+
+```bash
+python image_to_video.py \
+    --image examples/sample_image.jpg \
+    --prompt "A cinematic video of the scene"
+```
+
+### High-Quality Generation
+
+```bash
+python image_to_video.py \
+    --image examples/sample_image.jpg \
+    --prompt "A dramatic video with dynamic camera movement" \
+    --num-inference-steps 100 \
+    --guidance-scale 9.0 \
+    --num-frames 121
+```
+
+### Generate GIF
+
+```bash
+python image_to_video.py \
+    --image examples/sample_image.jpg \
+    --prompt "A looping animation" \
+    --output-format gif \
+    --num-frames 49
+```
+
+## Tips
+
+1. **Image Quality**: Use high-quality reference images for best results
+2. **Aspect Ratio**: The model works best with 16:9 aspect ratio (e.g., 832x480)
+3. **Frame Count**: More frames = longer videos but slower generation
+4. **Guidance Scale**:
+   - Lower (3-5): More creative, less adherence to prompt
+   - Medium (7-9): Balanced
+   - Higher (10+): Strong prompt adherence, may reduce quality
+5. **Inference Steps**: 50 steps is usually sufficient; 100+ for highest quality
+
+## Performance
+
+- **GPU Memory**: ~24GB VRAM required for R2V-14B model
+- **Generation Time**: ~2-5 minutes for 81 frames on A100 GPU
+- **Batch Size**: Currently supports batch size of 1
+
+## Troubleshooting
+
+### Out of Memory
+
+If you encounter OOM errors:
+- Reduce `--num-frames`
+- Reduce `--height` and `--width`
+- Use a smaller model variant if available
+
+### Poor Quality
+
+If the output quality is poor:
+- Increase `--num-inference-steps` (try 75-100)
+- Adjust `--guidance-scale` (try 8-10)
+- Use a higher quality reference image
+- Refine your prompt to be more specific
+
+## Citation
+
+If you use SkyReels-V3 in your research, please cite:
+
+```bibtex
+@article{skyreels2025,
+  title={SkyReels-V3: Multimodal Video Generation with Unified In-Context Learning},
+  author={Skywork Team},
+  journal={arXiv preprint},
+  year={2025}
+}
+```
+
+## License
+
+SkyReels-V3 models are released under the Skywork License. Please refer to the model card on Hugging Face for details.
@@ -0,0 +1,194 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
+
+"""
+SkyReels-V3 Image-to-Video (R2V) Offline Inference Example.
+
+This script demonstrates how to use the SkyReels-V3 R2V model to generate
+videos from reference images using the vLLM-Omni framework.
+
+Usage:
+    python image_to_video.py --model Skywork/SkyReels-V3-R2V-14B \
+                             --image path/to/image.jpg \
+                             --prompt "A person walking in the park"
+"""
+
+import argparse
+import os
+from pathlib import Path
+
+from PIL import Image
+
+from vllm_omni.entrypoints.omni_diffusion import OmniDiffusion
+from vllm_omni.inputs.data import OmniDiffusionSamplingParams
+from vllm_omni.outputs import OmniRequestOutput
+
+
+def main():
+    parser = argparse.ArgumentParser(description="SkyReels-V3 Image-to-Video Generation")
+    parser.add_argument(
+        "--model",
+        type=str,
+        default="Skywork/SkyReels-V3-R2V-14B",
+        help="Model name or path (default: Skywork/SkyReels-V3-R2V-14B)",
+    )
+    parser.add_argument(
+        "--image",
+        type=str,
+        required=True,
+        help="Path to the reference image",
+    )
+    parser.add_argument(
+        "--prompt",
+        type=str,
+        default="A cinematic video",
+        help="Text prompt describing the desired video",
+    )
+    parser.add_argument(
+        "--negative-prompt",
+        type=str,
+        default="",
+        help="Negative prompt (optional)",
+    )
+    parser.add_argument(
+        "--height",
+        type=int,
+        default=480,
+        help="Video height (default: 480)",
+    )
+    parser.add_argument(
+        "--width",
+        type=int,
+        default=832,
+        help="Video width (default: 832)",
+    )
+    parser.add_argument(
+        "--num-frames",
+        type=int,
+        default=81,
+        help="Number of frames to generate (default: 81)",
+    )
+    parser.add_argument(
+        "--num-inference-steps",
+        type=int,
+        default=50,
+        help="Number of denoising steps (default: 50)",
+    )
+    parser.add_argument(
+        "--guidance-scale",
+        type=float,
+        default=7.5,
+        help="Guidance scale for classifier-free guidance (default: 7.5)",
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=42,
+        help="Random seed for reproducibility (default: 42)",
+    )
+    parser.add_argument(
+        "--output-dir",
+        type=str,
+        default="./outputs/skyreels_v3",
+        help="Output directory for generated videos (default: ./outputs/skyreels_v3)",
+    )
+    parser.add_argument(
+        "--output-format",
+        type=str,
+        default="mp4",
+        choices=["mp4", "gif", "frames"],
+        help="Output format: mp4, gif, or frames (default: mp4)",
+    )
+
+    args = parser.parse_args()
+
+    # Create output directory
+    output_dir = Path(args.output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    # Load reference image
+    if not os.path.exists(args.image):
+        raise FileNotFoundError(f"Image not found: {args.image}")
+
+    image = Image.open(args.image).convert("RGB")
+    print(f"Loaded reference image: {args.image} ({image.size})")
+
+    # Initialize the model
+    print(f"Loading SkyReels-V3 model: {args.model}")
+    model = OmniDiffusion(
+        model=args.model,
+        model_class_name="SkyReelsV3R2VPipeline",
+        trust_remote_code=True,
+    )
+
+    # Prepare the request
+    print(f"\nGenerating video with prompt: '{args.prompt}'")
+    print("Parameters:")
+    print(f"  - Resolution: {args.width}x{args.height}")
+    print(f"  - Frames: {args.num_frames}")
+    print(f"  - Steps: {args.num_inference_steps}")
+    print(f"  - Guidance Scale: {args.guidance_scale}")
+    print(f"  - Seed: {args.seed}")
+
+    # Generate video
+    outputs = model.generate(
+        prompts=[
+            {
+                "prompt": args.prompt,
+                "multi_modal_data": {"image": image},
+            }
+        ],
+        sampling_params=OmniDiffusionSamplingParams(
+            height=args.height,
+            width=args.width,
+            num_frames=args.num_frames,
+            num_inference_steps=args.num_inference_steps,
+            guidance_scale=args.guidance_scale,
+            seed=args.seed,
+        ),
+    )
+
+    # Save the generated video
+    for idx, output in enumerate(outputs):
+        # Extract video frames from OmniRequestOutput
+        video_frames = None
+        if isinstance(output, OmniRequestOutput):
+            # In diffusion mode, output.images is the full list of frames
+            if hasattr(output, "images") and output.images:
+                video_frames = output.images
+            else:
+                raise ValueError("No video data found in diffusion output.")
+        else:
+            raise TypeError(f"Unexpected output type: {type(output)}")
+
+        if args.output_format == "mp4":
+            output_path = output_dir / f"video_{idx:04d}.mp4"
+            # Save as MP4 video
+            import imageio
+
+            imageio.mimsave(output_path, video_frames, fps=24, codec="libx264")
+            print(f"\nSaved video to: {output_path}")
+
+        elif args.output_format == "gif":
+            output_path = output_dir / f"video_{idx:04d}.gif"
+            # Save as GIF
+            import imageio
+
+            imageio.mimsave(output_path, video_frames, fps=12)
+            print(f"\nSaved GIF to: {output_path}")
+
+        else:  # frames
+            frames_dir = output_dir / f"video_{idx:04d}_frames"
+            frames_dir.mkdir(exist_ok=True)
+            # Save individual frames
+            for frame_idx, frame in enumerate(video_frames):
+                frame_path = frames_dir / f"frame_{frame_idx:04d}.png"
+                Image.fromarray(frame).save(frame_path)
+            print(f"\nSaved {len(video_frames)} frames to: {frames_dir}")
+
+    print("\nGeneration complete!")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,15 @@
+"""SkyReels-V3 multimodal video generation models."""
+
+from .pipeline_skyreels_v3_r2v import (
+    SkyReelsV3R2VPipeline,
+    get_skyreels_v3_r2v_post_process_func,
+    get_skyreels_v3_r2v_pre_process_func,
+)
+from .skyreels_v3_transformer import SkyReelsTransformer3DModel
+
+__all__ = [
+    "SkyReelsV3R2VPipeline",
+    "get_skyreels_v3_r2v_post_process_func",
+    "get_skyreels_v3_r2v_pre_process_func",
+    "SkyReelsTransformer3DModel",
+]