Skip to content

[Bugfix] Fix RIFE device selection for CPU-transported videos#2876

Merged
hsliuustc0106 merged 1 commit into
vllm-project:mainfrom
david6666666:codex/fix-rife-device-main
Apr 17, 2026
Merged

[Bugfix] Fix RIFE device selection for CPU-transported videos#2876
hsliuustc0106 merged 1 commit into
vllm-project:mainfrom
david6666666:codex/fix-rife-device-main

Conversation

@david6666666
Copy link
Copy Markdown
Collaborator

Purpose

Fix a regression in RIFE frame interpolation device selection for video generation.

Before this patch, FrameInterpolator always preferred video.device when choosing where to load the RIFE model. In the diffusion serving architecture, the decoded video tensor can already be on CPU because of transport/offload state rather than because CPU execution was intended. In that case, RIFE was incorrectly loaded on CPU even when the active execution platform was CUDA/NPU.

This patch changes the selection rule to:

  • keep using video.device when the tensor is already on a non-CPU device
  • fall back to _select_torch_device() when the tensor is on CPU

That preserves accelerator execution for CPU-transported tensors while keeping CPU-only environments working as before.

Test Plan

  1. Unit test the frame interpolator device-selection behavior.
  2. Run full pre-commit checks.
  3. Run an e2e sync video request against vllm serve --omni with frame interpolation enabled and num_inference_steps=1 to verify the runtime device in logs.

Commands:

pytest -q tests/entrypoints/openai_api/test_video_api_utils.py
pre-commit run --all-files

E2E serve command used:

source /mnt/data4/cwq/.venv/bin/activate
export PYTHONPATH=/mnt/data4/cwq/worktree/codex-fix-rife-device-main
export CUDA_VISIBLE_DEVICES=0
export VLLM_OMNI_STORAGE_PATH=/mnt/data4/cwq/tmp/storage-rife-e2e-main
python -m vllm_omni.entrypoints.cli.main serve \
  /mnt/data1/huggingface/hub/models--Wan-AI--Wan2.2-T2V-A14B-Diffusers/snapshots/5be7df9619b54f4e2667b2755bc6a756675b5cd7 \
  --omni --host 127.0.0.1 --port 18091

E2E request used:

curl -sS -D /mnt/data4/cwq/tmp/rife_e2e_main.headers \
  -o /mnt/data4/cwq/tmp/rife_e2e_main.mp4 \
  -X POST http://127.0.0.1:18091/v1/videos/sync \
  -F 'prompt=A small red ball rolling on a wooden table' \
  -F 'width=256' \
  -F 'height=256' \
  -F 'num_frames=5' \
  -F 'fps=4' \
  -F 'num_inference_steps=1' \
  -F 'guidance_scale=1.0' \
  -F 'guidance_scale_2=1.0' \
  -F 'boundary_ratio=0.875' \
  -F 'flow_shift=5.0' \
  -F 'enable_frame_interpolation=true' \
  -F 'frame_interpolation_exp=1' \
  -F 'frame_interpolation_scale=1.0' \
  -F 'frame_interpolation_model_path=/mnt/data1/huggingface/hub/models--elfgum--RIFE-4.22.lite/snapshots/99d6892a9f4c039cb37ff21c9530e79b13f0b30b' \
  -F 'seed=42'

Test Result

Unit test:

$ pytest -q tests/entrypoints/openai_api/test_video_api_utils.py
4 passed

Pre-commit:

$ pre-commit run --all-files
Passed

E2E:

  • POST /v1/videos/sync returned 200 OK
  • output file /mnt/data4/cwq/tmp/rife_e2e_main.mp4 was generated successfully
  • response header contained x-inference-time-s: 0.760
  • service log showed:
Loaded RIFE weights from /mnt/data1/huggingface/hub/models--elfgum--RIFE-4.22.lite/snapshots/99d6892a9f4c039cb37ff21c9530e79b13f0b30b/flownet.pkl
RIFE model loaded on device: cuda
POST /v1/videos/sync HTTP/1.1" 200 OK

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: david6666666 <530634352@qq.com>
@david6666666 david6666666 force-pushed the codex/fix-rife-device-main branch from d58bd2a to ecbb6d4 Compare April 17, 2026 08:23
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

The fix looks correct. However, a few concerns:

  1. Test coverage is narrow: the unit test only covers CPU tensor with CUDA platform. Consider adding tests for:

    • CPU tensor with CPU platform (CPU-only environments)
    • CUDA tensor (verify the unchanged path still works)
    • Other accelerators if supported (NPU, XPU)
  2. E2E test is minimal: num_inference_steps=1 and 256x256 resolution may not expose edge cases. Consider testing with more realistic parameters.

  3. Local test paths: the E2E validation uses hardcoded local paths (/mnt/data4/, /mnt/data1/). Consider making the test more portable or documenting how CI will validate this.

@hsliuustc0106 hsliuustc0106 merged commit b7f2398 into vllm-project:main Apr 17, 2026
8 checks passed
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
david6666666 added a commit that referenced this pull request Apr 20, 2026
 #2877 (#2878)

Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants